New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add custom exceptions for calling and validating user models #520
Conversation
Codecov Report
@@ Coverage Diff @@
## master #520 +/- ##
==========================================
+ Coverage 82.34% 82.49% +0.14%
==========================================
Files 76 77 +1
Lines 10334 10400 +66
==========================================
+ Hits 8510 8579 +69
+ Misses 1824 1821 -3
|
Unrelated to the content of this PR, but looks like the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice. Is there a way to wire up exceptions as well for cases where we are not doing the dummy call? e.g. KernelShap
?
It feels we are sticky with the dummy call approach longer term, which is fine but just highlighting that if we want to get rid of it we might consider doing the checks differently?
@sakoush good point, but I don't think there is another way to validate the user callable without doing some kind of dummy call. Since we're doing it for |
x = np.zeros((1,) + self.image_shape, dtype=self.dtype) | ||
try: | ||
prediction = predictor(x) | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another point here is that in mlserver / alibi explainer server v1 we are wrapping a predict_fn around a REST call. So there might be other exceptions that are raised with the connection. I am not sure though how we will identify the differences here.
interesting in the mlserver runtime it will raise this exception.
not sure about the interop between libraries and how to deal with exceptions in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, there must be a way around this... I guess the issue is that with these semantics alibi
doesn't know what can go wrong calling a user model, so all exceptions are caught and re-raised with an error that assumes something about the model is misspecified. But for clients such as mlserver
they would like to distinguish between these types of errors and network errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test the behaviour of this? I believe it could still work as is as the remote request shouldn't raise a network error but rather return an error code, in which case if an error is raised it must have come from the model internals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am raising ValueError
and here is what we get.
args = ()
kwargs = {'predictor_url': 'dummy_call', 'v2_payload': InferenceRequest(id=None, parameters=Parameters(content_type='np'), inpu... 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]))], outputs=None)}
loop = <_UnixSelectorEventLoop running=False closed=False debug=False>
res = InferenceResponse(model_name='custom_tf_mnist_model', model_version=None, id=None, parameters=None, outputs=[ResponseO...274435043, 0.10133041441440582, 0.09029622375965118, 0.11116134375333786, 0.07727331668138504, 0.09395473450422287]))])
def mock_predict(*args, **kwargs):
# note: sometimes the event loop is not running and in this case
# we create a new one otherwise
# we use the existing one.
# this mock implementation is required as we dont want to spin up a server,
# we just use MLModel.predict
try:
loop = asyncio.get_event_loop()
res = loop.run_until_complete(
custom_runtime_tf.predict(kwargs["v2_payload"])
)
raise ValueError("break")
return res
except Exception:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
res = loop.run_until_complete(
custom_runtime_tf.predict(kwargs["v2_payload"])
)
> raise ValueError("break")
E ValueError: break
conftest.py:161: ValueError
The above exception was the direct cause of the following exception:
args = ()
kwargs = {'anchor_image_directory': PosixPath('/home/sa/code/MLServer/runtimes/alibi-explain/tests/data/mnist_anchor_image'), 'custom_runtime_tf': <helpers.tf_model.TFMNISTModel object at 0x7f6936454f50>}
loop = <_UnixSelectorEventLoop running=False closed=False debug=False>
setup = <function pytest_fixture_setup.<locals>.wrapper.<locals>.setup at 0x7f6a84016830>
def wrapper(*args, **kwargs):
loop = fixture_stripper.get_and_strip_from(
FixtureStripper.EVENT_LOOP, kwargs
)
async def setup():
res = await coro(*args, **kwargs)
return res
> return loop.run_until_complete(setup())
/home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/site-packages/pytest_asyncio/plugin.py:160:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/site-packages/nest_asyncio.py:70: in run_until_complete
return f.result()
/home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/asyncio/futures.py:181: in result
raise self._exception
/home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/asyncio/tasks.py:249: in __step
result = coro.send(None)
/home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/site-packages/pytest_asyncio/plugin.py:157: in setup
res = await coro(*args, **kwargs)
conftest.py:177: in anchor_image_runtime_with_remote_predict_patch
await rt.load()
../mlserver_alibi_explain/runtime.py:165: in load
return await self._rt.load()
../mlserver_alibi_explain/explainers/black_box_runtime.py:43: in load
self._model = self._load_from_uri(self._infer_impl)
../mlserver_alibi_explain/runtime.py:91: in _load_from_uri
return load_explainer(uri, predictor=predictor)
/home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/site-packages/alibi/saving.py:68: in load_explainer
return load_fn(path, predictor, meta)
/home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/site-packages/alibi/saving.py:155: in _load_AnchorImage
explainer.reset_predictor(predictor)
/home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/site-packages/alibi/explainers/anchor_image.py:643: in reset_predictor
self.predictor = self._transform_predictor(predictor)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = AnchorImage(meta={
'name': 'AnchorImage',
'type': ['blackbox'],
'explanations': ['local'],
'params': {
... 'images_background': None,
'segmentation_fn': 'slic'}
,
'version': '0.6.2dev'}
)
predictor = <bound method AlibiExplainBlackBoxRuntime._infer_impl of <mlserver_alibi_explain.explainers.black_box_runtime.AlibiExplainBlackBoxRuntime object at 0x7f6935b20ad0>>
def _transform_predictor(self, predictor: Callable) -> Callable:
# check if predictor returns predicted class or prediction probabilities for each class
# if needed adjust predictor so it returns the predicted class
x = np.zeros((1,) + self.image_shape, dtype=self.dtype)
try:
prediction = predictor(x)
except Exception as e:
msg = f"Predictor failed to be called on {type(x)} of shape {x.shape} and dtype {x.dtype}. " \
f"Check that the parameter `image_shape` is correctly specified."
> raise AlibiPredictorCallException(msg) from e
E alibi.exceptions.AlibiPredictorCallException: Predictor failed to be called on <class 'numpy.ndarray'> of shape (1, 28, 28, 1) and dtype float32. Check that the parameter `image_shape` is correctly specified.
/home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/site-packages/alibi/explainers/anchor_image.py:630: AlibiPredictorCallException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sakoush unless I'm misunderstanding, the above result looks good? As you are successfully raising the new alibi
exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the point is that mlserver
would like to catch specific errors e.g. ValueError
from the scope calling the predictor, but because inside alibi
all Exception
are caught and re-raised with AlibiPredictorCallException
this is not possible? Would be good to clarify again what the goals here are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sakoush wouldn't this be solved by doing except ValueError
instead? mlserver
shouldn't really be catching bare exceptions and if has error handling for both ValueError
and AlibiPredictorCallException
we should be good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not actually capturing exceptions, what we do in mlserver
when we get an error from calling the endpoint for predict_fn
is
if response_raw.status_code != 200:
raise RemoteInferenceError(response_raw.status_code, response_raw.reason)
we probably can be more specific for the error codes but I dont think that the statement alibi.exceptions.AlibiPredictorCallException: Predictor failed to be called on <class 'numpy.ndarray'> of shape (1, 28, 28, 1) and dtype float32. Check that the parameter
image_shape is correctly specified.
is valid generally. I need to think more about this from my side, basically how we can propagate exceptions interprocess?
@jklaise these look nice to me. I'm assuming we'd add more exceptions as we go? |
Yes, we can also create a more granular hierarchy if needed in the future so callers can catch more general |
I will go ahead with merging this unless any further comments? @ascillitoe @sakoush |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
This PR introduces a new public
alibi.exceptions
module defining a (small) hierarchy of general exceptions to be used acrossalibi
. The motivation is to provide nicer messages when things go wrong and also well-defined exceptions which can be handled by the caller. This could be considered an adjacent building block towards #516.The initial application of this is 2 fold:
AlibiPredictorCallException
when calling user provided models with dummy data (currentlyAnchorTabular
,AnchorText
,AnchorImage
)AlibiPredictorReturnTypeError
if notThe first use case now clearly defines to the caller what error to expect and when it is raised. I've raised this when the call to a user provided model fails during
__init__
, but not in other methods as it could be argued that infit
andexplain
we are calling the model with user provided data (and perturbations thereof) so it should work fine (if there is a compelling reason to introducetry/except
on calling the user model elsewhere in the code then that can be done later).The second use case is, if a call succeeds, we can check whether the return is of the expected type so an error can be raised early if it is not. I opted for an explicit
isinstance
check rather than something of the EAFP styletry: prediction.shape
. This has the side-effect that we tighten the return type expected to be exactlynp.ndarray
as opposed to anything that implements.shape
attribute (for exampletensorflow
tensors). I don't think this is a big deal as other tensor/array types that may implement.shape
should not work withalibi
currently anyway, so it's good to tighten up the requirements at runtime.The PR should be ready to review as is with a few minor points outstanding:
There are a few other explainers which call theEDIT: this could be handled in a follow-up PR to focus this PR discussion on the overall approach.predictor
at__init__
with dummy data (e.g.Countefactual
) which should be treated the same waysphinx
directives to link to the custom exception definitions in the code.