Add custom exceptions for calling and validating user models #520

jklaise · 2021-11-03T15:17:05Z

This PR introduces a new public alibi.exceptions module defining a (small) hierarchy of general exceptions to be used across alibi. The motivation is to provide nicer messages when things go wrong and also well-defined exceptions which can be handled by the caller. This could be considered an adjacent building block towards #516.

The initial application of this is 2 fold:

Catching errors and re-raising our own AlibiPredictorCallException when calling user provided models with dummy data (currently AnchorTabular, AnchorText, AnchorImage)
Validating that the output type of the call (if successful) is of the expected type or raising AlibiPredictorReturnTypeError if not

The first use case now clearly defines to the caller what error to expect and when it is raised. I've raised this when the call to a user provided model fails during __init__, but not in other methods as it could be argued that in fit and explain we are calling the model with user provided data (and perturbations thereof) so it should work fine (if there is a compelling reason to introduce try/except on calling the user model elsewhere in the code then that can be done later).

The second use case is, if a call succeeds, we can check whether the return is of the expected type so an error can be raised early if it is not. I opted for an explicit isinstance check rather than something of the EAFP style try: prediction.shape. This has the side-effect that we tighten the return type expected to be exactly np.ndarray as opposed to anything that implements .shape attribute (for example tensorflow tensors). I don't think this is a big deal as other tensor/array types that may implement .shape should not work with alibi currently anyway, so it's good to tighten up the requirements at runtime.

The PR should be ready to review as is with a few minor points outstanding:

~~There are a few other explainers which call the predictor at __init__ with dummy data (e.g. Countefactual) which should be treated the same way~~ EDIT: this could be handled in a follow-up PR to focus this PR discussion on the overall approach.
In docstrings we should use the sphinx directives to link to the custom exception definitions in the code.

codecov · 2021-11-03T15:53:25Z

Codecov Report

Merging #520 (6f60454) into master (d891152) will increase coverage by 0.14%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #520      +/-   ##
==========================================
+ Coverage   82.34%   82.49%   +0.14%     
==========================================
  Files          76       77       +1     
  Lines       10334    10400      +66     
==========================================
+ Hits         8510     8579      +69     
+ Misses       1824     1821       -3

Impacted Files	Coverage Δ
alibi/exceptions.py	`100.00% <100.00%> (ø)`
alibi/explainers/anchor_image.py	`91.30% <100.00%> (+0.57%)`	⬆️
alibi/explainers/anchor_tabular.py	`78.75% <100.00%> (+0.92%)`	⬆️
alibi/explainers/anchor_text.py	`86.93% <100.00%> (+0.31%)`	⬆️
alibi/explainers/tests/test_anchor_image.py	`97.93% <100.00%> (+0.26%)`	⬆️
alibi/explainers/tests/test_anchor_tabular.py	`72.82% <100.00%> (+1.72%)`	⬆️
alibi/explainers/tests/test_anchor_text.py	`93.15% <100.00%> (+0.39%)`	⬆️

jklaise · 2021-11-03T17:19:23Z

Unrelated to the content of this PR, but looks like the tensorflow 2.6.1 release 2 days ago now results in some errors collecting tests (and potentially runtime errors...).

sakoush

nice. Is there a way to wire up exceptions as well for cases where we are not doing the dummy call? e.g. KernelShap?

It feels we are sticky with the dummy call approach longer term, which is fine but just highlighting that if we want to get rid of it we might consider doing the checks differently?

jklaise · 2021-11-04T14:45:10Z

@sakoush good point, but I don't think there is another way to validate the user callable without doing some kind of dummy call. Since we're doing it for Anchor* it makes sense to augment it with these exceptions. For KernelShap if we wanted to validate the callable in the same way we would have to make the dummy call... (One option which I don't particularly like is deferring validation until fit/explain where we use the user supplied data, but this introduces complexity in the alibi code).

sakoush · 2021-11-04T16:54:18Z

alibi/explainers/anchor_image.py

+        x = np.zeros((1,) + self.image_shape, dtype=self.dtype)
+        try:
+            prediction = predictor(x)
+        except Exception as e:


another point here is that in mlserver / alibi explainer server v1 we are wrapping a predict_fn around a REST call. So there might be other exceptions that are raised with the connection. I am not sure though how we will identify the differences here.

interesting in the mlserver runtime it will raise this exception.

not sure about the interop between libraries and how to deal with exceptions in this case.

Good point, there must be a way around this... I guess the issue is that with these semantics alibi doesn't know what can go wrong calling a user model, so all exceptions are caught and re-raised with an error that assumes something about the model is misspecified. But for clients such as mlserver they would like to distinguish between these types of errors and network errors.

Can we test the behaviour of this? I believe it could still work as is as the remote request shouldn't raise a network error but rather return an error code, in which case if an error is raised it must have come from the model internals.

I am raising ValueError and here is what we get.

args = () kwargs = {'predictor_url': 'dummy_call', 'v2_payload': InferenceRequest(id=None, parameters=Parameters(content_type='np'), inpu... 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]))], outputs=None)} loop = <_UnixSelectorEventLoop running=False closed=False debug=False> res = InferenceResponse(model_name='custom_tf_mnist_model', model_version=None, id=None, parameters=None, outputs=[ResponseO...274435043, 0.10133041441440582, 0.09029622375965118, 0.11116134375333786, 0.07727331668138504, 0.09395473450422287]))]) def mock_predict(*args, **kwargs): # note: sometimes the event loop is not running and in this case # we create a new one otherwise # we use the existing one. # this mock implementation is required as we dont want to spin up a server, # we just use MLModel.predict try: loop = asyncio.get_event_loop() res = loop.run_until_complete( custom_runtime_tf.predict(kwargs["v2_payload"]) ) raise ValueError("break") return res except Exception: loop = asyncio.new_event_loop() asyncio.set_event_loop(loop) res = loop.run_until_complete( custom_runtime_tf.predict(kwargs["v2_payload"]) ) > raise ValueError("break") E ValueError: break conftest.py:161: ValueError The above exception was the direct cause of the following exception: args = () kwargs = {'anchor_image_directory': PosixPath('/home/sa/code/MLServer/runtimes/alibi-explain/tests/data/mnist_anchor_image'), 'custom_runtime_tf': <helpers.tf_model.TFMNISTModel object at 0x7f6936454f50>} loop = <_UnixSelectorEventLoop running=False closed=False debug=False> setup = <function pytest_fixture_setup.<locals>.wrapper.<locals>.setup at 0x7f6a84016830> def wrapper(*args, **kwargs): loop = fixture_stripper.get_and_strip_from( FixtureStripper.EVENT_LOOP, kwargs ) async def setup(): res = await coro(*args, **kwargs) return res > return loop.run_until_complete(setup()) /home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/site-packages/pytest_asyncio/plugin.py:160: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/site-packages/nest_asyncio.py:70: in run_until_complete return f.result() /home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/asyncio/futures.py:181: in result raise self._exception /home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/asyncio/tasks.py:249: in __step result = coro.send(None) /home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/site-packages/pytest_asyncio/plugin.py:157: in setup res = await coro(*args, **kwargs) conftest.py:177: in anchor_image_runtime_with_remote_predict_patch await rt.load() ../mlserver_alibi_explain/runtime.py:165: in load return await self._rt.load() ../mlserver_alibi_explain/explainers/black_box_runtime.py:43: in load self._model = self._load_from_uri(self._infer_impl) ../mlserver_alibi_explain/runtime.py:91: in _load_from_uri return load_explainer(uri, predictor=predictor) /home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/site-packages/alibi/saving.py:68: in load_explainer return load_fn(path, predictor, meta) /home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/site-packages/alibi/saving.py:155: in _load_AnchorImage explainer.reset_predictor(predictor) /home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/site-packages/alibi/explainers/anchor_image.py:643: in reset_predictor self.predictor = self._transform_predictor(predictor) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = AnchorImage(meta={ 'name': 'AnchorImage', 'type': ['blackbox'], 'explanations': ['local'], 'params': { ... 'images_background': None, 'segmentation_fn': 'slic'} , 'version': '0.6.2dev'} ) predictor = <bound method AlibiExplainBlackBoxRuntime._infer_impl of <mlserver_alibi_explain.explainers.black_box_runtime.AlibiExplainBlackBoxRuntime object at 0x7f6935b20ad0>> def _transform_predictor(self, predictor: Callable) -> Callable: # check if predictor returns predicted class or prediction probabilities for each class # if needed adjust predictor so it returns the predicted class x = np.zeros((1,) + self.image_shape, dtype=self.dtype) try: prediction = predictor(x) except Exception as e: msg = f"Predictor failed to be called on {type(x)} of shape {x.shape} and dtype {x.dtype}. " \ f"Check that the parameter `image_shape` is correctly specified." > raise AlibiPredictorCallException(msg) from e E alibi.exceptions.AlibiPredictorCallException: Predictor failed to be called on <class 'numpy.ndarray'> of shape (1, 28, 28, 1) and dtype float32. Check that the parameter `image_shape` is correctly specified. /home/sa/miniconda3/envs/python3.7-mlserver/lib/python3.7/site-packages/alibi/explainers/anchor_image.py:630: AlibiPredictorCallException

@sakoush unless I'm misunderstanding, the above result looks good? As you are successfully raising the new alibi exception?

I believe the point is that mlserver would like to catch specific errors e.g. ValueError from the scope calling the predictor, but because inside alibi all Exception are caught and re-raised with AlibiPredictorCallException this is not possible? Would be good to clarify again what the goals here are.

@sakoush wouldn't this be solved by doing except ValueError instead? mlserver shouldn't really be catching bare exceptions and if has error handling for both ValueError and AlibiPredictorCallException we should be good to go.

We are not actually capturing exceptions, what we do in mlserver when we get an error from calling the endpoint for predict_fn is

if response_raw.status_code != 200: raise RemoteInferenceError(response_raw.status_code, response_raw.reason)

we probably can be more specific for the error codes but I dont think that the statement alibi.exceptions.AlibiPredictorCallException: Predictor failed to be called on <class 'numpy.ndarray'> of shape (1, 28, 28, 1) and dtype float32. Check that the parameter image_shape is correctly specified. is valid generally. I need to think more about this from my side, basically how we can propagate exceptions interprocess?

ascillitoe · 2021-11-08T10:32:21Z

@jklaise these look nice to me. I'm assuming we'd add more exceptions as we go? AlibiFitCallException etc?

jklaise · 2021-11-08T11:52:30Z

@jklaise these look nice to me. I'm assuming we'd add more exceptions as we go? AlibiFitCallException etc?

Yes, we can also create a more granular hierarchy if needed in the future so callers can catch more general alibi errors.

jklaise · 2021-11-15T09:44:26Z

I will go ahead with merging this unless any further comments? @ascillitoe @sakoush

ascillitoe

Looks good!

jklaise added 3 commits November 3, 2021 15:03

Add custom exceptions for calling and validating user models

31f6729

Extend docstrings

ace669f

Add missing new exceptions.py module

297a08f

jklaise requested review from sakoush and ascillitoe November 3, 2021 15:17

Add links to exception docs

6f60454

sakoush reviewed Nov 4, 2021

View reviewed changes

ascillitoe approved these changes Nov 15, 2021

View reviewed changes

jklaise merged commit 00f7391 into SeldonIO:master Nov 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add custom exceptions for calling and validating user models #520

Add custom exceptions for calling and validating user models #520

jklaise commented Nov 3, 2021 •

edited

codecov bot commented Nov 3, 2021 •

edited

jklaise commented Nov 3, 2021

sakoush left a comment

jklaise commented Nov 4, 2021

sakoush Nov 4, 2021 •

edited

jklaise Nov 4, 2021

jklaise Nov 4, 2021 •

edited

sakoush Nov 5, 2021

ascillitoe Nov 8, 2021

jklaise Nov 8, 2021

jklaise Nov 8, 2021

sakoush Nov 8, 2021

ascillitoe commented Nov 8, 2021

jklaise commented Nov 8, 2021

jklaise commented Nov 15, 2021

ascillitoe left a comment

Add custom exceptions for calling and validating user models #520

Add custom exceptions for calling and validating user models #520

Conversation

jklaise commented Nov 3, 2021 • edited

codecov bot commented Nov 3, 2021 • edited

Codecov Report

jklaise commented Nov 3, 2021

sakoush left a comment

Choose a reason for hiding this comment

jklaise commented Nov 4, 2021

sakoush Nov 4, 2021 • edited

Choose a reason for hiding this comment

jklaise Nov 4, 2021

Choose a reason for hiding this comment

jklaise Nov 4, 2021 • edited

Choose a reason for hiding this comment

sakoush Nov 5, 2021

Choose a reason for hiding this comment

ascillitoe Nov 8, 2021

Choose a reason for hiding this comment

jklaise Nov 8, 2021

Choose a reason for hiding this comment

jklaise Nov 8, 2021

Choose a reason for hiding this comment

sakoush Nov 8, 2021

Choose a reason for hiding this comment

ascillitoe commented Nov 8, 2021

jklaise commented Nov 8, 2021

jklaise commented Nov 15, 2021

ascillitoe left a comment

Choose a reason for hiding this comment

jklaise commented Nov 3, 2021 •

edited

codecov bot commented Nov 3, 2021 •

edited

sakoush Nov 4, 2021 •

edited

jklaise Nov 4, 2021 •

edited