New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Config driven detectors - part 3 #469
Config driven detectors - part 3 #469
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
The model The model
NoteOne side effect of the current |
alibi-detect/alibi_detect/utils/fetching.py Lines 82 to 86 in 6b5b2c7
but in general, it should be capable of passing custom objects/functions (e.g. for custom layers) to TensorFlow models. This is introduces another challenge; how do we save (and load) arbitrary custom objects stored inside NoteThis still needs a little more thought. It is also going to be a point of difference between TensorFlow and PyTorch models. |
doc/source/overview/config_files.md
Outdated
%## Advanced usage | ||
|
||
### Validating config files | ||
(validation)= | ||
## Validating config files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means the whole section is commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just removed the "Advanced usage" header, and promoted "Validating config files", since we didn't have any other level 3 headings under "Advanced usage". Can change in future if we add more "Advanced topics".
|
This has been fixed by changing to |
I've just run this on |
I've added something to delete all new files (not ones already in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but can you remind me if changes to imports using alibi.saving/loading
as opposed to alibi.utils.saving/loading
were already made in the example notebooks?
Thanks, I had missed those. I've now updated all mentions and uses to |
Main config driven save/load functionality, testing, and docs.
This is the third part of a series of PR's for the config-driven detector functionality. The original PR (#389) has been split into a number of smaller PR's to aid the review process.
Summary of PR
This PR implements the main save and load functionality, and related docs and testing. For more details on the overall config-based save/load strategy, refer to the original PR #389.
Details
The mainstay of this PR is contained in
utils/saving.py
, and the newly createdutils/loading.py
.Additional details:
save_detector
andload_detector
, both of which have been reworked to write/readconfig.toml
files (in the case of drift detectors). Other detectors are still saved to the legacy.dill
format, and support is retained for all detectors to read these legacy files. (to avoid having to regen all remote artifacts immediately).utils/saving.py
toutils/loading.py
, since the loading submodule is now larger and is expected to become larger still in the future. A deprecation warning is raised (bit it still works) whenfrom alibi_detect.utils.saving import load_detector
is called.saving.py
andloading.py
have been factored out intotensorflow/_saving.py
etc, in preperation for the soon-to-be-built PyTorch/sklearn save/load functionality. This also means the file-widetype: ignore
's can be removed.tensorflow/_saving.py
andtensorflow/_loading.py
, since in reality this was all tensorflow-specific.Fine details will be given in code comments below.
Ounstanding decisions
Currently the top-level
backend
config field is used to declare thebackend
for backend-specific detectors. But it is also used to set the expected backend for all preprocessing models and kernels etc. Is this OK (at least for now), or do we want specific flags in themodel
(orpreprocess_fn
) configs? Part of this decision might depend on whether we ever envisage one library being used for the backend whilst another is used for preprocessing. I can't see this being sensible for PyTorch/Tensorflow, but perhaps for sklearn? - To be addressed in a subsequent PR.At the moment the following functions are public:
save_detector
/load_detector
- to save/load detectorwrite_config
/read_config
- to write/readconfig.toml
from/into config dictvalidate_config
- to validate config dictAll other functions in
saving/loading.py
are private, and thetensorflow/_saving.py
etc is also private. A number of functions exist to save/load artefact configs, e.g._load_model_config
. Making these public could be useful for some users, e.g. so themodel
section of aconfig.toml
could be loaded in isolation for debugging, or written runtime model to assist withconfig.toml
generation. However making these public will hinder future code changes, so I'm inclined to leave them private unitl the config functionality is more stable?The current options for model type (e.g.
'UAE', 'HiddenOutput', 'custom'
) have been carried over from the legacy save/load code. We could do with rethinking what is required here.Find an example where
custom_objects
is needed, and investigate how these objects are defined. It might be easier to remove support for this for now. - Removed for now, will be added back in a later PR.In
resolve_cfg
, we only attempt to resolve fields in theconfig.toml
which are listed inFIELDS_TO_RESOLVE
. This has the advantage of avoiding infinite recursion, and also allows us to easily tell downstream deps (e.g. MLServer etc) what fields could potentially point to artefacts that will need resolving (such a list has been requested before). However, it is messy, and complicates the resolution of more generic containers such ascustom_objects
. If we assume that only a validated config will be passed toresolve_cfg
(so we are certain of its structure), I'm wondering if we should go back to more generic recursion here? - Left for now. Can change later if there is a good reason.Post PR TODO's (to be consolidated into issues)
custom_obj
back in. See above bullet, and Config driven detectors - part 3 #469 (comment).backend
for preprocessing and models separately? See above, and Config driven detectors - part 3 #469 (comment).registry
submodule. See Config driven detectors - part 3 #469 (comment).enum
forbackend
etc. See Config driven detectors - part 3 #469 (comment).alibi_detect.utils.schemas
api page, either with custom templates or using autodoc-pydantic.undoc-members
. This will require docstrings to be added to public objects that are currently missing docstrings.isort
on entire code base, and consider adding a check to CI.