You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Some industries have strict regulations about keeping sensitive client data, how to store it, who can access it etc. Some checks keep small examples of data as part of their output (result and display), e.g. TrainTestSamplesMix. This can conflict with the regulations as model metrics are stored and can be accessed in systems that do not comply with all restrictions.
Describe the solution you'd like
Provide a boolean flag "log_example_data" for each relevant check, that can be changed both on check and suite level.
The text was updated successfully, but these errors were encountered:
Turns out most checks already have a parameter 'n_to_show' that allows to suppress the addition of data in the display. I made PR #2337 to allow this also in the TrainTestSamplesMix check (the only one I could find that did not have this).
However I'm still unsure about how to address this in general and make sure for instance the result (not just display) of TrainTestSamplesMix does not contain data (currently it does). Continuing as for the displays would introduce a new parameter in the context and a lot of if-else logic in every check, might be hard to maintain.
Hi @MichaelMarien, admittedly I don't have a great solution for this, and I agree that adding a new parameter will add significant complexity / technical debt. I think that perhaps for now the best solution for such a case is for the user to anonymize their data prior to running the checks.
A solution we can consider - adding some function in deepchecks that can be used to anonymize two datasets automatically (while making sure they are anonymized in a way that enables comparison between the datasets). If you're interested in implementing something like this I'd be glad to discuss.
Is your feature request related to a problem? Please describe.
Some industries have strict regulations about keeping sensitive client data, how to store it, who can access it etc. Some checks keep small examples of data as part of their output (result and display), e.g. TrainTestSamplesMix. This can conflict with the regulations as model metrics are stored and can be accessed in systems that do not comply with all restrictions.
Describe the solution you'd like
Provide a boolean flag "log_example_data" for each relevant check, that can be changed both on check and suite level.
The text was updated successfully, but these errors were encountered: