You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
as we have checks pre-configured in the package, a user may want a specific check that is yet to be created,
or a specific check his personal needs, that might cause the users to pass on the ability to use the package as it won't fully answer their needs
Considerations
Creating a check at runtime should be as easy as possible, while still answering the needs of the user & the package (output wise, interface wise)
Checks that are created on the fly should be defined if they are data checks or model checks as that changes some of the behaviour
Creating a check by the user needs to be fully documented with examples as it may become a key factor in the users decision to adopt the package
Proposal Concept
Allow the users to define, whether ipynb or py to create a check at runtime (or 'On the Fly')
That way even if the package does not answer the users whole needs, it might still answer enough while allowing the users to fully customize its specific.
Thought on implementations
at most, all checks have the following:
a "check" which actually validates against the data/model
a "validation" which makes sure that the input data is correct
an "output" which depends on future implementations and changes, needs to correlate with the global package usage
as such, I feel the main option for that is create a class (lets call it CustomCheck for this thought). CustomCheck should have a function that returns a check, which can later be injected into suites, or run independently
it may look a bit like this:
our CustomCheck may look like this
...
##imports#CustomDatasetBaseCheck should have frommlchecks.base.checkimportCheckResult, CustomDatasetBaseCheck
...
classCustomCheck(CustomUserCheck):
# variable to hold the "Requires" field# variable to hold the "checkFunction" field# variable to know if we need to parse the output or not# basic usage functions defnew( *, checkFunction, requires):
# Returns an initialized CustomCheckClass based on the function requirements # the "requires" param should change the class behaviour in terms of validation and output, as they differ by the input requirements# A param to know if we should parse the output of the check or the user wants to do it (if its a plot or something complex)defrun(self, dataset=None, additional_dataset=None, model=None) ->CheckResult:
#Based on the required param, call the appropriate call in mlchecks.base.check to validateoutput=checkFunction(dataset=dataset, additional_dataset=None, model=None)
#If needed#Based on the required param, call the appropriate call in mlchecks.base.check to "output" the data
The vision of implementation by the user
as a basic check, lets assume a user wants to check the row count
...
###importsfrommlchecks.checksimportCustomCheck
...
defcheckRowCount (dataset: Union[pd.DataFrame, Dataset]):
returnlen(dataset.data)
rowCountCheck=CustomCheck.new(checkFunction=checkRowCount,
requires=CustomCheck.SINGLE_DATASET, #TWO_DATASETS/DATASET_MODEL/MODEL,etc
) #in the future we may add "output_format=html/json/yaml/cli"data= {'col1': ['foo', 'bar', 'cat']}
dataframe=pd.DataFrame(data=data)
rowCountCheck.run(dataframe)
Changes that will be required
there should be a global validation functions regarding the input of the params (that should also be used by internal checks)
there should be a global output function regarding the input of the params (that should also be used by the internal checks)
the 2 points above will force changes to all the internal checks
EDIT:
We've thought about allowing to user to "choose" if we should parse the output of the check (if its something simple like dataset,etc) or if he wants to add this logic to the check itself (which will make it easier for him to copy paste its existing code almost 1:1)
Since this proposal we had some thoughts on the matter, trying to outline some of them:
The tricky part would be how does the user generate a display. Current example only shows how to return a value. We need him to be able to return a display object which is (as we do it now) either text, dataframe or callable (that prints matplotlib). doing only this - adding a custom check that prints something and nothing more - is the most basic part and should be easy.
Additional functions, such as having a return value and defining a condition on it could be more complicated, the logic being that if you're implementing a check & condition then it's for CI / CD ish purposes and you can and should invest more time doing it right.
Background
as we have checks pre-configured in the package, a user may want a specific check that is yet to be created,
or a specific check his personal needs, that might cause the users to pass on the ability to use the package as it won't fully answer their needs
Considerations
Proposal Concept
Allow the users to define, whether
ipynb
orpy
to create a check at runtime (or 'On the Fly')That way even if the package does not answer the users whole needs, it might still answer enough while allowing the users to fully customize its specific.
Thought on implementations
at most, all checks have the following:
as such, I feel the main option for that is create a class (lets call it
CustomCheck
for this thought).CustomCheck
should have a function that returns a check, which can later be injected into suites, or run independentlyit may look a bit like this:
our CustomCheck may look like this
The vision of implementation by the user
as a basic check, lets assume a user wants to check the row count
Changes that will be required
EDIT:
We've thought about allowing to user to "choose" if we should parse the output of the check (if its something simple like dataset,etc) or if he wants to add this logic to the check itself (which will make it easier for him to copy paste its existing code almost 1:1)
DEE-206
The text was updated successfully, but these errors were encountered: