Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEE-206] [feat] Allow Users to Create Checks On the Fly #120

Open
DanArlowski opened this issue Nov 10, 2021 · 1 comment
Open

[DEE-206] [feat] Allow Users to Create Checks On the Fly #120

DanArlowski opened this issue Nov 10, 2021 · 1 comment
Labels
feature Feature update or code change to the package

Comments

@DanArlowski
Copy link
Contributor

DanArlowski commented Nov 10, 2021

Background

as we have checks pre-configured in the package, a user may want a specific check that is yet to be created,
or a specific check his personal needs, that might cause the users to pass on the ability to use the package as it won't fully answer their needs

Considerations

  • Creating a check at runtime should be as easy as possible, while still answering the needs of the user & the package (output wise, interface wise)
  • Checks that are created on the fly should be defined if they are data checks or model checks as that changes some of the behaviour
  • Creating a check by the user needs to be fully documented with examples as it may become a key factor in the users decision to adopt the package

Proposal Concept

Allow the users to define, whether ipynb or py to create a check at runtime (or 'On the Fly')
That way even if the package does not answer the users whole needs, it might still answer enough while allowing the users to fully customize its specific.

Thought on implementations

at most, all checks have the following:

  • a "check" which actually validates against the data/model
  • a "validation" which makes sure that the input data is correct
  • an "output" which depends on future implementations and changes, needs to correlate with the global package usage

as such, I feel the main option for that is create a class (lets call it CustomCheck for this thought).
CustomCheck should have a function that returns a check, which can later be injected into suites, or run independently
it may look a bit like this:

our CustomCheck may look like this

...
##imports

#CustomDatasetBaseCheck should have 
from mlchecks.base.check import CheckResult, CustomDatasetBaseCheck 
...


class CustomCheck(CustomUserCheck):
# variable to hold the "Requires" field
# variable to hold the "checkFunction" field
# variable to know if we need to parse the output or not
# basic usage functions 

  def new( *, checkFunction, requires):
    # Returns an initialized CustomCheckClass based on the function requirements 
    # the "requires" param should change the class behaviour in terms of validation and output, as they differ by the input requirements
    # A param to know if we should parse the output of the check or the user wants to do it (if its a plot or something complex)

  def run(self, dataset=None, additional_dataset=None, model=None) -> CheckResult:
    #Based on the required param, call the appropriate call in mlchecks.base.check to validate
    output=checkFunction(dataset=dataset, additional_dataset=None, model=None)
    #If needed
    #Based on the required param, call the appropriate call in mlchecks.base.check to "output" the data

The vision of implementation by the user

as a basic check, lets assume a user wants to check the row count

...
###imports
from mlchecks.checks import CustomCheck
...

def checkRowCount (dataset: Union[pd.DataFrame, Dataset]):
  return len(dataset.data)

rowCountCheck = CustomCheck.new(checkFunction=checkRowCount,
                               requires=CustomCheck.SINGLE_DATASET, #TWO_DATASETS/DATASET_MODEL/MODEL,etc
                               ) #in the future we may add "output_format=html/json/yaml/cli"
                               

data = {'col1': ['foo', 'bar', 'cat']}
dataframe = pd.DataFrame(data=data)

rowCountCheck.run(dataframe)

Changes that will be required

  • there should be a global validation functions regarding the input of the params (that should also be used by internal checks)
  • there should be a global output function regarding the input of the params (that should also be used by the internal checks)
  • the 2 points above will force changes to all the internal checks

EDIT:

We've thought about allowing to user to "choose" if we should parse the output of the check (if its something simple like dataset,etc) or if he wants to add this logic to the check itself (which will make it easier for him to copy paste its existing code almost 1:1)

DEE-206

@noamzbr
Copy link
Collaborator

noamzbr commented Nov 15, 2021

Since this proposal we had some thoughts on the matter, trying to outline some of them:

  1. The tricky part would be how does the user generate a display. Current example only shows how to return a value. We need him to be able to return a display object which is (as we do it now) either text, dataframe or callable (that prints matplotlib). doing only this - adding a custom check that prints something and nothing more - is the most basic part and should be easy.
  2. Additional functions, such as having a return value and defining a condition on it could be more complicated, the logic being that if you're implementing a check & condition then it's for CI / CD ish purposes and you can and should invest more time doing it right.

@ItayGabbay ItayGabbay added feature Feature update or code change to the package and removed kind/feature labels Jan 5, 2022
@ItayGabbay ItayGabbay added linear and removed linear labels Jan 11, 2023
@ItayGabbay ItayGabbay changed the title [feat] Allow Users to Create Checks On the Fly [DEE-206] [feat] Allow Users to Create Checks On the Fly Jan 11, 2023
@ItayGabbay ItayGabbay removed the linear label Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature update or code change to the package
Projects
None yet
Development

No branches or pull requests

4 participants