Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data Tests] Add ability to test your data #192

Open
aslotte opened this issue Jul 10, 2020 · 4 comments
Open

[Data Tests] Add ability to test your data #192

aslotte opened this issue Jul 10, 2020 · 4 comments

Comments

@aslotte
Copy link
Owner

aslotte commented Jul 10, 2020

Is your feature request related to a problem? Please describe.
Part of the model flow is to create and add data validation tests. These may check the sanity of your data, e.g. that certain columns have a specific cardinality (e.g. only 5 different kinds of values), or that a numeric data column has a specific range.

Creating these tests are pretty repetitive.

Describe the solution you'd like
Add a Fluent API to validate data structure.

I'm envisioning a syntax such as the following in a new project, e.g. MLOps.NET.Data.Tests

[TestMethod]
public void VerifyCardinalityOfColumn()
{
    var mlOpsTestingContext = new MLOpsTestingContext();

    mlOpsTestingContext.WithData(pathToData) 
        .HasColumn(index, x => x.WithCardinality(3))
        .Assert()
}

[TestMethod]
public void VerifyRangeOfColumn()
{
    var mlOpsTestingContext = new MLOpsTestingContext();

    mlOpsTestingContext.WithData(pathToData) 
        .HasColumn(index, x => x.WithRange(min: 0, max: 10000)
        .Assert()
}

[TestMethod]
public void VerifySchema()
{
    var mlOpsTestingContext = new MLOpsTestingContext();

    mlOpsTestingContext.WithData(pathToData) 
        .HasNumberOfColumns(10)
        .HasMinimumNumberOfRows(5000)
        .Assert()
}

[TestMethod]
public void VerifyColumOnlyContainsApprovedValues()
{
    var mlOpsTestingContext = new MLOpsTestingContext();

    mlOpsTestingContext.WithData(pathToData) 
        .HasColumn(index, x => x.WithValues(listOfApprovedValues)
        .Assert()
}
@aslotte
Copy link
Owner Author

aslotte commented Jul 10, 2020

@lqdev working on the workshop material, I thought that something like this would be super useful to create data validation tests as part of the pipeline.

@aslotte
Copy link
Owner Author

aslotte commented Jul 10, 2020

We can probably implement some of these checks using the new DataFrame API (at least it would be fun to try)

@AnoojNair
Copy link
Collaborator

@aslotte , Nice,That would be a cool feature to have w.r.t a ML workflow. Would it be part of MLopsContext or a separate context?

@aslotte
Copy link
Owner Author

aslotte commented Jul 11, 2020

It would be a separate context I think, as it wouldn't need the same dependencies we currently use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants