Objective

Does your dataset have issues? How do you find out, and how do you fix those issues?

I originally pitched this project as:

One dataset with different bad samples (eg too much of one class, missing values, gender bias), each as its own "discover the data problem" exercise

I wanted to include others' previous work on parsing CSV / data sources in general, to offer as many examples as possible

Work in progress

In the future ideally there would be a data browser, where you can programmatically review the dataset and determine its problems

Open source, MIT License