Skip to content

Latest commit

 

History

History
21 lines (12 loc) · 737 Bytes

README.md

File metadata and controls

21 lines (12 loc) · 737 Bytes

Objective

Does your dataset have issues? How do you find out, and how do you fix those issues?

I originally pitched this project as:

One dataset with different bad samples (eg too much of one class, missing values, gender bias), each as its own "discover the data problem" exercise

I wanted to include others' previous work on parsing CSV / data sources in general, to offer as many examples as possible

Work in progress

In the future ideally there would be a data browser, where you can programmatically review the dataset and determine its problems

License

Open source, MIT License