Skip to content
Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning
Python HTML JavaScript CSS Makefile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
datasets add examples and code Mar 29, 2019
docs update files Apr 2, 2019
examples
html update files Apr 2, 2019
python-package
save add examples and code Mar 29, 2019
.coveragerc
.gitignore update files Apr 2, 2019
.travis.yml update files Apr 2, 2019
AUTHORS.rst add files via upload Mar 29, 2019
LICENSE Initial commit Mar 29, 2019
Makefile update files Apr 2, 2019
readme.rst
tox.ini

readme.rst

./docs/images/learn2clean-text.png


Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Data Cleaning

Documentation Status PyPI version Build Status GitHub Issues codecov License


Learn2Clean is a Python library for data preprocessing and cleaning based on Q-Learning, a model-free reinforcement learning technique. It selects, for a given dataset, a ML model, and a quality performance metric, the optimal sequence of tasks for preperaring the data such that the quality of the ML model result is maximized.

You can try it for composing your own data preprocessing pipelines or for automizing data preparation before clustering, regression, and classification.

./docs/images/figure_Learn2Clean.jpeg

For more details, please refer to the paper presented at the Web Conf 2019 and the tutorial.

  • Laure Berti-Equille. Learn2Clean: Optimizing the Sequence of Tasks for Web Data Preparation. Proceedings of the Web Conf 2019, San Francisco, May 2019. Preprint
  • Laure Berti-Equille. ML to Data Management: A Round Trip. Tutorial Part I, ICDE 2018. Tutorial

How to Contribute

Learn2Clean is a research prototype. Your help is very valuable to make it better for everyone.

  • Check out call for contributions to see what can be improved, or open an issue if you want something.
  • Contribute to the tests to make it more reliable.
  • Contribute to the documents to make it clearer for everyone.
  • Contribute to the examples to share your experience with other users.
  • Open issue if you met problems during development.

For more details, please refer to CONTRIBUTING.

You can’t perform that action at this time.