The Magical CSV Merge Machine

What does it do ?

A python3 library to link a dirty CSV file with a clean reference table. It is meant to as generic as possible and includes a labeller to learn optimal parameters for each matching scenario.

How to install ?

Manual install

Non-Python Requirements

This library relies on Elasticsearch. We used version 5.6.7 for developpment.. We recommend Elasticsearch 5.X. Instructions here.

PIP3 install

pip3 install merge-machine

From source (recommended, for the meantime...):

git clone https://github.com/entrepreneur-interet-general/Merge-Machine.git
cd Merge-Machine
pip3 install -e .

How to use ?

How it works ?

The reference is indexed in Elasticsearch with multiple analyzers (languages specific, integers, n_grams...). The labeller then proposes training samples from the source which it tries to match to rows of the reference file. Upon user confirmation (match / not match) it updates its belief on which Elasticsearch queries are most performant to use for matching. When labelling is over, the "best query" (a weighted combination of multiple ES queries with different analyzers on different fields) is used for each row of the source to try to find a match in the ES-indexed referential.

How to contribute ?

Feel free to report bugs via issues and make pull requests...

Credits

This library was developped by Léo Bouloc during 10 months in 2017 at the French Ministry of Research and Higher Education in the context of the "Entrepreneur d'Intérêt Général" program funded by the French Government.

Name		Name	Last commit message	Last commit date
Latest commit History 191 Commits
documentation		documentation
examples		examples
merge_machine		merge_machine
.gitignore		.gitignore
HOW_TO.md		HOW_TO.md
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

License

entrepreneur-interet-general/Merge-Machine

Folders and files

Latest commit

History

Repository files navigation

The Magical CSV Merge Machine

What does it do ?

How to install ?

Manual install

Non-Python Requirements

PIP3 install

How to use ?

General use example (install the package first...)

Resource creation example (to use advanced analyzers)

Guidelines

How it works ?

How to contribute ?

Credits

See also

About

Topics

Resources

License

Stars

Watchers

Forks

Languages