This program will use NLP and ML technique to match similar company names. Matching form common words like "LTD" and "COMPANY" will be discounted autometically in the algorithm.
- pandas
- fuzzywuzzy (https://github.com/seatgeek/fuzzywuzzy)
The data we used is found on http://download.companieshouse.gov.uk/en_output.html it is an openly licensed publicly avalible dataset that contains a list of registered (limited liability) companies in Great Britain
Slides (not finalized): http://slides.com/cheukting_ho/fuzzy-matching