Skip to content
Python Shell
Branch: master
Clone or download
Latest commit 77f6aed Jan 14, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore Initial commit Jan 14, 2019 Update README Jan 15, 2019 Initial commit Jan 14, 2019
requirements.txt Initial commit Jan 14, 2019 Initial commit Jan 14, 2019 Remove comments Jan 14, 2019

Efficiently fuzzy match strings with machine learning in PySpark

To run the example, you'll need virtualenv installed

The code is implemented as a unit test that reads in 2 lists of 10 names each as a dataframe, runs the pipeline and prints out the resulting dataframe. It can be extended as needed.

Clone the repository

git clone

Run the following command to setup the virtual environment and run the test

./ setup

After the setup has been run once, the test can subsequently be run without the setup flag.

More details available here

You can’t perform that action at this time.