Skip to content
fuzzy-string-match-pyspark
Python Shell
Branch: master
Clone or download
Latest commit 77f6aed Jan 14, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore Initial commit Jan 14, 2019
README.md Update README Jan 15, 2019
conftest.py
requirements.in Initial commit Jan 14, 2019
requirements.txt Initial commit Jan 14, 2019
run.sh Initial commit Jan 14, 2019
test_name_match.py Remove comments Jan 14, 2019

README.md

Efficiently fuzzy match strings with machine learning in PySpark

To run the example, you'll need virtualenv installed

The code is implemented as a unit test that reads in 2 lists of 10 names each as a dataframe, runs the pipeline and prints out the resulting dataframe. It can be extended as needed.

Clone the repository

git clone https://github.com/changamire/fuzzy-string-match-pyspark.git

Run the following command to setup the virtual environment and run the test

./run.sh setup

After the setup has been run once, the test can subsequently be run without the setup flag.

More details available here

You can’t perform that action at this time.