Python script to parse address with concatenated street names and numbers.
Provide (1) built Docker image on Docker Hub and (2) a Dockerfile that can build a Docker image from source code.
Input: string of address
Output: string of street and string of street-number as JSON object
-
"Winterallee 3"
->{"street": "Winterallee", "housenumber": "3"}
-
"Musterstrasse 45"
->{"street": "Musterstrasse", "housenumber": "45"}
-
"Blaufeldweg 123B"
->{"street": "Blaufeldweg", "housenumber": "123B"}
-
"Am Bächle 23"
->{"street": "Am Bächle", "housenumber": "23"}
-
"Auf der Vogelwiese 23 b"
->{"street": "Auf der Vogelwiese", "housenumber": "23 b"}
-
"4, rue de la revolution"
->{"street": "rue de la revolution", "housenumber": "4"}
-
"200 Broadway Av"
->{"street": "Broadway Av", "housenumber": "200"}
-
"Calle Aduana, 29"
->{"street": "Calle Aduana", "housenumber": "29"}
-
"Calle 39 No 1540"
->{"street": "Calle 39", "housenumber": "No 1540"}
- Python 3
- libpostal : A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
- pypostal : Python bindings to libpostal for fast international address parsing/normalization
You can use the Docker container as a little virtual machine and only focus on the functionality and correctness of address parsing. To eliminate the environment setting and dependencies as much as possible. Here provide different ways that you can choose either pull the image from Docker Hub or build the image from source code. Then run the script in a Docker container.
1. Pull and run the built image from Docker Hub
-
Pull the image from Docker Hub
docker pull wu8149/address_parser
-
Run the image in interactive mode
docker run -ti wu8149/address_parser
-
Build the addressParser Docker image. This process will take a couple of minutes.
docker build -t address_parser .
-
Run the image in interactive mode
docker run -ti address_parser
-
Install libpostal, a C library for parsing. For more detailed installing instruction please refer to the original repository.
-
On Ubuntu/Debian
sudo apt-get install curl autoconf automake libtool python-dev pkg-config
-
Installing libpostal
git clone https://github.com/openvenues/libpostal cd libpostal ./bootstrap.sh ./configure --datadir=[...some dir with a few GB of space...] make sudo make install # On Linux it's probably a good idea to run sudo ldconfig
-
-
Install pypostal, Python bindings to libpostal
pip install postal
-
Install dependencies
pip install -r requirements.txt
-
Run Python script
python3 addressParser.py
Run the testing script. The testing script imports the testing dataset fromtestCase.csv
.
python3 test_addressParser.py