Skip to content

alannadevgen/french-address-matching

Repository files navigation

French address parser 🏠

Description

This project, in partnership with the Ministry of Ecological Transition and Territorial Cohesion, aims to standardise French addresses thanks to Machine Learning modelling.

Requirements 🐍

This project is deployed using Python 3.8.

Quick start

git clone https://github.com/alannadevgen/french-address-matching.git
cd french-address-matching
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
python3 main.py --help
# Usage: main.py [OPTIONS] BUCKET CSV_FILE ADDRESSES_COL CITIES_COL
#                POSTAL_CODE_COL CITY_CODE_COL

# Options:
#   --steps                   Task to perform: "create_sample", "hc", "hmm", "auto" (default "auto")
#   --size INTEGER            Sample size. If steps = create_sample (default 1000)
#   --correct_addresses TEXT  Column containing corrected addresses (default "adresse_corr")
#   --result_folder           Name of the folder where put the results (default "result")
#   --recompute_train         Boolean indicating if the model should be reestimated (default False)
#   --help                    Show this message and exit.
# standardize addresses with default arguments for the addresses of the file sample.csv
# important to give the name of the column :
# ADDRESSES_COL=adresse
# CITIES_COL=commune
# POSTAL_CODE_COL=cp_corr
# CITY_CODE_COL=CODGEO_2021     (INSEE code)
python3 main.py projet-pfe-adress-matching sample.csv adresse commune cp_corr CODGEO_2021

Create a sample of size 100 using the entire file of addresses.

python3 main.py projet-pfe-adress-matching DonneesCompletes.csv adresse commune cp_corr CODGEO_2021 --steps create_sample --size 100

Perform only hard-coded rules using the hc option.

python3 main.py projet-pfe-adress-matching sample.csv adresse commune cp_corr CODGEO_2021 --steps hc

Only perform HMM (Hidden Markov Model) thanks to the hmm option.

python3 main.py projet-pfe-adress-matching sample.csv adresse commune cp_corr CODGEO_2021 --steps hmm

auto: use hc (hard coded rules) and hmm after if the hard-coded result is considered as incorrect for a given address (option by default).

python3 main.py projet-pfe-adress-matching sample.csv adresse commune cp_corr CODGEO_2021 --steps auto

By defaut, --correct_addresses is set to adresse_corr.

python3 main.py projet-pfe-adress-matching DonneesCompletes.csv adresse commune cp_corr CODGEO_2021 --correct_addresses adresse_corr

Recompute the model:

python3 main.py projet-pfe-adress-matching DonneesCompletes.csv adresse commune cp_corr CODGEO_2021 --recompute_model True

Contributors 👩‍💻

About

Parse and match French addresses

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages