The Socface project aims to analyze French census documents and extract information on a large scale. The goal is to create a database using handwriting recognition to process handwritten nominal lists from the census.
One specific task is to predict the household head status, which involves determining whether an individual is the head of the household based on various characteristics extracted from the census data. This prediction task is crucial for grouping individuals into households and analyzing social changes over time.
To use this project, please follow these installation steps:
- Clone this repository to your local machine.
- Install dependencies by running
pip install -r requirements.txt
.
Data preprocessing is performed using the preprocessing.py
script. This script loads data from JSON and YAML files, prepares it for analysis, and converts it into a format suitable for model training.
We explored the use of RandomForest and pre-trained models like BERT for predicting the household head status. Details on the implementation and training of the models are available in the modelling.ipynb
notebook.
This project is licensed under the MIT license.