Imports the official KALI dumps to MongoDB.
KALI is the French Conventions Collectives Nationales database. It is created by the Direction de l’information légale et administrative (DILA). The official way to access them is via Legifrance
This project was created for the Code Du Travail Numérique project.
The app is divided into 3 separate docker containers:
- api: a lightweight read-only REST API that exposes the MongoDB collections (with HATEOAS).
- mongodb: a MongoDB server that stores the parsed documents
- extractor: Python scripts to download, extract, parse the XMLs and insert them into MongoDB.
Additionally, the scripts let you:
- flatten all nested XML documents to a common directory
- convert all the articles' XML files to individual JSON files
We're working on publishing a public version of this MongoDB database and of the JSON REST API.
You can find some work-in-progress docs on a Google Spreadsheet here.
The infos in this documentation come from:
- the DTD files available on Data Gouv
- exploring the MongoDB database that these scripts help create. The M3T tool is very helpful for this.
I also built a partial Entity Relationship Diagram, to see the relations between the different objects of this dump:
This should start the MongoDB server and the Python's Eve API.
docker-compose up
Then you have to run a script from the kali_extractor
container to fetch, parse and store into MongoDB the KALI dumps.
docker-compose run extractor python kali_extractor/parser.py --download
docker-compose run extractor python run_tests.py
The REST API is accessible locally on http://localhost:5000. Use curl
or httpie
to access it, by default it will return XML to browsers which is not very convenient.
The MongoDB database is exposed on your host machine on port 27019 with the kali:kali
credentials on the admin
db.
You can access it using any MongoDB client (I recommend Studio 3T) to explore.
Start MongoDB and API as daemons:
sudo docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d
Initial import
sudo docker-compose -f docker-compose.yml -f docker-compose.prod.yml run extractor python kali_extractor/parser.py --download --drop
(cf https://docs.docker.com/compose/extends/)
- original KALI dumps on Data Gouv
- sample "convention collective" on Legifrance
- Code Du Travail Numérique project.