Skip to content
This repository has been archived by the owner on Sep 3, 2021. It is now read-only.
/ kaligator Public archive

helps using KALI dumps (the French Conventions collectives nationales database).

License

Notifications You must be signed in to change notification settings

SocialGouv/kaligator

Repository files navigation

KALIGATOR Build Status

Imports the official KALI dumps to MongoDB.

logo kaligator

KALI is the French Conventions Collectives Nationales database. It is created by the Direction de l’information légale et administrative (DILA). The official way to access them is via Legifrance

This project was created for the Code Du Travail Numérique project.

The app is divided into 3 separate docker containers:

  • api: a lightweight read-only REST API that exposes the MongoDB collections (with HATEOAS).
  • mongodb: a MongoDB server that stores the parsed documents
  • extractor: Python scripts to download, extract, parse the XMLs and insert them into MongoDB.

Additionally, the scripts let you:

  • flatten all nested XML documents to a common directory
  • convert all the articles' XML files to individual JSON files

Coming soon

We're working on publishing a public version of this MongoDB database and of the JSON REST API.

Schema documentation

You can find some work-in-progress docs on a Google Spreadsheet here.

docs screenshot

The infos in this documentation come from:

  • the DTD files available on Data Gouv
  • exploring the MongoDB database that these scripts help create. The M3T tool is very helpful for this.

I also built a partial Entity Relationship Diagram, to see the relations between the different objects of this dump:

entity relationship diagram

Local setup

This should start the MongoDB server and the Python's Eve API.

docker-compose up

Then you have to run a script from the kali_extractor container to fetch, parse and store into MongoDB the KALI dumps.

docker-compose run extractor python kali_extractor/parser.py --download

Run tests

docker-compose run extractor python run_tests.py

Usage

The REST API is accessible locally on http://localhost:5000. Use curl or httpie to access it, by default it will return XML to browsers which is not very convenient.

The MongoDB database is exposed on your host machine on port 27019 with the kali:kali credentials on the admin db.

You can access it using any MongoDB client (I recommend Studio 3T) to explore.

Production

Start MongoDB and API as daemons:

sudo docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d

Initial import

sudo docker-compose -f docker-compose.yml -f docker-compose.prod.yml run extractor python kali_extractor/parser.py --download --drop

(cf https://docs.docker.com/compose/extends/)

External Resources