Explore the datasets interactively: avantages.csv and conventions.csv
This is a first try at extracting the data from transparence.sante.gouv.fr
The resulting files are in data:
- avantages.csv for the direct donations from the companies
- conventions.csv for the "conventions"
- Install the dependencies:
pip install -r requirements.txt
- Launch
python sante.py
for a single-threaded scraping - Launch
python machinegun.py <number of threads>
to have multiple scraping processes in parrallel
Scraping is segmented via postal code and I'm missing some postal codes, feel free to do it in another way.
TODO Better cleaning of the dataset, code cleaning and better doc