This project is dealing with COVID-19 data storage, processing and analysis.
Authors:
- Fully set up MongoDB - could be on-premise or cloud solution. Use
corresponding connection methods in filesrc/loader.py. - Python 3.8 with pip module.
- It is recommended (not mandatory) to use virtual environment.
pip install -r requirements.txt
There is available a distribution (.dist) version of MongoDB secrets called
mongo_secrets.py.dist. Copy this file as mongo_secrets.py and fill with
your own data (preferred). Alternatively you can change the source code of the
load_data function in loader.py.
Use python3 main.py or make to run the downloading and inserting into DB.
To clean up the project, type make clean.
If you want to run the whole process from the beginning again, you must
delete all data files from your data folder and delete all collections from
the database.
It is expected that this code is run to set up the whole database from scratch.
That means that if there is already CSV file with the specified name, the new one
is not downloaded, same for the JSON files and data processing. If there is
a collection in DB which has the specified name, no new data will be inserted.
In manor of rewrite the collection, you must delete the collection and run the
script again.
This file is used as dynamic approach to the constants like data file names
their sets of columns which are required for our analysis and their base URLs.
You must create this file as copy of mongo_secrets.py.dist. It is used for
storing database connection secrets,
This class can download data files from specified web-pages and store them
in the data folder (default is <project_root>/data folder).
This function connects to the specified MongoDB database and insert data in a
collection named by data file name.
Something more you can read in our documentation located at this