Skip to content

RichardKlem/UPA

Repository files navigation

UPA project

This project is dealing with COVID-19 data storage, processing and analysis.

Authors:

Prerequisites

  • Fully set up MongoDB - could be on-premise or cloud solution. Use
    corresponding connection methods in file src/loader.py.
  • Python 3.8 with pip module.
  • It is recommended (not mandatory) to use virtual environment.

Install requirements

pip install -r requirements.txt

MongoDB secrets

There is available a distribution (.dist) version of MongoDB secrets called
mongo_secrets.py.dist. Copy this file as mongo_secrets.py and fill with
your own data (preferred). Alternatively you can change the source code of the
load_data function in loader.py.

Run the code

Use python3 main.py or make to run the downloading and inserting into DB.

To clean up the project, type make clean.

If you want to run the whole process from the beginning again, you must
delete all data files from your data folder and delete all collections from
the database.

Behaviour of the script

It is expected that this code is run to set up the whole database from scratch.
That means that if there is already CSV file with the specified name, the new one
is not downloaded, same for the JSON files and data processing. If there is
a collection in DB which has the specified name, no new data will be inserted.
In manor of rewrite the collection, you must delete the collection and run the
script again.

About source code

Constants data_files.py

This file is used as dynamic approach to the constants like data file names
their sets of columns which are required for our analysis and their base URLs.

Credential file mongo_secrets.py

You must create this file as copy of mongo_secrets.py.dist. It is used for
storing database connection secrets,

Class DataHandler

This class can download data files from specified web-pages and store them
in the data folder (default is <project_root>/data folder).

Function load_data

This function connects to the specified MongoDB database and insert data in a
collection named by data file name.

Something more you can read in our documentation located at this

About

UPA project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors