5IF INSA - Foundations of data engineering project: Correlation of deaths with power plants (nuclear and thermic) in metropolitan France.
Student name | GitHub profile |
---|---|
Matthieu ROUX | M4TTRX |
Ewen CHAILLAN | EwenChaillann |
The goal of the project is to create a data pipeline that ingests data of deaths in france along with place of death and locations of active nuclear and thermal power plants in france and see if there is any correlation between the two. We don't have any pre existing work that proves that there is or isn't a correlation but we thought it would be fun to see.
Project report available here
Presentation PDF available here
- repository with the code, well documented
- docker-compose file to run the environment
- detailed description of the various steps
- report (Can be in the Repository README) with the project design steps (divided per area)
- Example dataset: the project testing should work offline, i.e., you need to have some sample data points.
- slides for the project presentation. You can do them too in markdown too.
All the data we will be extracting comes from the french government's data warehouse: data.gouv.fr:
- Ensure you have Docker installed and running. install all required python packages with
pip install -r requirements.txt
- Get your computer's user id by typing in
id -u
in your bash terminal. As a window user you will have to run this command in your WSL terminal as it will not work on CMD or powershell. - Create a
.env
file, based on the.template.env
file. Mae sure to update theAIRFLOW_UID
in the .env file to the ID you obtained. - Build and run the environment using the
docker-compose up
command (run it in the directory of the project). This step will take a while as a lot of images will be downloaded on your computer. - If they do not exist yet. Create a folder in your
/dags
folder called ingestion. Inside this folder create two sub folders: staging and ingestion. - You can now connect to localhost:8080 to access the airflow dashboard, user and password are
airflow
. - Now you must add a connection to the postgres SQL database. Navigate To the Admin -> Connections menu, then click the blue + button to add a new connection.
- Fill in the form like in the image
Service | Address:Port | Image |
---|---|---|
postgres | http://localhost:5432/ | postgres:13 |
airflow | http://localhost:8080/ | |
jupyter | http://localhost:8888/ | |
redis | http://localhost:6379/ | redis:latest |