Skip to content

NorberMV/data-final

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Entregable-Final

This project aims to create a lightweight and functional ETL system. It leverages Airflow's DAG that is scheduled to run daily at 3:00 AM UTC ("0 3 * * *") in order to automate the retrieval of Bitcoin data from the CoinGecko API, processing it into a Pandas DataFrame, and subsequently inserting it into a Redshift database. Additionally, the system compares the retrieved Bitcoin prices against predefined thresholds, categorizing them as 'Low', 'Medium', or 'High', and notifies designated recipients via email accordingly. This is also being Dockerized.

How can I configure entregable-final?

Just make sure you have the following installed and set up on your machine:

  • You need to have Docker Desktop and Docker Compose installed.
  • Create a ./logs folder at the root of the repository.
  • This assumes you have an .env file(see the .env.example file for reference) within the dag_callables/ folder containing the following Redshift DB data e.g:
# Redshift connection config
DB_NAME=
HOST=
PORT=
USERNAME=
PASSW=

# This would be the destinatary email address for the 
# notification and will be used by the EmailOperator task.
EMAIL_TO=
  • Also optionally you could consider create a config/ folder at the root of the repository, to store your airflow.cfg file. You will end up with a folder structure similar to the following:
.
├── Dockerfile
├── Makefile
├── README.md
├── airflow_example.cfg
├── config
│   └── airflow.cfg
├── dag_callables
│   ├── __init__.py
│   ├── __pycache__
│   ├── api_data.py
│   ├── callables.py
│   ├── utils.py
│   └── .env
├── dags
│   └── redshift_dag.py
├── resources
│   ├── dag_graph.png
│   └── notification.png
├── docker-compose.yaml
├── logs
├── plugins
├── requirements.txt
└── sql
    ├── create_db.sql
    ├── populate_db.sql
    └── table_exists.sql
  • Finally you will need to set the following [smtp] settings on your airflow.cfg file with your smtp creds in order to enable the email notifications.
[smtp]

smtp_host = 

smtp_starttls = True

smtp_ssl = False

smtp_user =

smtp_password 

smtp_port = 25

smtp_mail_from =

smtp_timeout = 30

smtp_retry_limit = 5

How can I run this?

  • Launch Docker desktop to have the docker daemon running.
  • Now you can run the command below to run the build:
make run
  • Once the build is done, please go to localhost:8080 in whatever browser you want.
  • Input airflow as the user and airflowfor the password .
  • Once inside, before activate the DAG, please add a new Redshift connection as follows:
    • to do so go to Admin/Connections on the Airflow ui.
    • Fill the connection fields provided in the .env shared file, please use "redshift_coder" as the connection Id.
  • Finally activate the DAG, wait for it to turn dark green so that means the pipeline has run.

dag_graph

  • If everything went smoothly, you should receive a notification to the EMAIL_TO environemnt variable.

dag_graph

To conclude, if you have completed your tasks, simply execute the following command:

make stop

About

Coderhouse's Data Engineering Final Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages