This project aims to create a lightweight and functional ETL system. It leverages Airflow's DAG that is scheduled to run daily at 3:00 AM UTC ("0 3 * * *") in order to automate the retrieval of Bitcoin data from the CoinGecko API, processing it into a Pandas DataFrame, and subsequently inserting it into a Redshift database. Additionally, the system compares the retrieved Bitcoin prices against predefined thresholds, categorizing them as 'Low', 'Medium', or 'High', and notifies designated recipients via email accordingly. This is also being Dockerized.
Just make sure you have the following installed and set up on your machine:
- You need to have Docker Desktop and Docker Compose installed.
- Create a ./logs folder at the root of the repository.
- This assumes you have an
.envfile(see the.env.examplefile for reference) within thedag_callables/folder containing the following Redshift DB data e.g:
# Redshift connection config
DB_NAME=
HOST=
PORT=
USERNAME=
PASSW=
# This would be the destinatary email address for the
# notification and will be used by the EmailOperator task.
EMAIL_TO=
- Also optionally you could consider create a
config/folder at the root of the repository, to store yourairflow.cfgfile. You will end up with a folder structure similar to the following:
.
├── Dockerfile
├── Makefile
├── README.md
├── airflow_example.cfg
├── config
│ └── airflow.cfg
├── dag_callables
│ ├── __init__.py
│ ├── __pycache__
│ ├── api_data.py
│ ├── callables.py
│ ├── utils.py
│ └── .env
├── dags
│ └── redshift_dag.py
├── resources
│ ├── dag_graph.png
│ └── notification.png
├── docker-compose.yaml
├── logs
├── plugins
├── requirements.txt
└── sql
├── create_db.sql
├── populate_db.sql
└── table_exists.sql
- Finally you will need to set the following [smtp] settings on your airflow.cfg file with your smtp creds in order to enable the email notifications.
[smtp]
smtp_host =
smtp_starttls = True
smtp_ssl = False
smtp_user =
smtp_password
smtp_port = 25
smtp_mail_from =
smtp_timeout = 30
smtp_retry_limit = 5
- Launch Docker desktop to have the docker daemon running.
- Now you can run the command below to run the build:
make run
- Once the build is done, please go to
localhost:8080in whatever browser you want. - Input
airflowas the user andairflowfor the password . - Once inside, before activate the DAG, please add a new Redshift connection as follows:
- to do so go to Admin/Connections on the Airflow ui.
- Fill the connection fields provided in the
.envshared file, please use"redshift_coder"as the connection Id.
- Finally activate the DAG, wait for it to turn dark green so that means the pipeline has run.
- If everything went smoothly, you should receive a notification to the
EMAIL_TOenvironemnt variable.
To conclude, if you have completed your tasks, simply execute the following command:
make stop

