Skip to content
Daniel Ribeiro Bueno edited this page Oct 6, 2023 · 1 revision

Welcome to the airflow_aws_justwatch_pipeline wiki!

Settings:

AWS (attention for the BILLINGS):

  1. Create your account.
  2. Create a S3 bucket.
  3. Create a new IAM user with authorization to read and write to S3 and run Glue Jobs.
  4. Create the access key pair generated for the just created user.

Airflow:

  1. Install Docker Desktop in your local machine.
  2. Run Docker.
  3. Go to your project root path directory and add the subdirectories for airflow settings: config, dags, logs and plugins
  4. Go to \dockerfile\airflow\Dockerfile and change the airflow version to the one you want to use in the first line: FROM apache/airflow:2.7.1
  5. Open a terminal under your project root path directory and write a command line to start Airflow: docker compose up
  6. After start Airflow, go to Airflow web interface (http://localhost:8080/) and then log in (default user and password: airflow)
  7. Open the Admin > Connections tab to set the AWS connection with Airflow:
    • Connection Id: AWSConnection (or whatever you want, remember to change it in the dag script)
    • Connection Type: Amazon Web Services
    • Extra: {"aws_access_key_id": "<YOUR_AWS_ACCESS_KEY_ID>", "aws_secret_access_key": "<YOUR_AWS_SECRET_ACCESS_KEY>"}
  8. Save the setings.
Clone this wiki locally