## Extract data from API

[requests](https://pypi.org/project/requests/): to make requests in an API and retreive data from it, for example.

[json](https://docs.python.org/3/library/json.html): handle data in JSON format, creating a data structure (dictionary) with it.

[Open-Meteo API](https://open-meteo.com/en/docs): forecasting weather.

In [1]:
import json
import requests

In [2]:
def extract():
    x = requests.get('https://api.open-meteo.com/v1/forecast?latitude=52.52&longitude=13.41&hourly=temperature_2m')
    text = x.text
    dictionary = json.loads(text)

    return dictionary

## Transform data into readable dataframe

[pandas](https://pypi.org/project/pandas/): used to handle data (transform, load, etc).

In [3]:
import pandas as pd

In [4]:
def transform(raw_data: dict):
    df = pd.json_normalize(raw_data)
    df = df.explode(['hourly.time', 'hourly.temperature_2m'])

    return df

## Load data into S3

[boto3](https://aws.amazon.com/sdk-for-python/?nc1=h_ls): SDK library for AWS, allows to create [session](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html) that allows the code to use/create services. 

[awswrangler](https://pypi.org/project/awswrangler/): basically pandas in AWS, allows to write dataframes into S3 buckets.

[datetime](https://docs.python.org/3/library/datetime.html): gets now date.

[dotenv](https://pypi.org/project/python-dotenv/): handles env variables by refering the env file.

[os](https://docs.python.org/3/library/os.html): operating system interface, in this case necessary to retrieve env variables values.

In [5]:
import boto3
import awswrangler as wr
from datetime import datetime

import os
from dotenv import load_dotenv

In [6]:
def load(transformed_data: pd.DataFrame):
    """
    Load task
    """
    load_dotenv('../.env')
    now = datetime.now()

    year = now.strftime("%Y")
    month = now.strftime("%m")
    day = now.strftime("%d")
    time = now.strftime("%H:%M:%S")

    session = boto3.Session(
        aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'), # AWS Secret Manager
        aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY'),
        region_name="us-east-2"
    )

    wr.s3.to_parquet(
        df=transformed_data,
        path='s3://dee-tutorial/open-meteo/' + year + '/' + month + '/' + day + '/' + time + '.parquet',
        boto3_session=session,
    )


## Execute ETL

In [7]:
raw_data = extract()
transformed_data = transform(raw_data)
load(transformed_data)

ETL -> dag inside airflow