# AWS Glue Studio Notebook
##### You are now running a AWS Glue Studio notebook; To start using your notebook you need to start an AWS Glue Interactive Session.


#### Optional: Run this cell to see available notebook commands ("magics").


In [7]:
%iam_role arn:aws:iam::212430227630:role/LabRole
%region us-east-1
%number_of_workers 2

%idle_timeout 30
%glue_version 4.0
%worker_type G.1X

Welcome to the Glue Interactive Sessions Kernel
For more information on available magic commands, please type %help in any new cell.

Please view our Getting Started page to access the most up-to-date information on the Interactive Sessions kernel: https://docs.aws.amazon.com/glue/latest/dg/interactive-sessions.html
Installed kernel version: 1.0.5 
Current iam_role is arn:aws:iam::212430227630:role/LabRole
iam_role has been set to arn:aws:iam::212430227630:role/LabRole.
Previous region: us-east-1
Setting new region to: us-east-1
Region is set to: us-east-1
Previous number of workers: None
Setting new number of workers to: 2
Current idle_timeout is None minutes.
idle_timeout has been set to 30 minutes.
Setting Glue version to: 4.0
Previous worker type: None
Setting new worker type to: G.1X


In [10]:
%extra_py_files s3://cryptoengineer/gluejobs-py-modules/load.py, s3://cryptoengineer/gluejobs-py-modules/storage.py
%additional_python_modules yfinance

Extra py files to be included:
s3://cryptoengineer/gluejobs-py-modules/load.py
s3://cryptoengineer/gluejobs-py-modules/storage.py
Additional python modules to be included:
yfinance


####  Run this cell to set up and start your interactive session.


In [13]:
%load_ext autoreload
%autoreload 2

In [1]:
import sys
import boto3

from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
  
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)

Trying to create a Glue session for the kernel.
Session Type: glueetl
Worker Type: G.1X
Number of Workers: 2
Idle Timeout: 30
Session ID: a998f449-50e9-4166-961f-64343a8852ad
Applying the following default arguments:
--glue_kernel_version 1.0.5
--enable-glue-datacatalog true
--extra-py-files s3://cryptoengineer/gluejobs-py-modules/load.py,s3://cryptoengineer/gluejobs-py-modules/storage.py
--additional-python-modules yfinance
Waiting for session a998f449-50e9-4166-961f-64343a8852ad to get into ready status...
Session a998f449-50e9-4166-961f-64343a8852ad has been created.



## Batch Load - FOREX


In [2]:
from datetime import datetime, timedelta, timezone
import pandas as pd
import load




### Set AWS Storage parameters


In [3]:
BUCKET_NAME = "cryptoengineer"
PREFIX = "datalake/bronze/forex"




### Load job parameters

In [4]:
glue_client = boto3.client("glue")

if '--WORKFLOW_NAME' in sys.argv and '--WORKFLOW_RUN_ID' in sys.argv:
    print("Running in Glue Workflow")
    
    glue_args = getResolvedOptions(
        sys.argv, ['WORKFLOW_NAME', 'WORKFLOW_RUN_ID']
    )
    
    print("Reading the workflow parameters")
    workflow_args = glue_client.get_workflow_run_properties(
        Name=glue_args['WORKFLOW_NAME'], RunId=glue_args['WORKFLOW_RUN_ID']
    )["RunProperties"]

    
    base= workflow_args['base']
    time_frame = int(workflow_args['time_frame'])
    symbols = workflow_args['symbols']
    api_key = workflow_args['api_key']

else:
    try:
        print("Running as Job")
        args = getResolvedOptions(sys.argv,
                                  ['JOB_NAME',
                                   'base',
                                   'time_frame',
                                   'symbols',
                                   'api_key'])
        base= args['base']
        time_frame = int(args['time_frame'])
        ##symbols = "EUR,CHF,JPY,CAD,GBP"
        symbols = args['symbols']
        api_key = args['api_key']
    except:
        base= 'USD'
        time_frame = 24
        symbols = "USDEUR"
        api_key= "xGFdE9Ydrcr1oCDJiCjHiZnkqUnQnjaH"
        


Running as Job


In [5]:
print("base: ", base)
print("Time Frame: ", time_frame)
print("Symbols: ", symbols)
print("API Key: ", api_key)

base:  USD
Time Frame:  24
Symbols:  USDEUR
API Key:  xGFdE9Ydrcr1oCDJiCjHiZnkqUnQnjaH


#### Set the start and end dates for the data you want to load

In [6]:
# Start date
start_date = (datetime.utcnow() - timedelta(hours=time_frame)).strftime("%Y-%m-%d")
end_date = datetime.utcnow().strftime("%Y-%m-%d")

print("Start date; ",start_date," End date: ",end_date)

Start date;  2024-08-31  End date:  2024-09-01


## Load the historical rates - 15min frequency

Set some config values

In [7]:
freq='15min'
source='FMP'




In [9]:
df= pd.DataFrame()
for symbol in symbols.split(","):
    print("Loading: ", symbol)
    symbol_df = load.load_batch_freq_rates(base=base,
                                          start_date=start_date,
                                          end_date=end_date,
                                          freq=freq,
                                          symbol=symbol,
                                          api_key=api_key,
                                          source=source
    )
    # Complete the table schema 
    if len(symbol_df)>0:
        symbol_df = load.set_schema_table(symbol_df, symbol, source, freq, base)
        print("Records: ", len(symbol_df))
        df = pd.concat([df, symbol_df])
    else:
        print("No data for: ", symbol)


Loading:  USDEUR
Leidos  0
Creating the dataframe
Records:  0


In [10]:
print("Records: ", len(df))

Records:  0


In [11]:
df.head(5)

Empty DataFrame
Columns: []
Index: []


In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Empty DataFrame


## Append the batch data to RAW table

Set the destination raw table

In [13]:
path=f"s3://{BUCKET_NAME}/{PREFIX}"
print("Path:",path)

Path: s3://cryptoengineer/datalake/raw/forex


In [14]:
if len(df)>0:
    print("Saving data to: ", path)
    (
        spark.createDataFrame(df)
        .repartition("load_date")
        .write
        .format("parquet")
        .mode("append")
        .partitionBy(['load_date'])
        .save(path)
    )


