# High-Frequency Effects and Modelling on FX Market in Response to US Macroeconomic Data Releases

## Steps:
1. Research and Planning
    * Define clear research goals and objectives
    * <span style="color:green">Decide on macroeconomic events and FX pairs</span>
2. Literature Review
    * Review past papers
    * Identify gaps (model that does stress testing, probability w/ price range, capturing retracement/correction)
3. Data Collection and Processing
    *  <span style="color:green">Identify data sources</span>
    * <span style="color:green">Collect FX pairs (date, minute interval, open, low, high, close)</span>
    * <span style="color:green">Collect macroeconomic events (date, expected value, actual value)</span>
    * Merge datasets
    * Do some analysis for data dimensionality (e.g., analysis whether to keep 10 minutes before and 10 minutes after only of the data release)
    * Keep only data with the macro events (not whole month but only on data releases)
4. Exploratory Analysis
    * Correlation of price movement with each macro event expected vs actual
    * Feature importance (which macro event brings more volume)
5. Model Development
    * Create some probabilistic TS model based on historical data
6. Backtesting and Market Simulation
    * KPIs and statistical measurement of model on test data (can be same as past)
    * Create a class/function to take input of a pair, expected value of macro data, and minute after event to give some price range and classification with confidence interval
7. Results and Discusion
    * Interpret market reactions to economic data
    * Compare model accuracy with benchmarks

----------------------

## <span style="color:blue">Libraries</span>

In [1]:
# Libraries for data fetching
import yfinance as yf
from polygon import RESTClient
from fredapi import Fred

# Libraries for data analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import time  

----------------------

## <span style="color:blue">Data Retrieval</span>
* Data Collection FX pair tickers (symbols) for **Yahoo Finance**: 
    * **DXY** 🠆 "DX-Y.NYB",
    * **USD/GBP** 🠆 "GBPUSD=X",
    * **USD/EUR** 🠆 "EURUSD=X",
    * **USD/CNY** 🠆 "CNY=X"


* Data Collection FX pair tickers (symbols) for **Polygon.io**:
    * **USD/EUR** 🠆 C:USD/EUR
    * **USD/GBP** 🠆 C:USD/GBP
    * **USD/CNY** 🠆 C:USD/CHF
    
* Data Collection FX pair tickers (symbols) for **fred.stlouisfed.org**:
    * **Inflation** 🠆 CPIAUCSL
    * **Interest Rates** 🠆 FEDFUNDS
    * **GDP** 🠆 GDPC1
    * **Unemployment Rate** 🠆 UNRATE
    * **PMI** 🠆 NAPM
    * **DXY** 🠆 DTWEXBGS
    
* Macroeconomic events **Bloomberg Terminal (KCL)** (macro_data_bloomberg.csv):
    * **INTR** (every 6-weeks)
    * **CPI** (MoM)
    * **GDP** (QoQ)
    * **PCE** (MoM)
    * **UNRATE** (MoM)
    * **PMI** (MoM)

In [2]:
def polygon_get_fx_data(fx_pair, interval, start_date, end_date, api_key='0GbCWQKVtPfEMJvPmc1n50psE3zsW1c8'):
    """
    Retrieve historical FX data from Polygon.io.

    Parameters:
    - interval (str): The interval for the data ('1m', '5m', '1h', '1d').
    - start_date (str): The start date in 'YYYY-MM-DD' format.
    - end_date (str): The end date in 'YYYY-MM-DD' format.
    - api_key (str): Polygon.io API key.

    Returns:
    - pd.DataFrame: A DataFrame containing historical USD/CNY forex data.
    """
    try:
        # Bring the fx pair ticker to Polygon's format
        fx_pair_polygon_formated = "C:" + str(fx_pair)

        client = RESTClient(api_key)

        # Request data
        data_request = client.get_aggs(
            ticker=fx_pair_polygon_formated, 
            multiplier=1, 
            timespan=interval,
            from_=start_date, 
            to=end_date
        )
    
        # Convert response to DataFrame
        df = pd.DataFrame(data_request)
        
        # Check if data is returned
        if df.empty:
            print("No data found.")
            return None

        # Convert timestamp to datetime format and set Datetime as an index
        df["Datetime"] = pd.to_datetime(df["timestamp"], unit="ms")
        df.set_index("Datetime", inplace=True)
        
        # Drop columns not needed
        df = df.drop(columns=["otc", "transactions", "timestamp", "open"])
        
        # Rename columns to align with yf for the feature engineering and merging later on
        df = df.rename(columns={"high": "High", "low": "Low", "close": "Adj Close", "volume": "Volume"})
        df = df.rename_axis(f'{fx_pair}', axis=1)
        
        return df

    except Exception as e:
        print(f"An error occurred: {e}")
        return None
        
macro_df = pd.read_csv("macro_bloomberg_data.csv")

----------------------

## <span style="color:blue">Data Preprocessing</span>
1. Replace '--' to null
2. DateTime format (some years to 2 digits some 4 e.g., 2025 and 25)
3. Convert to DateTime
4. Replace the % to decimals and convert to float

In [3]:
def handle_missing_values(df: pd.DataFrame) -> pd.DataFrame:
    """ 
    Finds for  '--' values and converts them to NaN. 
    Afterwards, drop rows where Expected and Actual are NaN.
    
    Parameters:
    - df ()
    """
    # Missing values convert from '--' to NaN
    df.replace('--', np.nan, inplace=True)

    # Drop rows where Expected and Actual are missing
    df = macro_df.dropna(subset=["Expected", "Actual"], how="all")
    
    return df



def standardize_datetime_format(df: pd.DataFrame, column_name="Date Time") -> pd.DataFrame:
    """
    Standardizes the Date Time column to the format MM/DD/YYYY HH:MM,
    converts it to a datetime datatype, sets it as the index, and sorts the DataFrame.

    Parameters:
    - df (pd.DataFrame): The DataFrame containing the column to be formatted.
    - column_name (str): The name of the column to standardize.

    Returns:
    - pd.DataFrame: The DataFrame with the standardized datetime format, 
                    set as index, and sorted.
    """
    def fix_date_format(date_str):
        # Try parsing with two-digit year format first
        dt = pd.to_datetime(date_str, format='%m/%d/%y %H:%M', errors='coerce')
        
        # If conversion fails, try four-digit year format
        if pd.isnull(dt):
            dt = pd.to_datetime(date_str, format='%m/%d/%Y %H:%M', errors='coerce')
        
        return dt  # Return as datetime object

    # Apply the function to convert to datetime
    df = df.copy(deep=True)
    df["Datetime"] = df[column_name].astype(str).apply(fix_date_format)
    
    # Drop rows where conversion failed
    df = df.dropna(subset=["Datetime"])
    # Drop previous 'Date Time'
    df.drop(columns=['Date Time'], inplace=True)

    # Convert the column to datetime dtype
    df["Datetime"] = pd.to_datetime(df["Datetime"])

    # Set Date Time as index
    df = df.set_index("Datetime")

    # Sort DataFrame by the Date Time index
    df = df.sort_index()

    return df



def convert_percentage_to_numeric(df: pd.DataFrame, columns=["Expected", "Actual", "Prior"]) -> pd.DataFrame:
    """
    Converts percentage values in specified columns to numerical float values.

    Parameters:
    - df (pd.DataFrame): The DataFrame containing percentage values.
    - columns (list): List of column names to convert.

    Returns:
    - pd.DataFrame: DataFrame with percentage values converted to floats.
    """
    for col in columns:
        df[col] = df[col].astype(str).str.replace('%', '', regex=True).astype(float)
    
    return df



def clean_macro_dataframe(df: pd.DataFrame) -> pd.DataFrame:
    """
    Drops the 'Event' column and renames 'Ticker' values to standardized names.

    Parameters:
    - df (pd.DataFrame): The macroeconomic DataFrame.

    Returns:
    - pd.DataFrame: Cleaned DataFrame with renamed tickers.
    """
    
    # Drop the 'Event' column
    df = df.drop(columns=["Event"], errors="ignore")

    # Dictionary mapping for renaming
    ticker_mapping = {
        "UNRATE": "UNRATE",
        "CPI_MoM": "CPI",
        "INTR": "INTR",
        "PCE_MoM": "PCE",
        "GDP_QoQ": "GDP",
        "PMI": "PMI"
    }

    # Rename cols
    df["Ticker"] = df["Ticker"].replace(ticker_mapping)

    return df

----------------------

## <span style="color:blue">Feature Engineering</span>

In [4]:
def surprise_calculation(df: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate the surprise (difference) of Expected vs Actual.
    
    Parameters:
    - df (pd.DataFrame): macro economic data

    Returns:
    - pd.DataFrame: Original dataframe plus the surprise column.
    """
    
    df["Surprise"] = df["Actual"] - df["Expected"]
    return df


def calculate_volatility(df: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate the average volatility dynamically based on the number of rows in the DataFrame.

    Parameters:
    - df (pd.DataFrame): FX data.

    Returns:
    - pd.DataFrame: The original DataFrame with added 'Volatility', 
                                                    'Average_Volatility', 
                                                    and 'Volatility_Multiplier' columns.
    """

    if df is None or df.empty:
        print("Error: DataFrame is empty or invalid.")
        return None
    if "Adj Close" in df.columns:
        # Compute Volatility
        df['Volatility'] = np.abs(np.log(df['Adj Close'] / df['Adj Close'].shift(1)))
    else:
        df['Volatility'] = np.abs(np.log(df['Close'] / df['Close'].shift(1)))
        
    # Compute Average Volatility based on number of rows
    df['Average_Volatility'] = df['Volatility'].mean()
    # Compute Volatility Multiplier
    df['Volatility_Multiplier'] = round(df['Volatility'] / df['Average_Volatility'],2)

    return df

------------

## <span style="color:blue">Plotting Functions</span>

In [5]:
def plot_time_series(df, column):
    """
    Plots a time series for a given data and column.

    Parameters:
    - df (pd.DataFrame): DataFrame containing time series data with a Datetime index.
    - column (str): Column name to plot.
    - title (str): Title of the plot (default: "Time Series Plot").

    Returns:
    - None (Displays the plot)
    """
    
    if column not in df.columns:
        print(f"Error: Column '{column}' not found in DataFrame.")
        return None
    
    fx_pair = df.columns.name if df.columns.name else "FX Pair"
    # Create plot
    fig, ax = plt.subplots(figsize=(12, 6))
    df[column].plot(ax=ax)
    
    # Improve x-axis readability
    ax.set_xlabel("Datetime", fontsize=12)
    ax.set_ylabel(column, fontsize=12)
    ax.set_title(f"'{fx_pair}' {column} Over Time", fontsize=14)
    ax.tick_params(axis='x', rotation=0)
    
    # Ensure the x-axis starts and ends exactly at the first and last data points (applicable only to yf data)
    if "vwap" not in df.columns:
        # Format x-axis to time
        ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
        ax.xaxis.set_major_locator(mdates.AutoDateLocator())
        ax.set_xlim(df.index.min(), df.index.max())

    ax.grid(True, which="major", linestyle="-", linewidth=0.6, alpha=1)  
    ax.grid(True, which="minor", linestyle="-", linewidth=0.5, alpha=1) 

    plt.show()
    pass

----------------------

## <span style="color:blue">Macro data preprocessed</span>

In [6]:
# Macroeconomics DataFrame
macro_df = clean_macro_dataframe(
                surprise_calculation(
                    convert_percentage_to_numeric(
                        standardize_datetime_format(
                            handle_missing_values(macro_df)
                        )
                   )
                )
           )

macro_df

Unnamed: 0_level_0,Ticker,Expected,Actual,Prior,Surprise
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-01-10 13:30:00,UNRATE,3.50,3.50,3.50,0.0
2020-01-14 13:30:00,CPI,0.30,0.20,0.30,-0.1
2020-01-29 19:00:00,INTR,1.75,1.75,1.75,0.0
2020-01-30 13:30:00,GDP,2.00,2.10,2.10,0.1
2020-01-31 13:30:00,PCE,0.20,0.30,0.20,0.1
...,...,...,...,...,...
2025-01-29 19:00:00,INTR,4.50,4.50,4.50,0.0
2025-01-30 13:30:00,GDP,2.60,2.30,3.10,-0.3
2025-01-31 13:30:00,PCE,0.30,0.30,0.10,0.0
2025-02-07 13:30:00,UNRATE,4.10,4.10,4.10,0.0


----------------------

# <span style="color:blue">Create final df with all Pairs and Macro events</span>

In [7]:
filtered_macro_df = macro_df[macro_df.index >= "2023-03-10"].copy()

In [233]:
def fetch_and_combine_fx_data(macro_df, pair):
    
    # Get index values to list
    macro_dates_list = macro_df.index.strftime("%Y-%m-%d").tolist()
    
    # Initialize empty DataFrame
    data_df = pd.DataFrame()  

    for i, date in enumerate(macro_dates_list):
        fx_data = calculate_volatility(polygon_get_fx_data(pair, "minute", date, date))
        
        # Add pair symbol as a prefix to columns (excluding Datetime)
        fx_data = fx_data.rename(columns={col: f"{pair}_{col}" for col in fx_data.columns if col != "Datetime"})
        
        # Append the retrieved data to the main DataFrame
        data_df = pd.concat([data_df, fx_data])

        # Implement sleep logic for API limitations
        if (i + 1) % 5 == 0:  
            time.sleep(70)
        else:
            time.sleep(2) 

    return data_df

In [24]:
# eur = fetch_and_combine_fx_data(filtered_macro_df, "USDEUR")
# gbp = fetch_and_combine_fx_data(filtered_macro_df, "USDGBP")
# cnh = fetch_and_combine_fx_data(filtered_macro_df, "USDCNH")

eur = pd.read_csv("USDEUR_data.csv")
eur.set_index('Datetime', inplace=True)

gbp = pd.read_csv("USDGBP_data.csv")
gbp.set_index('Datetime', inplace=True)

cnh = pd.read_csv("USDCNH_data.csv")
cnh.set_index('Datetime', inplace=True)

# <span style="color:red">Merge DFs based on index</span>

In [26]:
# Merge all three dataframes on 'Datetime' index, keeping only matching timestamps
fx_df = eur.merge(gbp, left_index=True, right_index=True, how="inner")

fx_df

Unnamed: 0_level_0,USDEUR_High,USDEUR_Low,USDEUR_AdjClose,USDEUR_Volume,USDEUR_vwap,USDEUR_Volatility,USDEUR_AverageVolatility,USDEUR_VolatilityMultiplier,USDGBP_High,USDGBP_Low,USDGBP_AdjClose,USDGBP_Volume,USDGBP_vwap,USDGBP_Volatility,USDGBP_AverageVolatility,USDGBP_VolatilityMultiplier
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
01/02/2024 00:00,0.92561,0.9253,0.92558,46,0.9255,,0.000095,,0.78867,0.788540,0.78865,56,0.7886,,0.000090,
01/02/2024 00:01,0.92562,0.9253,0.92557,47,0.9255,0.000011,0.000095,0.11,0.78868,0.788580,0.78860,31,0.7886,0.000063,0.000090,0.70
01/02/2024 00:02,0.92561,0.9253,0.92557,47,0.9256,0.000000,0.000095,0.00,0.78880,0.788590,0.78869,36,0.7887,0.000114,0.000090,1.26
01/02/2024 00:03,0.92559,0.9253,0.92548,45,0.9255,0.000097,0.000095,1.02,0.78880,0.788675,0.78870,40,0.7887,0.000013,0.000090,0.14
01/02/2024 00:04,0.92548,0.9252,0.92541,35,0.9254,0.000076,0.000095,0.79,0.78870,0.788590,0.78862,27,0.7886,0.000101,0.000090,1.12
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
31/10/2024 23:55,0.91872,0.9184,0.91872,23,0.9186,0.000022,0.000093,0.23,0.77533,0.775300,0.77532,10,0.7753,0.000013,0.000111,0.12
31/10/2024 23:56,0.91871,0.9185,0.91871,20,0.9187,0.000011,0.000093,0.12,0.77540,0.775320,0.77540,7,0.7753,0.000103,0.000111,0.93
31/10/2024 23:57,0.91880,0.9186,0.91879,33,0.9188,0.000087,0.000093,0.93,0.77540,0.775300,0.77540,17,0.7754,0.000000,0.000111,0.00
31/10/2024 23:58,0.91878,0.9185,0.91876,28,0.9187,0.000033,0.000093,0.35,0.77541,0.775300,0.77538,6,0.7754,0.000026,0.000111,0.23


In [23]:
# Merge GBP and EUR 
gbp_eur = gbp.merge(eur, left_index=True, right_index=True, how="outer")
# Merge all FX pairs based on index
fx_df = gbp_eur.merge(cnh, left_index=True, right_index=True, how="outer")

# # Rename 'Datetime_x' to 'Datetime' and set it as the index
# fx_df.rename(columns={'Datetime_x': 'Datetime'}, inplace=True)
# fx_df['Datetime'] = pd.to_datetime(fx_df['Datetime'])
# fx_df.set_index('Datetime', inplace=True)

fx_df

Unnamed: 0_level_0,USDGBP_High,USDGBP_Low,USDGBP_AdjClose,USDGBP_Volume,USDGBP_vwap,USDGBP_Volatility,USDGBP_AverageVolatility,USDGBP_VolatilityMultiplier,USDEUR_High,USDEUR_Low,...,USDEUR_AverageVolatility,USDEUR_VolatilityMultiplier,USDCNH_High,USDCNH_Low,USDCNH_Adj Close,USDCNH_Volume,USDCNH_vwap,USDCNH_Volatility,USDCNH_Average_Volatility,USDCNH_Volatility_Multiplier
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
01/02/2024 00:00,0.78867,0.788540,0.78865,56.0,0.7886,,0.000090,,0.92561,0.9253,...,0.000095,,,,,,,,,
01/02/2024 00:01,0.78868,0.788580,0.78860,31.0,0.7886,0.000063,0.000090,0.70,0.92562,0.9253,...,0.000095,0.11,,,,,,,,
01/02/2024 00:02,0.78880,0.788590,0.78869,36.0,0.7887,0.000114,0.000090,1.26,0.92561,0.9253,...,0.000095,0.00,,,,,,,,
01/02/2024 00:03,0.78880,0.788675,0.78870,40.0,0.7887,0.000013,0.000090,0.14,0.92559,0.9253,...,0.000095,1.02,,,,,,,,
01/02/2024 00:04,0.78870,0.788590,0.78862,27.0,0.7886,0.000101,0.000090,1.12,0.92548,0.9252,...,0.000095,0.79,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
31/10/2024 23:55,0.77533,0.775300,0.77532,10.0,0.7753,0.000013,0.000111,0.12,0.91872,0.9184,...,0.000093,0.23,,,,,,,,
31/10/2024 23:56,0.77540,0.775320,0.77540,7.0,0.7753,0.000103,0.000111,0.93,0.91871,0.9185,...,0.000093,0.12,,,,,,,,
31/10/2024 23:57,0.77540,0.775300,0.77540,17.0,0.7754,0.000000,0.000111,0.00,0.91880,0.9186,...,0.000093,0.93,,,,,,,,
31/10/2024 23:58,0.77541,0.775300,0.77538,6.0,0.7754,0.000026,0.000111,0.23,0.91878,0.9185,...,0.000093,0.35,,,,,,,,


In [199]:
def fetch_and_combine_fx_data(macro_df, pairs):
    # Get index values to list
    macro_dates_list = macro_df.index.strftime("%Y-%m-%d").tolist()
    
    # Initialize empty DataFrame
    data_df = pd.DataFrame()    

    for pair in pairs:
        pair_data = pd.DataFrame()
        
        for i, date in enumerate(macro_dates_list):
            fx_data = polygon_get_fx_data(pair, "minute", date, date)

            if fx_data is None or fx_data.empty:
                print(f"Warning: No data for {pair} on {date}, skipping...")
                continue  

            fx_data = calculate_volatility(fx_data)

            # Ensure Datetime is a column before prefixing
            if "Datetime" not in fx_data.columns:
                fx_data.reset_index(inplace=True)  # Convert index to column if needed

            # Add pair symbol as a prefix to columns (excluding Datetime)
            fx_data = fx_data.rename(columns={col: f"{pair}_{col}" for col in fx_data.columns if col != "Datetime"})

            # Append to the pair-specific DataFrame
            pair_data = pd.concat([pair_data, fx_data])

            # Implement sleep logic for API subscription limits
            if (i + 1) % 5 == 0:
                time.sleep(70)
            else:
                time.sleep(2)

        # Merge pair-specific data with main DataFrame
        if not pair_data.empty:
            if data_df.empty:
                data_df = pair_data
            else:
                data_df = data_df.merge(pair_data, on="Datetime", how="outer")

    # Ensure the correct index column is set
    if not data_df.empty:
        data_df.set_index("Datetime", inplace=True)

    return data_df

In [163]:
def extract_relevant_fx_data(fx_data, macro_df):
    extracted_data = []

    for event_time in macro_df.index:
        start_time = event_time - pd.Timedelta(minutes=5)
        end_time = event_time + pd.Timedelta(minutes=10)

        # Check if any timestamps in the range exist in fx_data
        available_times = fx_data.index[(fx_data.index >= start_time) & (fx_data.index <= end_time)]

        if available_times.empty:  
            continue  # Skip if no matching timestamps exist

        filtered_fx = fx_data.loc[available_times].copy()

        # Add event details from macro_df
        if event_time in macro_df.index:
            filtered_fx["Ticker"] = macro_df.loc[event_time, "Ticker"]
            filtered_fx["Expected"] = macro_df.loc[event_time, "Expected"]
            filtered_fx["Actual"] = macro_df.loc[event_time, "Actual"]
            filtered_fx["Prior"] = macro_df.loc[event_time, "Prior"]
            filtered_fx["Surprise"] = macro_df.loc[event_time, "Surprise"]

        extracted_data.append(filtered_fx)

    # Set Datetime index
    final_df = pd.concat(extracted_data).reset_index()
    final_df = final_df.rename(columns={"index": "Datetime"}).set_index("Datetime")
    # Drop duplicates if they exist
    final_df = final_df[~final_df.index.duplicated(keep="first")]
    
    return final_df

-----------

## 1. Volatility Analysis on Pairs
1. Compute volatility per pair symbol on events (e.g., USD/GBP, USD/EURO, USD/CNH)

## 2. Economic Data Seasonality report

## 3.  Economic Announcment Selection - Feature Importance / Importance of Macro events (feature importance on movement based on price change and volatility)

1. Do some feature importance to keep the major macro data releases and do detailed analysis on these ones.

## 4. Responses to CPI, IR, GDP, Unemployment Rate (R2 before appendix) - Jump (R4 p32) - Volatility Increase on each event table - Data Release Trend: Before/On/After

1. Charts
2. Table with all info for CPI, IR, GDP, UR, JUMP, Vol increase from avg for all pairs

## 5. Prior vs Now (volatility and importance)

1. Bar chart showing importance % in the past for the price movement for all events (x axis) on DXY:
    * For now pick average of 1 or 2 last years of data (e.g., 2023-2024)
    * For past pick average of 1 or 2 years in the past data (e.g., 2018-2019)

## 6. Analysis of High Volume Intraday on Data Releases
1. Find out in a way how much to keep out of each day (e.g., 10 minutes before and 10 minutes after the event)
2. Volatility within -5m/-1m, 1m, 1m-5m, >5m (% to understand when the market moves the most)

## 7. Correlation of USD pairs and DXY price movements

1. Correlation heatmap
2. Identify which one moves first if not all together to see for arbritrage opportunity

-------