<h2 align="center" style="background-color:#2c3e54;color:#ecf0f1;border-radius: 8px; padding:15px">Time-Series Forecasting in Financial Markets: Integrating Attention Mechanisms with Traditional Neural Networks for High-Frequency Trading Data</h2>

### **Table of Contents**

- [Introduction](#Introduction)
   - Research Overview
   - Objectives
   - Data Source and Storage
- [Install and Import Required Libraries](#Install-and-Import-Required-Libraries)
- [Download and Load Dataset](#Download-and-Load-Dataset)
- [Data Exploration](#Data-Exploration)
   - View First Five Rows
   - Inspect Shape
   - Investigate Missing data, duplicates and so on

<h3 style="background-color:#2c3e54;color:#ecf0f1;border-radius: 8px; padding:15px">Introduction</h3>

### Research Overview

### Objectives

### Data Source and Storage

<h3 style="background-color:#2c3e54;color:#ecf0f1;border-radius: 8px; padding:15px">Install and Import Required Libraries</h3>

In [1]:
!pip install --upgrade -q yfinance
!pip install -q pandas
!pip install -q -U kaleido

In [2]:
import os
import yfinance as yf

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

import warnings
warnings.filterwarnings('ignore')

<h3 style="background-color:#2c3e54;color:#ecf0f1;border-radius: 8px; padding:15px">Download and Load Dataset</h3>

#### Download and Store Finance Data using Y-Finance API

In [3]:
# Define stock symbols
stocks = ["GOOG", "AMZN", "MSFT", "TSLA"]

In [4]:
def download_stock_data(storage_path):
    # Fetch, store, and load data
    for stock in stocks:
        # Fetch max available stock data with 1-day interval
        dat = yf.Ticker(stock)
        df = dat.history(period="max", interval="1d", end="2025-02-28", auto_adjust=False)

        # Reset index to move Date column
        df.reset_index(inplace=True)

        # Convert Date column to string (YYYY-MM-DD format)
        df["Date"] = df["Date"].dt.strftime("%Y-%m-%d")

        # Save as CSV
        stock_path = os.path.join(storage_path, f"{stock}.csv")
        df.to_csv(stock_path, index=False)

        print(f"Stored {stock} data at {stock_path}")

In [5]:
# Define storage path
storage_path = "pandas_stock_data/"

# Ensure storage directory exists
os.makedirs(storage_path, exist_ok=True)

In [6]:
# Only call download function if the storage path is empty
if not any(os.scandir(storage_path)):  
    download_stock_data(storage_path)

Stored GOOG data at pandas_stock_data/GOOG.csv
Stored AMZN data at pandas_stock_data/AMZN.csv
Stored MSFT data at pandas_stock_data/MSFT.csv
Stored TSLA data at pandas_stock_data/TSLA.csv


#### Load data into Pandas DataFrames

In [7]:
goog_df = pd.read_csv(os.path.join(storage_path, "GOOG.csv"))
amzn_df = pd.read_csv(os.path.join(storage_path, "AMZN.csv"))
msft_df = pd.read_csv(os.path.join(storage_path, "MSFT.csv"))
tsla_df = pd.read_csv(os.path.join(storage_path, "TSLA.csv"))

<h3 style="background-color:#2c3e54;color:#ecf0f1;border-radius: 8px; padding:15px">Data Exploration</h3>

#### **Viewing First 5 Rows of Each DataFrame**

In [8]:
print("GOOG Data:")
display(goog_df.head())

print("\nAMZN Data:")
display(amzn_df.head())

print("\nMSFT Data:")
display(msft_df.head())

print("\nTSLA Data:")
display(tsla_df.head())

GOOG Data:


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
0,2004-08-19,2.490664,2.591785,2.390042,2.499133,2.490186,897427216,0.0,0.0
1,2004-08-20,2.51582,2.716817,2.503118,2.697639,2.687981,458857488,0.0,0.0
2,2004-08-23,2.758411,2.826406,2.71607,2.724787,2.715032,366857939,0.0,0.0
3,2004-08-24,2.770615,2.779581,2.579581,2.61196,2.602609,306396159,0.0,0.0
4,2004-08-25,2.614201,2.689918,2.587302,2.640104,2.630652,184645512,0.0,0.0



AMZN Data:


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
0,1997-05-15,0.121875,0.125,0.096354,0.097917,0.097917,1443120000,0.0,0.0
1,1997-05-16,0.098438,0.098958,0.085417,0.086458,0.086458,294000000,0.0,0.0
2,1997-05-19,0.088021,0.088542,0.08125,0.085417,0.085417,122136000,0.0,0.0
3,1997-05-20,0.086458,0.0875,0.081771,0.081771,0.081771,109344000,0.0,0.0
4,1997-05-21,0.081771,0.082292,0.06875,0.071354,0.071354,377064000,0.0,0.0



MSFT Data:


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
0,1986-03-13,0.088542,0.101563,0.088542,0.097222,0.059707,1031788800,0.0,0.0
1,1986-03-14,0.097222,0.102431,0.097222,0.100694,0.061839,308160000,0.0,0.0
2,1986-03-17,0.100694,0.103299,0.100694,0.102431,0.062906,133171200,0.0,0.0
3,1986-03-18,0.102431,0.103299,0.098958,0.099826,0.061306,67766400,0.0,0.0
4,1986-03-19,0.099826,0.100694,0.097222,0.09809,0.06024,47894400,0.0,0.0



TSLA Data:


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
0,2010-06-29,1.266667,1.666667,1.169333,1.592667,1.592667,281494500,0.0,0.0
1,2010-06-30,1.719333,2.028,1.553333,1.588667,1.588667,257806500,0.0,0.0
2,2010-07-01,1.666667,1.728,1.351333,1.464,1.464,123282000,0.0,0.0
3,2010-07-02,1.533333,1.54,1.247333,1.28,1.28,77097000,0.0,0.0
4,2010-07-06,1.333333,1.333333,1.055333,1.074,1.074,103003500,0.0,0.0


#### **Statistical Summary of Each DataFrame**

In [9]:
print("GOOG Statistical Summary:")
display(goog_df.describe())

print("\nAMZN Statistical Summary:")
display(amzn_df.describe())

print("\nMSFT Statistical Summary:")
display(msft_df.describe())

print("\nTSLA Statistical Summary:")
display(tsla_df.describe())

GOOG Statistical Summary:


Unnamed: 0,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
count,5165.0,5165.0,5165.0,5165.0,5165.0,5165.0,5165.0,5165.0
mean,48.939902,49.456712,48.449156,48.958748,48.798136,113022100.0,0.000116,0.004454
std,47.978757,48.507691,47.502903,48.003413,47.872085,148553600.0,0.004819,0.280018
min,2.47049,2.534002,2.390042,2.490913,2.481995,158434.0,0.0,0.0
25%,13.158429,13.287694,13.027669,13.152202,13.105114,26578000.0,0.0,0.0
50%,27.826603,27.990652,27.599052,27.828691,27.729057,51845600.0,0.0,0.0
75%,64.875,65.436501,64.570503,64.940002,64.707504,138180000.0,0.0,0.0
max,204.5,208.699997,204.259995,207.710007,207.710007,1650833000.0,0.2,20.0



AMZN Statistical Summary:


Unnamed: 0,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
count,6991.0,6991.0,6991.0,6991.0,6991.0,6991.0,6991.0,6991.0
mean,40.792767,41.266395,40.275176,40.783167,40.783167,136003300.0,0.0,0.003862
std,58.27787,58.921598,57.561554,58.25306,58.25306,137551400.0,0.0,0.244217
min,0.070313,0.072396,0.065625,0.069792,0.069792,9744000.0,0.0,0.0
25%,2.1005,2.14875,2.06475,2.11075,2.11075,62977000.0,0.0,0.0
50%,9.05,9.1625,8.9255,9.0255,9.0255,100512000.0,0.0,0.0
75%,73.182247,74.66925,72.083252,73.084999,73.084999,155442000.0,0.0,0.0
max,239.020004,242.520004,238.029999,242.059998,242.059998,2086584000.0,0.0,20.0



MSFT Statistical Summary:


Unnamed: 0,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
count,9817.0,9817.0,9817.0,9817.0,9817.0,9817.0,9817.0,9817.0
mean,63.773129,64.415781,63.108028,63.785265,57.944029,56249560.0,0.003183,0.001732
std,99.64239,100.538049,98.678492,99.652971,99.506501,38115230.0,0.048258,0.057515
min,0.088542,0.092014,0.088542,0.090278,0.055442,2304000.0,0.0,0.0
25%,5.921875,6.03125,5.8125,5.90625,3.627209,31315100.0,0.0,0.0
50%,27.459999,27.790001,27.200001,27.51,19.244913,49371200.0,0.0,0.0
75%,47.779999,48.34375,47.310001,47.77,40.272118,70240500.0,0.0,0.0
max,467.0,468.350006,464.459991,467.559998,464.85434,1031789000.0,3.08,2.0



TSLA Statistical Summary:


Unnamed: 0,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
count,3690.0,3690.0,3690.0,3690.0,3690.0,3690.0,3690.0,3690.0
mean,84.613579,86.467937,82.612032,84.568629,84.568629,96486400.0,0.0,0.002168
std,111.340458,113.820686,108.576889,111.209422,111.209422,77378710.0,0.0,0.095979
min,1.076,1.108667,0.998667,1.053333,1.053333,1777500.0,0.0,0.0
25%,12.237833,12.446166,12.003833,12.226167,12.226167,49368000.0,0.0,0.0
50%,18.313334,18.599333,17.901333,18.316334,18.316334,81929550.0,0.0,0.0
75%,182.857502,186.205002,178.369999,182.617504,182.617504,121650800.0,0.0,0.0
max,475.899994,488.540009,457.51001,479.859985,479.859985,914082000.0,0.0,5.0


#### **Checking for Missing Values**

In [10]:
def check_missing_values(df, name):
    print(f"\nMissing Values in {name}:")
    display(df.isnull().sum().to_frame().T)

# Check for missing values in each stock dataset
check_missing_values(goog_df, "GOOG")
check_missing_values(amzn_df, "AMZN")
check_missing_values(msft_df, "MSFT")
check_missing_values(tsla_df, "TSLA")


Missing Values in GOOG:


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
0,0,0,0,0,0,0,0,0,0



Missing Values in AMZN:


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
0,0,0,0,0,0,0,0,0,0



Missing Values in MSFT:


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
0,0,0,0,0,0,0,0,0,0



Missing Values in TSLA:


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
0,0,0,0,0,0,0,0,0,0


### **Initial Data Exploration Summary**

#### **1. Google (GOOG) Data**
- **Date Range**: The data for Google spans from **2004-08-19** to **2025-02-28**.
- **Missing Values**: There are **no missing values** in any of the columns (`Date`, `Open`, `High`, `Low`, `Close`, `Volume`, `Dividends`, `Stock_Splits`).
- **Statistical Summary**:
  - The average opening price is **\$48.80**, with a minimum of **\$2.48** and a maximum of **\$204.50**.
  - The average daily trading volume is **113,006,001** shares.
  - Dividends and stock splits are rare, with most values being **0.0**.

#### **2. Amazon (AMZN) Data**
- **Date Range**: The data for Amazon spans from **1997-05-15** to **2025-02-28**.
- **Missing Values**: There are **no missing values** in any of the columns.
- **Statistical Summary**:
  - The average opening price is **\$40.82**, with a minimum of **\$0.07** and a maximum of **\$239.02**.
  - The average daily trading volume is **135,991,203** shares.
  - Dividends are consistently **0.0**, and stock splits are rare.

#### **3. Microsoft (MSFT) Data**
- **Date Range**: The data for Microsoft spans from **1986-03-13** to **2025-02-28**.
- **Missing Values**: There are **no missing values** in any of the columns.
- **Statistical Summary**:
  - The average opening price is **\$57.97**, with a minimum of **\$0.05** and a maximum of **\$464.30**.
  - The average daily trading volume is **56,247,171** shares.
  - Dividends are present, with an average of **\$0.003** per share, and stock splits are rare.

#### **4. Tesla (TSLA) Data**
- **Date Range**: The data for Tesla spans from **2010-06-29** to **2025-02-28**.
- **Missing Values**: There are **no missing values** in any of the columns.
- **Statistical Summary**:
  - The average opening price is **\$84.67**, with a minimum of **\$1.08** and a maximum of **\$475.90**.
  - The average daily trading volume is **96,491,526** shares.
  - Dividends are consistently **0.0**, and stock splits are rare.

---

### **Key Observations**
1. **Date Range**:
   - The datasets cover different time periods, with **Microsoft (MSFT)** having the longest history (since 1986) and **Tesla (TSLA)** having the shortest (since 2010).
   - All datasets are consistent up to **2025-02-28**, ensuring uniformity in the analysis.

2. **Missing Values**:
   - There are **no missing values** in any of the datasets, indicating that the data is clean and complete.

3. **Price Trends**:
   - The stock prices vary significantly across companies, with **Microsoft (MSFT)** and **Tesla (TSLA)** showing higher maximum prices compared to **Google (GOOG)** and **Amazon (AMZN)**.
   - The minimum prices have been correctly recorded, ensuring that early trading periods reflect accurate values.

4. **Volume Trends**:
   - **Amazon (AMZN)** has the highest average daily trading volume, followed by **Google (GOOG)**, **Tesla (TSLA)**, and **Microsoft (MSFT)**.

5. **Dividends and Stock Splits**:
   - Dividends are rare, with only **Microsoft (MSFT)** showing occasional dividend payouts.
   - Stock splits are also rare, with only a few occurrences across the datasets.

<h3 style="background-color:#2c3e54;color:#ecf0f1;border-radius: 8px; padding:15px">Exploratory Data Analysis</h3>