<h2 align="center" style="background-color:#2c3e54;color:#ecf0f1;border-radius: 8px; padding:15px">Time-Series Forecasting in Financial Markets: Integrating Attention Mechanisms with Traditional Neural Networks for High-Frequency Trading Data</h2>

### **Table of Contents**

- [Introduction](#Introduction)
   - Research Overview
   - Objectives
   - Data Source and Storage
- [Install and Import Required Libraries](#Install-and-Import-Required-Libraries)
- [Download and Load Dataset](#Download-and-Load-Dataset)
- [Data Exploration](#Data-Exploration)
   - View First Five Rows
   - Inspect Shape
   - Investigate Missing data, duplicates and so on

<h3 style="background-color:#2c3e54;color:#ecf0f1;border-radius: 8px; padding:15px">Introduction</h3>

### Research Overview

### Objectives

### Data Source and Storage

<h3 style="background-color:#2c3e54;color:#ecf0f1;border-radius: 8px; padding:15px">Install and Import Required Libraries</h3>

In [None]:
!pip install --upgrade -q yfinance
!pip install -q pyspark pandas

In [None]:
import yfinance as yf
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, DoubleType, IntegerType, StringType, DateType

<h3 style="background-color:#2c3e54;color:#ecf0f1;border-radius: 8px; padding:15px">Download and Load Dataset</h3>

#### Download and Store Finance Data using Y-Finance API

In [None]:
# Initialize PySpark session
spark = SparkSession.builder.appName("StockDataStorage").getOrCreate()

# Define stock symbols
stocks = ["GOOG", "AMZN", "MSFT", "PYPL", "TSLA"]

In [None]:
# Define schema (Date as String initially, to convert later)
schema = StructType([
    StructField("Date", StringType(), True),  # Initially store as String
    StructField("Open", DoubleType(), True),
    StructField("High", DoubleType(), True),
    StructField("Low", DoubleType(), True),
    StructField("Close", DoubleType(), True),
    StructField("Volume", IntegerType(), True),
    StructField("Dividends", DoubleType(), True),
    StructField("Stock_Splits", DoubleType(), True),
])

In [None]:
storage_path = "pyspark_stock_data/"

# Fetch, store, and load data
for stock in stocks:
    # Fetch max available stock data with 1-day interval
    dat = yf.Ticker(stock)
    df = dat.history(period="max", interval="1d")

    # Reset index to move Date column
    df.reset_index(inplace=True)
    
    # Convert Date column to string (YYYY-MM-DD format)
    df["Date"] = df["Date"].dt.strftime("%Y-%m-%d")
    
    # Convert to PySpark DataFrame
    spark_df = spark.createDataFrame(df, schema=schema)

    # Convert Date column to actual DateType()
    spark_df = spark_df.withColumn("Date", spark_df["Date"].cast(DateType()))

    # Save as Parquet
    stock_path = f"{storage_path}{stock}.parquet"
    spark_df.write.mode("overwrite").parquet(stock_path)
    
    print(f"Stored {stock} data at {stock_path}")

#### Load Saved PySpark Data 

In [None]:
# Load data into separate variables
goog_df = spark.read.parquet(f"{storage_path}GOOG.parquet")
amzn_df = spark.read.parquet(f"{storage_path}AMZN.parquet")
msft_df = spark.read.parquet(f"{storage_path}MSFT.parquet")
pypl_df = spark.read.parquet(f"{storage_path}PYPL.parquet")
tsla_df = spark.read.parquet(f"{storage_path}TSLA.parquet")

<h3 style="background-color:#2c3e54;color:#ecf0f1;border-radius: 8px; padding:15px">Data Exploration</h3>