### Project: Apple Share Price Prediction (Part 3)
Aims: predict the next 10 days (2 weeks) share price based on the last 5 years data

Features used: 
- Close Price 
- Open Price
- High Price
- Low Price
- Volume, IXIC (NASDAQ Index)
- GSPC (S&P 500 Index)
- VIX (Volatility Index)
- DX-Y.NYB (US Dollor Index)
- TNX (US Treasury Yield)
- SOX (PHLX Semiconductor Index)

Note: Only using the closed prices for all other index


### Model A1: Feed-Forward Neural Network (MLP) with PyTorch (not in this file)

### Model A2: Feed-Forward Neural Network (MLP) with TensorFlow and Karas (not in this file)

### Model B: LSTM/Sequence Model (in this file)

### Model C: Transformer Model (not in this file)

In [None]:
import yfinance as yf
apple = yf.Ticker("AAPL")
apple_data = apple.history(period = "5y")
tickers = [ "^IXIC", "^GSPC", "DJI", "^VIX", "DX-Y.NYB", "^TNX", "^SOX"]
others_data = yf.download(tickers, period = "5y")["Close"] # Only using the Close Prices for all indexes


  others_data = yf.download(tickers, period = "5y")["Close"] # Only using the Close Prices for all indexes
[*********************100%***********************]  7 of 7 completed


### Filling the missing value with previous available values

In [None]:
apple_data = apple_data.ffill()

# Checking if there is any missing value in apple_data
apple_data.isna().sum()

Open            0
High            0
Low             0
Close           0
Volume          0
Dividends       0
Stock Splits    0
dtype: int64

In [None]:
others_data.ffill()
others_data.isna().sum()

# Note too many missing values in DJI, dropping the DJI column

Ticker
DJI         967
DX-Y.NYB      1
^GSPC         4
^IXIC         4
^SOX          4
^TNX          3
^VIX          3
dtype: int64

In [None]:
others_data.drop(columns = ['DJI'], inplace = True)

In [None]:
others_data.isna().sum()

Ticker
DX-Y.NYB    1
^GSPC       4
^IXIC       4
^SOX        4
^TNX        3
^VIX        3
dtype: int64

In [None]:
others_data.bfill()

Ticker,DX-Y.NYB,^GSPC,^IXIC,^SOX,^TNX,^VIX
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-12-11,90.709999,3647.489990,12440.040039,2736.250000,0.893,24.719999
2020-12-14,90.709999,3647.489990,12440.040039,2736.250000,0.892,24.719999
2020-12-15,90.470001,3694.620117,12595.059570,2774.790039,0.923,22.889999
2020-12-16,90.449997,3701.169922,12658.190430,2773.419922,0.920,22.500000
2020-12-17,89.820000,3722.479980,12764.750000,2778.139893,0.930,21.930000
...,...,...,...,...,...,...
2025-12-08,99.089996,6846.509766,23545.900391,7375.220215,4.172,16.660000
2025-12-09,99.220001,6840.509766,23576.490234,7372.509766,4.186,16.930000
2025-12-10,98.790001,6886.680176,23654.150391,7467.490234,4.164,15.770000
2025-12-11,98.349998,6901.000000,23593.859375,7411.479980,4.141,14.850000


In [None]:
others_data = others_data.dropna()

In [None]:
others_data.isna().sum()


Ticker
DX-Y.NYB    0
^GSPC       0
^IXIC       0
^SOX        0
^TNX        0
^VIX        0
dtype: int64

In [None]:
print(apple_data.index.tz)  
print(others_data.index.tz)

# Comment: apple_data times are timezone-awared while others_data times are timezone-naive (no timezone)

America/New_York
None


In [None]:
# Convert the times in apple_data to timezone-naive

apple_data.index = apple_data.index.tz_localize(None)

In [None]:
full_df = apple_data.join(others_data, how = "inner")
full_df.drop(columns = ["Dividends", "Stock Splits"], inplace = True)

In [None]:
len(full_df)

1255

In [None]:
full_df.tail(10)

Unnamed: 0_level_0,Open,High,Low,Close,Volume,DX-Y.NYB,^GSPC,^IXIC,^SOX,^TNX,^VIX
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2025-11-28,277.26001,279.0,275.98999,278.850006,20135600,99.459999,6849.089844,23365.689453,7025.149902,4.017,16.35
2025-12-01,278.01001,283.420013,276.140015,283.100006,46587700,99.410004,6812.629883,23275.919922,7020.529785,4.096,17.24
2025-12-02,283.0,287.399994,282.630005,286.190002,53669500,99.360001,6829.370117,23413.669922,7149.470215,4.086,16.59
2025-12-03,286.200012,288.619995,283.299988,284.149994,43538700,98.849998,6849.720215,23454.089844,7280.509766,4.057,16.08
2025-12-04,284.100006,284.730011,278.589996,280.700012,43989100,98.989998,6857.120117,23505.140625,7215.970215,4.108,15.78
2025-12-05,280.540009,281.140015,278.049988,278.779999,47265800,98.989998,6870.399902,23578.130859,7294.839844,4.139,15.41
2025-12-08,278.130005,279.670013,276.149994,277.890015,38211800,99.089996,6846.509766,23545.900391,7375.220215,4.172,16.66
2025-12-09,278.160004,280.029999,276.920013,277.179993,32193300,99.220001,6840.509766,23576.490234,7372.509766,4.186,16.93
2025-12-10,277.75,279.75,276.440002,278.779999,33038300,98.790001,6886.680176,23654.150391,7467.490234,4.164,15.77
2025-12-11,279.100006,279.589996,273.809998,278.029999,33207600,98.349998,6901.0,23593.859375,7411.47998,4.141,14.85


In [None]:
y = full_df["Close"]
X = full_df.drop(columns = ["Close"])

### Preprocess the data

Putting 30 days of data into 1 row as X ('Open', 'High', 'Low', 'Volume', 'DX-Y.NYB', '^GSPC', '^IXIC', '^SOX', '^TNX', '^VIX'), the next 10 days data ("Close") as y

In [None]:
import numpy as np

# Need create an overlapping window for X (30 days) to predict y (10 days)
# Reshape the whole dataset such that X = [[day1],[day2],....,[day30]], [day2,....,day31], y = [day31,....,day40], [day32,...,day41]

# Window 1 (t = 0 → 4):
# X₀ = [100, 102, 101, 103, 104]
# y₀ = next 10 days

# Window 2 (t = 1 → 5):
# X₁ = [102, 101, 103, 104, 106]
# y₁ = next 10 days

window_x = 30
window_y = 10

X = []
y = []

cols = ['Open', 'High', 'Low', 'Volume', 'DX-Y.NYB', '^GSPC', '^IXIC', '^SOX', '^TNX', '^VIX']

# range(30,3200) creates a range object starting from 30,31,32,......3199

# full_df[col].iloc[i-window:i].values return list of arrays wirh each row treated as an array

for i in range(window_x, len(full_df) - window_y + 1):
    X.append(full_df[cols].iloc[i-window_x : i].values)
    y.append(full_df[["Close"]].iloc[i:i + window_y].values)

X = np.array(X)

y = np.array(y)


In [None]:
num_samples = X.shape[0] #1206 rows
window_x = X.shape[1] # 30 days of data in 1 row
num_features = X.shape[2] # 10 features per day

# Flatten X,now there are 1206 rows, with each row having 300 values (10 features * 30 days)
X = X.reshape(num_samples, window_x * num_features)


# Flatten y
y = y.reshape(y.shape[0], y.shape[1])

In [None]:
X.shape

(1216, 300)

In [None]:
y.shape

(1216, 10)

### Scale the data

### MLP: Standard Scaler, inputs centered at around 0 to prevent gradient explodes or vanishes
Gradient vanishing means during backpropagation the gradients become extremely small as they move backward thru the network, resuiting in model learning very slow or not learning at all.

Gradient explosion means that the gradient become extremely large, they model jumps around instead of learning gradually.

### LSTM: MinMaxScaler
### Transformer: MinMaxScaler
LSTM and Transformer contain sigmoid and softmax, which break when values arenot bounded.

There will be some extreme values after standard scaling, e.g. -3, -5, -7

LSTM input gate: sigmoid, sigmoid(x) = 1 / (1 + exp(-x))

LSTM forget gate: sigmoid

LSTM output gate: sigmoid

LSTM candidate state: tanh, tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

Transformer self-attention: Attention = softmax(QKᵀ / sqrt(d))

### Model 3 - LSTM

### Split the training and test datasets

In [None]:
# Note: Decided not to use validation set, just to check the total loss of the traininf or test data
train_size = int(len(X) * 0.8)

X_train = X[:train_size]
X_test  = X[train_size:]

y_train = y[:train_size]
y_test  = y[train_size:]

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models
input_dim = 300
model = models.Sequential([
        layers.Input(shape = (input_dim,)),
        layers.Dense(128, activation = "gelu"),
        layers.Dense(128, activation = "gelu"),
        layers.Dense(10, activation = "linear")
])

2025-12-12 22:31:55.834328: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M4
2025-12-12 22:31:55.834362: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB
2025-12-12 22:31:55.834368: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB
2025-12-12 22:31:55.834398: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2025-12-12 22:31:55.834407: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
