# MSFT historical data
This notebook will use MSFT stock. 
We will train with the model architecture we made in the aapl notebook.  
We will test how well the architecture works on other stocks.  
We will test how well the functions from our package "ml-model" works when we are predicting real time data.

### Imports

In [1]:
import sys
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

from sklearn.metrics import f1_score
import datetime as dt


sys.path.append(str(Path("..").resolve()))
from src.ml_model.data import fetch_data, stock_data_prediction_pipeline, stock_data_feature_engineering, get_one_realtime_bar
from src.ml_model.training import train_model, sequence_split

2025-11-05 19:15:25.955173: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-11-05 19:15:26.014190: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-11-05 19:15:27.983135: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.


### Reading Data

In [2]:
df = fetch_data("MSFT", (2020, 1, 1), (2025, 9, 16))
df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,trade_count,vwap
symbol,timestamp,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
MSFT,2020-01-02 05:00:00+00:00,158.78,160.73,158.33,160.62,25472631.0,175506.0,159.81551
MSFT,2020-01-03 05:00:00+00:00,158.32,159.945,158.06,158.62,24389239.0,166336.0,159.10112
MSFT,2020-01-06 05:00:00+00:00,157.08,159.1,156.51,159.03,24709110.0,148395.0,158.495004
MSFT,2020-01-07 05:00:00+00:00,159.32,159.67,157.32,157.58,24503429.0,167838.0,158.298227
MSFT,2020-01-08 05:00:00+00:00,158.93,160.8,157.9491,160.09,31748417.0,198632.0,159.714784


### Preprocessing and Visualization

In [3]:
X, y, scaler = stock_data_feature_engineering(df)

### Training & Evaluation

In [4]:
X_train, X_test, y_train, y_test = sequence_split(X, y)

model = train_model(X_train, y_train)

# Evaluate LSTM
y_pred = (model.predict(X_test) > 0.5).astype(int)
print("LSTM F1-score:", f1_score(y_test, y_pred))

  ys.append(y[i + time_steps])
2025-11-05 19:15:28.887015: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


Epoch 1/20
62/62 - 2s - 30ms/step - accuracy: 0.5015 - loss: 0.6955 - val_accuracy: 0.4865 - val_loss: 0.7024
Epoch 2/20
62/62 - 0s - 4ms/step - accuracy: 0.4975 - loss: 0.6942 - val_accuracy: 0.4775 - val_loss: 0.6987
Epoch 3/20
62/62 - 0s - 4ms/step - accuracy: 0.5328 - loss: 0.6922 - val_accuracy: 0.5135 - val_loss: 0.6979
Epoch 4/20
62/62 - 0s - 4ms/step - accuracy: 0.5298 - loss: 0.6898 - val_accuracy: 0.4955 - val_loss: 0.6992
Epoch 5/20
62/62 - 0s - 4ms/step - accuracy: 0.5439 - loss: 0.6913 - val_accuracy: 0.5135 - val_loss: 0.7013
Epoch 6/20
62/62 - 0s - 5ms/step - accuracy: 0.5156 - loss: 0.6931 - val_accuracy: 0.5225 - val_loss: 0.6992
Epoch 7/20
62/62 - 0s - 4ms/step - accuracy: 0.5308 - loss: 0.6865 - val_accuracy: 0.5135 - val_loss: 0.7062
Epoch 8/20
62/62 - 0s - 3ms/step - accuracy: 0.5631 - loss: 0.6828 - val_accuracy: 0.5405 - val_loss: 0.6997
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step
LSTM F1-score: 0.6191950464396285


In [5]:
realtime_bar = await get_one_realtime_bar("MSFT")

yesterday = dt.datetime.now(dt.UTC) - dt.timedelta(days=1)
fifty_days_in_past = dt.datetime.now(dt.UTC) - dt.timedelta(days=100)
hist_df = fetch_data("MSFT", 
                     start_date = (fifty_days_in_past.year, fifty_days_in_past.month, fifty_days_in_past.day), 
                     end_date = (yesterday.year, yesterday.month, yesterday.day))
hist_df = hist_df.reset_index()     # making sure there isnt any dobble index
tdf = pd.concat([hist_df.tail(50), realtime_bar])
tdf.shape


(51, 9)

In [6]:
X = stock_data_prediction_pipeline(tdf, scaler)

In [7]:
X_seq = np.expand_dims(X[-50:], axis=0)
real_time_prediction = model.predict(X_seq)

print(f"The stock will go (1 for up, 0 for down) = {(real_time_prediction > 0.5).astype(int)} tommorow with {real_time_prediction*100} % probability")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 119ms/step
The stock will go (1 for up, 0 for down) = [[1]] tommorow with [[52.809547]] % probability
