# AI Training Using Linear Regression
====================================

This notebook is used to train an AI model to predict the future price of a stock.
The goal is to use this model to predict the future price of a stock and to use this data to train a machine learning model to predict the future price of a stock.

In [2]:
import joblib
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

print('All imports successful')

All imports successful


### Set up the data and look at it to make sure its correct and organized well

In [3]:
# Load the data
data = pd.read_csv("dummy_data_cleaned.csv")

In [None]:
# Convert it to a pandas DataFrame and print the first 5 rows
df = pd.DataFrame(data)
df.head()

# Check the min and max of the Close column for each Ticker, can also be done in data wrangler but eh idc <(＿　＿)>
print(f"Min = {df['Close'].min()}, \nMax = {df['Close'].max()}")


Min = 20.63808822631836, 
Max = 642.5139770507812


In [None]:
# Scale the data
scaler = MinMaxScaler(feature_range=(0, 1))

# Apply scaler only to numerical columns
numeric_cols = ["Open", "High", "Low", "Close", "Volume"]
df[numeric_cols] = scaler.fit_transform(df[numeric_cols])

In [21]:
# Create the sequences for the past 5 days
def create_sequences(group, seq_length=5):
	X, y = [], []
	for i in range(len(group) - seq_length):
		X.append(group[i:i + seq_length].flatten())
		y.append(group[i + seq_length][-1])  # Predict next day's Close
	return np.array(X), np.array(y)

In [22]:
X, y = [], []
for symbol, group in df.groupby("Ticker"):
	group_data = group[["Open", "High", "Low", "Close", "Volume"]].values
	X_seq, y_seq = create_sequences(group_data)
	X.extend(X_seq)
	y.extend(y_seq)

X, y = np.array(X), np.array(y)

In [23]:
# Use sklearns train_test_split to split the data into training and testing sets

# Split the data 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Here we train the model and determine the R^2 score

**REMEMBER:**

R² Score Range:
* 1.0 → Perfect fit (ideal but rare)
* 0.0 → No predictive power (as good as predicting the mean)
* Negative → Worse than a random guess (bad model!)

In [24]:
# Train the model and evaluate the modals R^2 score
model = LinearRegression()
model.fit(X_train, y_train)

train_score = model.score(X_train, y_train)
test_score = model.score(X_test, y_test)
print(f"Train R² Score: {train_score:.4f}")
print(f"Test R² Score: {test_score:.4f}")

Train R² Score: 0.8804
Test R² Score: 0.9356


In [25]:
# Finally dump the model to a joblib file
joblib.dump(model, "stock_model.pkl")
joblib.dump(scaler, "scaler.pkl")

print("Stock Linear Regression Model Trained and Saved!")

Stock Linear Regression Model Trained and Saved!
