# IS4487 Week 11 - Practice Code

This notebook is designed to help you follow along with the **Week 11 Lecture and Reading**, introducing you to Regression.

The practice code demos are intended to give you a chance to see working code and can be a source for your lap and assignment work.  Each section contains short explanations and annotated code that reflect the steps in the reading.

### Topics for this demo:
- Create a linear regression to predict a numeric value
- Visualize the regression line

<a href="https://colab.research.google.com/github/Stan-Pugsley/is_4487_base/blob/main/Demos/demo_11_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


### Context: Apple (AAPL) Stock Price Prediction
We will use historical stock prices, in addition to two calculated metrics, to predict the stock price on the following day of trading. 

Variables include:

| Feature     | Description                                          | Type        |
| ----------- | ---------------------------------------------------- | ----------- |
| `Open`       | Beginning stock price on the given day                                  | Numeric     |
| `High`       | Maximum stock price on the given day     | Numeric |
| `Low`   | Minimum  stock price on the given day              | Numeric |
| `Close` | Closing stock price on the given day | Numeric |
| `Volume`   |  Number of shared traded on the given day                           | Numeric |
| `Direction`   | Comparison of open to close price                         | Numeric |
| `RSI`      | Relative Strength Index.  RSI compares the average gains to the average losses in the last 14 trading days                        | Numeric |
| `SMA_5`   | 5-period Simple Moving Average     | Numeric     |
| `Next_Day_Closing`   | Closing stock price on the next trading day         | Numeric |


Your task is to predict the balance of the customer account using all other variables. 

### Linear Regression

This model will predict the account balance, then compare the prediction to the actual values in a plot.

In [None]:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error
import matplotlib.pyplot as plt

# Load the dataset
url = "https://raw.githubusercontent.com/Stan-Pugsley/is_4487_base/refs/heads/main/DataSets/aapl_stock_prices.csv"
df = pd.read_csv(url, sep=',')

In [None]:
#Preview the data
df.head()

Prepare Data

In [None]:
# Select features and target for stock price prediction
features = ['Open', 'High', 'Low', 'Close', 'Volume', 'SMA_5', 'RSI']
target = 'Next_Day_Closing'

# Drop rows with missing values in selected features or target
df_cleaned = df.dropna(subset=features + [target])

# Define features (X) and target (y)
X = df_cleaned[features]
y = df_cleaned[target]

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display the shapes of the resulting datasets
print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("y_train shape:", y_train.shape)
print("y_test shape:", y_test.shape)

Create Model

In [None]:
# Create and train Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

Evaluate Model

In [None]:
# Evaluate
r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)

print(f"R² score: {r2:.2f}")
print(f"Mean Squared Error: {mse:.2f}")

Create Visualization

In [None]:
# Plot actual vs predicted balances
plt.figure(figsize=(8, 5))
plt.scatter(y_test, y_pred, alpha=0.5)
plt.xlabel('Actual Balance')
plt.ylabel('Predicted Balance')
plt.title('Actual vs Predicted Stock Prices')
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], color='red', linestyle='--')
plt.tight_layout()
plt.show()