# IS4487 Week 11 - Practice Code

This notebook is designed to help you follow along with the **Week 11 Lecture and Reading**, introducing you to Regression.

The practice code demos are intended to give you a chance to see working code and can be a source for your lap and assignment work.  Each section contains short explanations and annotated code that reflect the steps in the reading.

### Topics for this demo:
- Create a linear regression to predict a numeric value
- Visualize the regression line

<a href="https://colab.research.google.com/github/Stan-Pugsley/is_4487_base/blob/main/Demos/demo_11_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


### Context: Financial Services Marketing
We will use a classic UCI banking dataset.  Variables include:

| Feature     | Description                                          | Type        |
| ----------- | ---------------------------------------------------- | ----------- |
| `Open`       | Beginning stock price on the given day                                  | Numeric     |
| `High`       | Maximum stock price on the given day     | Numeric |
| `Low`   | Minimum  stock price on the given day              | Numeric |
| `Close` | Closing stock price on the given day | Numeric |
| `Volume`   |  Number of shared traded on the given day                           | Numeric |
| `Direction`   | Comparison of open to close price                         | Numeric |
| `RSI`      | Relative Strength Index.  RSI compares the average gains to the average losses in the last 14 trading days                        | Numeric |
| `SMA_5`   | 5-period Simple Moving Average     | Numeric     |
| `Next Day Closing`   | Closing stock price on the next trading day         | Numeric |


Your task is to predict the balance of the customer account using all other variables. 

### Linear Regression

This model will predict the account balance, then compare the prediction to the actual values in a plot.

In [None]:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error
import matplotlib.pyplot as plt

# Load the dataset
url = "https://raw.githubusercontent.com/Stan-Pugsley/is_4487_base/refs/heads/main/DataSets/aapl_stock_prices.csv"
df = pd.read_csv(url, sep=',')

In [None]:
#Preview the data
df.head()

Prepare Data

In [None]:
# Select a few relevant features (both numeric and categorical)
features = ['age', 'job', 'marital', 'education', 'housing', 'loan']
target = 'balance'

# One-hot encode categorical variables
df_encoded = pd.get_dummies(df[features], drop_first=True)

# Target variable
y = df[target]

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(df_encoded, y, test_size=0.2, random_state=42)

# Look at the shapes of the four dataframes
print("Rows and columns in X_train:", X_train.shape)
print("Rows and columns in X_test:", X_test.shape)
print("Rows and columns in y_train:", y_train.shape)
print("Rows and columns in y_test:", y_test.shape)

Create Model

In [None]:
# Create and train Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

Evaluate Model

In [None]:
# Evaluate
r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)

print(f"R² score: {r2:.2f}")
print(f"Mean Squared Error: {mse:.2f}")

Create Visualization

In [None]:
# Plot actual vs predicted balances
plt.figure(figsize=(8, 5))
plt.scatter(y_test, y_pred, alpha=0.5)
plt.xlabel('Actual Balance')
plt.ylabel('Predicted Balance')
plt.title('Actual vs Predicted Bank Balances')
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], color='red', linestyle='--')
plt.tight_layout()
plt.show()