## **Multiple Linear Regression Notebook**


### Teaching Tips for the Module

- **Concept**: Multiple Linear Regression models the relationship between one dependent variable and **two or more independent variables**.
- **Formula**:  
  \[ y = b_0 + b_1x_1 + b_2x_2 + \dots + b_nx_n\]
- **Dataset Insight**:
  - Columns: R&D Spend, Administration, Marketing Spend, State, Profit
  - “Profit” is predicted based on various spending areas and location.

- **Skills Students Will Learn**:
  - Handling categorical data with OneHotEncoding
  - Fitting multiple linear regression
  - Evaluating model performance using R² score
  - Making comparisons and visualizing results


1. Manually applies **One-Hot Encoding** on the **State** column (without `ColumnTransformer`),  
2. Applies **Feature Scaling** using `StandardScaler` to see if normalization helps performance,  
3. Follows the structure from your previous notebooks.


### Key Notes for Students:
- **Why `drop_first=True`?** Avoids multicollinearity by removing one dummy column.
- **Why Scale the Features?** Standardization helps the model converge faster and can improve accuracy, especially if you plan to extend to algorithms sensitive to feature magnitudes (e.g., gradient-based models).
- **Model Evaluation:** Introduced R² score to measure how well the model fits the data.

In [None]:
# 1. Importing the Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler
from sklearn.metrics import r2_score

In [None]:
# 2. Loading the Dataset
dataset = pd.read_csv('50_Startups.csv')

In [None]:
# 3. Separating Features and Target
X = dataset.iloc[:, :-1]  # All columns except 'Profit'
y = dataset.iloc[:, -1]   # 'Profit' column

In [None]:
# 4. One-Hot Encoding the 'State' Column (manually)
state = pd.get_dummies(X['State'], drop_first=True)  # Drop first to avoid dummy variable trap
X = X.drop('State', axis=1)
X = pd.concat([state, X], axis=1)

In [None]:
# 5. Feature Scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [None]:
# 6. Splitting the Dataset into Training and Test Sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=0)

In [None]:
# 7. Training the Multiple Linear Regression Model
regressor = LinearRegression()
regressor.fit(X_train, y_train)

In [None]:
# 8. Making Predictions
y_pred = regressor.predict(X_test)

In [None]:
# 9. Comparing Actual vs Predicted
comparison = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
print(comparison)

In [None]:
# 10. Evaluating the Model
print("R² Score:", r2_score(y_test, y_pred))

In [None]:
# 11. Visualization (Optional)
plt.scatter(range(len(y_test)), y_test, color='blue', label='Actual')
plt.scatter(range(len(y_pred)), y_pred, color='red', label='Predicted')
plt.title('Actual vs Predicted Profits')
plt.xlabel('Sample Index')
plt.ylabel('Profit')
plt.legend()
plt.show()