# **Summer Session: Machine Learning for High School Students**  
## **Week 6: Linear Regression - Theory and Implementation**  

### **1. Introduction to Linear Regression (30 min)**  
**Objective:** Understand the mathematical foundations of linear regression, including different optimization methods.  

#### **1.1 What is Linear Regression?**  
- A supervised learning algorithm for predicting continuous numerical values.  
- **Example Applications:**  
  - Predicting house prices based on square footage.  
  - Estimating student test scores based on study hours.  

#### **1.2 Simple Linear Regression Equation**  
The model assumes a linear relationship between input `X` and output `y`:  

$$
y = \beta_0 + \beta_1 X + \epsilon
$$  

- $y$: Target variable (dependent variable).  
- $X$: Feature (independent variable).  
- $ \beta_0 $: Intercept (bias term).  
- $ \beta_1 $: Slope (coefficient).  
- $ \epsilon $: Error term (residuals).  

#### **1.3 Cost Function: Mean Squared Error (MSE)**  
The goal is to minimize the difference between predicted and actual values:  

$$
MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

- $ n $: Number of data points.  
- $ y_i $: Actual value.  
- $ \hat{y}_i $: Predicted value.  

#### **1.4 Optimization Methods**  

##### **Method 1: Normal Equation (Closed-Form Solution)**  
- Directly computes optimal coefficients without iteration.  
- Formula:  

$$
\beta = (X^T X)^{-1} X^T y
$$  

**Advantages:**  
✔ Exact solution (no approximation).  
✔ Works well for small datasets.  

**Disadvantages:**  
✖ Computationally expensive for large datasets $(O(n^3)$).  
✖ Requires matrix inversion (fails if $X^T X$ is singular).  

##### **Method 2: Gradient Descent (Iterative Approach)**  
- Gradually adjusts coefficients to minimize MSE.  
- Update rule:  

$$
\beta_j := \beta_j - \alpha \frac{\partial}{\partial \beta_j} MSE
$$

- $\alpha$: Learning rate (controls step size).  

**Advantages:**  
✔ Scalable for large datasets (\(O(n)\) per iteration).  
✔ Works well with high-dimensional data.  

**Disadvantages:**  
✖ Requires tuning learning rate.  
✖ May converge to local minima (rare in linear regression).  

---

### **2. Implementing Linear Regression in Python (60 min)**  

#### **Method 1: Using Scikit-learn**  
```python
from sklearn.linear_model import LinearRegression

model = LinearRegression()  # Uses Normal Equation internally
model.fit(X_train, y_train)
print(f"Slope (β₁): {model.coef_[0]}, Intercept (β₀): {model.intercept_}")
```

#### **Method 2: Normal Equation from Scratch**  
```python
def normal_equation(X, y):
    X_b = np.c_[np.ones((len(X), 1)), X]  # Add bias term (β₀)
    beta = np.linalg.inv(X_b.T @ X_b) @ X_b.T @ y
    return beta

# Usage
X = np.array([1000, 1500, 1200, 1800, 2000])
y = np.array([300000, 400000, 350000, 450000, 500000])

beta = normal_equation(X.reshape(-1, 1), y)
print(f"Intercept (β₀): {beta[0]}, Slope (β₁): {beta[1]}")
```

#### **Method 3: Gradient Descent from Scratch**  
```python
def gradient_descent(X, y, learning_rate=0.01, epochs=1000):
    n = len(X)
    beta = np.random.randn(2, 1)  # Random initialization
    
    for _ in range(epochs):
        X_b = np.c_[np.ones((n, 1)), X]  # Add bias term
        y_pred = X_b @ beta
        gradients = (2/n) * X_b.T @ (y_pred - y)
        beta -= learning_rate * gradients
    
    return beta

beta = gradient_descent(X.reshape(-1, 1), y.reshape(-1, 1))
print(f"Intercept (β₀): {beta[0][0]}, Slope (β₁): {beta[1][0]}")
```

---

### **3. Hands-on Exercise (30 min)**  
**Task:** Compare Normal Equation and Gradient Descent on the `diabetes` dataset.  

1. Load the dataset:  
   ```python
   from sklearn.datasets import load_diabetes
   data = load_diabetes()
   X, y = data.data, data.target
   ```  
2. Implement both methods.  
3. Compare runtime and coefficients.  

**Discussion Questions:**  
- Which method is faster for small datasets?  
- What happens if \(X^T X\) is non-invertible?  
- How does learning rate affect gradient descent?  

---

### **Summary**  
- **Theory:**  
  - Linear regression models relationships using coefficients.  
  - Normal equation is exact but slow for large data.  
  - Gradient descent is scalable but requires tuning.  
- **Coding:**  
  - Implemented regression using Scikit-learn, normal equation, and gradient descent.  
- **Next Week:** Decision Trees for classification!

# Part 01: Linear Regression using sklearn

In [140]:
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict on test set
y_pred = model.predict(X_test)

# Display predictions
print("\n🔮 Synthetic Regression Predictions:")
for i in range(5):
    print(f"Actual: {y_test[i]:.2f}, Predicted: {y_pred[i]:.2f}")



🔮 Synthetic Regression Predictions:
Actual: -48.95, Predicted: -58.67
Actual: 89.03, Predicted: 65.49
Actual: 44.41, Predicted: 36.05
Actual: -5.91, Predicted: -17.25
Actual: -7.62, Predicted: -10.26


In [141]:
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Load data
housing = fetch_california_housing()
X = housing.data
y = housing.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict on test set
y_pred = model.predict(X_test)

# Display predictions
print("\n🏡 California Housing Predictions:")
for i in range(5):
    print(f"Actual: {y_test[i]:.2f}, Predicted: {y_pred[i]:.2f}")



🏡 California Housing Predictions:
Actual: 0.48, Predicted: 0.72
Actual: 0.46, Predicted: 1.76
Actual: 5.00, Predicted: 2.71
Actual: 2.19, Predicted: 2.84
Actual: 2.78, Predicted: 2.60


# Using normal equation

In [142]:
import numpy as np

def normal_equation(X, y):
    # Add bias term (column of ones for intercept)
    X_b = np.c_[np.ones((len(X), 1)), X]
    beta = np.linalg.inv(X_b.T @ X_b) @ X_b.T @ y
    return beta

# Training data: square footage vs. house price
X_train = np.array([1000, 1500, 1200, 1800, 2000])
y_train = np.array([300000, 400000, 350000, 450000, 500000])

# Fit model using normal equation
beta = normal_equation(X_train.reshape(-1, 1), y_train)

# Display learned parameters
print(f"Intercept (β₀): {beta[0]:.2f}, Slope (β₁): {beta[1]:.2f}")

# Test data: new square footage values
X_test = np.array([1100, 1600, 2100])
X_test_b = np.c_[np.ones((len(X_test), 1)), X_test]  # Add bias term

# Predict house prices
y_pred = X_test_b @ beta

# Display predictions
print("\n🔮 Test Predictions:")
for sqft, price in zip(X_test, y_pred):
    print(f"Square Footage: {sqft}, Predicted Price: ${price:,.2f}")


Intercept (β₀): 113235.29, Slope (β₁): 191.18

🔮 Test Predictions:
Square Footage: 1100, Predicted Price: $323,529.41
Square Footage: 1600, Predicted Price: $419,117.65
Square Footage: 2100, Predicted Price: $514,705.88


In [143]:
# price = 500k
# price = 200k + 10 * num_rooms + 20 * sq_foot
# y = b_0 + b1 x1 + b2 x2  linear equation

# y = b0 + b11 x1 + b12 x1^2 + b21 x21 + b21 * x21^2 # polynomial , not linear.

In [144]:
# y= a + bx
#

In [145]:
# y = price
# we have three houses
# 500k
# 300k
# 200k

In [146]:
# X = sqft, num_bedroom, num_bathroom, num_swimming_pools
# #bed
# 2
# 3
# 4


# bath
# 1
# 2
# 5


# X = [ sqft, bedroom, bathroom, pools]




In [147]:
note = r"""

β=(XTX)−1XTy


Xt = transpose X


"""

X = np.array([1000, 1500, 1200, 1800, 2000])
X

array([1000, 1500, 1200, 1800, 2000])

In [148]:
X.T

array([1000, 1500, 1200, 1800, 2000])

In [149]:
# Scikit-learn

In [150]:
import sklearn

In [151]:
sklearn.__version__

'1.6.1'

In [152]:
from sklearn import linear_model

In [153]:
print( [i for i in dir(linear_model) if i[0] != '_'])

['ARDRegression', 'BayesianRidge', 'ElasticNet', 'ElasticNetCV', 'GammaRegressor', 'HuberRegressor', 'Lars', 'LarsCV', 'Lasso', 'LassoCV', 'LassoLars', 'LassoLarsCV', 'LassoLarsIC', 'LinearRegression', 'LogisticRegression', 'LogisticRegressionCV', 'MultiTaskElasticNet', 'MultiTaskElasticNetCV', 'MultiTaskLasso', 'MultiTaskLassoCV', 'OrthogonalMatchingPursuit', 'OrthogonalMatchingPursuitCV', 'PassiveAggressiveClassifier', 'PassiveAggressiveRegressor', 'Perceptron', 'PoissonRegressor', 'QuantileRegressor', 'RANSACRegressor', 'Ridge', 'RidgeCV', 'RidgeClassifier', 'RidgeClassifierCV', 'SGDClassifier', 'SGDOneClassSVM', 'SGDRegressor', 'TheilSenRegressor', 'TweedieRegressor', 'enet_path', 'lars_path', 'lars_path_gram', 'lasso_path', 'orthogonal_mp', 'orthogonal_mp_gram', 'ridge_regression']


In [154]:
from sklearn.linear_model import LinearRegression

# Part 01: Read the data

In [155]:
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Load data
housing = fetch_california_housing()
X = housing.data
y = housing.target



In [156]:
type(X)

numpy.ndarray

In [157]:
X.shape

(20640, 8)

In [158]:
cols = housing.feature_names
cols

['MedInc',
 'HouseAge',
 'AveRooms',
 'AveBedrms',
 'Population',
 'AveOccup',
 'Latitude',
 'Longitude']

In [159]:
target = housing.target_names[0]
target

'MedHouseVal'

In [160]:
import pandas as pd

In [161]:
df = pd.DataFrame(X, columns=cols)
df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


In [162]:
target_series = pd.Series(y, name=target)
target_series.head()

Unnamed: 0,MedHouseVal
0,4.526
1,3.585
2,3.521
3,3.413
4,3.422


In [163]:
data = pd.concat([df, target_series], axis=1)
data.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


In [164]:
housing.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'feature_names', 'DESCR'])

In [165]:
print(housing.DESCR)

.. _california_housing_dataset:

California Housing dataset
--------------------------

**Data Set Characteristics:**

:Number of Instances: 20640

:Number of Attributes: 8 numeric, predictive attributes and the target

:Attribute Information:
    - MedInc        median income in block group
    - HouseAge      median house age in block group
    - AveRooms      average number of rooms per household
    - AveBedrms     average number of bedrooms per household
    - Population    block group population
    - AveOccup      average number of household members
    - Latitude      block group latitude
    - Longitude     block group longitude

:Missing Attribute Values: None

This dataset was obtained from the StatLib repository.
https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

The target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).

This dataset was derived from the 1990 U.S. census, using one row per ce

# Data Manipulation

In [166]:
# Add feature0 * feature1, feature0 * feature1 to numpy arrray X

feat1 = X[:, 0]
feat2 = X[:, 1]

feat12 = feat1 * feat2
feat12

array([341.3332, 174.3294, 377.3848, ...,  28.9   ,  33.6096,  38.2176])

In [167]:
feat13 = feat1 ** 2
feat13

array([69.30895504, 68.91324196, 52.66985476, ...,  2.89      ,
        3.48643584,  5.70540996])

In [168]:
X = np.c_[X, feat12, feat13]
X

array([[   8.3252    ,   41.        ,    6.98412698, ..., -122.23      ,
         341.3332    ,   69.30895504],
       [   8.3014    ,   21.        ,    6.23813708, ..., -122.22      ,
         174.3294    ,   68.91324196],
       [   7.2574    ,   52.        ,    8.28813559, ..., -122.24      ,
         377.3848    ,   52.66985476],
       ...,
       [   1.7       ,   17.        ,    5.20554273, ..., -121.22      ,
          28.9       ,    2.89      ],
       [   1.8672    ,   18.        ,    5.32951289, ..., -121.32      ,
          33.6096    ,    3.48643584],
       [   2.3886    ,   16.        ,    5.25471698, ..., -121.24      ,
          38.2176    ,    5.70540996]])

In [169]:
# np.c_ or np.r_ # c = column concatenate, r is row concatenate

In [170]:
cols = cols + ['feature0 * feature1', 'feature0^2']
cols

['MedInc',
 'HouseAge',
 'AveRooms',
 'AveBedrms',
 'Population',
 'AveOccup',
 'Latitude',
 'Longitude',
 'feature0 * feature1',
 'feature0^2']

# Part 02: Split train and test data (data preparation)

In [171]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [172]:
X_train

array([[   3.2596    ,   33.        ,    5.0176565 , ..., -117.03      ,
         107.5668    ,   10.62499216],
       [   3.8125    ,   49.        ,    4.47354497, ..., -118.16      ,
         186.8125    ,   14.53515625],
       [   4.1563    ,    4.        ,    5.64583333, ..., -120.48      ,
          16.6252    ,   17.27482969],
       ...,
       [   2.9344    ,   36.        ,    3.98671727, ..., -118.38      ,
         105.6384    ,    8.61070336],
       [   5.7192    ,   15.        ,    6.39534884, ..., -121.96      ,
          85.788     ,   32.70924864],
       [   2.5755    ,   52.        ,    3.40257649, ..., -122.42      ,
         133.926     ,    6.63320025]])

In [173]:
X_train_df = pd.DataFrame(X_train, columns=cols)
X_train_df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,feature0 * feature1,feature0^2
0,3.2596,33.0,5.017657,1.006421,2300.0,3.691814,32.71,-117.03,107.5668,10.624992
1,3.8125,49.0,4.473545,1.041005,1314.0,1.738095,33.77,-118.16,186.8125,14.535156
2,4.1563,4.0,5.645833,0.985119,915.0,2.723214,34.66,-120.48,16.6252,17.27483
3,1.9425,36.0,4.002817,1.033803,1418.0,3.994366,32.69,-117.11,69.93,3.773306
4,3.5542,43.0,6.268421,1.134211,874.0,2.3,36.78,-119.8,152.8306,12.632338


In [174]:
X_test_df = pd.DataFrame(X_test, columns=cols)
X_test_df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,feature0 * feature1,feature0^2
0,1.6812,25.0,4.192201,1.022284,1392.0,3.877437,36.06,-119.01,42.03,2.826433
1,2.5313,30.0,5.039384,1.193493,1565.0,2.679795,35.14,-119.46,75.939,6.40748
2,3.4801,52.0,3.977155,1.185877,1310.0,1.360332,37.8,-122.44,180.9652,12.111096
3,5.7376,17.0,6.163636,1.020202,1705.0,3.444444,34.28,-118.72,97.5392,32.920054
4,3.725,34.0,5.492991,1.028037,1063.0,2.483645,36.62,-121.93,126.65,13.875625


In [175]:
X_train.shape, X_test.shape

((16512, 10), (4128, 10))

# Part 03: Modelling

In [176]:
# Train model
model = LinearRegression()
model.fit(X_train, y_train)

In [177]:
# help(LinearRegression)

# Part 04: Prediction

In [178]:
X_test[0]

array([ 1.68120000e+00,  2.50000000e+01,  4.19220056e+00,  1.02228412e+00,
        1.39200000e+03,  3.87743733e+00,  3.60600000e+01, -1.19010000e+02,
        4.20300000e+01,  2.82643344e+00])

In [179]:
X_test.shape

(4128, 10)

In [180]:
X_test_df.head(4)

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,feature0 * feature1,feature0^2
0,1.6812,25.0,4.192201,1.022284,1392.0,3.877437,36.06,-119.01,42.03,2.826433
1,2.5313,30.0,5.039384,1.193493,1565.0,2.679795,35.14,-119.46,75.939,6.40748
2,3.4801,52.0,3.977155,1.185877,1310.0,1.360332,37.8,-122.44,180.9652,12.111096
3,5.7376,17.0,6.163636,1.020202,1705.0,3.444444,34.28,-118.72,97.5392,32.920054


In [181]:
# Predict on test set
y_pred = model.predict(X_test)

# Display predictions
print("\n🏡 California Housing Predictions:")
for i in range(5):
    print(f"Actual: {y_test[i]:.2f}, Predicted: {y_pred[i]:.2f}")


🏡 California Housing Predictions:
Actual: 0.48, Predicted: 0.65
Actual: 0.46, Predicted: 1.73
Actual: 5.00, Predicted: 2.76
Actual: 2.19, Predicted: 2.86
Actual: 2.78, Predicted: 2.62


In [182]:
y_pred

array([0.64650484, 1.73069768, 2.76183836, ..., 4.33119992, 1.16786446,
       2.05926499])

In [183]:
target

'MedHouseVal'

In [184]:
y_pred_df = pd.Series(y_pred, name='predicted'+ "_" + target)
y_pred_df.head()

Unnamed: 0,predicted_MedHouseVal
0,0.646505
1,1.730698
2,2.761838
3,2.860529
4,2.616051


In [185]:
df_prediction = pd.concat([X_test_df, y_pred_df], axis=1)
df_prediction.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,feature0 * feature1,feature0^2,predicted_MedHouseVal
0,1.6812,25.0,4.192201,1.022284,1392.0,3.877437,36.06,-119.01,42.03,2.826433,0.646505
1,2.5313,30.0,5.039384,1.193493,1565.0,2.679795,35.14,-119.46,75.939,6.40748,1.730698
2,3.4801,52.0,3.977155,1.185877,1310.0,1.360332,37.8,-122.44,180.9652,12.111096,2.761838
3,5.7376,17.0,6.163636,1.020202,1705.0,3.444444,34.28,-118.72,97.5392,32.920054,2.860529
4,3.725,34.0,5.492991,1.028037,1063.0,2.483645,36.62,-121.93,126.65,13.875625,2.616051


# Part 05: Model Evaluation

In [186]:
y_test

array([0.477  , 0.458  , 5.00001, ..., 5.00001, 0.723  , 1.515  ])

In [187]:
y_test_ser = pd.Series(y_test, name=target)
y_test_ser.head()

Unnamed: 0,MedHouseVal
0,0.477
1,0.458
2,5.00001
3,2.186
4,2.78


In [188]:
df_pred = pd.concat([df_prediction, y_test_ser], axis=1)
df_pred.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,feature0 * feature1,feature0^2,predicted_MedHouseVal,MedHouseVal
0,1.6812,25.0,4.192201,1.022284,1392.0,3.877437,36.06,-119.01,42.03,2.826433,0.646505,0.477
1,2.5313,30.0,5.039384,1.193493,1565.0,2.679795,35.14,-119.46,75.939,6.40748,1.730698,0.458
2,3.4801,52.0,3.977155,1.185877,1310.0,1.360332,37.8,-122.44,180.9652,12.111096,2.761838,5.00001
3,5.7376,17.0,6.163636,1.020202,1705.0,3.444444,34.28,-118.72,97.5392,32.920054,2.860529,2.186
4,3.725,34.0,5.492991,1.028037,1063.0,2.483645,36.62,-121.93,126.65,13.875625,2.616051,2.78


In [189]:
dfx = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
dfx.head()

Unnamed: 0,Actual,Predicted
0,0.477,0.646505
1,0.458,1.730698
2,5.00001,2.761838
3,2.186,2.860529
4,2.78,2.616051


In [190]:
dfx['difference'] = dfx['Actual'] - dfx['Predicted']
dfx.head()

Unnamed: 0,Actual,Predicted,difference
0,0.477,0.646505,-0.169505
1,0.458,1.730698,-1.272698
2,5.00001,2.761838,2.238172
3,2.186,2.860529,-0.674529
4,2.78,2.616051,0.163949


In [191]:
dfx['difference_abs'] = dfx['difference'].abs()
dfx.head()

Unnamed: 0,Actual,Predicted,difference,difference_abs
0,0.477,0.646505,-0.169505,0.169505
1,0.458,1.730698,-1.272698,1.272698
2,5.00001,2.761838,2.238172,2.238172
3,2.186,2.860529,-0.674529,0.674529
4,2.78,2.616051,0.163949,0.163949


In [192]:
dfx['difference_abs'].mean()

np.float64(0.531334531490923)

In [193]:
dfx['difference_square'] = dfx['difference'] ** 2
dfx.head()

Unnamed: 0,Actual,Predicted,difference,difference_abs,difference_square
0,0.477,0.646505,-0.169505,0.169505,0.028732
1,0.458,1.730698,-1.272698,1.272698,1.619759
2,5.00001,2.761838,2.238172,2.238172,5.009412
3,2.186,2.860529,-0.674529,0.674529,0.45499
4,2.78,2.616051,0.163949,0.163949,0.026879


In [194]:
dfx['difference_square'].mean()

np.float64(0.5533710793681178)

In [195]:
dfx['difference_square'].mean() ** 0.5

np.float64(0.7438891579853263)

In [196]:
note = r"""
RMSE


root
mean
squared = (y_true - y_pred) ** 2
error = y_true - y_pred


"""

In [197]:
from sklearn.metrics import mean_squared_error

In [198]:
mean_squared_error(y_test, y_pred)

0.5533710793681178

In [199]:
from sklearn.metrics import root_mean_squared_error

In [200]:
root_mean_squared_error(y_test, y_pred) # 0.7455813830127749

0.7438891579853263

In [201]:
# modelA ==> RMSE = 0.7455813830127749
# modelB ==> RMSE = 0.7438891579853263

# WHICH IS BETTER?