2. Linear Regression

1) Linear Regression
There are two types: simple linear regression and multiple linear regression.
We usually use multiple linear regression because it’s rare to have only one feature; we often have two or more features.

2) Cost Function
The cost function (also called loss function or objective function) helps us measure the distance between the data and the model.
It calculates the error (the difference between the real values and predicted values).
We try to find the best settings (parameters) that make the mean squared error as small as possible.

3) Gradient Descent
This is a way to find the error and make it smaller.
We look for the point where the slope (gradient) of the cost function is the smallest, and that helps us create a model with less error.

In [2]:
# imoprt Library 

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error


In [3]:
# import Data

from sklearn.datasets import load_diabetes

def make_dataset():
    dataset = load_diabetes()
    
    df = pd.DataFrame(dataset.data, columns=dataset.feature_names)
    df['target'] = dataset.target 
    
    X_train, X_test, y_train, y_test = train_test_split(
        df.drop('target', axis=1), df['target'], test_size=0.2, random_state=1--4)
    
    return X_train, X_test, y_train, y_test


X_train, X_test, y_train, y_test = make_dataset()
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((353, 10), (89, 10), (353,), (89,))

In [4]:
# Linear Regression

from sklearn.linear_model import LinearRegression

model = LinearRegression()

model.fit(X_train, y_train)

pred = model.predict(X_test)

mean_squared_error(y_test, pred)


2981.5854714667616

3. Ridge Regression

- Ridge regression is a model used to fix the problem of overfitting by adding regularization.
- The regularization method used is called L2 regularization.
- We can adjust the regularization by changing the alpha value.
- As the alpha value gets bigger, it makes the regression coefficients smaller.
- Regularization helps make the model better and more reliable.

In [5]:
# Ridge Regression

from sklearn.linear_model import Ridge

model = Ridge(alpha=1)
model.fit(X_train, y_train)
pred = model.predict(X_test)

mean_squared_error(y_test, pred)

3584.0955479468957

In [6]:
# Coefficents

coef = pd.DataFrame(data=model.coef_, index=X_train.columns, columns=['alpha1'])
coef

Unnamed: 0,alpha1
age,34.980295
sex,-70.418618
bmi,289.990899
bp,197.301093
s1,14.361706
s2,-6.028385
s3,-140.961042
s4,112.855692
s5,215.648129
s6,95.414386


In [7]:
# Ridge Regression with Coefficents (alpha=10)

model = Ridge(alpha=10)
model.fit(X_train, y_train)
pred = model.predict(X_test)
mean_squared_error(y_test, pred)


5312.538476807287

In [8]:
# Coefficients(alpha =10)

coef['alpha10'] = model.coef_ 
coef

Unnamed: 0,alpha1,alpha10
age,34.980295,18.239772
sex,-70.418618,-0.179466
bmi,289.990899,64.638525
bp,197.301093,48.296866
s1,14.361706,16.116114
s2,-6.028385,12.650269
s3,-140.961042,-39.044831
s4,112.855692,38.730542
s5,215.648129,54.157253
s6,95.414386,34.314453


In [9]:
# Ridge Regression with Coefficents (alpha=0.05)

model = Ridge(alpha=0.05)
model.fit(X_train, y_train)
pred = model.predict(X_test)
mean_squared_error(y_test, pred)


3011.220004888878

In [10]:
# Coefficients(alpha =0.05)

coef['alpha0.05'] = model.coef_ 
coef

Unnamed: 0,alpha1,alpha10,alpha0.05
age,34.980295,18.239772,10.155109
sex,-70.418618,-0.179466,-228.627783
bmi,289.990899,64.638525,510.650019
bp,197.301093,48.296866,331.433858
s1,14.361706,16.116114,-120.923424
s2,-6.028385,12.650269,-5.901966
s3,-140.961042,-39.044831,-181.838744
s4,112.855692,38.730542,132.533346
s5,215.648129,54.157253,407.55023
s6,95.414386,34.314453,51.392493


4. Lasso Regression

- Lasso regression is a model that also helps with the problem of overfitting by using regularization.
- The regularization method here is called L1 regularization (it helps pick only the most important features).
- Compared to L2, Lasso makes the regression coefficients drop more sharply, keeping only the features that matter and turning others into zero.

Quick Reminder:
Ridge Regression uses L2 regularization.
Lasso Regression uses L1 regularization, which picks the important features and leaves out the rest.

In [11]:
# Lasso Regression

from sklearn.linear_model import Lasso

model = Lasso(alpha =1)
model.fit(X_train, y_train)
pred = model.predict(X_test)

mean_squared_error(y_test, pred)


4157.679048082886

In [12]:
# Coefficients 

coef = pd.DataFrame(data = model.coef_, index = X_train.columns, columns = ['alpha1'])
coef

Unnamed: 0,alpha1
age,0.0
sex,-0.0
bmi,405.016266
bp,39.860492
s1,0.0
s2,0.0
s3,-0.0
s4,0.0
s5,220.846547
s6,0.0


In [13]:
# Lasso Regression with Coefficents(alpha=2)

model = Lasso(alpha =2)
model.fit(X_train, y_train)
pred = model.predict(X_test)

mean_squared_error(y_test, pred)


6043.327017075469

In [14]:
#  Coefficents (alpha=2)

coef['alpha2'] = model.coef_
coef

Unnamed: 0,alpha1,alpha2
age,0.0,0.0
sex,-0.0,0.0
bmi,405.016266,83.332652
bp,39.860492,0.0
s1,0.0,0.0
s2,0.0,0.0
s3,-0.0,-0.0
s4,0.0,0.0
s5,220.846547,0.0
s6,0.0,0.0


In [15]:
# Lasso Regression with Coefficents(alpha=0.05)

model = Lasso(alpha =0.05)
model.fit(X_train, y_train)
pred = model.predict(X_test)

mean_squared_error(y_test, pred)

3026.9096979632095

In [16]:
#  Coefficents (alpha=0.05)

coef['alpha0.05'] = model.coef_
coef

Unnamed: 0,alpha1,alpha2,alpha0.05
age,0.0,0.0,0.0
sex,-0.0,0.0,-199.056361
bmi,405.016266,83.332652,533.170336
bp,39.860492,0.0,319.457884
s1,0.0,0.0,-40.581851
s2,0.0,0.0,-0.0
s3,-0.0,-0.0,-254.140254
s4,0.0,0.0,4.901003
s5,220.846547,0.0,423.892247
s6,0.0,0.0,21.973553


5. ElasticNet Regression

- ElasticNet regression helps with overfitting by combining L2 and L1 regularization.
- It takes longer to run because it uses both methods to choose important features.

Quick Reminder:
Ridge Regression uses L2 regularization.
Lasso Regression uses L1 regularization (picks important features).
ElasticNet Regression uses both L2 + L1 regularization.

In [17]:
# ElasticNet Regression

from sklearn.linear_model import ElasticNet

model = ElasticNet(alpha = 1)

model.fit(X_train, y_train)

pred = model.predict(X_test)

mean_squared_error(y_test, pred)

6302.844051697682

In [18]:
# ElasticNet Regression with coefficents (alpha , l1_ratio)

from sklearn.linear_model import ElasticNet

model = ElasticNet(alpha = 0.0001, l1_ratio = 0.6)

model.fit(X_train, y_train)

pred = model.predict(X_test)

mean_squared_error(y_test, pred)

2988.9507985386003

6. RandomForest vs XGBoost

1) RandomForest

 It uses many decision trees.
 It's a bagging method where we take random samples from the data.
 At the end, it makes a decision by majority vote from all the trees.

2) XGBoost

XGBoost is a very powerful tree method.
It stands for eXtreme Gradient Boosting.
It uses weak learners that keep improving to make a stronger model.

In [19]:

# RandomForest Regression 

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor()
model.fit(X_train, y_train)
pred = model.predict(X_test)

mean_squared_error(y_test, pred)


3030.605039325843

In [20]:
# XGBoost


from xgboost import XGBRegressor

model = XGBRegressor()
model.fit(X_train, y_train)
pred = model.predict(X_test)

mean_squared_error(y_test, pred)




4260.163796212525

7. Improving Model Performance with Hyperparameter Tuning

You can use GridSearchCV and RandomizedSearchCV from scikit-learn to find the best settings for your model.

1) GridSearchCV

It tries all possible combinations of settings to find the best one.
This can take a lot of time.

2) RandomizedSearchCV
Instead of checking all combinations, it picks random ones and tries N combinations.
It’s faster and still finds good settings in less time.

In [21]:
# import the library of GreidSearchCV, RandomizedSearchCV

from sklearn.model_selection import GridSearchCV, RandomizedSearchCV


In [22]:
# Hypermarameter

params = {# 'learning_rate':[0.07, 0.05],
          'max_depth': [3,5,7],
          'n_estimators': [100,200],
          # 'subsample': [0.9, 0.8, 0.7]
          }

In [23]:
# Loading Dataset


def make_dataset2():
    dataset = load_diabetes()
    df = pd.DataFrame(dataset.data, columns=dataset.feature_names)
    df['target'] = dataset.target
    return df.drop('target', axis=1), df['target']

X,y = make_dataset2()


In [24]:
# GridSearchCV

xgb = XGBRegressor()
grid = GridSearchCV(xgb, params, cv=3, n_jobs=-1)
grid.fit(X,y)



GridSearchCV(cv=3,
             estimator=XGBRegressor(base_score=None, booster=None,
                                    colsample_bylevel=None,
                                    colsample_bynode=None,
                                    colsample_bytree=None, gamma=None,
                                    gpu_id=None, importance_type='gain',
                                    interaction_constraints=None,
                                    learning_rate=None, max_delta_step=None,
                                    max_depth=None, min_child_weight=None,
                                    missing=nan, monotone_constraints=None,
                                    n_estimators=100, n_jobs=None,
                                    num_parallel_tree=None, random_state=None,
                                    reg_alpha=None, reg_lambda=None,
                                    scale_pos_weight=None, subsample=None,
                                    tree_method=None, validate_para

In [25]:
# Optimizing parameters with GridSearchCV

grid.best_params_

{'max_depth': 3, 'n_estimators': 100}

In [26]:
# Tuning Hyperparameters with GridSearchCV

xgb = XGBRegressor(
    learning_rate = 0.05,
    max_depth = 3,
    n_estimators = 100,
    subsample = 0.7
)

xgb.fit(X_train, y_train)
pred = xgb.predict(X_test)

mean_squared_error(y_test, pred)


2809.687899466801

In [27]:
# RandomizedSearchCV

xgb = XGBRegressor()
rand = RandomizedSearchCV(xgb, params, cv=3, n_iter=10, n_jobs=-1)
rand.fit(X,y)



RandomizedSearchCV(cv=3,
                   estimator=XGBRegressor(base_score=None, booster=None,
                                          colsample_bylevel=None,
                                          colsample_bynode=None,
                                          colsample_bytree=None, gamma=None,
                                          gpu_id=None, importance_type='gain',
                                          interaction_constraints=None,
                                          learning_rate=None,
                                          max_delta_step=None, max_depth=None,
                                          min_child_weight=None, missing=nan,
                                          monotone_constraints=None,
                                          n_estimators=100, n_jobs=None,
                                          num_parallel_tree=None,
                                          random_state=None, reg_alpha=None,
                                       

In [28]:
# Optimizing parameters with RandomizedSearchCV

rand.best_params_

{'n_estimators': 100, 'max_depth': 3}

In [29]:
# Tuning Hyperparaeters with RandomizedSearchCF

xgb = XGBRegressor(
    learning_rate = 0.05,
    max_depth = 3,
    n_estimators = 100,
    subsample = 0.7
)

xgb.fit(X_train, y_train)
pred = xgb.predict(X_test)

mean_squared_error(y_test, pred)


2809.687899466801

8. Regression Evaluation Metrics:

These are ways to check how well a model predicts numbers. We use these to see if the model's guesses are close to the real values.

Types of Evaluation Metrics)

- MAE (Mean Absolute Error):
This tells us the average "mistake" the model makes. It's like how far off the model's guesses are, but we don't care if it's above or below the correct number. It's just how big the mistake is.

- MSE (Mean Squared Error):
This is similar to MAE, but it squares the mistakes. It means bigger mistakes get a lot more attention than smaller ones. It helps to focus on big errors.

- RMSE (Root Mean Squared Error):
This is like MSE, but we take the square root at the end to bring it back to the same scale as the original numbers. It's a fancy way to show the average mistake, but it gives more weight to big errors.

- RMSLE (Root Mean Squared Logarithmic Error):
This one is special. It uses logs (a type of math) to help when the numbers you're predicting can be very different from each other. It works well when we're predicting things like prices or populations.

- R² (R-squared):
This tells us how much the model's guesses match the real data. It ranges from 0 to 1. If it’s close to 1, it means the model is doing a great job. If it’s closer to 0, it means the model is not so good.

In [31]:
# Mean Absolute Error

from sklearn.metrics import mean_absolute_error

mean_absolute_error(y_test, pred)

42.86635401007835

In [32]:
# Mean Squared Error

from sklearn.metrics import mean_squared_error

mean_squared_error(y_test, pred)

2809.687899466801

In [33]:
# Root Mean Squared Error 

import numpy as np
np.sqrt(mean_squared_error(y_test,pred))

53.00648922034736

In [34]:
# Root Mean Squared Logarithmic Error 

from sklearn.metrics import mean_squared_log_error 
np.sqrt(mean_squared_log_error(y_test, pred))

0.3952020879440222

In [35]:
# R-sqaured

from sklearn.metrics import r2_score
r2_score(y_test, pred)

0.5544170854869348