# linear regression model without regularization
## 1. Dataset Description
*A linear regression model without regularization to predict the training data*


**Thanks To:**
1. [Fighting Overfitting With L1 or L2 Regularization](https://neptune.ai/blog/fighting-overfitting-with-l1-or-l2-regularization)

## 2. Import libraries

In [1]:
# import the libraries
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

## 3. Load Dataset

In [3]:
# load the dataset
url = "https://raw.githubusercontent.com/akdubey2k/ML/main/ML_17_2_Linear_Regression_without_Regularization/ML_17_2_housing.csv"

# reading data into pandas dataframe
df = pd.read_csv(url, header=None)
df.head(5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,396.9,5.33,36.2


## 3. Dataset EDA (exploratory data analysis)
## 3.1 Summarize the Dataset
• Dimensions of the dataset.<br>
• Peek at the data itself.*(Throw a glance at; take a brief look at)*<br>
• Statistical summary of all attributes.<br>
• Breakdown of the data by the class variable.<br>

In [4]:
# dimension of dataset
print("\nShape matirx (rows & columns) ".ljust(50, '.'), ": ", df.shape)


Shape matirx (rows & columns) ................... :  (506, 14)


In [5]:
# peek at the dataset
print("\nA glance at the data content =>\n", df.head(5))


A glance at the data content =>
         0     1     2   3      4      5     6       7   8      9     10  \
0  0.00632  18.0  2.31   0  0.538  6.575  65.2  4.0900   1  296.0  15.3   
1  0.02731   0.0  7.07   0  0.469  6.421  78.9  4.9671   2  242.0  17.8   
2  0.02729   0.0  7.07   0  0.469  7.185  61.1  4.9671   2  242.0  17.8   
3  0.03237   0.0  2.18   0  0.458  6.998  45.8  6.0622   3  222.0  18.7   
4  0.06905   0.0  2.18   0  0.458  7.147  54.2  6.0622   3  222.0  18.7   

       11    12    13  
0  396.90  4.98  24.0  
1  396.90  9.14  21.6  
2  392.83  4.03  34.7  
3  394.63  2.94  33.4  
4  396.90  5.33  36.2  


In [6]:
# statistical summary of all attributes.
print("\nstatistical summary of all attributes =>\n", df.describe())


statistical summary of all attributes =>
                0           1           2           3           4           5   \
count  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000   
mean     3.613524   11.363636   11.136779    0.069170    0.554695    6.284634   
std      8.601545   23.322453    6.860353    0.253994    0.115878    0.702617   
min      0.006320    0.000000    0.460000    0.000000    0.385000    3.561000   
25%      0.082045    0.000000    5.190000    0.000000    0.449000    5.885500   
50%      0.256510    0.000000    9.690000    0.000000    0.538000    6.208500   
75%      3.677083   12.500000   18.100000    0.000000    0.624000    6.623500   
max     88.976200  100.000000   27.740000    1.000000    0.871000    8.780000   

               6           7           8           9           10          11  \
count  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000   
mean    68.574901    3.795043    9.549407  408.237154   18.455534

## 4. Data distribution into **independent** and **dependent variable**

In [12]:
# selecting a single feature
# only using 100 instances for simplicity
X = df.iloc[0:100, 5].values
y = df.iloc[0:100, 13].values   # target label

# print("\nIndependent variable =>\n", X)
# print("\nDependent variable =>\n", y)

print("\nShape of independent variable ".ljust(50, '.'), ": ", X.shape)
print("\nShape of dependent variable ".ljust(50, '.'), ": ", y.shape)

# reshaping the data
X_reshaped = X[:, np.newaxis]
y_reshaped = y[:, np.newaxis]

print("\nReshape of independent variable ".ljust(50, '.'), ": ", X_reshaped.shape)
print("\nReshape of dependent variable ".ljust(50, '.'), ": ", y_reshaped.shape)


Shape of independent variable ................... :  (100,)

Shape of dependent variable ..................... :  (100,)

Reshape of independent variable ................. :  (100, 1)

Reshape of dependent variable ................... :  (100, 1)


## 5. Model Creation

In [13]:
# instantiating the linear regression model
lr_model = LinearRegression()

# training the model
lr_model.fit(X_reshaped, y_reshaped)

## 7. Prediction

In [15]:
# making predictions on the training data
y_pred = lr_model.predict(X_reshaped)

print("\nPredicted values =>\n", y_pred)


Predicted values =>
 [[25.7910094 ]
 [24.21659604]
 [32.02732208]
 [30.11553442]
 [31.63883047]
 [24.30860721]
 [20.0351995 ]
 [21.67095365]
 [16.14005995]
 [19.9534118 ]
 [23.76676365]
 [20.00452911]
 [18.77771351]
 [19.39112131]
 [20.89397043]
 [18.21542302]
 [19.24799282]
 [19.81028331]
 [14.35095385]
 [17.12151244]
 [15.51642868]
 [19.55469672]
 [21.36424975]
 [18.00073029]
 [19.13553473]
 [15.81290912]
 [18.00073029]
 [20.39302072]
 [24.97313233]
 [26.80313228]
 [16.97838395]
 [20.64860731]
 [19.40134477]
 [16.85570239]
 [20.89397043]
 [19.2275459 ]
 [18.28698726]
 [18.37899843]
 [19.56492019]
 [25.99547867]
 [30.38134447]
 [27.78458476]
 [21.64028326]
 [22.06966872]
 [20.61793692]
 [16.66145658]
 [17.72469678]
 [20.21922185]
 [13.76821644]
 [15.84357951]
 [19.5342498 ]
 [21.08821624]
 [25.13670774]
 [19.89207102]
 [18.76749004]
 [32.68162373]
 [23.82810443]
 [28.25486408]
 [21.39492014]
 [19.16620512]
 [17.26464092]
 [19.56492019]
 [24.57441726]
 [27.70279706]
 [31.19922154]
 [2

## 8. Model Evaluation

In [16]:
mse = mean_squared_error(y_reshaped, y_pred)
print("\nMean Squared Error =>\n", mse)


Mean Squared Error =>
 9.860246638510331


## 9. Data Visualization
