# Linear Regression for "AI4I 2020 Predictive Maintenance Dataset"
### dataset link = https://archive.ics.uci.edu/ml/datasets/AI4I+2020+Predictive+Maintenance+Dataset

## Submitted By: SAMYAMOY RAKSHIT

<h6>
    
### Problem Statement:
From the given dataset I've to find out the prediction of "Air temperature [K]" label value with respect to the feature values. Also detect the accuracy of the model and find the best possible value of the label via Linear Regression.     

##### Attribute Information:

The dataset consists of 10000 data points stored as rows with 14 features in columns
UID: unique identifier ranging from 1 to 10000
product ID: consisting of a letter L, M, or H for low (50% of all products), medium (30%) and high (20%) as product quality variants and a variant-specific serial number
air temperature [K]: generated using a random walk process later normalized to a standard deviation of 2 K around 300 K
process temperature [K]: generated using a random walk process normalized to a standard deviation of 1 K, added to the air temperature plus 10 K.
rotational speed [rpm]: calculated from a power of 2860 W, overlaid with a normally distributed noise
torque [Nm]: torque values are normally distributed around 40 Nm with a Ïƒ = 10 Nm and no negative values.
tool wear [min]: The quality variants H/M/L add 5/3/2 minutes of tool wear to the used tool in the process. and a
'machine failure' label that indicates, whether the machine has failed in this particular datapoint for any of the following failure modes are true.

The machine failure consists of five independent failure modes
tool wear failure (TWF): the tool will be replaced of fail at a randomly selected tool wear time between 200 â€“ 240 mins (120 times in our dataset). At this point in time, the tool is replaced 69 times, and fails 51 times (randomly assigned).
heat dissipation failure (HDF): heat dissipation causes a process failure, if the difference between air- and process temperature is below 8.6 K and the toolâ€™s rotational speed is below 1380 rpm. This is the case for 115 data points.
power failure (PWF): the product of torque and rotational speed (in rad/s) equals the power required for the process. If this power is below 3500 W or above 9000 W, the process fails, which is the case 95 times in our dataset.
overstrain failure (OSF): if the product of tool wear and torque exceeds 11,000 minNm for the L product variant (12,000 M, 13,000 H), the process fails due to overstrain. This is true for 98 datapoints.
random failures (RNF): each process has a chance of 0,1 % to fail regardless of its process parameters. This is the case for only 5 datapoints, less than could be expected for 10,000 datapoints in our dataset.

If at least one of the above failure modes is true, the process fails and the 'machine failure' label is set to 1. It is therefore not transparent to the machine learning method, which of the failure modes has caused the process to fail <h6>

    



#### Importing required libraries

In [1]:
## pandas and numpy
import pandas as pd
import numpy as np

## Visualization library
import seaborn as sns
import matplotlib.pyplot as plt
from pandas_profiling import ProfileReport

## Machine Learning libraries
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge,Lasso,RidgeCV,LassoCV, ElasticNet, ElasticNetCV, LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

## Statistics libraries
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm

## To save the model
import pickle

#### Importing Dataset

In [2]:
df = pd.read_csv(r"C:\Users\tarak\Downloads\ai4i2020.csv")

#### Data Exploration snd Visualization

In [3]:
df

Unnamed: 0,UDI,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,0,0,0,0,0
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,0,0,0,0,0
2,3,L47182,L,298.1,308.5,1498,49.4,5,0,0,0,0,0,0
3,4,L47183,L,298.2,308.6,1433,39.5,7,0,0,0,0,0,0
4,5,L47184,L,298.2,308.7,1408,40.0,9,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,M24855,M,298.8,308.4,1604,29.5,14,0,0,0,0,0,0
9996,9997,H39410,H,298.9,308.4,1632,31.8,17,0,0,0,0,0,0
9997,9998,M24857,M,299.0,308.6,1645,33.4,22,0,0,0,0,0,0
9998,9999,H39412,H,299.0,308.7,1408,48.5,25,0,0,0,0,0,0


In [4]:
df.describe()

Unnamed: 0,UDI,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,5000.5,300.00493,310.00556,1538.7761,39.98691,107.951,0.0339,0.0046,0.0115,0.0095,0.0098,0.0019
std,2886.89568,2.000259,1.483734,179.284096,9.968934,63.654147,0.180981,0.067671,0.106625,0.097009,0.098514,0.04355
min,1.0,295.3,305.7,1168.0,3.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2500.75,298.3,308.8,1423.0,33.2,53.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,5000.5,300.1,310.1,1503.0,40.1,108.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,7500.25,301.5,311.1,1612.0,46.8,162.0,0.0,0.0,0.0,0.0,0.0,0.0
max,10000.0,304.5,313.8,2886.0,76.6,253.0,1.0,1.0,1.0,1.0,1.0,1.0


In [5]:
ProfileReport(df)

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]



In [6]:
pf = ProfileReport(df)

In [7]:
pf.to_widgets()

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render widgets:   0%|          | 0/1 [00:00<?, ?it/s]

VBox(children=(Tab(children=(Tab(children=(GridBox(children=(VBox(children=(GridspecLayout(children=(HTML(valu…

In [8]:
pf.to_file('test_hw.html')

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

In [9]:
df

Unnamed: 0,UDI,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,0,0,0,0,0
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,0,0,0,0,0
2,3,L47182,L,298.1,308.5,1498,49.4,5,0,0,0,0,0,0
3,4,L47183,L,298.2,308.6,1433,39.5,7,0,0,0,0,0,0
4,5,L47184,L,298.2,308.7,1408,40.0,9,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,M24855,M,298.8,308.4,1604,29.5,14,0,0,0,0,0,0
9996,9997,H39410,H,298.9,308.4,1632,31.8,17,0,0,0,0,0,0
9997,9998,M24857,M,299.0,308.6,1645,33.4,22,0,0,0,0,0,0
9998,9999,H39412,H,299.0,308.7,1408,48.5,25,0,0,0,0,0,0


In [10]:
df.isnull().sum()

UDI                        0
Product ID                 0
Type                       0
Air temperature [K]        0
Process temperature [K]    0
Rotational speed [rpm]     0
Torque [Nm]                0
Tool wear [min]            0
Machine failure            0
TWF                        0
HDF                        0
PWF                        0
OSF                        0
RNF                        0
dtype: int64

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   UDI                      10000 non-null  int64  
 1   Product ID               10000 non-null  object 
 2   Type                     10000 non-null  object 
 3   Air temperature [K]      10000 non-null  float64
 4   Process temperature [K]  10000 non-null  float64
 5   Rotational speed [rpm]   10000 non-null  int64  
 6   Torque [Nm]              10000 non-null  float64
 7   Tool wear [min]          10000 non-null  int64  
 8   Machine failure          10000 non-null  int64  
 9   TWF                      10000 non-null  int64  
 10  HDF                      10000 non-null  int64  
 11  PWF                      10000 non-null  int64  
 12  OSF                      10000 non-null  int64  
 13  RNF                      10000 non-null  int64  
dtypes: float64(3), int64(9)

#### Drop one unnecessary column

In [12]:
## UDI is just a index no., So I've to remove it
df.drop(columns=['UDI'], inplace=True)

In [13]:
df

Unnamed: 0,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
0,M14860,M,298.1,308.6,1551,42.8,0,0,0,0,0,0,0
1,L47181,L,298.2,308.7,1408,46.3,3,0,0,0,0,0,0
2,L47182,L,298.1,308.5,1498,49.4,5,0,0,0,0,0,0
3,L47183,L,298.2,308.6,1433,39.5,7,0,0,0,0,0,0
4,L47184,L,298.2,308.7,1408,40.0,9,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,M24855,M,298.8,308.4,1604,29.5,14,0,0,0,0,0,0
9996,H39410,H,298.9,308.4,1632,31.8,17,0,0,0,0,0,0
9997,M24857,M,299.0,308.6,1645,33.4,22,0,0,0,0,0,0
9998,H39412,H,299.0,308.7,1408,48.5,25,0,0,0,0,0,0


### Set label and feature values for the Dataset

In [14]:
## label value
y = df['Air temperature [K]']

In [15]:
y

0       298.1
1       298.2
2       298.1
3       298.2
4       298.2
        ...  
9995    298.8
9996    298.9
9997    299.0
9998    299.0
9999    299.0
Name: Air temperature [K], Length: 10000, dtype: float64

In [16]:
## feature values
x = df.drop(columns='Air temperature [K]')

In [17]:
x

Unnamed: 0,Product ID,Type,Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
0,M14860,M,308.6,1551,42.8,0,0,0,0,0,0,0
1,L47181,L,308.7,1408,46.3,3,0,0,0,0,0,0
2,L47182,L,308.5,1498,49.4,5,0,0,0,0,0,0
3,L47183,L,308.6,1433,39.5,7,0,0,0,0,0,0
4,L47184,L,308.7,1408,40.0,9,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...
9995,M24855,M,308.4,1604,29.5,14,0,0,0,0,0,0
9996,H39410,H,308.4,1632,31.8,17,0,0,0,0,0,0
9997,M24857,M,308.6,1645,33.4,22,0,0,0,0,0,0
9998,H39412,H,308.7,1408,48.5,25,0,0,0,0,0,0


### Feature Scaling

In [18]:
x = x.apply(LabelEncoder().fit_transform)
x

Unnamed: 0,Product ID,Type,Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
0,7003,2,29,325,313,0,0,0,0,0,0,0
1,1003,1,30,182,348,2,0,0,0,0,0,0
2,1004,1,28,272,379,4,0,0,0,0,0,0
3,1005,1,29,207,280,6,0,0,0,0,0,0
4,1006,1,30,182,285,8,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9997,2,27,378,180,13,0,0,0,0,0,0
9996,1001,0,27,406,203,16,0,0,0,0,0,0
9997,9998,2,29,419,219,21,0,0,0,0,0,0
9998,1002,0,30,182,370,24,0,0,0,0,0,0


In [19]:
scaler = StandardScaler()

In [20]:
scaler

In [21]:
arr = scaler.fit_transform(x)

In [22]:
arr

array([[ 0.69403276,  1.33388944, -0.94735989, ..., -0.09793424,
        -0.09948362, -0.04363046],
       [-1.38442822, -0.33222278, -0.879959  , ..., -0.09793424,
        -0.09948362, -0.04363046],
       [-1.38408181, -0.33222278, -1.01476077, ..., -0.09793424,
        -0.09948362, -0.04363046],
       ...,
       [ 1.7315312 ,  1.33388944, -0.94735989, ..., -0.09793424,
        -0.09948362, -0.04363046],
       [-1.38477463, -1.998335  , -0.879959  , ..., -0.09793424,
        -0.09948362, -0.04363046],
       [ 1.73187761,  1.33388944, -0.879959  , ..., -0.09793424,
        -0.09948362, -0.04363046]])

In [23]:
df1 = pd.DataFrame(arr)

In [24]:
df1

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,0.694033,1.333889,-0.947360,0.099755,0.284142,-1.681089,-0.187322,-0.06798,-0.10786,-0.097934,-0.099484,-0.04363
1,-1.384428,-0.332223,-0.879959,-0.783092,0.637292,-1.649655,-0.187322,-0.06798,-0.10786,-0.097934,-0.099484,-0.04363
2,-1.384082,-0.332223,-1.014761,-0.227454,0.950082,-1.618222,-0.187322,-0.06798,-0.10786,-0.097934,-0.099484,-0.04363
3,-1.383735,-0.332223,-0.947360,-0.628748,-0.048828,-1.586788,-0.187322,-0.06798,-0.10786,-0.097934,-0.099484,-0.04363
4,-1.383389,-0.332223,-0.879959,-0.783092,0.001622,-1.555354,-0.187322,-0.06798,-0.10786,-0.097934,-0.099484,-0.04363
...,...,...,...,...,...,...,...,...,...,...,...,...
9995,1.731185,1.333889,-1.082162,0.426964,-1.057827,-1.476770,-0.187322,-0.06798,-0.10786,-0.097934,-0.099484,-0.04363
9996,-1.385121,-1.998335,-1.082162,0.599829,-0.825757,-1.429619,-0.187322,-0.06798,-0.10786,-0.097934,-0.099484,-0.04363
9997,1.731531,1.333889,-0.947360,0.680088,-0.664317,-1.351035,-0.187322,-0.06798,-0.10786,-0.097934,-0.099484,-0.04363
9998,-1.384775,-1.998335,-0.879959,-0.783092,0.859272,-1.303884,-0.187322,-0.06798,-0.10786,-0.097934,-0.099484,-0.04363


In [87]:
pf1 = df1.profile_report()

In [88]:
pf1.to_widgets()

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render widgets:   0%|          | 0/1 [00:00<?, ?it/s]

VBox(children=(Tab(children=(Tab(children=(GridBox(children=(VBox(children=(GridspecLayout(children=(HTML(valu…

In [89]:
pf1.to_file('test1_hw.html')

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

In [27]:
df1.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,-2.2737370000000003e-17,-3.7658760000000006e-17,6.821210000000001e-17,-1.08713e-16,-5.2580160000000004e-17,-9.521273000000001e-17,2.842171e-18,-4.973799e-18,1.136868e-17,2.4158450000000003e-17,-6.7501560000000004e-18,-1.705303e-17
std,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005
min,-1.731878,-1.998335,-2.901986,-1.906716,-2.874027,-1.681089,-0.187322,-0.06797983,-0.10786,-0.09793424,-0.09948362,-0.04363046
25%,-0.8659388,-0.3322228,-0.8125581,-0.6904859,-0.6844973,-0.8638118,-0.187322,-0.06797983,-0.10786,-0.09793424,-0.09948362,-0.04363046
50%,0.0,-0.3322228,0.0636534,-0.1965854,0.01171247,0.0006161014,-0.187322,-0.06797983,-0.10786,-0.09793424,-0.09948362,-0.04363046
75%,0.8659388,1.333889,0.7376623,0.4763541,0.6877422,0.8493272,-0.187322,-0.06797983,-0.10786,-0.09793424,-0.09948362,-0.04363046
max,1.731878,1.333889,2.557486,3.896615,2.937812,2.169544,5.338401,14.71024,9.271274,10.21093,10.05191,22.91977


In [28]:
arr

array([[ 0.69403276,  1.33388944, -0.94735989, ..., -0.09793424,
        -0.09948362, -0.04363046],
       [-1.38442822, -0.33222278, -0.879959  , ..., -0.09793424,
        -0.09948362, -0.04363046],
       [-1.38408181, -0.33222278, -1.01476077, ..., -0.09793424,
        -0.09948362, -0.04363046],
       ...,
       [ 1.7315312 ,  1.33388944, -0.94735989, ..., -0.09793424,
        -0.09948362, -0.04363046],
       [-1.38477463, -1.998335  , -0.879959  , ..., -0.09793424,
        -0.09948362, -0.04363046],
       [ 1.73187761,  1.33388944, -0.879959  , ..., -0.09793424,
        -0.09948362, -0.04363046]])

### Check multicollinearity

In [29]:
vif_df = pd.DataFrame()

In [30]:
arr.shape

(10000, 12)

In [31]:
vif_df['vif'] = [variance_inflation_factor(arr,i) for i in range(arr.shape[1])]

In [32]:
vif_df['feature'] = x.columns

In [33]:
vif_df

Unnamed: 0,vif,feature
0,4.408607,Product ID
1,4.292126,Type
2,1.105275,Process temperature [K]
3,5.900671,Rotational speed [rpm]
4,6.046345,Torque [Nm]
5,1.040322,Tool wear [min]
6,11.793695,Machine failure
7,2.428236,TWF
8,4.586744,HDF
9,3.561736,PWF


### Observation:
    
>There is a presence of the multicollinearity because one of the feature has the value of VIF>10

>After checking all the best possible features I come to conclude in this point if I take this four features -   ( Process temperature [K], Tool wear [min],	Machine failure, HDF ), there VIF will be less than 10 and it gives me a very good accuracy(testing accuracy > training accuracy) 

In [34]:
x = df.drop(columns=['Product ID','Type','Air temperature [K]','TWF','PWF','OSF','RNF','Rotational speed [rpm]','Torque [Nm]'],axis=1)

In [35]:
x

Unnamed: 0,Process temperature [K],Tool wear [min],Machine failure,HDF
0,308.6,0,0,0
1,308.7,3,0,0
2,308.5,5,0,0
3,308.6,7,0,0
4,308.7,9,0,0
...,...,...,...,...
9995,308.4,14,0,0
9996,308.4,17,0,0
9997,308.6,22,0,0
9998,308.7,25,0,0


### Scale the new Dataset

In [36]:
x = x.apply(LabelEncoder().fit_transform)
x

Unnamed: 0,Process temperature [K],Tool wear [min],Machine failure,HDF
0,29,0,0,0
1,30,2,0,0
2,28,4,0,0
3,29,6,0,0
4,30,8,0,0
...,...,...,...,...
9995,27,13,0,0
9996,27,16,0,0
9997,29,21,0,0
9998,30,24,0,0


In [37]:
arr1 = scaler.fit_transform(x)

In [38]:
arr1

array([[-0.94735989, -1.68108917, -0.18732201, -0.10786004],
       [-0.879959  , -1.64965543, -0.18732201, -0.10786004],
       [-1.01476077, -1.61822169, -0.18732201, -0.10786004],
       ...,
       [-0.94735989, -1.35103487, -0.18732201, -0.10786004],
       [-0.879959  , -1.30388425, -0.18732201, -0.10786004],
       [-0.879959  , -1.22529989, -0.18732201, -0.10786004]])

In [39]:
df2 = pd.DataFrame(arr1)
df2

Unnamed: 0,0,1,2,3
0,-0.947360,-1.681089,-0.187322,-0.10786
1,-0.879959,-1.649655,-0.187322,-0.10786
2,-1.014761,-1.618222,-0.187322,-0.10786
3,-0.947360,-1.586788,-0.187322,-0.10786
4,-0.879959,-1.555354,-0.187322,-0.10786
...,...,...,...,...
9995,-1.082162,-1.476770,-0.187322,-0.10786
9996,-1.082162,-1.429619,-0.187322,-0.10786
9997,-0.947360,-1.351035,-0.187322,-0.10786
9998,-0.879959,-1.303884,-0.187322,-0.10786


In [40]:
pf2 = df2.profile_report()
pf2.to_widgets()

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render widgets:   0%|          | 0/1 [00:00<?, ?it/s]

VBox(children=(Tab(children=(Tab(children=(GridBox(children=(VBox(children=(GridspecLayout(children=(HTML(valu…

In [83]:
pf2.to_file('test2_hw.html')

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

In [42]:
arr1.shape

(10000, 4)

### Multicollinearity of new Dataset

In [43]:
vif_df = pd.DataFrame()
vif_df['VIF'] = [variance_inflation_factor(arr,i) for i in range(arr1.shape[1])]
vif_df['FEATURE'] = x.columns
vif_df

Unnamed: 0,VIF,FEATURE
0,4.408607,Process temperature [K]
1,4.292126,Tool wear [min]
2,1.105275,Machine failure
3,5.900671,HDF


### Observation:
  <h4>  VIF<10 , so the dataset is ok(no multicollinearity) <h4>

### Splitting of the Data

In [44]:
x_train, x_test, y_train, y_test = train_test_split(arr1, y, test_size=0.15,random_state=100)

In [45]:
x_train

array([[ 0.73766225, -0.43945631, -0.18732201, -0.10786004],
       [-0.94735989, -1.08384805, -0.18732201, -0.10786004],
       [-0.67775635, -0.12511887, -0.18732201, -0.10786004],
       ...,
       [ 1.27686934, -1.36675174, -0.18732201, -0.10786004],
       [ 0.73766225, -1.28816738, -0.18732201, -0.10786004],
       [ 1.41167111,  1.16366461, -0.18732201, -0.10786004]])

### Linear Regression

In [46]:
lr = LinearRegression()

In [47]:
lr.fit(x_train,y_train)

In [48]:
## Find out the coefficient value
lr.coef_

array([ 1.73908941e+00,  6.38122318e-03, -2.85542922e-04,  1.77352563e-01])

In [49]:
## Find out the intercept value
lr.intercept_

300.0050815315702

In [50]:
## checking training accuracy(R-squared)
lr.score(x_train,y_train)

0.772199162488978

In [51]:
x

Unnamed: 0,Process temperature [K],Tool wear [min],Machine failure,HDF
0,29,0,0,0
1,30,2,0,0
2,28,4,0,0
3,29,6,0,0
4,30,8,0,0
...,...,...,...,...
9995,27,13,0,0
9996,27,16,0,0
9997,29,21,0,0
9998,30,24,0,0


### Predict the output value through a sample data

In [52]:
lr.predict([[29,0,0,0]])

array([350.43867442])

In [53]:
test1 = scaler.transform([[29,0,0,0]])
test1



array([[-0.94735989, -1.68108917, -0.18732201, -0.10786004]])

In [54]:
lr.predict(test1)

array([298.32773481])

In [55]:
x_test

array([[ 1.27686934,  1.41513456, -0.18732201, -0.10786004],
       [-0.61035546,  0.18921856, -0.18732201, -0.10786004],
       [ 0.67026137,  1.35226707,  5.33840098, -0.10786004],
       ...,
       [-0.34075192, -0.97382995, -0.18732201, -0.10786004],
       [ 0.46805871,  0.92791153, -0.18732201, -0.10786004],
       [-0.61035546,  0.91219466, -0.18732201, -0.10786004]])

In [56]:
y_test

8018    301.0
9225    298.0
3854    302.4
2029    298.7
3539    302.0
        ...  
3398    301.4
6008    300.7
522     297.5
7066    300.8
2743    299.6
Name: Air temperature [K], Length: 1500, dtype: float64

In [57]:
lr.score(x_test,y_test) ## checking testing accuracy

0.7920272294981816

### Accuracy through Adjusted R-Squared

In [58]:
# Let's create a function to create adjusted R-Squared
def adj_r2(x,y):
    r2 = lr.score(x,y)
    n = x.shape[0]
    p = x.shape[1]
    adjusted_r2 = 1-(1-r2)*(n-1)/(n-p-1)
    return adjusted_r2

In [59]:
adj_r2(x_train,y_train)   ## training accuracy

0.7720918989986845

In [60]:
lr.coef_

array([ 1.73908941e+00,  6.38122318e-03, -2.85542922e-04,  1.77352563e-01])

In [61]:
lr.intercept_

300.0050815315702

In [62]:
adj_r2(x_test,y_test)   ## testing accuracy

0.7914707806138959

### Regularization

#### L1 or LASSO Regularization

In [63]:
## finding the best possible value for alpha(shrinkage factor)
lassocv = LassoCV(alphas=None,cv= 50 , max_iter=200000, normalize=True)
lassocv.fit(x_train,y_train)

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Lasso())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * np.sqrt(n_samples). 


In [64]:
lassocv.alpha_   

8.76480384082038e-05

In [65]:
lasso = Lasso(alpha=lassocv.alpha_)
lasso.fit(x_train,y_train)

In [66]:
lasso.score(x_train,y_train)

0.7721991482019338

In [67]:
lasso.score(x_test,y_test)

0.7920274996516029

#### L2 or Ridge Regularization

In [68]:
np.random.uniform(0,10,50)

array([9.06775743, 0.40513287, 6.6268473 , 1.61317551, 6.71637237,
       1.40327054, 3.86565237, 1.44095174, 8.11837403, 6.62219521,
       9.77423572, 1.43038345, 2.42870149, 5.08259205, 8.36659685,
       5.12842038, 6.47208548, 8.7226375 , 3.3669417 , 4.52694855,
       1.42886907, 6.4079527 , 6.08510007, 4.50801447, 2.49285563,
       5.464525  , 0.81857442, 3.14890498, 1.13393511, 3.68248351,
       9.42083803, 5.5546267 , 3.36947217, 6.76420057, 5.41880374,
       3.40151893, 4.94459462, 3.40993056, 8.61573561, 0.48573388,
       5.80098624, 9.48502304, 7.43828143, 9.39087936, 3.25138889,
       0.62100849, 9.31339259, 3.19120998, 0.97536293, 3.90465425])

In [69]:
## finding the best possible value for alpha(shrinkage factor)
ridgecv = RidgeCV(alphas=np.random.uniform(0,10,50),cv = 10 , normalize=True)
ridgecv.fit(x_train,y_train)

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alp

In [70]:
ridgecv.alpha_

0.1830685652925168

In [71]:
ridge_lr = Ridge(alpha=ridgecv.alpha_)
ridge_lr.fit(x_train,y_train)

In [72]:
ridge_lr.score(x_train,y_train)

0.7721991621290183

In [73]:
ridge_lr.score(x_test,y_test)

0.7920268656973687

#### ElasticNet

In [74]:
## finding the best possible value for alpha(shrinkage factor)
elastic= ElasticNetCV(alphas=None, cv = 50 )
elastic.fit(x_train,y_train)

In [75]:
elastic.alpha_

0.0034662339997097408

In [76]:
elastic.l1_ratio_  ## l1_ratio is a mixed parameter between Ridge and LASSO Regularization

0.5

In [77]:
elastic_lr = ElasticNet(alpha=elastic.alpha_ , l1_ratio=elastic.l1_ratio_)

In [78]:
elastic_lr.fit(x_train,y_train)

In [79]:
elastic_lr.score(x_train,y_train)

0.7721919069096735

In [80]:
elastic_lr.score(x_test,y_test)

0.7919877589948303

### Save the Model

In [81]:
pickle.dump(lr,open('ai4i2020_maintainance__lr_model.pickle','wb'))
pickle.dump(lasso,open('ai4i2020_maintainance__lasso_model.pickle','wb'))
pickle.dump(ridge_lr,open('ai4i2020_maintainance__ridge_lr_model.pickle','wb'))
pickle.dump(elastic_lr,open('ai4i2020_maintainance__elastic_lr_model.pickle','wb'))
%ls

 Volume in drive C is OS
 Volume Serial Number is 5CFF-620A

 Directory of C:\Users\tarak\Downloads\AI4I 2020 Predictive Maintenance Dataset__Linear Regression

11/11/2022  09:19 PM    <DIR>          .
11/11/2022  09:19 PM    <DIR>          ..
11/11/2022  09:17 PM    <DIR>          .ipynb_checkpoints
11/11/2022  09:19 PM         4,166,986 AI4I 2020 Predictive Maintenance Dataset__Linear Regression.ipynb
11/05/2022  10:42 PM           522,048 ai4i2020.csv
11/11/2022  09:21 PM               600 ai4i2020_maintainance__elastic_lr_model.pickle
11/11/2022  09:21 PM               585 ai4i2020_maintainance__lasso_model.pickle
11/11/2022  09:21 PM               492 ai4i2020_maintainance__lr_model.pickle
11/11/2022  09:21 PM               499 ai4i2020_maintainance__ridge_lr_model.pickle
11/05/2022  10:42 PM           220,858 Predictive Maintenance Dataset - homepage.png
11/11/2022  09:19 PM         3,087,853 test_hw.html
11/11/2022  09:20 PM           759,924 test1_hw.html
               9 File(