### **Upload the data into google colab**

In [1]:
from google.colab import files
uploaded = files.upload()

Saving Boston.csv to Boston.csv


### **1. Importing the required packages**

In [2]:
import pandas as pd
import numpy as np

#machine learning related packages
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

### **2. Reading and Exploring the data**

In [3]:
data = pd.read_csv('Boston.csv')

In [4]:
data.head()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,black,lstat,medv
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2


In [5]:
data.shape

(506, 14)

#### We will perform these following steps in data exploration:

1. Null values
2. Duplicates
3. Data types of the values in each column.
4. Outliers present in the data.
5. Necessary Visualizations.

### **Machine Learning Process**

#### steps to perform in machine learning process

1. Creating X and y variables.
2. Splitting the given dataset into training and testing data.
3. Standardization/Scaling of the data.
4. Applying the algorithm on the data which is also known as training of the ml model.
5. Check the performance of the model on the testing data.

In [6]:
#1. Creating X and y variables
X = data.drop(columns = 'medv')   #store all the input columns
y = data['medv']  #store the output column

In [7]:
#2. Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

In [None]:
#3. Standardization process
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

### **Applying the Linear Regression on the data**

In [8]:
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)  #providing the training data to linear regression for learning

In [10]:
lin_reg.intercept_  #c value

np.float64(38.516849168352984)

In [11]:
lin_reg.coef_  #m1 to m13 values

array([-7.73996104e-02,  4.77940878e-02, -3.53859559e-02,  2.55651282e+00,
       -1.87124089e+01,  3.55708718e+00,  1.31967003e-02, -1.61121650e+00,
        2.82770326e-01, -1.19104608e-02, -9.36879021e-01,  9.94688417e-03,
       -5.36113058e-01])

In [12]:
y_pred = lin_reg.predict(X_test)

In [14]:
y_pred

array([22.42857003, 17.1572248 , 32.78244082, 22.92834916, 16.0438778 ,
       21.23634883, 13.94424548, 17.58661979, 20.26677973, 14.28845897,
       18.12771622, 31.77292515, 32.5014585 , 28.61019568, 20.37770989,
       13.11683632, 32.29213227, 42.95883843, 23.77547752, 31.46428607,
       22.21648373, 33.35072071, 18.19342644, 18.89306734, 16.36284931,
       28.79834975, 22.25142338, 30.29305645, 10.02758818, 20.45365194,
       18.71945783,  8.25524001, 12.55149229, 19.5425157 ,  8.60347958,
       35.33752907, 18.77256599, 21.49758408, 16.80046315, 25.25961875,
       18.37019681, 22.2455315 , 19.94756804, 25.57925106, 13.61480943,
       13.19950313, 20.14161473,  8.43318774, 14.85902699, 25.47764455,
       32.71283577, 24.41331196, 20.26773278, 20.95002502, 33.52572792,
       32.40455689, 36.70905356, 26.65662678, 19.97698316, 16.19596616,
       25.96108178, 22.84686187, 26.37450822, 35.32904726, 34.21305781,
       22.45700262, 35.88144808, 33.4537488 , 27.8964234 , 17.48

In [15]:
y_test

Unnamed: 0,medv
86,22.5
449,13.0
227,31.6
505,11.9
26,16.6
...,...
353,30.1
502,20.6
138,13.3
67,22.0


In [13]:
#y_test vs y_pred
r2_score(y_test, y_pred)

0.7818088883369799