<h2>Linear Regression with Iris dataset</h2>

<h4>Loading & Splitting the dataset</h4>

In [1]:
import pandas as pd

iris_df = pd.read_csv("Iris.csv", delimiter=",")
print(iris_df)

      Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm  \
0      1            5.1           3.5            1.4           0.2   
1      2            4.9           3.0            1.4           0.2   
2      3            4.7           3.2            1.3           0.2   
3      4            4.6           3.1            1.5           0.2   
4      5            5.0           3.6            1.4           0.2   
..   ...            ...           ...            ...           ...   
145  146            6.7           3.0            5.2           2.3   
146  147            6.3           2.5            5.0           1.9   
147  148            6.5           3.0            5.2           2.0   
148  149            6.2           3.4            5.4           2.3   
149  150            5.9           3.0            5.1           1.8   

            Species  
0       Iris-setosa  
1       Iris-setosa  
2       Iris-setosa  
3       Iris-setosa  
4       Iris-setosa  
..              ...  
145  

<h4>Replacing NaN values</h4>

In [2]:
print(iris_df.isnull().sum())

Id               0
SepalLengthCm    0
SepalWidthCm     0
PetalLengthCm    0
PetalWidthCm     0
Species          0
dtype: int64


<h4>Editing the dataset</h4>

In [3]:
X = iris_df.iloc[:, 1:-1].values #Without Id & Species
y = iris_df.iloc[:, -1].values #Only Species

<h4>Encoding the dependent categorical column</h4>

In [4]:
from sklearn.preprocessing import LabelEncoder
import numpy as np 

le = LabelEncoder()
y = np.array(le.fit_transform(y))

In [5]:
print(X)

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]
 [4.4 2.9 1.4 0.2]
 [4.9 3.1 1.5 0.1]
 [5.4 3.7 1.5 0.2]
 [4.8 3.4 1.6 0.2]
 [4.8 3.  1.4 0.1]
 [4.3 3.  1.1 0.1]
 [5.8 4.  1.2 0.2]
 [5.7 4.4 1.5 0.4]
 [5.4 3.9 1.3 0.4]
 [5.1 3.5 1.4 0.3]
 [5.7 3.8 1.7 0.3]
 [5.1 3.8 1.5 0.3]
 [5.4 3.4 1.7 0.2]
 [5.1 3.7 1.5 0.4]
 [4.6 3.6 1.  0.2]
 [5.1 3.3 1.7 0.5]
 [4.8 3.4 1.9 0.2]
 [5.  3.  1.6 0.2]
 [5.  3.4 1.6 0.4]
 [5.2 3.5 1.5 0.2]
 [5.2 3.4 1.4 0.2]
 [4.7 3.2 1.6 0.2]
 [4.8 3.1 1.6 0.2]
 [5.4 3.4 1.5 0.4]
 [5.2 4.1 1.5 0.1]
 [5.5 4.2 1.4 0.2]
 [4.9 3.1 1.5 0.1]
 [5.  3.2 1.2 0.2]
 [5.5 3.5 1.3 0.2]
 [4.9 3.1 1.5 0.1]
 [4.4 3.  1.3 0.2]
 [5.1 3.4 1.5 0.2]
 [5.  3.5 1.3 0.3]
 [4.5 2.3 1.3 0.3]
 [4.4 3.2 1.3 0.2]
 [5.  3.5 1.6 0.6]
 [5.1 3.8 1.9 0.4]
 [4.8 3.  1.4 0.3]
 [5.1 3.8 1.6 0.2]
 [4.6 3.2 1.4 0.2]
 [5.3 3.7 1.5 0.2]
 [5.  3.3 1.4 0.2]
 [7.  3.2 4.7 1.4]
 [6.4 3.2 4.5 1.5]
 [6.9 3.1 4.

In [6]:
print(y)

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]


<h4>Splitting data into training & testing sets</h4>

In [7]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=23)

<h4>Performing feature scaling</h4>

In [8]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [9]:
print(X_train)

[[-1.12609418e+00  1.57273591e-01 -1.31034227e+00 -1.48404494e+00]
 [-7.63487823e-01 -7.41432642e-01  5.79629287e-02  2.39789838e-01]
 [-5.21750253e-01  1.95468606e+00 -1.42436771e+00 -1.08623692e+00]
 [ 2.25823179e+00 -5.16756084e-01  1.65431900e+00  1.03540589e+00]
 [-1.00522539e+00  8.31303265e-01 -1.31034227e+00 -1.35144227e+00]
 [-5.21750253e-01  1.95468606e+00 -1.19631684e+00 -1.08623692e+00]
 [ 8.07806377e-01 -5.16756084e-01  4.57051946e-01  3.72392514e-01]
 [-1.00522539e+00 -1.64013887e+00 -2.84113372e-01 -2.90620864e-01]
 [-1.59143900e-01 -6.74029674e-02  2.29001079e-01 -2.54155128e-02]
 [ 5.66068808e-01  6.06626707e-01  1.25522998e+00  1.69841927e+00]
 [ 1.29128152e+00  3.81950149e-01  1.08419183e+00  1.43321392e+00]
 [-1.24696296e+00  8.31303265e-01 -1.08229141e+00 -1.35144227e+00]
 [-4.00881469e-01 -1.41546232e+00 -5.60625048e-02 -2.90620864e-01]
 [-1.48870053e+00  3.81950149e-01 -1.36735499e+00 -1.35144227e+00]
 [ 5.66068808e-01 -1.64013887e+00  3.43026512e-01  1.07187163e

In [10]:
print(X_test)

[[ 1.29128152  0.15727359  0.91315368  1.16800857]
 [ 1.65388787  0.38195015  1.25522998  0.77020054]
 [-0.03827512 -0.74143264  0.17198836 -0.29062086]
 [-1.48870053  0.15727359 -1.31034227 -1.35144227]
 [-0.1591439  -1.19078576  0.68510281  1.03540589]
 [ 1.29128152  0.15727359  0.6280901   0.37239251]
 [-0.1591439   1.7300095  -1.19631684 -1.21883959]
 [ 0.68693759  0.15727359  0.9701664   0.77020054]
 [-0.52175025  0.83130327 -1.19631684 -1.35144227]
 [-0.28001268 -0.29207953 -0.11307522  0.10718716]
 [ 0.32433124 -0.51675608  0.51406466 -0.02541551]
 [-1.24696296  0.15727359 -1.25332956 -1.35144227]
 [ 1.53301908 -0.06740297  1.19821726  1.16800857]
 [-0.88435661  1.7300095  -1.31034227 -1.21883959]
 [-0.52175025  1.50533294 -1.31034227 -1.35144227]
 [ 0.56606881 -0.51675608  0.74211553  0.37239251]
 [ 0.68693759  0.38195015  0.40003923  0.37239251]
 [ 0.32433124 -0.29207953  0.51406466  0.23978984]
 [ 2.25823179 -0.06740297  1.3122427   1.43321392]
 [-0.76348782  0.83130327 -1.36

<h3>Regression models</h3>

Regrssion models (both linear and non-linear) are used for predicting a real value, e.g. salary. If your independent variable is time, then you are forecasting future values, otherwise your model is predicting present but unknown values.

<h4>Simple Linear Regression</h4>

$$Y = b_0 + b_1 X_1$$
- $Y$: dependent variable;
- $b_0$: y-intercept (constant);
- $b_1$: slope coefficient;
- $X_1$: independent variable;