<div style="color:#D81F26;
           display:fill;
           border-radius:30px;
           border-style: solid;
           border-color:#C1C1C1;
           background-color:#E0E0E0;
           font-size:30px;
           font-family:Verdana;
           letter-spacing:0.5px">
<h1 style="text-align: center;
           padding: 15px;
           color:#D81F26;">
<b>Linear Regression → Closed Form</b>
</h1>
</div>

<div style="color: #D81F26;
           display:fill;
           border-radius:0px;
           border-style: solid;
           border-color:#C1C1C1;
           background-color:#E0E0E0 ;
           font-size:30px;
           font-family:Verdana;
           letter-spacing:0.5px;">
<h1 style="text-align: center;
           padding: 15px;
           color:#D81F26;">
Normal Equation in Python: The Closed-Form Solution for Linear Regression
</h1> 

<hr> 
<center><img src="https://i.ibb.co/tbVDC8D/1.png" style="text-align:center" alt="1" border="0"></center>

<p style="color:black;">We will implement the Normal Equation which is the closed-form solution for the Linear Regression algorithm where we can find the optimal value of theta in just one step without using the Gradient Descent algorithm.</p>

<b>Gradient Descent</b>

<p style="color:black;">We have:</p>

<p style="color:black;">1.<b>X</b> →Input data (Training Data)</p>

<p style="color:black;">2.<b>y</b> →Target variable</p>

<p style="color:black;">3.<b>theta</b> →The parameter</p>
    
<p style="color:black;">4.<b>y_hat</b> →Prediction/hypothesis (dot product of theta and X)</p>
    
<center><img src="https://i.ibb.co/9YpryHD/y-hat.png" alt="y-hat" border="0"></center>

<b>Loss Function:</b>
    
<p style="color:black;">MSE loss or mean squared error loss (y_hat-y)²</p>
    
<center><img src="https://i.ibb.co/S7Z9nsH/MSE.png" alt="MSE" border="0"></center>
    
<p style="color:black;"><b>m</b> →the number of training examples.</p>
    
<p style="color:black;"><b>n</b> →number of features</p>

<b>Gradient Descent Algorithm</b>

<center><img src="https://i.ibb.co/phyDWgv/Gradient-Descent-Algorithm.png" alt="MSE" border="0"></center>

 
    
<b>First, we initialize the parameter theta randomly or with all zeros. Then:</b>

<p style="color:black;">1.Calculate the prediction/hypothesis y_hat using the equation above.</p>
<p style="color:black;">2.Then use the prediction/hypothesis y_hat to calculate MSE loss like this → (y_hat-y)².</p>
<p style="color:black;">3.Then take the partial derivative(gradient) of the MSE loss with respect to the parameter theta.</p>
<p style="color:black;">4.Finally use this partial derivative(gradient) to update the parameter theta like this → theta := theta -lr*gradient , where lr is the learning rate.</p>
<p style="color:black;">5.Repeat steps 1 to 4 until we reach an optimal value for the parameter theta.</p>

<b>Normal Equation</b>
    
<p style="color:black;">Gradient Descent is an iterative algorithm meaning that you need to take multiple steps to get to the Global optimum (to find the optimal parameters) but it turns out that for the special case of Linear Regression, there is a way to solve for the optimal values of the parameter theta to just jump in one step to the Global optimum without needing to use an iterative algorithm and this algorithm is called the Normal Equation. It works only for Linear Regression and not any other algorithm.</p>
    
<p style="color:black;">Normal Equation is the Closed-form solution for the Linear Regression algorithm which means that we can obtain the optimal parameters by just using a formula that includes a few matrix multiplications and inversions.</p>
    
<p style="color:black;">To calculate theta , we take the partial derivative of the MSE loss function (equation 2) with respect to theta and set it equal to zero. Then, do a little bit of linear algebra to get the value of theta.</p>
    
<b>This is the Normal Equation: </b>
<center><img src="https://i.ibb.co/JcT9zd9/psedo.png" alt="MSE" border="0"></center>
    
<p style="color:black;">If you know about the matrix derivatives along with a few properties of matrices, you should be able to derive the Normal Equation for yourself.</p>
    
<p style="color:black;">You might think what if X is a non-invertible matrix, which usually happens if you have redundant features i.e your features are linearly dependent, probably because you have the same features repeated twice. One thing you can do is go and find out which features are repeated and fix them or you can use the np.pinv function in NumPy which will also give you the right answer.</p>

<b>The Algorithm</b>
<p style="color:black;">1.Calculate theta using the Normal Equation.</p>
<p style="color:black;">2.Use the theta to make predictions.</p>
    
<p style="color:black;">Check the shapes of X and y so that the equation matches up.</p>
</div>

# 📤 Import & Install Libraries

In [None]:
import numpy as np
import pandas as pd

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn import metrics

from sklearn.linear_model import LinearRegression

%matplotlib inline

# 💾 Check out the Data

In [None]:
train = pd.read_csv("../input/hw1-pattern-shirazu/Data-Train.csv")
test = pd.read_csv("../input/hw1-pattern-shirazu/Data-Test.csv")

In [None]:
train.corr()

In [None]:
test.corr()

In [None]:
sns.heatmap(train.corr(), annot=True,cmap='Reds')

# 📊 Exploratory Data Analysis (EDA)

In [None]:
sns.pairplot(train)

In [None]:
sns.pairplot(test)

# 🧱 **X and y arrays**

**Train**

In [None]:
X_train_raw = train.drop('y', axis=1)
y_train_raw = train['y']
y_train_raw = y_train_raw.to_frame()

In [None]:
X_train_raw

In [None]:
y_train_raw

In [None]:
print('The size of dataset is:' ,train.shape)
print('The size of X is:' ,X_train_raw.shape)
print('The size of y is:' ,y_train_raw.shape)

**Test**

In [None]:
X_test_raw = test.drop('y', axis=1)
y_test_raw = test['y']
y_test_raw = y_test_raw.to_frame()

In [None]:
X_test_raw

In [None]:
y_test_raw

# Normalized Data

In [None]:
def norm(X):
    X_min = min(X)
    X_max = max(X)
    m=X.shape[0]
    X_norm = []

    for i in range(m):
        item = (X.iloc[i]-X_min) / (X_max-X_min)
        X_norm.append(item)

    X_norm = np.array(X_norm)
    X_norm = X_norm.reshape(m,1)
    return X_norm



**Train**

In [None]:
X_train_norm=norm(X_train_raw['x'])
X_train = X_train_norm
X_train = np.hstack([np.ones([X_train.shape[0],1]), X_train])
X_train.shape

In [None]:
y_train_norm=norm(y_train_raw['y'])
y_train = y_train_norm
y_train.shape

**Test**

In [None]:
X_test_norm=norm(X_test_raw['x'])
X_test = X_test_norm
X_test = np.hstack([np.ones([X_test.shape[0],1]), X_test])
X_test.shape

In [None]:
y_test_norm = norm(y_test_raw['y'])
y_test = y_test_norm
y_test.shape

# 📈 **Training a Linear Regression Model**

✔️ **Linear Regression**

**Functions**

In [None]:
def find_theta(x, y):
    m = x.shape[0]
    theta = np.dot(np.linalg.inv(np.dot(x.T, x)), np.dot(x.T, y))
    return theta

In [None]:
def find_h_theta(x, theta):
    h_theta = 0
    h_theta = np.dot(x, theta)
    return h_theta

In [None]:
def J_theta(x, y, h_theta):
    m=x.shape[0]
    J_theta = 0
    for i in range(m):
        J_theta += (1/2)*((h_theta[i] - y[i])**2)
    return J_theta

**Train**

In [None]:
theta = find_theta(X_train, y_train)
h_theta_train = find_h_theta(X_train, theta)
y_pred_train = h_theta_train
J_train = J_theta(X_train, y_train, h_theta_train)

In [None]:
J_train

**Test**

In [None]:
h_theta_test = find_h_theta(X_test, theta)
y_pred_test = h_theta_test
J_test = J_theta(X_test, y_test, h_theta_test)

In [None]:
J_test

# **Plots**

In [None]:
plt.plot(X_train_norm, y_train, "bo")
plt.plot(X_train_norm, y_pred_train, "r-")
plt.title("Train Data")
plt.xlabel("X")
plt.ylabel("y")
plt.show()

In [None]:
plt.plot(X_test_norm, y_test, "bo")
plt.plot(X_test_norm, y_pred_test, "r-")
plt.title("Test Data")
plt.xlabel("X")
plt.ylabel("y")
plt.show()