## Importing Libraries

In [0]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from matplotlib import gridspec
from matplotlib import pyplot as plt

## Importing and Processing Data

### Importing and Viewing Data

*Importing Data from csv file*

In [0]:
df=     #Read the file from the dataset

*Printing Data and its Statistics*

In [0]:
df = df.reindex(np.random.permutation(df.index))			#Shuffle
print (df)
print (df.describe())					#Gives statitics of the data

In [0]:
X = df.loc[:, df.columns != "quality"]
y = df["quality"]
print (X.columns.values)
print (y)

In [0]:
#converting pandas Dataframe to Numpy Arrays
X = X.to_numpy()
y = y.to_numpy()

### Preprocessing Data

*Preprocessing Data*

In [0]:
def Feature_Normalize(X):
    """
    Normalizes the features in X. returns a normalized version of X.
    """
    #Write your code here
    return X_norm, mu, sigma

### Splitting the Data for Training and Testing

*Splitting Data to Train and Test sets to train and evaluate the Model*

In [0]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y)

In [0]:
X_train = Feature_Normalize(X_train)[0]
X_test = Feature_Normalize(X_test)[0]
X_train = np.c_[np.ones(X_train.shape[0]), X_train] 
X_test = np.c_[np.ones(X_test.shape[0]), X_test] 

In [0]:
print ("Shape of Training Set is",X_train.shape)
print ("Shape of Test Set is",X_test.shape)

print ("Shape of Training Set is",y_train.shape)
print ("Shape of Test Set is",y_test.shape)

## Training and Validating the Model

The objective of linear regression is to minimize the cost function

$$ J(\theta) = \frac{1}{2m} \sum_{i=1}^m \left( h_{\theta}(x^{(i)}) - y^{(i)}\right)^2 +\frac{\lambda}{2m}\sum_{j=1}^n {\theta_j}^2$$
where the hypothesis $h_\theta(x)$ is given by the linear model$$ h_\theta(x) = \theta^Tx = \sum_{i=0}^n \theta_ix_i$$

Recall that the parameters of your model are the $\theta_j$ values. These are the values you will adjust to minimize cost $J(\theta)$. One way to do this is to use the batch gradient descent algorithm. In batch gradient descent, each iteration performs the update

$$ \theta_j = \theta_j - \alpha \frac{1}{m} \frac{\partial J(\theta)}{\partial \theta_j} \quad \forall j\geq1  \qquad \text{simultaneously update } \theta_j \text{ for all } j$$
where, $$\frac{\partial J(\theta)}{\partial \theta_j} = \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)}\right)x_j^{(i)} + \frac{\lambda}{m}\theta_j$$
With each step of gradient descent, your parameters $\theta_j$ come closer to the optimal values that will achieve the lowest cost J($\theta$).

$\textbf{Note}$: Its your wish to implement the regularisation or you can ignore.

*Compute Cost function*

In [0]:
def Cost_Function(X, y, theta):
    """
    Compute cost for linear regression with multiple variables.
    Computes the cost of using theta as the parameter for linear regression to fit the data points in X and y.
    """
    #Write your code here
    
    return J

*Gradient Descent*

In [0]:
def GradientDescent(X, y, theta, lr):
    """
    Performs gradient descent to learn theta.
    Updates theta by taking num_iters gradient steps with learning rate alpha.
    """
    #Write your code here
    return theta

*Function to Calculate Validation Loss*

In [0]:
def Validation_Loss_Function(y_pred,y):
    """
    Function to calculate Validation Loss
    """
    m = y.shape[0]
    loss = np.square(y_pred-y)/(m)
    loss = np.sum(loss)
    loss = np.sqrt(loss)
    return loss   
    

### Training the Model

In [0]:
def Training_and_Validation(X_train,y_train,X_test,y_test,lr=,epochs=):
    """
    Training Linear Regression Model
    """
    #Rearrange all the following lines to get a perfect plot
    s = X_train.shape[1]
    theta = np.zeros(s)
    Training_Loss = []
    Validation_Loss = []
    for i in range(epochs):
        y_pred = np.dot(X_test, theta)
        Validation_Loss.append(Validation_Loss_Function(y_pred,y_test))
        theta = GradientDescent(X_train, y_train, theta, lr)
        Training_Loss.append(CostFunction(X_train,y_train,theta))
    plt.plot(Training_Loss,'r')
    plt.plot(Validation_Loss,'y')
    plt.show()
    
    print ("Final Training Loss is equal to ",Training_Loss[-1])
    print ("Final Validation Loss is equal to ",Validation_Loss[-1])
    
    return theta  

In [0]:
theta = Training_and_Validation(X_train,y_train,X_test,y_test)
print(theta)