# Programming Machine Learning Lab
# Exercise 4

**General Instructions:**

1. You need to submit the PDF as well as the filled notebook file.
1. Name your submissions by prefixing your matriculation number to the filename. Example, if your MR is 12345 then rename the files as **"12345_Exercise_4.xxx"**
1. Complete all your tasks and then do a clean run before generating the final pdf. (_Clear All Ouputs_ and _Run All_ commands in Jupyter notebook)

**Exercise Specific instructions::**

1. You are allowed to use only NumPy and Pandas (unless stated otherwise). You can use any library for visualizations.


In [1]:
# imports 
import numpy as np

### Part 1

**Optimization Routines and Loss Functions**

In this part of the assignment we learn how to write modular programs and make our code reusable. For this, declare a class named $\textbf{Optimization}$ which has 2 inputs X and y as the class variables. Next, implement the following optimization algorithms in this class. 

- Stochastic Gradient Descent (For Mean Square Loss)
- Newton’s Method (For Cross Entropy Loss) 

You will need loss functions and their gradients for the optimization process. So implement a class $\textbf{Loss}$ which also takes in X and y and computes the following losses and their gradients.

- Mean Square Loss (for Regression)
- Cross Entropy Loss (for Classification)
    
Make the $\textbf{Loss}$ class such that you can access it from the $\textbf{Optimization}$ class.

*Note : You can use np.linalg.solve for solving linear equations*

In [None]:
### Write your code here
import numpy as np

class Loss:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def mean_square_loss(self, theta):
        # Mean Square Loss for Regression
        predictions = np.dot(self.x, theta)
        mse = np.mean((predictions - self.y) ** 2)
        return mse

    def mean_square_loss_gradient(self, theta):
        # Gradient of Mean Square Loss for Regression
        gradient = 2 * np.dot(self.x.T, (np.dot(self.x, theta) - self.y))  / len(self.y)
        return gradient

    def cross_entropy_loss(self, theta):
        # Cross Entropy Loss for Classification
        m = len(self.y)
        h_theta = self.sigmoid(np.dot(self.x, theta))
        cost = (-1 / m) * np.sum(self.y * np.log(h_theta) + (1 - self.y) * np.log(1 - h_theta))
        return cost

    def cross_entropy_loss_gradient(self, theta):
        # Gradient of Cross Entropy Loss for Classification
        m = len(self.y)
        h_theta = self.sigmoid(np.dot(self.x, theta))
        gradient = np.dot(self.x.T, (h_theta - self.y)) / m
        return gradient

    def sigmoid(self, z):
        # Sigmoid activation function
        return 1 / (1 + np.exp(-z))


class Optimization:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        self.loss = Loss(x, y)

    def stochastic_gradient_descent(self, theta, learning_rate, epochs):
        # Stochastic Gradient Descent for Mean Square Loss
        m = len(self.y)
        for epoch in range(epochs):
            for i in range(m):
                random_index = np.random.randint(m)
                xi = self.X[random_index:random_index+1]
                yi = self.y[random_index:random_index+1]
                gradient = self.loss.mean_square_loss_gradient(theta)
                theta = theta - learning_rate * gradient
        return theta

    def newtons_method(self, theta, epochs):
        # Newton's Method for Cross Entropy Loss
        for epoch in range(epochs):
            cost = self.loss.cross_entropy_loss(theta)
            gradient = self.loss.cross_entropy_loss_gradient(theta)
            hessian = np.dot(self.x.T, np.dot(np.diag(self.loss.sigmoid(theta)), np.dot(np.diag(1 - self.loss.sigmoid(theta)), self.x)))
            theta = theta - np.linalg.solve(hessian, gradient)
        return theta


# Example usage:
# Assuming you have X and y as your input data and labels
X = np.array([[1, 2], [3, 4], [5, 6]])
y = np.array([0, 1, 0])
theta_init = np.zeros(X.shape[1] + 1)  # Initialize theta with zeros

opt = Optimization(X, y)

# Stochastic Gradient Descent
theta_sgd = opt.stochastic_gradient_descent(theta_init, learning_rate=0.01, epochs=100)

# Newton's Method
theta_newton = opt.newtons_method(theta_init, epochs=10)

print("Theta after Stochastic Gradient Descent:", theta_sgd)
print("Theta after Newton's Method:", theta_newton)


### Part 2

In this task, you are given a data set named **"regression.csv"**. 
- Split the dataset into 80% for training and 20% for test
- Check the correlation of features (X) with the target (Y) (Visually as well). 
- Remove the 3 least correlated variables. The correlation is checked only using the train dataset
- Perform standard scaling on the remaining feature Columns

Implement a class $\textbf{LinearRegression}$ that has at least two functions, $\textbf{fit}$ and $\textbf{predict}$ for fitting a linear regression model and predicting the results. You need to use the $\textbf{Optimization}$ and $\textbf{Loss}$ class inside this. Fit a linear regression model with *Mean Square Loss* and *Stochastic Gradient Descent*.

Also, generate the loss trajectory for both training and testing datasets


In [None]:
#### Write your code here

**Evaluation**

Compute the test predictions using the Linear Regression from sklearn and compare the Betas and Results to your implementation.

In [None]:
#### Write your code here
from sklearn.linear_model import LinearRegression

**Point to ponder**

While optimizing the loss function for Linear Regression or Logistic Regression, one needs to initialize the model parameters. It is well known that deep neural networks do not function if the model parameters are initialized to zero. Why is it so? Does this issue also arise while optimizing the loss function for Linear or Logistic Regression? Explain.

### Part 3

You are given a file **"logistic.csv"**. 
- Split the dataset into 80% for training and 20% for test.
- Explore the dataset and visualize distribution of the features (train data only). 
- Do a Violin plot for the 5 features that have the highest standard deviation. 
- Remove outliers form the dataset. *(This can be done by either removing the rows with outliers or by clipping, comment on the pros and cons of whichever method you employ)*
- Perform standard scaling.

This part of the assignment involves a classification task. Implement a class $\textbf{LogisticRegression}$ having at least two functions, $\textbf{fit}$ and $\textbf{predict}$ for fitting the model and getting the predictions. Fit a logistic regression model with Cross Entropy Loss and Newton’s Method.

Report the test accuracy, plot the confusion matrix and also compute the precision, recall and F-score. 

Also, generate the loss trajectory for both training and testing.


In [None]:
### Write your code here

**Point to Ponder**

Read about precision, recall and F-score. Suppose model A and model B both have same accuracy, but model B has a higher F-score, which model would be suited? 

### Part 4

**Discriminant Analysis**

In this part of the assignment you will implement linear and quadratic discriminant analysis classifiers on the iris dataset *from scratch*. Again, this should follow an object oriented method of implementation where you need 2 classes $\textbf{LDA()}$ and $\textbf{QDA()}$ with the associated $\textbf{fit()}$ and $\textbf{predict()}$ methods.


In [None]:
### Write your code here

from sklearn import datasets
import pandas as pd
import numpy as np
iris = datasets.load_iris()

#print(iris.DESCR)

df_iris = pd.DataFrame(np.hstack([iris.data,iris.target[...,np.newaxis]]),columns=['X1', 'X2', 'X3', 'X4', "Y"])

df_iris.head()

**Evaluation**

Compare your implementation with those of sklearn Library, both in terms of accuracy and timing. Visualize all comparisons in a meaningful manner.

In [None]:
### Write your code here
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis, LinearDiscriminantAnalysis
