# Final Exercise
This is the final exercise in our Into to Machine Learning Course.\
207027053 - Gil Yair Yamin

First thing we need to do is of course import all libraries we will be using:

In [29]:
import pandas as pd
import numpy as np
import matplotlib as mpl

We have here a csv file with data about cancer patients.
Using numpy, lets import the data from the csv we were given.

In [30]:
raw_data = np.loadtxt("cancer_data.csv", dtype=np.longdouble, delimiter=",")

We need to normalize our data to continue:

In [31]:
def normalize(raw_data):
    res = np.zeros_like(raw_data, dtype=np.longdouble)
    ExpectedArr = np.zeros(shape=(len(raw_data[0])), dtype=np.longdouble)
    DeviationArr = np.zeros(shape=(len(raw_data[0])), dtype=np.longdouble)

    for i in range(len(raw_data[0])):
        Expected = raw_data[:, i].mean()

        shiftedCol = raw_data[:, i] - Expected
        Deviation = np.sqrt(np.square(shiftedCol).mean())

        res[:, i] = shiftedCol / Deviation if Deviation > 0 else shiftedCol

        ExpectedArr[i] = Expected
        DeviationArr[i] = Deviation

    return res, ExpectedArr, DeviationArr


normalized_data, ExpectedArr, DeviationArr = normalize(raw_data)

Now we'll create the matrix X, and vector y

In [32]:
ones = np.ones((len(normalized_data), 1), dtype=np.longdouble)

X = np.concatenate((ones, normalized_data[:, :-1]), axis=1)
y = normalized_data[:, -1]

Let's define a function that recieves the vector $\Theta$ and a vector $x$, and returns the result of $h_\Theta(x)$\
We are assuming that $x[0] = 1$.

In [33]:
def hTheta(theta: np.ndarray, x: np.ndarray):
    return (theta * x).sum()

Let's define a function that recieves a vector $\Theta$, the matrix $X$, and the vector $y$, and returns the value of $J(\Theta)$.

In [34]:
def MSE(theta: np.ndarray, X: np.ndarray, y: np.ndarray):
    return np.square(np.linalg.norm(X.dot(theta) - y)) / (2 * len(y))

Let's define a function that recieves a vector $\Theta$, the matrix $X$, and the vector $y$, and returns the value of $\triangledown J(\Theta)$.

In [35]:
def Gradient(theta: np.ndarray, X: np.ndarray, y: np.ndarray):
    return X.transpose().dot((X.dot(theta) - y)) / len(y)

Let's write the Gradient Decent function here.\
We must recieve the data, including X and y, and we also must recieve parameters for our 3 ending conditions:
$$||\Theta^{(k + 1)} - \Theta^{(k)}|| < \epsilon$$
$$||J(\Theta^{(k + 1)}) - J(\Theta^{(k)}))|| < \delta $$
$$k + 1 < M $$

In [44]:
def GradientDecent(X, y, alpha, epsilon, delta, M):
    thetai = np.zeros((len(X[0])))
    thetaIPlus = 0, 0, 0

    for _ in range(M):
        thetaIPlus = thetai - Gradient(thetai, X, y) * alpha
        if np.linalg.norm(thetaIPlus - thetai) < epsilon:
            break
        if np.abs(MSE(thetaIPlus, X, y) - MSE(thetai, X, y)) < delta:
            break
        thetai = thetaIPlus

    return thetaIPlus

Let's run the GradientDecent function with multiple alpha values (using M=1000):

In [49]:
epsilon = 0.000000001
delta = epsilon
alpha = 1
M = 100

GradientDecent(X, y, alpha, epsilon, delta, M)

array([ 1.38903820e+16,  1.04315693e+32,  1.05353324e+32,  8.21473438e+30,
        4.90964316e+31,  1.05837238e+32, -2.40405681e+31, -8.34917216e+30,
       -4.03217349e+31, -4.16107563e+31])

When we use $\alpha = 1$, the larger our M, the larger our $\Theta$ values get to the point of overflowing with a large enough M.\
So, $\alpha = 1$ does not converge on an answer, instead it diverges, meaning it is too big for our dataset.

In [73]:
alpha = 0.1
M = 2250

GradientDecent(X, y, alpha, epsilon, delta, M)

array([ 1.39377989e-15, -2.84032219e-01,  4.23652633e-01,  4.47311047e-01,
       -2.02455830e-01, -2.18094411e-01,  2.41824487e-01, -2.22091361e-03,
       -7.32317783e-03,  1.53754788e-02])

When we use $\alpha = 0.1$, it seems our answer converges on an answer after around 2250 iterations.
I am assuming that our $\epsilon$ or $\delta$ conditions are met.

In [92]:
alpha = 0.01
M = 16900

GradientDecent(X, y, alpha, epsilon, delta, M)

array([ 1.39442627e-15, -2.83102208e-01,  4.15759922e-01,  4.47553428e-01,
       -2.02596593e-01, -2.11037603e-01,  2.41779854e-01, -2.24661823e-03,
       -7.60235798e-03,  1.58558664e-02])

When using $\alpha = 0.01$, it seems our answer is a bit different, probably more accurate.\
But we are reaching our ending conditions only after around 16900 iterations.\
That means it takes much longer to converge on an answer, yet it is a bit more accurate.

In [96]:
alpha = 0.001
M = 100000

GradientDecent(X, y, alpha, epsilon, delta, M)

array([ 1.39462726e-15, -2.78911122e-01,  3.81054361e-01,  4.48614417e-01,
       -2.03209377e-01, -1.80107499e-01,  2.41602278e-01, -2.35918520e-03,
       -8.74892708e-03,  1.78860456e-02])

When using $\alpha = 0.001$, it seems our answer can again get a bit more accurate.\
But also, it seems that to reach a more accurate result, we need to increase our M by a lot.\
To the point that even after increasing M over 100,000, it seems we didn't yet reach our $\epsilon$ or $\delta$ ending conditions.

In conclusion, it seems that we can get better and more accurate results the smaller our $\alpha$ is.\
But also that means we will need a lot more time to do so.\
Notice, I used a pretty small value of $\epsilon$ and $\delta$.\
When using bigger values of $\epsilon$ and $\delta$, we will converge on the same answer with smaller $\alpha$ values, even though it will take longed.