<a href="https://colab.research.google.com/github/VivianPita/lab_iagi/blob/main/Esercitazione6_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://colab.research.google.com/github/Sapienza-AI-Lab/esercitazione6-22-23/blob/main/Exercise1.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>

## Logistic Regression Implementation
In questo esercizio implementeremo la regressione logistica da zero.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.optimize as opt
from sklearn import linear_model

### Esercizio 1 - Sigmoid Function
Iniziate implementando la funzione sigmoide, che è definita come:

$$g(z) = \frac{1}{1+e^{-z}}$$

Nel nostro caso $z = \theta^Tx$.

Non usate cicli for, ma usate le funzioni di numpy per sfruttare il calcolo vettoriale.

In [None]:
# Sigmoid Function
def sigmoid(z):
    raise NotImplementedError("You need to implement this function")

### Esercizio 2 - Logistic Regression Cost Function
Implementate la funzione di costo per la regressione logistica, che è definita come:

$$J(\theta) = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(h_\theta(x^{(i)})) + (1-y^{(i)})\log(1-h_\theta(x^{(i)}))$$

Usate la funzione sigmoide che avete implementato precedentemente e continuate a sfruttare la vettorizzazione.

In [None]:
# Logistic Regression Cost Function
def logistic_cost(W, X, Y):
    raise NotImplementedError("You need to implement this function")

### Esercizio 3 - Gradient Function (single step)

Ora implementate la funzione che calcola il gradiente della funzione di costo. Il gradiente è definito come:

$$\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m}\sum_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)}$$

In [None]:
# Logistic Regression Cost Gradient
def cost_gradient(W, X, Y):
    raise NotImplementedError("You need to implement this function")

### Esercizio 4 - Prediction Function

Implementate la funzione che calcola la predizione. La predizione è definita come:

$$h_\theta(x) = \begin{cases} 1 & \text{se } g(W^Tx) \geq 0.5 \\ 0 & \text{se } g(W^Tx) < 0.5 \end{cases}$$

In [None]:
# Predict Function
def predict(W, X):
    raise NotImplementedError("You need to implement this function")

## Admission Dataset
Usiamo l'Admission Dataset per testare le funzioni che abbiamo implementato. Il dataset contiene i risultati di due esami e la decisione di ammissione dei candidati in un'università.

In [None]:
# Load test data
path = 'data/exercise1_data.txt'
data = pd.read_csv(path, header=None, names=['Exam 1', 'Exam 2', 'Admitted'])
data.head()

In [None]:
# Visualize data
positive = data[data['Admitted'].isin([1])]
negative = data[data['Admitted'].isin([0])]

fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['Exam 1'], positive['Exam 2'], s=50, c='b', marker='o', label='Admitted')
ax.scatter(negative['Exam 1'], negative['Exam 2'], s=50, c='r', marker='x', label='Not Admitted')
ax.legend()
ax.set_xlabel('Exam 1 Score')
ax.set_ylabel('Exam 2 Score')
plt.show()


In [None]:
# Set up input and output matrices
X = data[['Exam 1', 'Exam 2']].values
m, n = X.shape
X = np.concatenate((np.ones((m, 1)), X), axis=1)
n += 1
Y = np.array(data[['Admitted']])

In [None]:
# Test sigmoid function
z = np.linspace(-10, 10, 100)
out = sigmoid(z)
plt.figure()
plt.plot(z, out)
plt.show()

In [None]:
# Test logistic cost function
W = np.matrix(np.ones((3, 1))*0.1)
print('Test cost function: ', logistic_cost(W, X, Y))

In [None]:
# Test logistic regression cost gradient
result = opt.fmin_tnc(func=logistic_cost, x0=W, fprime=cost_gradient, args=(X, Y))
print('Logistic cost after optimization: ', logistic_cost(result[0], X, Y))

In [None]:
# Predict with computed weights
Y_hat = predict(result[0], X)
accuracy1 = 1.0/m * np.sum(Y_hat == Y.reshape((m,)))

print("Accuracy with our implementation: ", accuracy1)

# Test with sklearn
logreg = linear_model.LogisticRegression(penalty=None)
logreg.fit(X, Y.reshape((m,)).T)
result2 = logreg.predict(X)
accuracy2 = np.sum(result2 == Y.reshape((m,))) / m
print("Accuracy with sklearn: ", accuracy2)

# The two accuracies should be the same

In [None]:
# Visualize decision boundary
x1_min, x1_max = X[:, 1].min(), X[:, 1].max(),
x2_min, x2_max = X[:, 2].min(), X[:, 2].max(),
xx1, xx2 = np.meshgrid(np.linspace(x1_min, x1_max), np.linspace(x2_min, x2_max))
h = sigmoid(np.c_[np.ones((xx1.ravel().shape[0], 1)), xx1.ravel(), xx2.ravel()].dot(result[0]))
h = h.reshape(xx1.shape)
fig, ax = plt.subplots(figsize=(12,8))
ax.contour(xx1, xx2, h, [0.5], linewidths=1, colors='g')
ax.scatter(positive['Exam 1'], positive['Exam 2'], s=50, c='b', marker='o', label='Admitted')
ax.scatter(negative['Exam 1'], negative['Exam 2'], s=50, c='r', marker='x', label='Not Admitted')
ax.legend()
ax.set_xlabel('Exam 1 Score')
ax.set_ylabel('Exam 2 Score')
plt.show()

