# Algoritmul de Regresie Logistica:
-   Este folosit pentru clasificare, in mare parte din cazuri binara dar se poate si multiclass

-   Are Regression in nume pentru ca estimam o valoare intre 0-1, valoare care reprezinta o probabilitate, de exemplu putem avea ca rezultat 0.762 care inseamna 76.2% sa fie True, clasa 1. Cu un threshold de 0.5 putem clasifica astfel rezultatul, dar algoritmul in spate estimeaza o probabilitate.
  
-   Super similar cu Linear Regression pana la un anumit punct. Efectiv e o regresie liniara la care aplicam o functie sa fortam valorile sa fie intre 0-1

-   Folosim functia Sigmoid ca sa fortam valorile in intervalul 0 - 1, Sigm(z) = 1/(1+e^(-z))

-   Cross entropy loss function ul e pe scurt: negative average log likelihood
    -   Negative pentru ca vrem sa minimizam functia

In [1]:
%pip install numpy # exemplu de instalare pachet


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m26.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
import numpy as np

In [7]:
def sigmoid(z):
    return 1.0 / (1.0 + np.exp(-z))

def calculate_gradient(theta, X, y):
    m = y.size # numarul de instante (observatii)
    return (X.T @ (sigmoid(X @ theta) - y)) / m

def gradient_descent(X, y, alpha = 0.1, num_iter = 100, tol = 1e-7):
    X_b = np.c_[np.ones((X.shape[0], 1)), X]
    theta = np.zeros(X_b.shape[1])
    
    for i in range(num_iter):
        grad = calculate_gradient(theta, X_b, y)
        theta -= alpha * grad
        
        if np.linalg.norm(grad) < tol:
            break
        
    return theta

In [8]:
def predict_prob(X, theta):
    X_b = np.c_[np.ones((X.shape[0], 1)), X]
    return sigmoid(X_b @ theta)

def predict(X, theta, treshold = 0.5):
    return (predict_prob(X, theta) >= treshold).astype(int)

In [9]:
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [10]:
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

theta_hat = gradient_descent(X_train_scaled, y_train, alpha=0.1)

y_pred_train = predict(X_train_scaled, theta_hat)
y_pred_test = predict(X_test_scaled, theta_hat)

train_acc = accuracy_score(y_train, y_pred_train)
test_acc = accuracy_score(y_test, y_pred_test)

In [11]:
print(f"Train acc: {train_acc}")
print(f"Test acc: {test_acc}")

Train acc: 0.978021978021978
Test acc: 0.9912280701754386
