In [None]:
Logistic Regression
This Logistic regression complies with L2 Regularizátion, where it includes the sigmoid function, Cost function with
L2 Regularizátion, Gradient computation, Gradient descent optimization, Model prediction, Evaluation and visualiation

In [None]:
On the following jupyter cell we import the libraries which are utilized for the Logistic Regression

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt

In [None]:
Sigmoid Function, which this function will convert every value into 0 and 1, which it helps us 
on interpreting it into a probability. Where we use the np.clip(z, 500, 500) in order to avoid overflow when z 
is too large or way too negative for the model

In [None]:
def logistic(z):
    z = np.array(z, dtype=float)
    z = np.clip(z, -500, 500)
    return 1.0 / (1.0 + np.exp(-z))

In [None]:
Cost Function, This function will measure how our model is far from the models prediction, in order to penalize 
the large weights to prevent overfitting we use the regularization term.

In [None]:
def compute_cost(X, y, theta, lambda_=0.0):
    m = len(y)
    h = logistic(X.dot(theta))  
    h = np.clip(h, 1e-15, 1 - 1e-15)  
    cost = - (1/m) * (y.T.dot(np.log(h)) + (1 - y).T.dot(np.log(1 - h)))
    reg = (lambda_ / (2*m)) * np.sum(theta[1:]**2)  
    return cost + reg

In [None]:
Gradient, Which will tell us how to change the weights in case we would want to reduce the error, where 
we compute the difference between the predictions between h and actual y. Then we will applu the vectorized gradient formula
and also the L2 Regularization which is useful to discourage the large weights.

In [None]:
def gradient(X, y, theta, lambda_=0.0):
    m = len(y)
    h = logistic(X.dot(theta))
    grad = (1/m) * (X.T.dot(h - y))
    grad[1:] += (lambda_/m) * theta[1:]
    return grad

In [None]:
Gradient Descent, where here the fucntionw will train the model, where it moves the theta in the opposite direction 
of the gradient in order reduce the error. The learning rate will control how big the steps are. 

In [None]:
def gradient_descent(X, y, theta, alpha, iterations, lambda_=0.0):
    cost_history = []
    for _ in range(iterations):
        grad = gradient(X, y, theta, lambda_)
        theta -= alpha * grad
        cost_history.append(compute_cost(X, y, theta, lambda_))
    return theta, cost_history

In [None]:
Prediction Function, which we will turn the probabilities into class labels, basically where if we have more or 
equal than 0.5 it will be class 1, otherwise it will be 0 

In [None]:
def predict(X, theta, threshold=0.5):
    probs = logistic(X.dot(theta))
    return (probs >= threshold).astype(int)

In [None]:
Load Dataset, its time to load the dataset which we have, we will use the dropna() which will help us fropping the 
rows which have NaN in order to avoid any time of error in maths. Which will keep the numeric features, since 
when talking about logistic regression we mention that it will work onlyw ith numbers.

In [None]:
data = pd.read_csv('cleaned_weather_data_2001-2021.csv')
data = data.dropna()
data = data.select_dtypes(include=[np.number])

In [None]:
Create Labels, Where the last numeric column will be turned into the binary classification target, where
1 if the value is above the median and otherwise is 0

In [None]:
cols = data.columns.tolist()
target = cols[-1]
data['label'] = (data[target] >= data[target].median()).astype(int)
feature_cols = cols[:-1]

In [None]:
Scale the Features and Split the Dataset, by using the StandardScaler we would be able to place the features on the
same scale which is mean0 and std 1 helping the gradient descent converge way faster. Where the columns of 1 will
be added to X to Account for the bias term and after we would split the training and test sets

In [None]:
scaler = StandardScaler()
X = scaler.fit_transform(data[feature_cols].values)
X = np.hstack((np.ones((X.shape[0], 1)), X))  
X_train, X_test, y_train, y_test = train_test_split(X, data['label'].values, test_size=0.3, random_state=42)

In [None]:
Train the model, as it name explains it will train the model, initializing the weights(theta) to zeros, then running
the gradient descent for 3000 steps using alpha = 0.1 as the learning rate

In [None]:
theta = np.zeros(X_train.shape[1])
alpha = 0.1
iterations = 3000
lambda_ = 1.0

theta, cost_history = gradient_descent(X_train, y_train, theta, alpha, iterations, lambda_)

In [None]:
Evaluating the performance, it will give us the check on how well the model will fit, which we would conclude
Final Cost (The lower the better) and accuracy on the training and test date.

In [None]:
print(f"Final cost: {cost_history[-1]:.4f}")
print(f"Training accuracy: {np.mean(predict(X_train, theta) == y_train):.4f}")
print(f"Testing accuracy : {np.mean(predict(X_test, theta) == y_test):.4f}")

In [None]:
Detailed Classification metrics, these are the metrics which will show us the 
- Precision = Of predictive positives which is how many were correct 
- Recall = Of the actual positives we have how many we have found 
- F1 Score = Whihc si the harmonic mean of the precision and recall 