Importing libraries

NumPy: utilized for high-performance vector arithmetic.

Pandas: utilized for dataset management and cleaning.

Matplotlib: utilized for generating graphical visualizations.

LogisticRegressor: a simplt custom built class implementation for the perceptron logic.

\* from shared: utils for performing repeating functions, such as loading data, calculating metrics, and more.

In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import sys
from pathlib import Path

root_dir = Path().absolute().parent.parent
if str(root_dir) not in sys.path:
    sys.path.insert(0, str(root_dir))

from src.regression.logistic import LogisticRegressor
from src.shared import *

I chose a simple dataset suitable for probability calculation. \
The Floods dataset fits these criteria well and is perfect for demonstrating the logistic regressor.

In [None]:
# --- 1. Load Data ---
df = load_dataset('floods')


# --- 2. Data Cleaning ---
# There is no need to clean the data as it is already clean in this dataset.


# --- 3. Analysis ---
# Correlation Matrix (Simple 1-to-1 relationship)
correlations = df.corr()['FloodProbability'].sort_values(ascending=False).drop('FloodProbability')
print("A table of features correlations with flood probability:")
print(correlations)

# We can see there is no particular feature that has a very high correlation with flood probability. 
# This means we will need to use as much features as we can to predict flood probability with the most accuracy.

ValueError: Dataset 'a' not found.

In [7]:
# --- 1. Select Features & Target ---
# using all features for prediction (we sadly can't plot a 2D or 3D graph because of this)
X = df.drop('FloodProbability', axis=1).values
y = df['FloodProbability'].values


# --- 2. Split Data for training & testing ---
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=None)


# Normalize features to 0-1 range for better performance
train_min = X_train.min(axis=0)
train_range = X_train.max(axis=0) - train_min + 1e-15

X_train = (X_train - train_min) / train_range
X_test = (X_test - train_min) / train_range


# --- 3. Training ---
print(f"Training on {len(X_train)} samples")

model = LogisticRegressor()

# Small inputs (0-1) yield small gradients, so we use a high Learning Rate for fast convergence.
model.fit(X_train, y_train, learning_rate=2, n_epochs=1600, show_progress=False)


predictions = model.predict(X_test)

mae = calculate_mae(y_test, predictions)
r2 = calculate_r2(y_test, predictions)

print(f"Mean Absolute Error: {mae:.4f}")
print(f"R-squared Score: {r2:.4f}")

Training on 40000 samples
Mean Absolute Error: 0.0002
R-squared Score: 0.9999
