# Logistic Regression
Categorical data - Classification issue on unbounded values (0 or 1)
</br>Requires the use of a *sigmoid activation function*
</br>Sigmoid function maps between 0 and 1: $s(z) = \frac{1}{1 + e^(-z)}$, where $z = w^tx + b$

- $z$: linear combination
- $w$: weight vector
- $b$: bias
- $w^tx$: dot product of the weights and features
- $s(z) or \hat{y}$: predicted probability

Rule:
- If s(z) >= 0.5 then -> 1
- Else, predict 0

## Cost function

Unlike Linear Regression, we cannot use mean squared error — applying it to sigmoid outputs leads to a **non-convex cost surface**, making optimization unstable.

Instead, logistic regression uses **binary cross-entropy loss** (also known as **log loss**):

**Loss =**  
`-(1/m) * Σ [y(i) * log(ŷ(i)) + (1 - y(i)) * log(1 - ŷ(i))]`

Where:  
- `y(i)` is the true label (0 or 1)  
- `ŷ(i)` is the predicted probability from the sigmoid function

---

### 🔍 Intuition

- The loss measures how far off the predicted probability `ŷ` is from the actual label `y`.
- If the **true label is 1**, only the term `y * log(ŷ)` matters:
  - If `ŷ` is close to 1 → **low loss**
  - If `ŷ` is close to 0 → **high loss**
- If the **true label is 0**, only the term `(1 - y) * log(1 - ŷ)` matters:
  - If `ŷ` is close to 0 → **low loss**
  - If `ŷ` is close to 1 → **high loss**
- This means **confident, wrong predictions are penalized heavily**, which encourages the model to output well-calibrated probabilities.

---

### 🎯 Why is this loss function used?

- It follows from **maximum likelihood estimation** — we're maximizing the probability that our model assigns to the correct labels.
- It is **convex** when used with a sigmoid, so gradient descent can reliably find a **global minimum**.



In [1]:
# Imports
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import pprint
import pickle

## Data Loading and Analysis

In [14]:
# From https://developer.ibm.com/articles/implementing-logistic-regression-from-scratch-in-python/
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer

In [15]:
data = load_breast_cancer()

In [25]:
# Combine features and target into a single DataFrame
df = pd.DataFrame(data.data, columns=data.feature_names)
df['prognosis'] = data.target

# Split into train and test sets
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

In [26]:
df.sample(10)

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,prognosis
168,17.47,24.68,116.1,984.6,0.1049,0.1603,0.2159,0.1043,0.1538,0.06365,...,32.33,155.3,1660.0,0.1376,0.383,0.489,0.1721,0.216,0.093,0
104,10.49,19.29,67.41,336.1,0.09989,0.08578,0.02995,0.01201,0.2217,0.06481,...,23.31,74.22,402.8,0.1219,0.1486,0.07987,0.03203,0.2826,0.07552,1
549,10.82,24.21,68.89,361.6,0.08192,0.06602,0.01548,0.00816,0.1976,0.06328,...,31.45,83.9,505.6,0.1204,0.1633,0.06194,0.03264,0.3059,0.07626,1
522,11.26,19.83,71.3,388.1,0.08511,0.04413,0.005067,0.005664,0.1637,0.06343,...,26.43,76.38,435.9,0.1108,0.07723,0.02533,0.02832,0.2557,0.07613,1
172,15.46,11.89,102.5,736.9,0.1257,0.1555,0.2032,0.1097,0.1966,0.07069,...,17.04,125.0,1102.0,0.1531,0.3583,0.583,0.1827,0.3216,0.101,0
109,11.34,21.26,72.48,396.5,0.08759,0.06575,0.05133,0.01899,0.1487,0.06529,...,29.15,83.99,518.1,0.1699,0.2196,0.312,0.08278,0.2829,0.08832,1
76,13.53,10.94,87.91,559.2,0.1291,0.1047,0.06877,0.06556,0.2403,0.06641,...,12.49,91.36,605.5,0.1451,0.1379,0.08539,0.07407,0.271,0.07191,1
482,13.47,14.06,87.32,546.3,0.1071,0.1155,0.05786,0.05266,0.1779,0.06639,...,18.32,94.94,660.2,0.1393,0.2499,0.1848,0.1335,0.3227,0.09326,1
117,14.87,16.67,98.64,682.5,0.1162,0.1649,0.169,0.08923,0.2157,0.06768,...,27.37,127.1,1095.0,0.1878,0.448,0.4704,0.2027,0.3585,0.1065,0
233,20.51,27.81,134.4,1319.0,0.09159,0.1074,0.1554,0.0834,0.1448,0.05592,...,37.38,162.7,1872.0,0.1223,0.2761,0.4146,0.1563,0.2437,0.08328,0


In [27]:
px.histogram(data_frame=df, x='prognosis', color='prognosis',color_discrete_sequence=['#05445E','#75E6DA'])