# Logistic Regression
Categorical data - Classification issue on unbounded values (0 or 1)
</br>Requires the use of a *sigmoid activation function*
</br>Sigmoid function maps between 0 and 1: $s(z) = \frac{1}{1 + e^{-z}}$, where $z = w^tx + b$

- $z$: linear combination
- $w$: weight vector
- $b$: bias
- $w^tx$: dot product of the weights and features
- $s(z) or \hat{y}$: predicted probability

Rule:
- If s(z) >= 0.5 then -> 1
- Else, predict 0

## Cost function

Unlike Linear Regression, we cannot use mean squared error ‚Äî applying it to sigmoid outputs leads to a **non-convex cost surface**, making optimization unstable.

Instead, logistic regression uses **binary cross-entropy loss** (also known as **log loss**):

**Loss =**  
`-(1/m) * Œ£ [y(i) * log(≈∑(i)) + (1 - y(i)) * log(1 - ≈∑(i))]`

Where:  
- `y(i)` is the true label (0 or 1)  
- `≈∑(i)` is the predicted probability from the sigmoid function

---

### üîç Intuition

- The loss measures how far off the predicted probability `≈∑` is from the actual label `y`.
- If the **true label is 1**, only the term `y * log(≈∑)` matters:
  - If `≈∑` is close to 1 ‚Üí **low loss**
  - If `≈∑` is close to 0 ‚Üí **high loss**
- If the **true label is 0**, only the term `(1 - y) * log(1 - ≈∑)` matters:
  - If `≈∑` is close to 0 ‚Üí **low loss**
  - If `≈∑` is close to 1 ‚Üí **high loss**
- This means **confident, wrong predictions are penalized heavily**, which encourages the model to output well-calibrated probabilities.

---

### üéØ Why is this loss function used?

- It follows from **maximum likelihood estimation** ‚Äî we're maximizing the probability that our model assigns to the correct labels.
- It is **convex** when used with a sigmoid, so gradient descent can reliably find a **global minimum**.



In [31]:
# Imports
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import pprint
import pickle

## Data Loading and Analysis

In [32]:
# From https://developer.ibm.com/articles/implementing-logistic-regression-from-scratch-in-python/
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer

In [33]:
data = load_breast_cancer()

In [34]:
# Combine features and target into a single DataFrame
df = pd.DataFrame(data.data, columns=data.feature_names)
df['diagnosis'] = data.target

# Split into train and test sets
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

In [35]:
df.sample(10)

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,diagnosis
54,15.1,22.02,97.26,712.8,0.09056,0.07081,0.05253,0.03334,0.1616,0.05684,...,31.69,117.7,1030.0,0.1389,0.2057,0.2712,0.153,0.2675,0.07873,0
247,12.89,14.11,84.95,512.2,0.0876,0.1346,0.1374,0.0398,0.1596,0.06409,...,17.7,105.0,639.1,0.1254,0.5849,0.7727,0.1561,0.2639,0.1178,1
423,13.66,19.13,89.46,575.3,0.09057,0.1147,0.09657,0.04812,0.1848,0.06181,...,25.5,101.4,708.8,0.1147,0.3167,0.366,0.1407,0.2744,0.08839,1
167,16.78,18.8,109.3,886.3,0.08865,0.09182,0.08422,0.06576,0.1893,0.05534,...,26.3,130.7,1260.0,0.1168,0.2119,0.2318,0.1474,0.281,0.07228,0
257,15.32,17.27,103.2,713.3,0.1335,0.2284,0.2448,0.1242,0.2398,0.07596,...,22.66,119.8,928.8,0.1765,0.4503,0.4429,0.2229,0.3258,0.1191,0
353,15.08,25.74,98.0,716.6,0.1024,0.09769,0.1235,0.06553,0.1647,0.06464,...,33.22,121.2,1050.0,0.166,0.2356,0.4029,0.1526,0.2654,0.09438,0
327,12.03,17.93,76.09,446.0,0.07683,0.03892,0.001546,0.005592,0.1382,0.0607,...,22.25,82.74,523.4,0.1013,0.0739,0.007732,0.02796,0.2171,0.07037,1
401,11.93,10.91,76.14,442.7,0.08872,0.05242,0.02606,0.01796,0.1601,0.05541,...,20.14,87.64,589.5,0.1374,0.1575,0.1514,0.06876,0.246,0.07262,1
531,11.67,20.02,75.21,416.2,0.1016,0.09453,0.042,0.02157,0.1859,0.06461,...,28.81,87.0,550.6,0.155,0.2964,0.2758,0.0812,0.3206,0.0895,1
284,12.89,15.7,84.08,516.6,0.07818,0.0958,0.1115,0.0339,0.1432,0.05935,...,19.69,92.12,595.6,0.09926,0.2317,0.3344,0.1017,0.1999,0.07127,1


In [36]:
# Name of all parameters
list(df.columns.values)

['mean radius',
 'mean texture',
 'mean perimeter',
 'mean area',
 'mean smoothness',
 'mean compactness',
 'mean concavity',
 'mean concave points',
 'mean symmetry',
 'mean fractal dimension',
 'radius error',
 'texture error',
 'perimeter error',
 'area error',
 'smoothness error',
 'compactness error',
 'concavity error',
 'concave points error',
 'symmetry error',
 'fractal dimension error',
 'worst radius',
 'worst texture',
 'worst perimeter',
 'worst area',
 'worst smoothness',
 'worst compactness',
 'worst concavity',
 'worst concave points',
 'worst symmetry',
 'worst fractal dimension',
 'diagnosis']

### NOTE: 0 - Malignant, 1 - Benign

In [37]:
px.histogram(data_frame=df, x='diagnosis', color='diagnosis',color_discrete_sequence=['#05445E','#75E6DA'])

In [38]:
px.histogram(data_frame=df,x='mean area',color='diagnosis',color_discrete_sequence=['#05445E','#75E6DA'])

Now that the stock standard parameters have been shown, my guess is that most metrics will be different between malignant and benign types.

In [39]:
px.scatter(data_frame=df,x='worst area',color='diagnosis',color_discrete_sequence=['#05445E','#75E6DA'])

In [40]:
px.scatter(data_frame=df,x='mean fractal dimension',color='diagnosis',color_discrete_sequence=['#05445E','#75E6DA'])

Interestingly this one has less deviation between the two diagnoses. Google quotes it as: 'A tumour fractal dimension is a numerical value that quantifies the complexity and irregularity of a tumor's structure.' Therefore the complexity of the growth *does not infer* how dangerous it is.