# Logistic Regression Training
Binary classification algorithm.
Typically used when the target variable has only two possible outcomes.

However, logistic regression can be extended to handle multi-class classification problems through techniques such as *one-vs-all* or *multinomial* logistic regression.

## One-vs-all or One-vs-Rest (OVR)
Train separate binary logistic regression model for each class.
Each model is trained to distinguish that class from all other classes.
During prediction, we run each observation through all models, and the class with the highest probability is assigned as the predicted class.

Our dataset has a discrete number of possible outcomes: `[Ravenclaw, Slytherin, Gryffindor, Hufflepuff]`.

This method allows breaking down by splitting up into multiple binary class models.

We will be using `k=4` *binary classifiers*.

## Features Selection
Based on data visualization:
- `Arithmancy` and `Care of magical Creatures` cannot classify well.
- `Defense Against the Dark Arts` and `Astronomy` are anti-correlated; we can drop one.

All other numerical features will be used for training.

## Data preparation
- Only meaningful features will be used
- Remove rows containing `NaN`
- *Standardize* data

## Model Training
- Input values: $X = (x_1, x_2, ..., x_n)$
- Weights: $W=(w_1, w_2, ..., w_n)$
- $b$: bias parameter

$$z=b+x_1.w_1+w_2.w_2+...+x_n.w_n = b+\sum_{i=1}^{n}{x_i.w_i}$$

**Logistic function:**
$$g(z)=\frac{1}{1+e^{-z}}=\frac{1}{1+e^{-(b+X.W)}}$$

### Matrix Notation
$X = \begin{bmatrix} 
x_{1}^{(1)} & x_{2}^{(1)} & \cdots & x_{n}^{(1)} & 1 \\ 
x_{1}^{(2)} & x_{2}^{(2)} & \cdots & x_{n}^{(2)} & 1\\ 
\vdots & \vdots & \ddots & \vdots & \vdots \\ 
x_{1}^{(m)} & x_{2}^{(m)} & \cdots & x_{n}^{(m)} & 1
\end{bmatrix}$

$W = \begin{bmatrix} 
w_{1} \\ w_{2} \\ \vdots \\ w_{n} \\ b 
\end{bmatrix}$


In [79]:
import numpy as np

%run "utils.ipynb"

df = get_data()
# ============== DATA PREPROCESSING ================
print('Data frame shape:', df.shape)
excluded_features = ['Arithmancy', 'Care of Magical Creatures', 'Defense Against the Dark Arts']
df.drop(df.columns[1:5], inplace=True, axis=1)
df.drop(excluded_features, inplace=True, axis=1)
df.dropna(inplace=True)
print('Data frame shape after data processing:', df.shape)

# Extract houses
df_houses = df['Hogwarts House']

df_features = df.drop(df.columns[:1], axis=1)
print(df_features.shape)

# Standardize data
df_std_features = df_features.apply(lambda x: (x - x.mean()) / x.std())

# ============== INITIALIZE MODEL ================
# Create matrices
X = np.array(df_std_features)
ones = np.ones((len(X), 1), dtype=float)
# Add a columns of 1s RIGHT
X = np.concatenate((X, ones), axis=1)

features = df_std_features.columns.tolist()
houses = df_houses.unique().tolist()

w_indexes = df_std_features[:-1].columns.insert(0, ['Bias'])
W = pd.DataFrame(columns=houses, index=w_indexes)
W = W.infer_objects(copy=False).fillna(0)
W.head(11)

Data frame shape: (1600, 18)
Data frame shape after data processing: (1333, 11)
(1333, 10)


Unnamed: 0,Ravenclaw,Slytherin,Gryffindor,Hufflepuff
Bias,0.0,0.0,0.0,0.0
Astronomy,0.0,0.0,0.0,0.0
Herbology,0.0,0.0,0.0,0.0
Divination,0.0,0.0,0.0,0.0
Muggle Studies,0.0,0.0,0.0,0.0
Ancient Runes,0.0,0.0,0.0,0.0
History of Magic,0.0,0.0,0.0,0.0
Transfiguration,0.0,0.0,0.0,0.0
Potions,0.0,0.0,0.0,0.0
Charms,0.0,0.0,0.0,0.0
