## Classical Machine Learning
Prerequisites:
- Python experience, including a basing understanding of python syntax, loops, conditional statements, functions, and data types in python
- Some background in numerical computing - MATLAB, R, numpy, or similar, and an understanding of vectors, matrices, and relevant linear algebra concepts

Goals for this session:
- Introduce the sklearn API and common practices in the field of machine learning
- Provide intuition for various classical machine learning techniques regarding their complexity, performance, and effectiveness in the context of different applications
- Explore concepts such as feature selection, model selection, hyperparameter tuning, performance metrics, and the bias/variance tradeoff
- Apply this knowledge to a real-world dataset in a competition-style activity


In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import precision_score, recall_score, f1_score
import pickle

Classifier shortlist
- LogisticRegression
- RidgeClassifier
- SVC
- KNeighborsClassifier
- GaussianProcessClassifier (too slow)
- GaussianNB
- DecisionTreeClassifier

In [4]:
from sklearn.linear_model import LogisticRegression, RidgeClassifier
from sklearn.svm import SVC, LinearSVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier

In [5]:
with open("../data/classical_ml_activity.pkl", "rb") as f:
    data = pickle.load(f)

X_train = data["X_train"].values
X_test = data["X_test"].values
y_train = data["y_train"].values
y_test = data["y_test"].values

# Train your model

In [6]:
params = {
    "class_weight": "balanced",
    "random_state": 87,
    "max_iter": 10_000
}
model = LogisticRegression(**params)

In [7]:
trained_models = []
metrics = {}

name = model.__class__.__name__
print(f"Training {name}...")
metrics.setdefault("model_name", []).append(name)

model.fit(X_train, y_train)
test_acc = model.score(X_test, y_test)
metrics.setdefault("accuracy", []).append(test_acc)
print(f"Model accuracy: {test_acc*100:.2f}%")

y_pred = model.predict(X_test)
for metric_name, metric in zip(
    ["precision", "recall", "f1"],
    [precision_score, recall_score, f1_score]
):
    metrics.setdefault(metric_name, []).append(metric(y_test, y_pred, average="weighted"))


Training LogisticRegression...
Model accuracy: 87.85%
