# Homework 4

(╯°□°)╯︵◓

You've been search far and wide, but you're still trying to understand the power that's inside. This time round though, you're armed with new weapons: supervised learning algorithms. Pokemons will have no more secrets after you analyse the pokedex!

The data can be found under `pokedex/pokemons.csv`, and is the same as homeworks 1, 2, & 3. Run the cell below to get an overview of the dataset:

In [None]:
import pandas as pd
import numpy as np

df = pd.read_csv('pokedex/pokemons.csv')
df.head()

## Problem 1

We've learned that Pokemon masters never mix training and battles. Since we are building supervised learning models, we want to _split our data_ into train and test datasets. We are only running a single round of experiments with no hyperparameter optimisation, so we'll skip validation sets this time.

💪 **Task: Split the `df` DataFrame into training and test DataFrames.**
- the split should be 80% training 20% test
- use `random_state=0`
- store your datasets into two variables called `train_df` and `test_df`. No need to save to disk!

In [None]:
# INSERT YOUR CODE HERE

In [None]:
def test_split():
    assert "train_df" in globals(), "Can't find train_df, have you used the correct variable name for your train dataset?"
    assert "test_df" in globals(), "Can't find train_df, have you used the correct variable name for your test dataset?"
    assert train_df["Total"].sum() == 275746, "Your dataset split doesn't look quite right. Did you use the correct random_state?"
    print('Success! 🎉')

test_split()

## Problem 2

Now that we have split our dataset, we are ready to train. A crucial statistic in pokemon battles is `HP`. This is the amount of damage you have to inflict to your opponent to win the fight, so being able to _predict_ this amount would be an enormous advantage 👊.

💪 **Task: Train a linear regression model which predicts the label `HP`.**
- train the model on your training dataset
- use `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, and `Speed` as features
- use `HP` as label
- scale the features using standardization before you train the model
- store your trained model in a variable called `reg`

In [None]:
# INSERT YOUR CODE HERE

In [None]:
import math

def test_regression():
    assert reg, "Can't find reg, have you used the correct variable name for your model?"
    assert math.isclose(reg.coef_.sum(), 15.46275, rel_tol=1e-6), "Your model parameters don't look quite right"
    print('Success! 🎉')

test_regression()

🧠 **Bonus Task: List and describe the main steps that happen during your linear regression model's _training_ , i.e inside of sklearn's `.fit()` method.**

🧠 **Bonus Task: Explain the purpose of feature scaling, and why it's a good idea to use it here.**

## Problem 3

You encounter an unknown pokemon, and it looks very strong. 🙀 Use your `HP` regression model to see if you can take it on!

💪 **Task: Predict the `HP` of an unknown pokemon using your linear regression model.**
- the stats of the unknown pokemon are found below
- predict using your trained model, `reg`
- store the prediction in a variable called `y_predict`


In [None]:
attack = 79
defense = 109
sp_atk = 73
sp_def = 84
speed = 68

# INSERT YOUR CODE HERE 

In [None]:
def test_predict_hp():
    expected_prediction = 70.324
    assert y_predict, f"Can't find y_predict, have you used the correct variable name?"
    assert math.isclose(y_predict, expected_prediction, rel_tol=1e-4), f'The prediction should be {expected_prediction}, but your model predicted {y_predict}'
    print('Success! 🎉')
    print(f"The unknown pokemon has predicted HP: {y_predict.item():.1f}")
    return

test_predict_hp()

## Problem 4

Professor Oak told you about a rare breed of exceptionally powerful pokemon... the _legendary_ pokemon. A trainer who finds and captures a legendary pokemon is sure to become invicible!

💪 **Task: Train a logistic regression model which predicts if pokemons are `Legendary`.**
- train the model on your training dataset
- use `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, and `Speed` as features
- use `Legendary` as label
- scale the features using standardization before you train the model
- store your trained model in a variable called `clf`

In [None]:
# INSERT YOUR CODE HERE

In [None]:
def test_classification():
    assert clf, "Can't find clf, have you used the correct variable name for your model?"
    assert math.isclose(clf.coef_.sum(), 5.71640, rel_tol=1e-5), "Your model parameters don't look quite right"
    print('Success! 🎉')
    return

test_classification()

🧠 **Bonus Task: What are the differences between logistic regression and linear regression?**

## Problem 5

Finding legendary pokemons is no easy task, and we expect that we need a more _powerful_ model to accurately predict them.

💪 **Task: Train a logistic regression model with polynomial features and regularization which predicts if pokemons are Legendary.**
- use `HP`, `Attack`, `Defense`, `Sp. Atk`, `Sp. Def`, and `Speed` as features
- use `Legendary` as label
- add polynomial features of degree 3
- scale the features using standardization before you train the model
- use ridge logistic regression to regularize your model
- store your trained model in a variable called `clf`  
Pro-tip: [RidgeClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeClassifier.html)

In [None]:
# INSERT YOUR CODE HERE

In [None]:
def test_polynomial_regression():
    assert clf, "Can't find clf, have you used the correct variable name for your model?"
    assert math.isclose(clf.coef_.sum(), 0.340915, rel_tol=1e-5), "Your model parameters don't look quite right"
    print('Success! 🎉')
    return

test_polynomial_regression()

🧠 **Bonus Task: How do polynomial features make our model more powerful?**

🧠 **Bonus Task: What is the purpose of regularization? Why is it a good idea to use it here?**

💪 **Bonus Task: Train the exact same regularized logistic regression model with polynomial features, but this time, chain your preprocessors and your model into a [Pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html)**

## Problem 6

You are travelling across the land, when you spot a large rainbow bird in the sky. Maybe it's a legendary pokemon! Let's check with our freshly trained regularized polynomial classifier.

💪 **Task: Predict if the rainbow bird is a legendary pokemon using your classifier.**
- the stats of the rainbow bird pokemon are found below
- predict using your trained model, `clf`
- store the prediction in a variable called `y_predict`

In [None]:
hp = 106
attack = 130
defense = 90
sp_atk = 110
sp_def = 154
speed = 90

# INSERT YOUR CODE HERE

In [None]:
def test_predict_legendary():
    assert y_predict == True, f'The prediction should be {True}, but your model predicted {y_predict}'
    print('Success! 🎉')
    print("The rainbow bird is predicted to be a legendary pokemon!")
    
test_predict_legendary()

## Problem 7

This legendary pokemon classifier is neat, and the Pokedex scientists are interested in including it in their next update. However, they want to make sure that it is accurate enough. 


💪 **Task: Evaluate the accuracy of your legendary Pokemon classifier.**
- evaluate your model on your test dataset
- store the prediction in a variable called `accuracy`

In [None]:
# INSERT YOUR CODE HERE

In [None]:
def test_evaluation():
    assert math.isclose(accuracy, 0.93125, rel_tol=1e-5), "Your accuracy doesn't look quite right"
    print('Success! 🎉')
    print(f"You can predict legendary pokemons with an accuracy of {accuracy*100:.1f}%!")
    
test_evaluation()

🧠 **Bonus Task: What is the definition of the accuracy metric?**