<center><img src=img/MScAI_brand.png width=70%></center>

# Scikit-Learn and OOP: Exercises and Solutions

In [1]:
import doctest

In [2]:
class C:
    def __init__(self, data=17):
        self.data = data
    def __repr__(self):
        return f"C({self.data})"
    def __lt__(self, other):
        return self.data < other.data

### Exercise

Run this code and explain the result:

In [3]:
C(17) <= C(18)

TypeError: '<=' not supported between instances of 'C' and 'C'

### Exercise

Run this code and explain the result. **Hint**: the special `id` function in Python gets the **location** of the object in memory.

In [None]:
C(17) == C(17) 

### Exercise

Edit the definition of `C`, implementing `__eq__` and `__le__`, to fix the problems above.

In [None]:
class C:
    """
    >>> C(17) <= C(18)
    True
    >>> C(17) == C(17)
    True
    """
    def __init__(self, data=17):
        self.data = data
    def __repr__(self):
        return f"C({self.data})"
    def __lt__(self, other):
        return self.data < other.data

In [None]:
doctest.run_docstring_examples(C, globals(), verbose=True)

### Exercise

This is a classic OOP exercise. Implement a class `Vehicle`, and then create sub-classes `Bicycle` and `Car` from it using `super`. A `Vehicle` has some number of wheels, and a colour, and a method `move`.

In [None]:
class Vehicle:
    """
    >>> v = Vehicle()
    >>> v.move()
    The vehicle is moving in an abstract kind of way
    """
    pass # REPLACE WITH YOUR CODE - remember, "pass" is a "do-nothing" placeholder.
    
class Bicycle(Vehicle):
    """
    >>> b = Bicycle("red")
    >>> b.move()
    The red bicycle is pedalling
    """
    pass
    
class Car(Vehicle):
    """
    >>> c = Car("blue")
    >>> c.move()
    The blue car is combusting petrol
    """
    pass

In [None]:
doctest.run_docstring_examples(Vehicle, globals(), verbose=True)
doctest.run_docstring_examples(Bicycle, globals(), verbose=True)
doctest.run_docstring_examples(Car, globals(), verbose=True)


### Exercise: predict the mode

In many machine learning scenarios it's good to create a simple **baseline** to compare a more sophisticated algorithm against. In classification, one simple example is to predict the **mode** -- the most common $y$ value in the training data, ignoring the $X$.

In [None]:
from collections import Counter

def mode(y):
    """
    Example: Counter("aaba") returns (item, count) tuples ordered by count:
    [('a', 3), ('b', 1)]
    So the most common item is at [0][0]
    """
    return Counter(y).most_common()[0][0]
mode(['a', 'a', 'b', 'a'])

Create a `ModePredictor` class using the above code. Inherit from Scikit-Learn `BaseEstimator` and `ClassifierMixin`. Compare its classification accuracy on the dataset below against another classifier, such as `RandomForestClassifier`. Remember, we should **inherit** classification accuracy, not implement it ourselves!

In [None]:
import pandas as pd
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

In [None]:
df = pd.read_csv("data/unbalanced.csv", index_col=0)
df.head()

In [None]:
X = df[["X0", "X1"]].values
y = df["y"].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)

In [None]:
X_train

In [None]:
y_train

In [None]:
class ModePredictor(BaseEstimator, ClassifierMixin):
    pass ## REPLACE WITH YOUR CODE

When we run the code below we should see a table of results like this:

```python
ModePredictor(): 0.90
RandomForestClassifier(): 0.90
```

In [None]:
clfs = [ModePredictor(), RandomForestClassifier()]
for clf in clfs:
    clf.fit(X_train, y_train)
    print(f"{clf}: {clf.score(X_test, y_test):.2f}")