# Custom scorer

In [197]:
import pandas as pd
from sklearn.model_selection import RandomizedSearchCV
from sklearn.svm import SVC
from scipy import stats

Let's create a dummy "wine" dataset of 1000 wines, 10 features, and 3 classes (0=bad, 1=medium, 2=good wine)

In [198]:
from sklearn.datasets import make_classification

X, y = make_classification(
    n_samples=1000, n_features=10, n_classes=3, n_clusters_per_class=1, weights=[0.6, 0.3, 0.1], random_state=0
)

-----
❓ Our objective is to train a model which **maximizes prediction precision for the good wines (y=2) only**.  

We don't want any customers to be dissatisfied!

----

## Optimizing for accuracy 

In [199]:
# Split train/test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state = 0)

❓ Random Search CV the best SVC model measured on the `accuracy`
(you can limit yourself to `rbf` kernels to start with)

In [225]:
%%time


# Instanciate model
svc = SVC(kernel='rbf')

# Hyperparameter search space
grid = { 'C' : stats.uniform(0, 1),
         'gamma' : [0.001, 0.01, 0.1, 1]
        }

# Instanciate Random Search
random_search = RandomizedSearchCV(svc,grid,
                                   scoring='r2',
                                   n_iter = 100,
                                   cv = 5, 
                                   n_jobs = -1)
# fit it
random_search = random_search.fit(X_train,y_train)

# show best model

random_search.best_estimator_

CPU times: user 245 ms, sys: 94.7 ms, total: 340 ms
Wall time: 2 s


In [226]:
random_search.best_score_

0.6130830976684601

❓ Print classification report on the test set

☝️ Not good enough, we want to focus **only** on class 2

## Custom scoring function

Let's make our own custom metric which returns the precision of class "2" 

reminder: ```precision2 = TP2/(TP2+FP2)``` 

In [227]:
def my_custom_metric(y_true, y_pred):
    TP2 = 0
    FP2 = 0
    for i in range(len(y_true)):
       if(y_pred[i] == 2):
           if(y_true[i] == y_pred[i]):
               TP2 = TP2 + 1
           else:
               FP2 = FP2 + 1
    #print(f'TP2 : {TP2}')
    #print(f'FP2 : {FP2}')
    precision2 =TP2/(TP2+FP2)
    return precision2


_true = [0,0,1,2,2,1]
_pred = [0,1,1,2,1,0]

my_custom_metric(_true, _pred)


1.0

## Optimizing hyper-params for our custom scoring function

In [229]:
# Can you use this function in  sklearn (will crash)
grid = { 'C' : stats.uniform(0, 100),
         'gamma' : [0.001, 0.01, 0.1, 1]
        }
svc = SVC()
random_search = RandomizedSearchCV(svc, grid, 
                   n_iter = 100,
                   cv=5,
                   scoring=my_custom_scorer,
                   random_state = 0).fit(X_train, y_train)
random_search
random_search.best_score_

Traceback (most recent call last):
  File "/Users/orchidaung/.pyenv/versions/lewagon/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/Users/orchidaung/.pyenv/versions/lewagon/lib/python3.10/site-packages/sklearn/metrics/_scorer.py", line 220, in __call__
    return self._score(
  File "/Users/orchidaung/.pyenv/versions/lewagon/lib/python3.10/site-packages/sklearn/metrics/_scorer.py", line 268, in _score
    return self._sign * self._score_func(y_true, y_pred, **self._kwargs)
  File "/var/folders/q0/927vzsz50hl8mmvk22_t7k280000gn/T/ipykernel_32741/2047272053.py", line 12, in my_custom_metric
    precision2 =TP2/(TP2+FP2)
ZeroDivisionError: division by zero

Traceback (most recent call last):
  File "/Users/orchidaung/.pyenv/versions/lewagon/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/Users/o

0.8295238095238096

In [237]:
random_search.best_estimator_

### `Make_scorer`

We need to transform our "metric" into a "sklearn scorer method"

In [238]:
from sklearn.metrics import make_scorer

my_custom_scorer = make_scorer(my_custom_metric)
my_custom_scorer

make_scorer(my_custom_metric)

### RandomizedSearchCV with custom scorer

In [240]:
svc.fit(X_train,y_train)
my_custom_scorer(svc,X_test,y_test)

best_svc = SVC(C=3.36, gamma=0.001)
best_svc.fit(X_train,y_train)

❓ print classification report on the test set and compare with previous one

In [241]:
from sklearn.metrics import classification_report


y_pred = best_svc.predict(X_test)
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_test, y_pred, target_names=target_names))


              precision    recall  f1-score   support

     class 0       0.88      0.97      0.92       183
     class 1       0.85      0.76      0.80        88
     class 2       0.74      0.48      0.58        29

    accuracy                           0.86       300
   macro avg       0.82      0.74      0.77       300
weighted avg       0.86      0.86      0.86       300



✅  We improved our precision for class 2 from 0.69 up to 0.73, but at the detriment of overall accuracy!