This notebook demonstrate and test the usage of focal loss reusing functions of `src.train_classifier.py`

In [14]:
from src import config
from src.train_classifier import load_data, RANDOM_STATE, FocalBinaryLoss, split_data, build_model
import numpy as np

## Load data and split into X and y datasets

In [2]:
# Load your data here
X, y, category_names = load_data(config.path_database)
X.head()

Loading data...
Number of message records found 26179
 35 categories found: ['related', 'request', 'offer', 'aid_related', 'medical_help', 'medical_products', 'search_and_rescue', 'security', 'military', 'water', 'food', 'shelter', 'clothing', 'money', 'missing_people', 'refugees', 'death', 'other_aid', 'infrastructure_related', 'transport', 'buildings', 'electricity', 'tools', 'hospitals', 'shops', 'aid_centers', 'other_infrastructure', 'weather_related', 'floods', 'storm', 'fire', 'earthquake', 'cold', 'other_weather', 'direct_report']
Shape X: (26179, 2)
Shape y: (26179, 35)


Unnamed: 0_level_0,message,genre
id,Unnamed: 1_level_1,Unnamed: 2_level_1
2,Weather update - a cold front from Cuba that c...,direct
7,Is the Hurricane over or is it not over,direct
8,Looking for someone but no name,direct
9,UN reports Leogane 80-90 destroyed. Only Hospi...,direct
12,"says: west side of Haiti, rest of the country ...",direct


## Instantiate the class

In [5]:
focal_loss = FocalBinaryLoss(gamma=10)
focal_loss.gamma

10

## Split, train and predict model

In [8]:
model = build_model()
X_train, y_train, X_val, y_val, X_test, y_test = split_data(X, y, random_state=RANDOM_STATE)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Total records: X(26179, 2):y(26179, 35)
Train shape: X(17452, 2):y(17452, 35)
Validation shape: X(4363, 2):y(4363, 35)
Test shape: X(4364, 2):y(4364, 35)
Training shapes before augmentation: (17452, 2) (17452, 35)
Imbalanced labels: ['offer', 'security', 'clothing', 'missing_people', 'tools', 'hospitals', 'shops', 'aid_centers', 'fire']
Minority samples: (1395, 2) (1395, 35)
Training shapes after augmentation: (30815, 2) (30815, 35)
[ColumnTransformer] ....... (1 of 3) Processing one_hot, total=   0.0s
[ColumnTransformer] . (2 of 3) Processing starting_verb, total=  46.3s
[ColumnTransformer] ..... (3 of 3) Processing text_vect, total= 2.0min
[Pipeline] ...... (step 1 of 2) Processing preprocessor, total= 2.8min
[Pipeline] ............... (step 2 of 2) Processing clf, total= 4.0min


In [9]:
y_pred[:5]

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0.],
       [1., 1., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 1.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0.],
       [1., 1., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 1.],
       [1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0.]])

In [10]:
type(y_pred)

numpy.ndarray

In [11]:
type(y_test)

pandas.core.frame.DataFrame

## Calculating the focal loss for different gamma

In [25]:
for fl in [1, 2, 5, 6, 7, 8, 9, 10, 20]:
    focal_loss = FocalBinaryLoss(gamma=fl)  
    loss = focal_loss.focal_binary_cross_entropy(y_pred, np.array(y_test))
    print('Focal loss with gamma {} is {}'.format(fl, loss))

Focal loss with gamma 1 is 1488.270975843912
Focal loss with gamma 2 is 748.9352974434116
Focal loss with gamma 5 is 100.16223133317654
Focal loss with gamma 6 is 52.432590490146346
Focal loss with gamma 7 is 27.943814758788744
Focal loss with gamma 8 is 15.237098882081522
Focal loss with gamma 9 is 8.544089992423912
Focal loss with gamma 10 is 4.948833800331088
Focal loss with gamma 20 is 0.09612400133964835
