In [1]:
# import libraries
from sqlalchemy import create_engine
import pandas as pd
import numpy as np

In [2]:
engine = create_engine(r'sqlite:///data/DisasterResponse.db')

## Results from initial Pipeline

In [10]:
pd.read_sql_table('Pipeline', engine)

Unnamed: 0,category,precision,recall,f1-score
0,related,0.79078,0.685067,0.713283
1,request,0.869552,0.742379,0.785334
2,offer,0.497925,0.5,0.49896
3,aid_related,0.778912,0.769242,0.77267
4,medical_help,0.76947,0.538462,0.551288
5,medical_products,0.903009,0.535614,0.554401
6,search_and_rescue,0.861613,0.516509,0.525054
7,security,0.49047,0.499922,0.495151
8,military,0.857756,0.519074,0.527846
9,child_alone,1.0,1.0,1.0


With a median f1-score of around 0.55 there is some room for improvement fpr the model. Let's see what the GridSearch could do for us.

## GridSearch results
### GridSearch.fit best parameters:

In [4]:
pd.read_sql_table('GsFit', engine)

Unnamed: 0,parameter,value
0,clf__estimator__class_weight,balanced
1,clf__estimator__min_samples_leaf,5
2,clf__estimator__n_estimators,200
3,vect__ngram_range,"(1, 1)"
4,median f1-score,0.709469701251744


In [5]:
pd.read_sql_table('GsPredict', engine)

Unnamed: 0,category,precision,recall,f1-score
0,related,0.748754,0.774629,0.759604
1,request,0.724643,0.820253,0.751283
2,offer,0.585954,0.627083,0.602431
3,aid_related,0.766564,0.774221,0.765932
4,medical_help,0.688796,0.787565,0.723234
5,medical_products,0.627432,0.777912,0.664978
6,search_and_rescue,0.613583,0.657207,0.631279
7,security,0.585864,0.57019,0.577133
8,military,0.692953,0.844166,0.743763
9,child_alone,1.0,1.0,1.0


With the above mentioned adjustments on the pipeline parameters we're able to considerably improve the median f1-score to around 0.69. For a real world project we wouldn't be satisfied with this result. We could add more parameters to the GridSeardch but due to a lack of tome and hardware resources we won't go for this route here. 

## Balanced Pipeline results

In [13]:
pd.read_sql_table('BalancedPipeline', engine)

Unnamed: 0,category,precision,recall,f1-score
0,related,0.657793,0.709873,0.561283
1,request,0.600383,0.617473,0.390044
2,offer,0.00292,0.5,0.005806
3,aid_related,0.763317,0.754245,0.735807
4,medical_help,0.544141,0.509004,0.098809
5,medical_products,0.525404,0.500971,0.050291
6,search_and_rescue,0.013601,0.5,0.026481
7,security,0.008606,0.5,0.016921
8,military,0.016674,0.5,0.032272
9,water,0.53118,0.505758,0.082664


Because we're dealing with an imbalanced dataset I decided to try the BalancedRandomForestClassifier. This classifier is suppossed to draw samples out of the different categories and balance them more evenly during the fitting process. We removed the category 'Child_alone' because not a single message in the dataset is classified with this tag. So in order to avoid errors and distraction for the model fitting we remove this column.
The results we get with the best parameters we found with the GridSearch are disapointing. The median f1-score is 0.05 so that we can't make any useful predictions with that model. We will work with the finetuned pipeline from the GridSearch.

## Conclusion
The imbalanced structure of this datset is making it hard to get acceptable f1-scores for the underrepresented categories. One way to fix this would be to treat each category seperatly. This means we have to create a dedicated train and test dataset for every single category and create a model from that data. We won't to that here an d work with the result we got from the GridSearch.