## Model Tempate for Project Classifiers
This notebook can be used as template to roll your own classification models for the project. The template implements the classifier from the <a href="https://www.kaggle.com/philculliton/nlp-getting-started-tutorial">Getting Started Tutorial</a> related to this Kaggle challenge. Each model must implement the <a href="https://scikit-learn.org/stable/developers/develop.html">scikit-learn API</a>. The easiest way to do this is, is by inheriting from the base estimator and classifier mixin classes

In [1]:
from sklearn.base import BaseEstimator, ClassifierMixin

The API requires us to implement two functions in our custom model, namely `fit` and `predict`. We will usually also implement an `__init__` and `score` function (the default score function is the mean accuracy, which only makes sense for balanced classes).

In [2]:
import numpy as np
from sklearn.utils.validation import check_is_fitted
from sklearn import metrics, feature_extraction, linear_model

class TemplateClassifier(BaseEstimator, ClassifierMixin):
    
    def __init__(self):
        # Setup model parameters and instance attributes
        self.count_vectorizer = feature_extraction.text.CountVectorizer()
        self.clf = linear_model.RidgeClassifier()

    def score(self, X, y, sample_weight=None):
        # We use the F1 score to have the same evaluation metric as in the challenge
        return metrics.f1_score(y, self.predict(X), sample_weight=sample_weight)

Here comes the training method, it takes the feature matrix `X` in shape `(n_samples, n_features)` and the binary target vector `(n_samples,)`. The feature matrix has 3 columns (in order):

* `keyword`: The keyword feature of the tweets (may have null entries)
* `location`: The location faeture of the tweets (may have null entries)
* `text`: The actual tweet (non-null)

The target vector `y` consists of the two classes `[0, 1]`. Note that class `1` is slightly underrepresented (run notebook `exploratory_data_analysis` for first details on the data).

In [3]:
class TemplateClassifier(TemplateClassifier):
    def fit(self, X, y):
        # We implement model training by first applying all feature transformations
        X_trans = self.count_vectorizer.fit_transform(X[:, 2])
        # We then train the model with the vectorized feature matrix
        self.clf.fit(X_trans, y)
        return self

The `predict` method takes a feature matrix `X` in shape `(n_samples, n_features)` containing the same columns as described in the training function above (i.e. `keyword`, `location` and `text`). The function must return a vector of shape `(n_samples,)` containing the binary prediction `[0, 1]` for each sample in `X`. Some pitfalls are for example, that you don't apply the same feature transformation as in model training (forgotten, change of order, etc.). If you ahve a lot of feature transformations, consider using the pipeline model approach (see the notebook `template_model_pipeline` instead.

In [4]:
class TemplateClassifier(TemplateClassifier):
    def predict(self, X):
        # Perform some checks
        check_is_fitted(self.count_vectorizer)
        check_is_fitted(self.clf)
        # Dont forget to apply the same transformations that were used for training
        X_trans = self.count_vectorizer.transform(X[:, 2])
        # Compute and return the predictions
        return self.clf.predict(X_trans)

## Evaluation and Submission File
Once we have the model defined, we run our shared evaluation pipeline. The model performance is measured using stratified cross-validation (i.e. the training data is repeatedly split into training and test set until every data point was once used as test set). This type of evaluation might take a while depending on how fast your model trains and compute predictions. When finished, a bunch of performance metrics for your model are printed:

* <b>F1-Score</b>: The main metric which is used by the challenge to evaluate the model. The score combines recall and precision (see below for details) into a single score that takes class inbalance into account.
* <b>Accuracy</b>: A standard score used by many classifiers, but prone to misinterpretation with inbalanced class distributions.
* <b>Recall</b>: The ability of the model to detect tweets about real disasters (i.e. the probability that the model actually finds real disaster tweets).
* <b>Precision</b>: The ability of the model to correctly classify tweets about real disaster (i.e. the probability that tweets classified as real disaster tweets by the model are actually real disaster tweets).

The evaluation will also store the model and create a submission file for the challenge if the corresponding flags are set (all outsputs are stored in the `/models` directory). Stored files are labeled with a datetime stamp, followed by the model class name, the cross-calidation settings and the the F1 score it achieved in CV.
E.g. the submission file `submission_2020-12-05_224810_TemplateClassifier_1x5cv_0.73.csv` was created on 5.12.2020 at 22:45:10 for a `TemplateClassifier` model achieving an F1-Score of `0.73` using 1 run of 5-Fold Cross-Validation.

In [5]:
import evaluation

model = TemplateClassifier()
evaluation.evaluate(model, store_model=True, store_submission=True)

INFO:root:Loading training data from ../data/external/kaggle/train.csv...
INFO:root:-> Number of samples: 7613
INFO:root:-> Number of features: 3
INFO:root:Evaluating model with 1 experiment(s) of 10-fold Cross Validation...
INFO:root:Run 1/10 finished
INFO:root:Run 2/10 finished
INFO:root:Run 3/10 finished
INFO:root:Run 4/10 finished
INFO:root:Run 5/10 finished
INFO:root:Run 6/10 finished
INFO:root:Run 7/10 finished
INFO:root:Run 8/10 finished
INFO:root:Run 9/10 finished
INFO:root:Run 10/10 finished
INFO:root:---
INFO:root:Expected submission results (F1-Score): around 0.74
INFO:root:F1-Score: 1.00 (training); 0.74 (test)
INFO:root:Accuracy: 99.57% (training); 78.81% (test)
INFO:root:Recall: 99.36% (training); 69.67% (test)
INFO:root:Precision: 99.64% (training); 78.59% (test)
INFO:root:---
INFO:root:Retraining model on the complete data set...
INFO:root:-> F1-Score on complete training set: 0.99
INFO:root:-> Stored model to ../models/model_2020-12-06_124915_TemplateClassifier_1x10cv_

The results can be used to assess on what to improve on the model. Some basic guidelines:

* If there is a big discrepancy between training and test scores, your model might be too strong. Consider decreasing its power by tuning the parameters towards less complex models.
* If traing and test scores are close and low, your model might be too weak. Consider increasing its power by tuning the parameters towards more complex models.
* If there is a big discrepancy between recall and precision, you model might have issues with the class inbalance. Consider class balancing for preprocessing or weighting classes in model training.


The actual submission result is `0.78057`.