## Using Pipelines
As the evaluation function takes scikit-learn compatible estimators, it is possible to use scikits <a href="https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html">pipelines</a> to create models in an easy to use and concise way. A pipeline chains feature transformers with an estimator at the end. In the following, we evaluate a support vector machine with linear kernel chaining a custom column-selector, a `CountVectorizer` and a `MaxAbsScaler` transformer as preprocessing steps in the form of such a pipeline model.

In [7]:
from sklearn.pipeline import Pipeline
from sklearn import preprocessing, base, svm, linear_model
from sklearn.feature_extraction.text import TfidfVectorizer

import evaluation

# Setup model as transformer pipeline with logistic regression
model = Pipeline([
    # Extract the `text` feature
    ('col-selector', preprocessing.FunctionTransformer(func=lambda X: X[:, 2])),
    #TF-IDF Vectorizer
    ('tfidf', TfidfVectorizer()),
    # Classify data with a linear SVM
    ('clf', svm.LinearSVC(C=1e-2, class_weight='balanced', random_state=42))
])

# Evaluate model pipeline
evaluation.evaluate(model, store_model=False, store_submission=False)

INFO:root:Loading training data from ../data/external/kaggle/train.csv...
INFO:root:-> Number of samples: 7613
INFO:root:-> Number of features: 3
INFO:root:Evaluating model with 1 experiment(s) of 10-fold Cross Validation...
INFO:root:Run 1/10 finished
INFO:root:Run 2/10 finished
INFO:root:Run 3/10 finished
INFO:root:Run 4/10 finished
INFO:root:Run 5/10 finished
INFO:root:Run 6/10 finished
INFO:root:Run 7/10 finished
INFO:root:Run 8/10 finished
INFO:root:Run 9/10 finished
INFO:root:Run 10/10 finished
INFO:root:---
INFO:root:Expected submission results (F1-Score): around 0.71
INFO:root:F1-Score: 0.75 (training); 0.71 (test)
INFO:root:Accuracy: 78.44% (training); 74.45% (test)
INFO:root:Recall: 75.38% (training); 73.43% (test)
INFO:root:Precision: 74.68% (training); 69.06% (test)
INFO:root:Evaluation finished.


C=1e-2:
INFO:root:Expected submission results (F1-Score): around 0.71
INFO:root:F1-Score: 0.75 (training); 0.71 (test)
INFO:root:Accuracy: 78.44% (training); 74.45% (test)
INFO:root:Recall: 75.38% (training); 73.43% (test)
INFO:root:Precision: 74.68% (training); 69.06% (test)
INFO:root:Evaluation finished.


In [8]:
from sklearn.pipeline import Pipeline
from sklearn import preprocessing, base, svm, linear_model
from sklearn.feature_extraction.text import TfidfVectorizer

import evaluation

# Setup model as transformer pipeline with logistic regression
model = Pipeline([
    # Extract the `text` feature
    ('col-selector', preprocessing.FunctionTransformer(func=lambda X: X[:, 2])),
    #TF-IDF Vectorizer
    ('tfidf', TfidfVectorizer()),
    # Classify data with a linear SVM
    ('clf', svm.LinearSVC(C=0.5, class_weight='balanced', random_state=42))
])

# Evaluate model pipeline
evaluation.evaluate(model, store_model=False, store_submission=False)

INFO:root:Loading training data from ../data/external/kaggle/train.csv...
INFO:root:-> Number of samples: 7613
INFO:root:-> Number of features: 3
INFO:root:Evaluating model with 1 experiment(s) of 10-fold Cross Validation...
INFO:root:Run 1/10 finished
INFO:root:Run 2/10 finished
INFO:root:Run 3/10 finished
INFO:root:Run 4/10 finished
INFO:root:Run 5/10 finished
INFO:root:Run 6/10 finished
INFO:root:Run 7/10 finished
INFO:root:Run 8/10 finished
INFO:root:Run 9/10 finished
INFO:root:Run 10/10 finished
INFO:root:---
INFO:root:Expected submission results (F1-Score): around 0.76
INFO:root:F1-Score: 0.96 (training); 0.76 (test)
INFO:root:Accuracy: 96.91% (training); 79.06% (test)
INFO:root:Recall: 95.69% (training); 75.11% (test)
INFO:root:Precision: 97.09% (training); 75.90% (test)
INFO:root:Evaluation finished.


In [None]:
C=0.5:
INFO:root:Expected submission results (F1-Score): around 0.76
INFO:root:F1-Score: 0.96 (training); 0.76 (test)
INFO:root:Accuracy: 96.91% (training); 79.06% (test)
INFO:root:Recall: 95.69% (training); 75.11% (test)
INFO:root:Precision: 97.09% (training); 75.90% (test)