## Using Pipelines
As the evaluation function takes scikit-learn compatible estimators, it is possible to use scikits <a href="https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html">pipelines</a> to create models in an easy to use and concise way. A pipeline chains feature transformers with an estimator at the end. In the following, we evaluate a support vector machine with linear kernel chaining a custom column-selector, a `CountVectorizer` and a `MaxAbsScaler` transformer as preprocessing steps in the form of such a pipeline model.

In [2]:
from sklearn.pipeline import Pipeline
from sklearn import preprocessing, feature_extraction, linear_model
from sklearn import svm
import evaluation

# Setup model as transformer pipeline with logistic regression
model = Pipeline([
    # Extract the `text` feature
    ('col-selector', preprocessing.FunctionTransformer(func=lambda X: X[:, 2])),
    # Vectorize the text
    ('vectorizer', feature_extraction.text.CountVectorizer()),
    # Scale data to maximum absolute value of 1 and keep sparsity properties
    ('scaler', preprocessing.MaxAbsScaler()),
    # Classify data with a linear SVM
    ('clf', svm.LinearSVC(C=1e-2, class_weight='balanced', random_state=42))
])

# Evaluate model pipeline
_, _, _ = evaluation.evaluate(model, store_model=True, store_submission=True)

INFO:root:Loading training data from ../data/external/kaggle/train.csv...
INFO:root:-> Number of samples: 7613
INFO:root:-> Number of features: 3
INFO:root:Evaluating model with 1 experiment(s) of 10-fold Cross Validation...
INFO:root:Run 1/10 finished
INFO:root:Run 2/10 finished
INFO:root:Run 3/10 finished
INFO:root:Run 4/10 finished
INFO:root:Run 5/10 finished
INFO:root:Run 6/10 finished
INFO:root:Run 7/10 finished
INFO:root:Run 8/10 finished
INFO:root:Run 9/10 finished
INFO:root:Run 10/10 finished
INFO:root:---
INFO:root:Expected submission results (F1-Score): around 0.75
INFO:root:F1-Score: 0.86 (training); 0.75 (test)
INFO:root:Accuracy: 88.18% (training); 79.36% (test)
INFO:root:Recall: 81.99% (training); 71.69% (test)
INFO:root:Precision: 89.63% (training); 78.43% (test)
INFO:root:---
INFO:root:Retraining model on the complete data set...
INFO:root:-> F1-Score on complete training set: 0.85
INFO:root:-> Stored model to ../models/model_2021-01-23_221250_Pipeline_1x10cv_0.75.pck
I

The actual submission result is `0.79160`.