<a href="https://colab.research.google.com/github/cleysonl/ML_Bootcamp_CLL/blob/master/Pipelines.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Design Principles**

The Scikit-Learn API paper outlines its design principles as:

*   **Consistency:** All objects share a common interface drawn from a limited set of methods, with consistent documentation.
*   **Inspection:** All specified parameter values are exposed as public atributes.
*   **Limited object hierarchy:** Only algorithms are represented by Pyton classes; datasets are represented in standard formats (Numpy arrays, DataFrames, SciPy sparse matrices) and parameter names use standard Python strings.
*   **Composition:** Many machine learning tasks can be expressed as sequences of more fundamental algorithms, and Scikit-learn makes use of this wherever possible.
*   **Sensible defaults:** When models require user-specified parameters, the library defines an appropiate default value.









### **Hands-on with Pipelines**

In [0]:
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

In [0]:
iris = load_iris()
X, y = iris.data, iris.target

#train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True, random_state=42, stratify=y)

In [0]:
# prepare pipeline
pipeline = make_pipeline(StandardScaler(), LogisticRegression(multi_class='ovr', solver='lbfgs'))

In [7]:
pipeline.fit(X_train, y_train)

Pipeline(memory=None,
         steps=[('standardscaler',
                 StandardScaler(copy=True, with_mean=True, with_std=True)),
                ('logisticregression',
                 LogisticRegression(C=1.0, class_weight=None, dual=False,
                                    fit_intercept=True, intercept_scaling=1,
                                    l1_ratio=None, max_iter=100,
                                    multi_class='ovr', n_jobs=None,
                                    penalty='l2', random_state=None,
                                    solver='lbfgs', tol=0.0001, verbose=0,
                                    warm_start=False))],
         verbose=False)

In [0]:
y_pred = pipeline.predict(X_test)

In [9]:
from sklearn.metrics import classification_report, accuracy_score, precision_score, recall_score
from sklearn.metrics import f1_score, roc_auc_score
import pandas as pd

cr = classification_report(y_test, y_pred)
print(cr)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       0.89      0.80      0.84        10
           2       0.82      0.90      0.86        10

    accuracy                           0.90        30
   macro avg       0.90      0.90      0.90        30
weighted avg       0.90      0.90      0.90        30



In [10]:
score_df = pd.DataFrame({'accuracy': accuracy_score(y_test, y_pred),
                         'precision': precision_score(y_test, y_pred,average='weighted'),
                         'recall': recall_score(y_test, y_pred,average='weighted'),
                         'f1': f1_score(y_test, y_pred,average='weighted')},
                         index=pd.Index([0]))

score_df

Unnamed: 0,accuracy,precision,recall,f1
0,0.9,0.902357,0.9,0.899749
