## Difference between Pipeline and make_pipeline?

**Pipeline requires naming of steps, make_pipeline does not.**

**Pipeline - List of (name, transform) tuples** <br>
eg: [('imputer', imp, ['Age'])]

**make_pipline - This is a shorthand for the Pipeline constructor; it does not require, and does not permit, naming the estimators. Instead, their names will be set to the lowercase of their types automatically.**<br>
eg: (imp, ['Age'])

**(Same applies to ColumnTransformer vs make_column_transformer)**

In [1]:
import pandas as pd
df = pd.read_csv('Datasets/08_3_titanic_train.csv', nrows=6) 
# nrows = 6 -- reads only the first 6 rows of dataset

In [2]:
cols = ['Embarked', 'Sex', 'Age', 'Fare']
X = df[cols]
X

Unnamed: 0,Embarked,Sex,Age,Fare
0,S,male,22.0,7.25
1,C,female,38.0,71.2833
2,S,female,26.0,7.925
3,S,female,35.0,53.1
4,S,male,35.0,8.05
5,Q,male,,8.4583


In [3]:
from sklearn.preprocessing import OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression

In [4]:
ohe = OneHotEncoder()
imp = SimpleImputer()
clf = LogisticRegression()

## Using make_pipeline

In [5]:
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline

In [6]:
ct = make_column_transformer(
    (ohe, ['Embarked', 'Sex']),
    (imp, ['Age']),
    remainder='passthrough')

In [7]:
pipe = make_pipeline(ct, clf)

## Using Pipeline

In [8]:
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

In [9]:
ct = ColumnTransformer(
    [('encoder', ohe, ['Embarked', 'Sex']),
     ('imputer', imp, ['Age'])],
    remainder='passthrough')

In [10]:
pipe = Pipeline([('preprocessor', ct), ('classifier', clf)])