# Pipelines in Machine Learning using sklearn

Trainer : - Rajesh Jakhotia

- Pipeline allows you to sequentially apply a list of transformers to preprocess the data and, if desired, conclude the sequence with a final predictor for predictive modeling.
Ref:- https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html

### Import Packages

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import scale 
from sklearn.linear_model import LogisticRegression

### Load the Dataset

In [2]:
#Load the Dataset
dev = pd.read_csv("DEV_SAMPLE.csv")
holdout = pd.read_csv("HOLDOUT_SAMPLE.csv")

print( len(dev),  len(holdout))

14000 6000


### Variable Transformation

In [3]:
dev["Balance_Standardized"] = scale(dev["Balance"]) 
print(f"Mean : {round(dev["Balance_Standardized"].mean(),2)}")
print(f"Standard Deviation : {round(dev["Balance_Standardized"].std(),2)}")

Mean : 0.0
Standard Deviation : 1.0


### Build Logistic Regression Model

In [4]:
X = pd.DataFrame(dev.loc[:, "Balance_Standardized"])
y = dev["Target"]

In [5]:
## Running one variable Logistic Regression
mylogit = LogisticRegression(random_state=0).fit(X, y)

## Apply Model on Hold-Out sample for prediction

#### Variable Transformation : Standardization step on hold-out data

In [6]:
holdout["Balance_Standardized"] = scale(holdout["Balance"]) 
X_ho = pd.DataFrame(holdout.loc[:, "Balance_Standardized"])
X_ho.head()

Unnamed: 0,Balance_Standardized
0,-0.477471
1,0.013377
2,-0.474415
3,1.296725
4,0.790488


In [7]:
y_ho_pred = mylogit.predict_proba(X_ho)
y_ho_pred

array([[0.90187237, 0.09812763],
       [0.91503437, 0.08496563],
       [0.90195969, 0.09804031],
       ...,
       [0.90730505, 0.09269495],
       [0.89545464, 0.10454536],
       [0.89876758, 0.10123242]])

### What is wrong in the above step?

## Application of Model on a new record

In [8]:
df = holdout.iloc[0, 0:8]
df

Cust_ID            C12935
Target                  0
Age                    26
Gender                  M
Balance          67291.63
Occupation           SENP
No_OF_CR_TXNS           6
AGE_BKT             26-30
Name: 0, dtype: object

### How will you apply the model on the above record?

# Let's Apply Pipelines

In [9]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

In [10]:
steps = [("standard_scaler", StandardScaler()),
         ("classifier", LogisticRegression())
        ]

In [11]:
pipe = Pipeline(steps)

## Visualize the Pipeline

In [12]:
from sklearn import set_config
set_config(display="diagram")
pipe

In [13]:
pipe.fit(dev.loc[:,["Balance"]],y)

In [14]:
y_pred = pipe.predict_proba(holdout.loc[:,["Balance"]])
y_pred

array([[0.90239776, 0.09760224],
       [0.91562647, 0.08437353],
       [0.90248558, 0.09751442],
       ...,
       [0.90786047, 0.09213953],
       [0.89594042, 0.10405958],
       [0.89927435, 0.10072565]])

## What if I just want to do the transformation?

In [15]:
holdout["Balance_Standardized"] = pipe['standard_scaler'].transform(holdout.loc[:,["Balance"]])


In [16]:
pipe.named_steps['standard_scaler'].transform(holdout.loc[:,["Balance"]])

array([[-0.45904374],
       [ 0.03703499],
       [-0.4559548 ],
       ...,
       [-0.26200045],
       [-0.67966189],
       [-0.56732437]])

## Thank You