<a href="https://colab.research.google.com/github/dr-mushtaq/Machine-Learning/blob/master/Pipelines_in_scikit_learn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


#<p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**Creating a Pipeline in scikit-learn**</p>

## **Example 1**

**Step 1: Import the Necessary Libraries**

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score

**Step 2: Load and Split the Data**

In [None]:
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

**Step 3: Define the Pipeline**

In [None]:
# Define the pipeline
pipeline = Pipeline([
 ('scaler', StandardScaler()),
 ('pca', PCA(n_components=2)),
 ('classifier', LogisticRegression())
])

**Step 4: Train the Pipeline**

In [None]:
# Train the pipeline
pipeline.fit(X_train, y_train)

**Step 5: Make Predictions and Evaluate the Model**

In [None]:
# Make predictions on the test set
y_pred = pipeline.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

Accuracy: 90.00%


**Step 6:Hyperparameter Tuning with Pipelines**

**Hyperparameter Tuning with GridSearchCV**

In [None]:
from sklearn.model_selection import GridSearchCV
# Define the parameter grid
param_grid = {
'pca__n_components': [2, 3],
'classifier__C': [0.1, 1, 10]
}
# Create a GridSearchCV object
grid_search = GridSearchCV(pipeline, param_grid, cv=5)
# Perform grid search
grid_search.fit(X_train, y_train)
# Best parameters and score
print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best Cross-Validation Score: {grid_search.best_score_:.2f}")

Best Parameters: {'classifier__C': 1, 'pca__n_components': 3}
Best Cross-Validation Score: 0.96


#<p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 1px; color:#207d06; font-size:100%; text-align:left;padding: 0px; border-bottom: 3px solid #207d06;">**SK Factor**</p>

**1-Install SK Factor:**

In [None]:
!pip install git+https://github.com/B2F/sk-factor.git



In [7]:
# Install SK Factor and restart the kernel
!pip install git+https://github.com/B2F/sk-factor.git

# This comment is important: After running this cell, please restart the kernel
# (Kernel -> Restart) to make the installed package available to your notebook.

#Import the sys module to interact with the Python runtime environment.
import sys

# Import SK Factor's pipeline builder and scikit-learn's dataset
from skfactor.pipeline_builder import PipelineBuilder # This import will now work as skfactor is installed in the same cell.
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

Collecting git+https://github.com/B2F/sk-factor.git
  Cloning https://github.com/B2F/sk-factor.git to /tmp/pip-req-build-vrbzyz51
  Running command git clone --filter=blob:none --quiet https://github.com/B2F/sk-factor.git /tmp/pip-req-build-vrbzyz51
  Resolved https://github.com/B2F/sk-factor.git to commit 326a426732cb1be0e6b4a6438d608f8446f4a110
[31mERROR: git+https://github.com/B2F/sk-factor.git does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.[0m[31m
[0m

ModuleNotFoundError: No module named 'skfactor'

In [None]:
# Build a simple pipeline using SK Factor for classification
pipeline = PipelineBuilder(task='classification') \
            .with_data(X, y) \
            .with_model('decision_tree') \
            .build()

# Train the model
pipeline.train()

# Evaluate the model
pipeline.evaluate()