# Practical Pipeline with Model Evaluation

In real-world machine learning, we combine **data preprocessing, model training, hyperparameter tuning, and evaluation** into a single pipeline.

## Why Use Pipelines?
- Keeps code **clean and modular**.
- Ensures that preprocessing (e.g., scaling, encoding) is applied consistently during training and testing.
- Integrates with **cross-validation** and hyperparameter tuning.


In [None]:
# Example: Pipeline with StandardScaler + Logistic Regression + Cross-Validation
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
import numpy as np

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('model', LogisticRegression(max_iter=500))
])

# Cross-validation evaluation
scores = cross_val_score(pipeline, X_train, y_train, cv=5, scoring='accuracy')
print("Cross-Validation Accuracy: %.3f ± %.3f" % (np.mean(scores), np.std(scores)))

# Train and evaluate on test set
pipeline.fit(X_train, y_train)
print("Test Accuracy:", pipeline.score(X_test, y_test))

## Key Takeaways
- Pipelines streamline preprocessing + modeling.
- Work seamlessly with **cross-validation** and **hyperparameter tuning**.
- Helps prevent data leakage (applying preprocessing separately on train/test).

Always use pipelines in production ML workflows.