## 🔷 Scikit-learn Design Philosophy

Scikit-learn follows a unified, modular, object-oriented API design, built around the idea of estimators, transformers, and predictors.

---

### 🔹 1. Estimator Interface
An estimator is any object that learns from data. It implements:

 - .fit(X, y) → learns patterns or parameters from data.

Examples:
- LinearRegression()

- KMeans()

- StandardScaler()


In [2]:
# model = LinearRegression()
# model.fit(X_train, y_train)

## 🔹 2. Transformer Interface
A transformer is a type of estimator that also implements:

- .transform(X) → transforms the input data.

- Often used in data preprocessing.

Some also have:

- .fit_transform(X, y=None) → a shortcut for fit() followed by transform().

Examples:
- StandardScaler()

- OneHotEncoder()

- PCA()

In [None]:
# scaler = StandardScaler()
# X_scaled = scaler.fit_transform(X)


## 🔹 3. Predictor Interface
A predictor is a type of estimator that can make predictions. It implements:

- .predict(X) → predicts outputs from input data.

Some also implement:

- .score(X, y) → returns a metric (like accuracy).

Examples:
- LogisticRegression()

- SVC()

- DecisionTreeClassifier()

In [3]:
# clf = DecisionTreeClassifier()
# clf.fit(X_train, y_train)
# y_pred = clf.predict(X_test)


## 🔹 4. Pipelines
A Pipeline chains transformers and estimators into a single step.

In [5]:
# from sklearn.pipeline import Pipeline

# pipe = Pipeline([
#     ('scaler', StandardScaler()),
#     ('model', LogisticRegression())
# ])

# pipe.fit(X_train, y_train)
# y_pred = pipe.predict(X_test)


           +-----------------+
           |  Estimator      | <--- all models, transformers
           +-----------------+
            |   |        |
        fit()  transform()  predict()
         |       |           |
         ↓       ↓           ↓
      Learns   Transforms   Predicts


In [6]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Sample dataset
data = {
    'Age': [25, 45, 35, 33, 22],
    'Salary': [50000, 80000, 60000, 58000, 52000],
    'Gender': ['Male', 'Female', 'Female', 'Male', 'Male'],
    'Purchased': [0, 1, 0, 1, 0]
}

df = pd.DataFrame(data)

# Features and label
X = df[['Age', 'Salary', 'Gender']]
y = df['Purchased']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define preprocessing:
# - StandardScaler for numerical
# - OneHotEncoder for categorical
numeric_features = ['Age', 'Salary']
categorical_features = ['Gender']

preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numeric_features),
        ('cat', OneHotEncoder(), categorical_features)
    ]
)

# Define pipeline with preprocessing + model
pipeline = Pipeline([
    ('preprocessing', preprocessor),
    ('classifier', LogisticRegression())
])

# Fit the pipeline
pipeline.fit(X_train, y_train)

# Predict and evaluate
y_pred = pipeline.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))


Accuracy: 1.0

Classification Report:
               precision    recall  f1-score   support

           1       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1

