### Custom Transformers:

In scikit-learn, you can create custom transformers using either a class-based approach or a function-based approach. Each method has its advantages, and the choice often depends on your specific use case and preferences. Here’s a detailed comparison of both approaches:

### Class-Based Approach

**Overview:**
The class-based approach involves creating a custom transformer by defining a class that implements the required methods (usually `fit`, `transform`, and optionally `fit_transform`). This is the recommended method for building reusable and maintainable transformers.

**Advantages:**
- **Encapsulation:** You can encapsulate related functionality and state (e.g., parameters, learned statistics) in a single class.
- **Scikit-learn Integration:** Classes can easily be integrated into scikit-learn pipelines, allowing for consistent behavior across different transformers.
- **Reusability:** Once defined, you can create multiple instances of the class with different parameters.


### Function-Based Approach

**Overview:**
The function-based approach involves defining a standalone function that takes data as input and returns the transformed data. This approach is simpler and can be quicker to implement for one-off transformations or when you don’t need to maintain state.

**Advantages:**
- **Simplicity:** It’s easier and quicker to write for simple transformations.
- **Less Overhead:** No need to define a class structure, which can be beneficial for simple tasks.


### Comparison

| Aspect                    | Class-Based Approach                          | Function-Based Approach                       |
|---------------------------|----------------------------------------------|----------------------------------------------|
| **Structure**             | Encapsulated in a class with methods         | Simple function                                |
| **State Maintenance**     | Can maintain state (e.g., learned parameters) | No state is maintained                        |
| **Reusability**           | Easily reusable with different instances      | Reusable but typically for simpler tasks     |
| **Integration with Pipelines** | Directly integrates with scikit-learn pipelines | Requires additional wrappers for integration  |
| **Complexity**            | More complex, suitable for advanced use cases | Simpler, suitable for quick transformations   |

### Conclusion

- Use the **class-based approach** if you need a reusable, maintainable transformer that integrates seamlessly with scikit-learn’s functionality and requires state management.
- Use the **function-based approach** for simple, one-off transformations that don’t require the overhead of a class structure.


#### *Example OF class Approach* 

In [40]:
from sklearn.base import BaseEstimator, TransformerMixin
import numpy as np

class Robust_Scaler(BaseEstimator, TransformerMixin):
    """
    A robust scaler that scales features based on the interquartile range (IQR) 
    to make the model robust to outliers. This transformer scales data by removing 
    the median and dividing by the IQR (75th percentile - 25th percentile), making 
    it less sensitive to outliers compared to standard scaling techniques.

    Attributes
    ----------
    iqr : array-like, shape (n_features,)
        The interquartile range (IQR) for each feature, calculated as the 
        difference between the 75th and 25th percentile of the data.
    
    medians : array-like, shape (n_features,)
        The median for each feature, calculated during the fitting process.

    Methods
    -------
    fit(X, y=None)
        Computes the IQR and median for each feature in the dataset `X`.
    
    transform(X)
        Scales the features in the dataset `X` using the IQR and median values.
        Raises an error if the scaler has not been fitted yet.
    """

    def __init__(self):
        self.iqr = None
        self.medians = None

    def fit(self, X, y=None):
        """
        Compute the interquartile range (IQR) and median for each feature 
        in the dataset `X`.

        Parameters
        ----------
        X : array-like, shape (n_samples, n_features)
            The input data to compute the statistics.
        
        y : Ignored
            Not used, present here for consistency with scikit-learn API.
        
        Returns
        -------
        self : object
            Returns the instance of the transformer.
        """
        self.iqr = np.percentile(X, 75, axis=0) - np.percentile(X, 25, axis=0)
        self.medians = np.median(X, axis=0)

        # Handle division by zero for features with constant values
        self.iqr[self.iqr == 0] = 1
        return self

    def transform(self, X):
        """
        Scale the input data `X` using the IQR and medians calculated during fitting.
        The transformation is applied to each feature individually.

        Parameters
        ----------
        X : array-like, shape (n_samples, n_features)
            The input data to be transformed.

        Returns
        -------
        X_scaled : array-like, shape (n_samples, n_features)
            The transformed data with features scaled by the IQR.
        
        Raises
        ------
        RuntimeError
            If the `fit` method has not been called before `transform`.
        """
        # Ensure that fit has been called
        if self.medians is None or self.iqr is None:
            raise RuntimeError("Missing medians or IQR values. Please fit the transformer first.")
        
        # Apply IQR scaling to each feature
        return (X - self.medians) / self.iqr


#### *The formula of above class*
Robust scaling is used when the data contains outliers. It scales the data based on the median and the interquartile range (IQR), making it less sensitive to extreme values.

The formula for Robust Scaling is:

![Min-Max Scaling Formula](Robust%20Scaling.png)

```
X_scaled = (X - median(X)) / IQR
```

- `median(X)` is the median of the feature values.
- `IQR` is the interquartile range (difference between the 75th and 25th percentiles).

##### Example Usage of the Custom Transformer

In [41]:
from sklearn.datasets import make_blobs

# Generate synthetic data

X, _ = make_blobs(n_samples=100, n_features=2, centers=3, random_state=42)

# Create and fit the transformer
custom_tra = Robust_Scaler()
fit_mo = custom_tra.fit(X)

# Transform the data
X_transformed = fit_mo.transform(X)

In [33]:
custom_tra.medians

array([-2.60302683,  1.92268979])

In [36]:
simple = (X[0] - custom_tra.medians) / custom_tra.iqr

In [38]:
simple

array([-0.49872679, -0.71613207])

In [30]:
X[0]

array([-7.72642091, -8.39495682])

In [39]:
X_transformed[0]

array([-0.49872679, -0.71613207])

#### *Example OF Function Approach*

In [10]:
def custom_scaler(X, scale_factor=1):
    return X * scale_factor

In [11]:
from sklearn.preprocessing import FunctionTransformer
custom_tra = FunctionTransformer(custom_scaler, kw_args={'scale_factor': 2})

In [22]:
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=1, noise=0.1, random_state=42)

X_transformed = custom_tra.transform(X)

# Scale the transformed data
lin_reg = LinearRegression()

fit_mo = lin_reg.fit(X_transformed, y)
