***want to do feature engineering within a ColumnTransformer or Pipeline?***
1. *Select an existing function (or write your own)*
2. *Convert it into a trandformer using FunctionTransformer*
3. *yayy hoggya!*

***` FunctionTransformer: `***
<br>***FunctionTransformer** is a **wrapper** that lets you use **any Python function** inside an sklearn pipeline.*

*why needed?*
>*sklearn pipelines only accept transformers with `fit()` and `transform()` methods.
**`FunctionTransformer`** turns a normal function into a transformer.*

In [14]:
import pandas as pd

df = pd.DataFrame({
    "Fare": [200, 300, 50, 120, 800],
    "Code": ["X12", "C102", "Z7", "A200", "B15"],
    "Deck": ["A101", "C102", "Z7", "A200", "B15"]
})

In [15]:
from sklearn.preprocessing import FunctionTransformer

***Convert existing function into a Transformer:***

In [16]:
import numpy as np

In [None]:
#Cap Fare values between 100 and 600 so extreme outliers don’t dominate the model.

clip_values = FunctionTransformer(
    np.clip,
    kw_args={'a_min':100, 'a_max':600}
    )

#[50, 200, 800] → [100, 200, 600]

***Convert custom function into a transformer:***

In [18]:
#extract the first letter from each string
def first_letter(df):
    return df.apply(lambda x: x.str.slice(0,1))

In [19]:
get_first_letter = FunctionTransformer(first_letter)
get_first_letter

0,1,2
,"func  func: callable, default=None The callable to use for the transformation. This will be passed the same arguments as transform, with args and kwargs forwarded. If func is None, then func will be the identity function.",<function fir...001A208B81DA0>
,"inverse_func  inverse_func: callable, default=None The callable to use for the inverse transformation. This will be passed the same arguments as inverse transform, with args and kwargs forwarded. If inverse_func is None, then inverse_func will be the identity function.",
,"validate  validate: bool, default=False Indicate that the input X array should be checked before calling ``func``. The possibilities are: - If False, there is no input validation. - If True, then X will be converted to a 2-dimensional NumPy array or  sparse matrix. If the conversion is not possible an exception is  raised. .. versionchanged:: 0.22  The default of ``validate`` changed from True to False.",False
,"accept_sparse  accept_sparse: bool, default=False Indicate that func accepts a sparse matrix as input. If validate is False, this has no effect. Otherwise, if accept_sparse is false, sparse matrix inputs will cause an exception to be raised.",False
,"check_inverse  check_inverse: bool, default=True Whether to check that or ``func`` followed by ``inverse_func`` leads to the original inputs. It can be used for a sanity check, raising a warning when the condition is not fulfilled. .. versionadded:: 0.20",True
,"feature_names_out  feature_names_out: callable, 'one-to-one' or None, default=None Determines the list of feature names that will be returned by the `get_feature_names_out` method. If it is 'one-to-one', then the output feature names will be equal to the input feature names. If it is a callable, then it must take two positional arguments: this `FunctionTransformer` (`self`) and an array-like of input feature names (`input_features`). It must return an array-like of output feature names. The `get_feature_names_out` method is only defined if `feature_names_out` is not None. See ``get_feature_names_out`` for more details. .. versionadded:: 1.1",
,"kw_args  kw_args: dict, default=None Dictionary of additional keyword arguments to pass to func. .. versionadded:: 0.18",
,"inv_kw_args  inv_kw_args: dict, default=None Dictionary of additional keyword arguments to pass to inverse_func. .. versionadded:: 0.18",


***include them in a ColumnTransformer:***

In [21]:
from sklearn.compose import make_column_transformer

In [23]:
ct = make_column_transformer(
    (clip_values, ['Fare']),
    (get_first_letter, ['Code', 'Deck'])
)

***Apply the transformations:***

In [24]:
df

Unnamed: 0,Fare,Code,Deck
0,200,X12,A101
1,300,C102,C102
2,50,Z7,Z7
3,120,A200,A200
4,800,B15,B15


In [25]:
ct.fit_transform(df)

array([[200, 'X', 'A'],
       [300, 'C', 'C'],
       [100, 'Z', 'Z'],
       [120, 'A', 'A'],
       [600, 'B', 'B']], dtype=object)

***Using FunctionTransformer means: `Wrap my function so sklearn can use it.`***<br>
*✅ same logic during train & test*<br>
*✅ no data leakage*<br>
*✅ production-ready*