# What is function transformer in Machine Learning ?

# The function Transformer is a tool in scikit-learn, a popular Python library for machine learning, that allows you to apply a specified function to the input data. The Function Transformer can be useful for performing custom transformation of input data in a machine learning pipeline.

# The function Transformer takes as input in a single function that will be applied to each sample in the data. The function can be any Python function that takes a single argument, such as a lambda function or a user-defined function. The function should return the transformed sample.

In [4]:
from sklearn.preprocessing import FunctionTransformer
import numpy as np

# create a dataset
x = np.array([[1,2],[3,4]])

# define the transformation function
log_transform = FunctionTransformer(np.log1p)

# apply the transformation to the dataset
x_transformed = log_transform.transform(x)

# view the transformed data
print(x_transformed)

[[0.69314718 1.09861229]
 [1.38629436 1.60943791]]


# Types of function in Machine learning

# There are two types of Function Transformer available in scikit-learn:

# 1. FunctionTransformer
This transformer allows you to specify a single function that will be applied to the entire input data matrix. This transformer can be useful for feature scaling or feature extraction.
# 2. ColumnTransformer 
This transformer allows you to specify a different function for each column or subset of columns in the input data matrix. This transformer can be useful for applying different transformationa to different features in a dataset.

Both of these transformers are part of scikit-learn library in Python and can be used in a machine learning pipeline to preprocess data before training a model.

# For which condition I have to use function transformer in machine learning ?

#  We might consider using a FunctionTransformer in a machine learning pipeline in the following situations:

Custom feature engineering: If you want to engineer new features using a custom function, you can use a FunctionTransformer to apply the function to the input data matrix and create new features based on the output.

Scaling and normalization: If you want to scale or normalize the input data matrix in a custom way, you can use a FunctionTransformer to apply a custom scaling or normalization function.

Data cleaning: If you want to clean the input data matrix by removing outliers, imputing missing values, or replacing certain values, you can use a FunctionTransformer to apply a custom cleaning function.

Dimensionality reduction: If you want to reduce the dimensionality of the input data matrix by selecting a subset of features or by applying a dimensionality reduction technique such as PCA, you can use a FunctionTransformer to apply the custom function.

In general, a FunctionTransformer can be useful for any situation in which you want to apply a custom function to the input data matrix before training a machine learning model.

# Practical Usecases

# 1. Custom Feature Engineering

In [6]:
from sklearn.preprocessing import FunctionTransformer
import numpy as np

# create a dataset
x = np.array([[1,2],[3,4]])

# define the transformation function
def my_feature_engineering(x):
    return np.hstack((x,x**2))

# create a FunctionTransformer to apply the custom function
custom_transformer = FunctionTransformer(my_feature_engineering)

# apply the transformer to the input data
x_transformed = custom_transformer.transform(x)

# view the transformed data
print(x_transformed)

[[ 1  2  1  4]
 [ 3  4  9 16]]


# 2. Scaling And Normalization

In [7]:
from sklearn.preprocessing import FunctionTransformer
import numpy as np

# create a dataset
x = np.array([[1,2],[3,4]])

# define the custom scaling function
def my_scaling(x):
    return x/np.max(x)

# create a FunctionTransformer to apply the custom function
custom_transformer = FunctionTransformer(my_scaling)

# apply the transformer to the input data
x_transformed = custom_transformer.transform(x)

# view the transformed data
print(x_transformed)

[[0.25 0.5 ]
 [0.75 1.  ]]


# 3. Data Cleaning

In [8]:
from sklearn.preprocessing import FunctionTransformer
import numpy as np

# create a dataset with missing values
x = np.array([[1,2],[3,np.nan]])

# define the custom cleaning function
def my_cleaning(x):
    x[np.isnan(x)]=0
    return x

# create a FunctionTransformer to apply the custom function
custom_transformer = FunctionTransformer(my_cleaning)

# apply the transformer to the input data
x_transformed = custom_transformer.transform(x)

# view the transformed data
print(x_transformed)

[[1. 2.]
 [3. 0.]]


# 4. Dimensionality reduction 

In [1]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import FunctionTransformer
import numpy as np

# create a dataset
X = np.array([[1, 2], [3, 4]])

# define a custom PCA function
def my_pca(X):
    pca = PCA(n_components=1)
    X_pca = pca.fit_transform(X)
    return X_pca

# create a FunctionTransformer to apply the custom function
custom_transformer = FunctionTransformer(my_pca)

# apply the transformer to the input data
X_transformed = custom_transformer.transform(X)

# view the transformed data
print(X_transformed)

[[-1.41421356]
 [ 1.41421356]]


# Real Life Use-Case of Function Transformer 


There are many real-life use cases where FunctionTransformer can be useful in machine learning pipelines. Here are a few examples:

1. Image processing: In computer vision applications, FunctionTransformer can be used to apply custom functions to preprocess image data. For example, a custom function can be used to resize images, change the color balance, or apply filters to improve image quality.

2. Natural language processing: In NLP applications, FunctionTransformer can be used to preprocess text data by applying custom functions to perform tasks such as tokenization, stemming, or removing stop words.

3. Financial modeling: In finance, FunctionTransformer can be used to preprocess financial data by applying custom functions to transform the data, such as scaling stock prices, normalizing financial ratios, or imputing missing values.

4. Audio signal processing: In speech recognition or music analysis applications, FunctionTransformer can be used to preprocess audio data by applying custom functions to perform tasks such as filtering noise, extracting features such as MFCCs (Mel-frequency cepstral coefficients), or resampling the audio signal.

5. Sensor data processing: In Internet of Things (IoT) applications, FunctionTransformer can be used to preprocess sensor data by applying custom functions to remove outliers, impute missing values, or rescale sensor readings.

In [9]:
import numpy as np
import pandas as pd

In [10]:
df = pd.read_csv("C:\\Users\\91636\\OneDrive\\Desktop\\Regex ML\\Data\\covid_toy.csv")

In [11]:
df.head(2)

Unnamed: 0,age,gender,fever,cough,city,has_covid
0,60,Male,103.0,Mild,Kolkata,No
1,27,Male,100.0,Mild,Delhi,Yes


In [12]:
from sklearn.preprocessing import LabelEncoder

In [17]:
lb = LabelEncoder()

In [18]:
df['gender'] = lb.fit_transform(df['gender'])
df['cough'] = lb.fit_transform(df['cough'])
df['city'] = lb.fit_transform(df['city'])
df['has_covid'] = lb.fit_transform(df['has_covid'])

In [23]:
df.head()

Unnamed: 0,age,gender,fever,cough,city,has_covid
0,60,1,103.0,0,2,0
1,27,1,100.0,0,1,1
2,42,1,101.0,0,1,0
3,31,0,98.0,0,2,0
4,65,0,101.0,0,3,0


In [19]:
from sklearn.preprocessing import FunctionTransformer

In [20]:
trf = FunctionTransformer(func = np.log1p)

In [21]:
new_df = trf.fit_transform(df)

In [22]:
new_df

Unnamed: 0,age,gender,fever,cough,city,has_covid
0,4.110874,0.693147,4.644391,0.000000,1.098612,0.000000
1,3.332205,0.693147,4.615121,0.000000,0.693147,0.693147
2,3.761200,0.693147,4.624973,0.000000,0.693147,0.000000
3,3.465736,0.000000,4.595120,0.000000,1.098612,0.000000
4,4.189655,0.000000,4.624973,0.000000,1.386294,0.000000
...,...,...,...,...,...,...
95,2.564949,0.000000,4.653960,0.000000,0.000000,0.000000
96,3.951244,0.000000,4.624973,0.693147,1.098612,0.693147
97,3.044522,0.000000,4.624973,0.000000,0.000000,0.000000
98,1.791759,0.000000,4.595120,0.693147,1.386294,0.000000


In [26]:
df = pd.read_csv("C:\\Users\\91636\\OneDrive\\Desktop\\Regex ML\\Data\\tips.csv")

In [27]:
df.head(2)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3


In [28]:
from sklearn.preprocessing import LabelEncoder

In [29]:
lb = LabelEncoder()

In [30]:
df['sex'] = lb.fit_transform(df['sex'])
df['smoker'] = lb.fit_transform(df['smoker'])
df['day'] = lb.fit_transform(df['day'])
df['time'] = lb.fit_transform(df['time'])

In [31]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,0,0,2,0,2
1,10.34,1.66,1,0,2,0,3
2,21.01,3.5,1,0,2,0,3
3,23.68,3.31,1,0,2,0,2
4,24.59,3.61,0,0,2,0,4


In [32]:
from sklearn.preprocessing import FunctionTransformer

In [33]:
tf = FunctionTransformer(func = np.log1p)

In [34]:
new_df = tf.fit_transform(df)

In [35]:
new_df

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,2.889816,0.698135,0.000000,0.000000,1.098612,0.0,1.098612
1,2.428336,0.978326,0.693147,0.000000,1.098612,0.0,1.386294
2,3.091497,1.504077,0.693147,0.000000,1.098612,0.0,1.386294
3,3.205993,1.460938,0.693147,0.000000,1.098612,0.0,1.098612
4,3.242202,1.528228,0.000000,0.000000,1.098612,0.0,1.609438
...,...,...,...,...,...,...,...
239,3.402197,1.934416,0.693147,0.000000,0.693147,0.0,1.386294
240,3.338613,1.098612,0.000000,0.693147,0.693147,0.0,1.098612
241,3.164208,1.098612,0.693147,0.693147,0.693147,0.0,1.098612
242,2.934920,1.011601,0.693147,0.000000,0.693147,0.0,1.098612


In [None]:
click