# Feature Transformation with FunctionTransformer | [Link](https://github.com/AdilShamim8/50-Days-of-Machine-Learning/tree/main/Day%2014%20Function%20Transformer)

In feature engineering, transforming your data can help meet the assumptions of many machine learning algorithms. The **FunctionTransformer** from scikit‑learn provides an easy way to apply any custom function to your data as part of a preprocessing pipeline.

Below, we detail three types of transformations: Log Transform, Reciprocal Transform, and Square Root Transform.

---

## 1. Log Transform

### Formula

For a feature x, the log transformation is defined as:  

$$  
x' = \log(x + c)  
$$  

- x is the original value.  
- c is a constant (often 1) added to avoid taking the logarithm of zero.  
- x' is the transformed value.  

### When to Use

- **Reduce right-skewness** in the data.
- **Stabilize variance** across different values.

### Python Code Example

```python
import numpy as np
import pandas as pd
from sklearn.preprocessing import FunctionTransformer
import matplotlib.pyplot as plt

# Create a sample dataset
df = pd.DataFrame({
    'Value': [0, 1, 10, 100, 1000]
})

# Define the log transformation using np.log1p (computes log(1+x))
log_transformer = FunctionTransformer(np.log1p, validate=True)

# Apply the log transformation
df['Log_Transformed'] = log_transformer.transform(df[['Value']])

print("Log Transform:\n", df)

# Visualize the transformation
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(df['Value'], marker='o')
plt.title("Original Data")
plt.xlabel("Index")
plt.ylabel("Value")

plt.subplot(1, 2, 2)
plt.plot(df['Log_Transformed'], marker='o', color='green')
plt.title("Log Transformed Data")
plt.xlabel("Index")
plt.ylabel("log(Value + 1)")
plt.tight_layout()
plt.show()
```

---

## 2. Reciprocal Transform

### Formula

The reciprocal transformation is defined as:

$$  
x' = \frac{1}{x + c}  
$$  

- x is the original value.  
- c is a constant added to avoid division by zero.  
- x' is the transformed value.  

### When to Use

- **Compress high values:** It can help in reducing the impact of very large values.
- **Reverse order:** Larger original values become smaller after transformation.

### Python Code Example

```python
# Define the reciprocal transformation function
def reciprocal_transform(x, c=1):
    return 1 / (x + c)

# Create a FunctionTransformer for the reciprocal function
reciprocal_transformer = FunctionTransformer(func=lambda x: reciprocal_transform(x, c=1), validate=True)

# Apply the reciprocal transformation
df['Reciprocal_Transformed'] = reciprocal_transformer.transform(df[['Value']])

print("Reciprocal Transform:\n", df)

# Plot the reciprocal transformation
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(df['Value'], marker='o')
plt.title("Original Data")
plt.xlabel("Index")
plt.ylabel("Value")

plt.subplot(1, 2, 2)
plt.plot(df['Reciprocal_Transformed'], marker='o', color='red')
plt.title("Reciprocal Transformed Data")
plt.xlabel("Index")
plt.ylabel("1/(Value + 1)")
plt.tight_layout()
plt.show()
```

---

## 3. Square Root Transform

### Formula

The square root transformation is defined as:

$$  
x' = \sqrt{x + c}  
$$  

- x is the original value.  
- c is a constant added to handle zero or negative values.  
- x' is the transformed value.  

### When to Use

- **Moderate variance stabilization:** Useful when data values are moderately skewed.
- **Compress scale:** Reduces the range of data, but less aggressively than a log transform.

### Python Code Example

```python
# Define the square root transformation function
def sqrt_transform(x, c=0):
    return np.sqrt(x + c)

# Create a FunctionTransformer for the square root function
sqrt_transformer = FunctionTransformer(func=lambda x: sqrt_transform(x, c=0), validate=True)

# Apply the square root transformation
df['Sqrt_Transformed'] = sqrt_transformer.transform(df[['Value']])

print("Square Root Transform:\n", df)

# Plot the square root transformation
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(df['Value'], marker='o')
plt.title("Original Data")
plt.xlabel("Index")
plt.ylabel("Value")

plt.subplot(1, 2, 2)
plt.plot(df['Sqrt_Transformed'], marker='o', color='purple')
plt.title("Square Root Transformed Data")
plt.xlabel("Index")
plt.ylabel("sqrt(Value)")
plt.tight_layout()
plt.show()
```

---

## Conclusion

Using the **FunctionTransformer** in scikit‑learn makes it easy to integrate custom transformations into your machine learning pipeline. The log, reciprocal, and square root transforms are powerful tools for handling skewed data, stabilizing variance, and improving model performance. Adjust the constant c as needed based on the characteristics of your dataset.