### Feature Engineering on Numerical data:
- Binarization
- Rounding
- Binning
- Interactions & Transformations.

### Feature Engineering on Categorical data:
- categorical data
- Nominal data

### Binarization and Label Encoding

| **Method**              | **Description**                                                              |
|-------------------------|------------------------------------------------------------------------------|
| **Binarizer**            | Binarize data (set feature values to 0 or 1) according to a threshold.      |
| **LabelBinarizer**      | Binarize labels in a one-vs-all fashion.                                     |
| **MultiLabelBinarizer** | Transform between iterable of iterables and a multilabel format.            |
| **label_binarize**      | Binarize labels in a one-vs-all fashion.                                     |


In [None]:
import sklearn.preprocessing


# Binarizer converts numerical data into binary values (0 or 1) based on a threshold.
# Values greater than or equal to the threshold become 1, and those below become 0.
binarizer = sklearn.preprocessing.Binarizer(threshold="value")

# LabelBinarizer converts categorical labels into a binary format, typically used for one-hot encoding in classification tasks.
# It transforms each label into a separate binary vector, with each class represented by its own column.
labelbinarizer = sklearn.preprocessing.LabelBinarizer()

# MultiLabelBinarizer handles multi-label classification by converting multiple labels for each sample into a binary matrix.
# Each label is represented by a separate column, and samples are encoded with multiple 1's as appropriate.
multilabelbbinarizer = sklearn.preprocessing.MultiLabelBinarizer()


labelbinarize=sklearn.preprocessing.label_binarize()

In [None]:
from sklearn.preprocessing import LabelBinarizer
from sklearn.preprocessing import label_binarize

# Sample data: Categories of animals
labels = ['cat', 'dog', 'dog', 'bird', 'cat', 'bird']

label_binarizer = LabelBinarizer()
binary_labels = label_binarizer.fit_transform(labels)
print("Binarized labels using LabelBinarizer:")
print(binary_labels)


classes = ['bird', 'cat', 'dog']
binary_labels = label_binarize(labels, classes=classes)
print("\nBinarized labels using label_binarize:")
print(binary_labels)


### Encoding Categorical Features

| **Method**              | **Description**                                                              |
|-------------------------|------------------------------------------------------------------------------|
| **OneHotEncoder**       | Encode categorical features as a one-hot numeric array.                      |
| **OrdinalEncoder**      | Encode categorical features as an integer array, ordering categories by sorting them alphabetically.|
| **TargetEncoder**       | Target Encoder for regression and classification targets.                    |


#### Ordinal encoder.

In [None]:
from sklearn.preprocessing import OneHotEncoder

onehotencoder = OneHotEncoder()

# Example:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

data = {
    'Product_ID': [1, 2, 3, 4, 5],
    'Category': ['Electronics', 'Clothing', 'Furniture', 'Clothing', 'Electronics']
}
df = pd.DataFrame(data)

onehotencoder = OneHotEncoder()
encoded_data = onehotencoder.fit_transform(df[['Category']])
encoded_df = pd.DataFrame(encoded_data.toarray(), columns=onehotencoder.categories_[0])

df = pd.concat([df, encoded_df])
df.head()


# <-------------------------------------------------------(or)------------------------------------------------------->
## Manual

import pandas as pd

data = {
    'City': ['New York', 'Chicago', 'New York', 'Dallas', 'Chicago']
}
df = pd.DataFrame(data)

unique_cities = df['City'].unique()     # Step 1: Get the unique categories (i.e., columns for each unique category)
for city in unique_cities:              # Step 2: Create a new column for each unique category and assign 1 or 0 based on presence
    df[city] = df['City'].apply(lambda x: 1 if x == city else 0)
df

#### Label encoder.

In [None]:
import sklearn.preprocessing

labelencoder=sklearn.preprocessing.LabelEncoder()

# Example:
import pandas as pd
import sklearn.preprocessing

# Sample dataset with 'Level' column
data = {
    'Product_ID': [1, 2, 3, 4, 5],
    'Level': ['small','large','medium','extra-large','large']
}
df = pd.DataFrame(data)

labelencoder = sklearn.preprocessing.LabelEncoder()
df['Level_encoded'] = labelencoder.fit_transform(df['Level'])               # Apply LabelEncoder to the 'Level' column
df['Level_decoded'] = labelencoder.inverse_transform(df['Level_encoded'])   # To get the reverse mapping (i.e: map numbers back to original labels)

print("\nDecoded DataFrame:\n",df)


# <-------------------------------------------------------(or)------------------------------------------------------->
## Manual



import pandas as pd

data = {
    'Product_ID': [1, 2, 3, 4, 5],
    'Level': ['low', 'high', 'medium', 'medium', 'low']
}
df = pd.DataFrame(data)

level_mapping = {'low':1 , 'medium': 2, 'high': 3}      # Define a mapping dictionary
df['Level_encoded'] = df['Level'].map(level_mapping)    # Apply the mapping using the map function
df

#### Target Encoder

In [None]:
import pandas as pd

data = {
    'City': ['New York', 'Chicago', 'New York', 'Chicago', 'Dallas'],
    'Price': [500, 300, 550, 400, 350]
}
df = pd.DataFrame(data)

city_mean = df.groupby('City')['Price'].mean()      # Step 1: Calculate the mean of the 'Price' column for each unique 'City'
df['City_encoded'] = df['City'].map(city_mean)      # Step 2: Replace the 'City' column with the mean 'Price' for each city
df

### Transformation and Feature Generation
| **Method**            | **Description**                                                                                       |
|-----------------------|-------------------------------------------------------------------------------------------------------|
| **FunctionTransformer** | Constructs a transformer from an arbitrary callable.                                                  |
| **PolynomialFeatures**  | Generate polynomial and interaction features.                                                         |
| **PowerTransformer**    | Apply a power transform featurewise to make data more Gaussian-like.                                  |
| **QuantileTransformer** | Transform features using quantiles information.                                                     |
| **SplineTransformer**   | Generate univariate B-spline bases for features.                                                     |


In [None]:
functiontransformer=sklearn.preprocessing.FunctionTransformer(
    func: ufunc | ((...) -> Any) | None = None,
    inverse_func: ufunc | ((...) -> Any) | None = None,
    *,
    validate: bool = False,
    accept_sparse: bool = False,
    check_inverse: bool = True,
    feature_names_out: str | ((...) -> Any) | None = None,
    kw_args: dict | None = None,
    inv_kw_args: dict | None = None
)
)