In [26]:
import numpy as np
import pandas as pd
from category_encoders import TargetEncoder

### Target Encoding - It transformes categorical variables to numerical values.
For each category in categorical feature, it calculates mean (or, any other aggregate statistics like median or sum) of the target variable and replaces each category with its target mean.  
Pros:  
1. Dimensionality Reduction: Unlike one-hot-encoding it does not create new
columns for each categories, which is beneficial for high-cardinality categorical features.  

Cons:  
1. Overfitting: For low-frequency categories, the encoded value might be
heavily influenced by a small number of samples, causing overfitting to those categories. Techniques like smoothing can help here. In smoothing, instead of taking mean of the target variable, we will calculate weighted mean for each categories, the weights being the frequencies of the categories.  
2. Target Leakage: It is important to apply the target mapping to only training data and learn the mapping and apply it on the test data. Otherwise if we apply the mapping on the full dataset, the training data will contain test data target variable results, may result in a model which performs overwhelmingly on test set.

In [27]:
# target encoding
data = {
    'category': ['A', 'B', 'A', 'C', 'C', 'B', 'A', 'A', 'C'],
    'target': [12, 21, 14, 35, 29, 18, 11, 15, 32]
}
data = pd.DataFrame(data)
encoder = TargetEncoder(cols = ['category'])
data_encoded = encoder.fit_transform(data['category'], data['target'])
print(data_encoded)

    category
0  19.471254
1  20.596524
2  19.471254
3  22.511221
4  22.511221
5  20.596524
6  19.471254
7  19.471254
8  22.511221


### Polynomial Features - Linear models might fail to capture non-linear relationships in the data. Using polynomial features allows a linear model to learn non-linear relationships.  
When to use:
1. When a linear model underfits data.  
2. When the relationship between input and output is non-linear.  
3. When we want to capture interaction effects between features.  

Considerations:  
1. Higher degree polynomials may introduce too many features resulting in overfitting. Regularization (Lasso, Ridge, ElasticNet) helps control this by penalizing large coefficients.  
2. Polynomial features often introduce highly correlated variables. Using PCA can help in reducing redundancy.  

When not to use:  
1. When there is high dimensional data, it will lead to too many features resulting in feature explosion and computation inefficiency.  
2. Nosiy Data: High degree polynomials are sensitive to noisy data, resulting in overfitting.  
3. Better alternatives: Decision Trees, SVMs and Neural Networks can model non-linearity better.

In [28]:
# polynomial features
from sklearn.preprocessing import PolynomialFeatures

X = np.array([[2], [3], [4]])
poly = PolynomialFeatures(degree = 2, include_bias = False)
X_poly = poly.fit_transform(X)
print(f"Polynomial Features: \n{X_poly}")

Polynomial Features: 
[[ 2.  4.]
 [ 3.  9.]
 [ 4. 16.]]
