<a href="https://colab.research.google.com/github/Sara1428/AI-ML-Learning/blob/main/feature_crossing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In many machine learning problems, the relationship between the input features and the target variable is not just linear. By combining features, we can uncover complex relationships that a model might miss if it only considers each feature independently.

A feature cross is the combination of two or more features to create a new feature. This new feature can capture the interactions between the original features, providing more insightful information to your model.

Better explanation - https://krishna-yogik.medium.com/feature-cross-a-deep-dive-with-practical-examples-4e87a373f117

**1. Polynomial Feature Crosses** \\

Polynomial feature crosses involve creating new features by raising existing features to a power or combining them using polynomial terms. For instance, if you have a feature x, a polynomial cross might be x² or x³. These are useful in regression models where the relationship between features and the target variable is nonlinear.

In [None]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures

# Sample data
data = pd.DataFrame({'x': [1, 2, 3, 4, 5], 'y': [6, 7, 8, 9, 10]})

# Creating polynomial features
poly = PolynomialFeatures(degree=2)
poly_features = poly.fit_transform(data)

# Converting to dataframe
poly_df = pd.DataFrame(poly_features, columns=['1', 'x', 'y', 'x^2', 'xy', 'y^2'])

print(poly_df)

     1    x     y   x^2    xy    y^2
0  1.0  1.0   6.0   1.0   6.0   36.0
1  1.0  2.0   7.0   4.0  14.0   49.0
2  1.0  3.0   8.0   9.0  24.0   64.0
3  1.0  4.0   9.0  16.0  36.0   81.0
4  1.0  5.0  10.0  25.0  50.0  100.0


**2. Interaction Feature Crosses**

Interaction feature crosses are created by multiplying two or more features together. For example, if you have features age and income, an interaction cross might be age * income. This type captures the interaction effect between features and is often used in linear and logistic regression models to understand how features work together to influence the target variable.


In [None]:
import pandas as pd

# Sample data
data = pd.DataFrame({'age': [25, 32, 47, 51, 62], 'income': [50000, 60000, 80000, 90000, 120000]})

# Create interaction feature
data['age_income'] = data['age'] * data['income']

print(data)

   age  income  age_income
0   25   50000     1250000
1   32   60000     1920000
2   47   80000     3760000
3   51   90000     4590000
4   62  120000     7440000


**3. Binning and One-Hot Encoding Crosses**


This type involves creating categorical features by binning continuous features and then crossing these binned features with other features. For example, if you bin age into categories like young, middle-aged, and old, and have a feature income_level with categories like low, medium, and high, you can create crossed features like young_low_income, middle-aged_medium_income, etc. This is useful in decision tree-based models and models that handle categorical features well

In [None]:
import pandas as pd
import numpy as np

# Sample data
data = pd.DataFrame({'age': [25, 32, 47, 51, 62], 'income_level': ['low', 'medium', 'medium', 'high', 'high']})

# Binning age into categories
bins = [0, 30, 50, 100]
labels = ['young', 'middle-aged', 'old']
data['age_group'] = pd.cut(data['age'], bins=bins, labels=labels)

# One-hot encode the features
data_encoded = pd.get_dummies(data, columns=['age_group', 'income_level'])

print(data_encoded)

   age  age_group_young  age_group_middle-aged  age_group_old  \
0   25             True                  False          False   
1   32            False                   True          False   
2   47            False                   True          False   
3   51            False                  False           True   
4   62            False                  False           True   

   income_level_high  income_level_low  income_level_medium  
0              False              True                False  
1              False             False                 True  
2              False             False                 True  
3               True             False                False  
4               True             False                False  


**4. Crossed Embedding Features**


Crossed embedding features are used mainly in deep learning models, especially those dealing with sparse data like recommender systems. This technique involves combining embeddings of categorical features to create new, informative features. For instance, in a movie recommendation system, combining the embeddings of user_id and movie_id can help capture the interaction between users and movies more effectively.

In [None]:
import tensorflow as tf

# Sample data
user_ids = tf.constant([1, 2, 3, 4, 5])
movie_ids = tf.constant([101, 102, 103, 104, 105])

# Embedding layers
user_embedding = tf.keras.layers.Embedding(input_dim=10, output_dim=4)
movie_embedding = tf.keras.layers.Embedding(input_dim=200, output_dim=4)

# Get embeddings
user_embeds = user_embedding(user_ids)
movie_embeds = movie_embedding(movie_ids)

# Cross embeddings
crossed_embeds = tf.concat([user_embeds, movie_embeds], axis=1)

print(crossed_embeds)

tf.Tensor(
[[ 1.86966769e-02 -1.59388892e-02 -2.24998351e-02  1.09314695e-02
   2.63718851e-02  1.63127109e-03 -3.28007825e-02  4.58093397e-02]
 [ 1.79199092e-02 -3.05398591e-02  6.36471435e-03  4.44461815e-02
   2.17534564e-02  3.34068388e-03 -1.65980570e-02  4.57465984e-02]
 [ 1.61955617e-02 -1.36791840e-02  2.21272595e-02  4.89951111e-02
  -3.19010504e-02  9.12203640e-03 -3.55796888e-03 -1.78154223e-02]
 [ 4.26277183e-02  3.81649621e-02  1.74470805e-02 -1.37405023e-02
  -2.13567857e-02 -3.35861333e-02  3.17292847e-02 -4.55721989e-02]
 [-4.14873734e-02  1.10461600e-02 -2.13115346e-02  3.71809863e-02
  -6.82752207e-03 -6.61611557e-05  7.11059570e-03  1.97275393e-02]], shape=(5, 8), dtype=float32)


##**5. Combining Multiple Feature Crosses**

**Example: Combining polynomial and interaction features.**

In [None]:
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures

# Sample data
data = pd.DataFrame({'x1': [1, 2, 3, 4, 5], 'x2': [10, 20, 30, 40, 50]})

# Create polynomial features
poly = PolynomialFeatures(degree=2, include_bias=False)
poly_features = poly.fit_transform(data)

# Convert to DataFrame for readability
poly_df = pd.DataFrame(poly_features, columns=poly.get_feature_names_out(['x1', 'x2']))

# Create interaction feature
poly_df['x1_x2'] = poly_df['x1'] * poly_df['x2']

print(poly_df)

    x1    x2  x1^2  x1 x2    x2^2  x1_x2
0  1.0  10.0   1.0   10.0   100.0   10.0
1  2.0  20.0   4.0   40.0   400.0   40.0
2  3.0  30.0   9.0   90.0   900.0   90.0
3  4.0  40.0  16.0  160.0  1600.0  160.0
4  5.0  50.0  25.0  250.0  2500.0  250.0
