Skip to content

Commit 2965916

Browse files
feat(KDP): adding passthrough feature
1 parent fac7806 commit 2965916

File tree

5 files changed

+370
-76
lines changed

5 files changed

+370
-76
lines changed

docs/features/overview.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
55
## 💪 Feature Types at a Glance
66

7-
KDP supports five primary feature types, each with specialized processing:
7+
KDP supports six primary feature types, each with specialized processing:
88

99
| Feature Type | What It's For | Processing Magic |
1010
|--------------|---------------|------------------|
@@ -13,6 +13,7 @@ KDP supports five primary feature types, each with specialized processing:
1313
| 📝 **Text** | Free-form text like reviews, descriptions | Tokenization, embeddings, sequence handling |
1414
| 📅 **Date** | Temporal data like signup dates, transactions | Component extraction, cyclical encoding, seasonality |
1515
|**Cross Features** | Feature interactions | Combined embeddings, interaction modeling |
16+
| 🔍 **Passthrough** | Pre-processed data, custom vectors | No modification, type casting only |
1617

1718
## 🚀 Getting Started
1819

@@ -34,7 +35,10 @@ features = {
3435

3536
# Text and dates - specialized processing
3637
"product_review": FeatureType.TEXT,
37-
"signup_date": FeatureType.DATE
38+
"signup_date": FeatureType.DATE,
39+
40+
# Passthrough feature - use with pre-processed data
41+
"embedding_vector": FeatureType.PASSTHROUGH
3842
}
3943

4044
# Create your preprocessor
@@ -88,13 +92,15 @@ Learn about each feature type in detail:
8892
- [Text Features](text-features.md) - Work with free-form text
8993
- [Date Features](date-features.md) - Extract temporal patterns
9094
- [Cross Features](cross-features.md) - Model feature interactions
95+
- [Passthrough Features](passthrough-features.md) - Include unmodified data
9196

9297
## 👨‍💻 Advanced Feature Configuration
9398

9499
For more control, use specialized feature classes:
95100

96101
```python
97-
from kdp.features import NumericalFeature, CategoricalFeature, TextFeature, DateFeature
102+
from kdp.features import NumericalFeature, CategoricalFeature, TextFeature, DateFeature, PassthroughFeature
103+
import tensorflow as tf
98104

99105
# Advanced feature configuration
100106
features = {
@@ -129,6 +135,12 @@ features = {
129135
add_day_of_week=True,
130136
add_month=True,
131137
cyclical_encoding=True
138+
),
139+
140+
# Passthrough feature
141+
"embedding": PassthroughFeature(
142+
name="embedding",
143+
dtype=tf.float32
132144
)
133145
}
134146
```
@@ -140,6 +152,7 @@ features = {
140152
3. **Combine Approaches**: Mix distribution-aware, attention, embeddings for best results
141153
4. **Check Distributions**: Review your data distribution before choosing feature types
142154
5. **Experiment with Types**: Sometimes a different encoding provides better results
155+
6. **Consider Passthrough**: Use passthrough features for pre-processed data or custom vectors
143156

144157
---
145158

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# 🔍 Passthrough Features
2+
3+
> Passthrough features allow you to include data in your model without any preprocessing modifications.
4+
5+
## 🚀 When to Use Passthrough Features
6+
7+
Passthrough features are ideal when:
8+
9+
1. **Pre-processed Data**: You have already processed the data externally
10+
2. **Custom Vectors**: You want to include pre-computed embeddings or vectors
11+
3. **Preserving Raw Values**: You need the exact original values in your model
12+
4. **Feature Testing**: You want to compare raw vs processed feature performance
13+
5. **Gradual Migration**: You're moving from a legacy system and need compatibility
14+
15+
## 💡 Defining Passthrough Features
16+
17+
Define a passthrough feature using the `FeatureType.PASSTHROUGH` enum or the `PassthroughFeature` class:
18+
19+
```python
20+
from kdp import PreprocessingModel, FeatureType
21+
from kdp.features import PassthroughFeature
22+
import tensorflow as tf
23+
24+
# Simple approach using enum
25+
features = {
26+
"embedding_vector": FeatureType.PASSTHROUGH,
27+
"age": FeatureType.FLOAT_NORMALIZED,
28+
"category": FeatureType.STRING_CATEGORICAL
29+
}
30+
31+
# Advanced configuration with PassthroughFeature class
32+
features = {
33+
"embedding_vector": PassthroughFeature(
34+
name="embedding_vector",
35+
dtype=tf.float32 # Specify the data type
36+
),
37+
"raw_text_embedding": PassthroughFeature(
38+
name="raw_text_embedding",
39+
dtype=tf.float32
40+
),
41+
"age": FeatureType.FLOAT_NORMALIZED,
42+
"category": FeatureType.STRING_CATEGORICAL
43+
}
44+
45+
# Create your preprocessor
46+
preprocessor = PreprocessingModel(
47+
path_data="customer_data.csv",
48+
features_specs=features
49+
)
50+
```
51+
52+
## 📊 How Passthrough Features Work
53+
54+
Unlike other feature types that undergo normalization, encoding, or other transformations, passthrough features are:
55+
56+
1. **Added to Inputs**: Included in model inputs like other features
57+
2. **Type Casting**: Cast to their specified dtype for compatibility
58+
3. **No Transformation**: Pass through the pipeline without normalization or encoding
59+
4. **Feature Selection (Optional)**: Can still use feature selection if enabled
60+
61+
## 🔧 Configuration Options
62+
63+
The `PassthroughFeature` class supports these parameters:
64+
65+
| Parameter | Type | Description |
66+
|-----------|------|-------------|
67+
| `name` | str | The name of the feature |
68+
| `feature_type` | FeatureType | Set to `FeatureType.PASSTHROUGH` by default |
69+
| `dtype` | tf.DType | The data type of the feature (default: tf.float32) |
70+
71+
## 🎯 Example: Using Pre-computed Embeddings
72+
73+
Here's how to use passthrough features for pre-computed embeddings:
74+
75+
```python
76+
import pandas as pd
77+
from kdp import PreprocessingModel, FeatureType
78+
from kdp.features import PassthroughFeature, NumericalFeature
79+
import tensorflow as tf
80+
81+
# Define features
82+
features = {
83+
# Regular features
84+
"age": NumericalFeature(
85+
name="age",
86+
feature_type=FeatureType.FLOAT_NORMALIZED
87+
),
88+
"category": FeatureType.STRING_CATEGORICAL,
89+
90+
# Passthrough features for pre-computed embeddings
91+
"product_embedding": PassthroughFeature(
92+
name="product_embedding",
93+
dtype=tf.float32
94+
)
95+
}
96+
97+
# Create your preprocessor
98+
preprocessor = PreprocessingModel(
99+
path_data="data.csv",
100+
features_specs=features
101+
)
102+
103+
# Build the model
104+
model = preprocessor.build_preprocessor()
105+
```
106+
107+
## ⚠️ Things to Consider
108+
109+
1. **Data Type Compatibility**: Ensure the data type of your passthrough feature is compatible with the overall model
110+
2. **Dimensionality**: Make sure the feature dimensions fit your model architecture
111+
3. **Data Quality**: Since no preprocessing is applied, ensure your data is clean and ready for use
112+
4. **Performance Impact**: Using raw data may affect model performance; test both approaches
113+
114+
## 🚀 Best Practices
115+
116+
1. **Document Your Decision**: Make it clear why certain features are passed through
117+
2. **Test Both Approaches**: Compare passthrough vs preprocessed features for performance
118+
3. **Consider Feature Importance**: Use feature selection to see if passthrough features contribute meaningfully
119+
4. **Monitor Gradients**: Watch for gradient issues since passthrough features may have different scales
120+
121+
---
122+
123+
<div class="prev-next">
124+
<a href="cross-features.md" class="prev">← Cross Features</a>
125+
<a href="../optimization/overview.md" class="next">Optimization →</a>
126+
</div>
127+
128+
<style>
129+
.prev-next {
130+
display: flex;
131+
justify-content: space-between;
132+
margin-top: 40px;
133+
}
134+
.prev-next a {
135+
padding: 10px 15px;
136+
background-color: #f1f1f1;
137+
border-radius: 5px;
138+
text-decoration: none;
139+
color: #333;
140+
}
141+
.prev-next a:hover {
142+
background-color: #ddd;
143+
}
144+
</style>

kdp/features.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ class FeatureType(Enum):
3232
TEXT = auto()
3333
CROSSES = auto()
3434
DATE = auto()
35+
PASSTHROUGH = auto()
3536

3637

3738
class DistributionType(str, Enum):
@@ -235,3 +236,26 @@ def __init__(
235236
super().__init__(name, feature_type, **kwargs)
236237
self.dtype = tf.string
237238
self.kwargs = kwargs
239+
240+
241+
class PassthroughFeature(Feature):
242+
"""PassthroughFeature for including features in output without processing."""
243+
244+
def __init__(
245+
self,
246+
name: str,
247+
feature_type: FeatureType = FeatureType.PASSTHROUGH,
248+
dtype: tf.DType = tf.float32,
249+
**kwargs,
250+
) -> None:
251+
"""Initializes a PassthroughFeature instance.
252+
253+
Args:
254+
name (str): The name of the feature.
255+
feature_type (FeatureType): The type of the feature.
256+
dtype (tf.DType): The data type of the feature (defaults to float32).
257+
**kwargs: Additional keyword arguments for the feature.
258+
"""
259+
super().__init__(name, feature_type, **kwargs)
260+
self.dtype = dtype
261+
self.kwargs = kwargs

0 commit comments

Comments
 (0)