|
| 1 | +# 🔍 Passthrough Features |
| 2 | + |
| 3 | +> Passthrough features allow you to include data in your model without any preprocessing modifications. |
| 4 | +
|
| 5 | +## 🚀 When to Use Passthrough Features |
| 6 | + |
| 7 | +Passthrough features are ideal when: |
| 8 | + |
| 9 | +1. **Pre-processed Data**: You have already processed the data externally |
| 10 | +2. **Custom Vectors**: You want to include pre-computed embeddings or vectors |
| 11 | +3. **Preserving Raw Values**: You need the exact original values in your model |
| 12 | +4. **Feature Testing**: You want to compare raw vs processed feature performance |
| 13 | +5. **Gradual Migration**: You're moving from a legacy system and need compatibility |
| 14 | + |
| 15 | +## 💡 Defining Passthrough Features |
| 16 | + |
| 17 | +Define a passthrough feature using the `FeatureType.PASSTHROUGH` enum or the `PassthroughFeature` class: |
| 18 | + |
| 19 | +```python |
| 20 | +from kdp import PreprocessingModel, FeatureType |
| 21 | +from kdp.features import PassthroughFeature |
| 22 | +import tensorflow as tf |
| 23 | + |
| 24 | +# Simple approach using enum |
| 25 | +features = { |
| 26 | + "embedding_vector": FeatureType.PASSTHROUGH, |
| 27 | + "age": FeatureType.FLOAT_NORMALIZED, |
| 28 | + "category": FeatureType.STRING_CATEGORICAL |
| 29 | +} |
| 30 | + |
| 31 | +# Advanced configuration with PassthroughFeature class |
| 32 | +features = { |
| 33 | + "embedding_vector": PassthroughFeature( |
| 34 | + name="embedding_vector", |
| 35 | + dtype=tf.float32 # Specify the data type |
| 36 | + ), |
| 37 | + "raw_text_embedding": PassthroughFeature( |
| 38 | + name="raw_text_embedding", |
| 39 | + dtype=tf.float32 |
| 40 | + ), |
| 41 | + "age": FeatureType.FLOAT_NORMALIZED, |
| 42 | + "category": FeatureType.STRING_CATEGORICAL |
| 43 | +} |
| 44 | + |
| 45 | +# Create your preprocessor |
| 46 | +preprocessor = PreprocessingModel( |
| 47 | + path_data="customer_data.csv", |
| 48 | + features_specs=features |
| 49 | +) |
| 50 | +``` |
| 51 | + |
| 52 | +## 📊 How Passthrough Features Work |
| 53 | + |
| 54 | +Unlike other feature types that undergo normalization, encoding, or other transformations, passthrough features are: |
| 55 | + |
| 56 | +1. **Added to Inputs**: Included in model inputs like other features |
| 57 | +2. **Type Casting**: Cast to their specified dtype for compatibility |
| 58 | +3. **No Transformation**: Pass through the pipeline without normalization or encoding |
| 59 | +4. **Feature Selection (Optional)**: Can still use feature selection if enabled |
| 60 | + |
| 61 | +## 🔧 Configuration Options |
| 62 | + |
| 63 | +The `PassthroughFeature` class supports these parameters: |
| 64 | + |
| 65 | +| Parameter | Type | Description | |
| 66 | +|-----------|------|-------------| |
| 67 | +| `name` | str | The name of the feature | |
| 68 | +| `feature_type` | FeatureType | Set to `FeatureType.PASSTHROUGH` by default | |
| 69 | +| `dtype` | tf.DType | The data type of the feature (default: tf.float32) | |
| 70 | + |
| 71 | +## 🎯 Example: Using Pre-computed Embeddings |
| 72 | + |
| 73 | +Here's how to use passthrough features for pre-computed embeddings: |
| 74 | + |
| 75 | +```python |
| 76 | +import pandas as pd |
| 77 | +from kdp import PreprocessingModel, FeatureType |
| 78 | +from kdp.features import PassthroughFeature, NumericalFeature |
| 79 | +import tensorflow as tf |
| 80 | + |
| 81 | +# Define features |
| 82 | +features = { |
| 83 | + # Regular features |
| 84 | + "age": NumericalFeature( |
| 85 | + name="age", |
| 86 | + feature_type=FeatureType.FLOAT_NORMALIZED |
| 87 | + ), |
| 88 | + "category": FeatureType.STRING_CATEGORICAL, |
| 89 | + |
| 90 | + # Passthrough features for pre-computed embeddings |
| 91 | + "product_embedding": PassthroughFeature( |
| 92 | + name="product_embedding", |
| 93 | + dtype=tf.float32 |
| 94 | + ) |
| 95 | +} |
| 96 | + |
| 97 | +# Create your preprocessor |
| 98 | +preprocessor = PreprocessingModel( |
| 99 | + path_data="data.csv", |
| 100 | + features_specs=features |
| 101 | +) |
| 102 | + |
| 103 | +# Build the model |
| 104 | +model = preprocessor.build_preprocessor() |
| 105 | +``` |
| 106 | + |
| 107 | +## ⚠️ Things to Consider |
| 108 | + |
| 109 | +1. **Data Type Compatibility**: Ensure the data type of your passthrough feature is compatible with the overall model |
| 110 | +2. **Dimensionality**: Make sure the feature dimensions fit your model architecture |
| 111 | +3. **Data Quality**: Since no preprocessing is applied, ensure your data is clean and ready for use |
| 112 | +4. **Performance Impact**: Using raw data may affect model performance; test both approaches |
| 113 | + |
| 114 | +## 🚀 Best Practices |
| 115 | + |
| 116 | +1. **Document Your Decision**: Make it clear why certain features are passed through |
| 117 | +2. **Test Both Approaches**: Compare passthrough vs preprocessed features for performance |
| 118 | +3. **Consider Feature Importance**: Use feature selection to see if passthrough features contribute meaningfully |
| 119 | +4. **Monitor Gradients**: Watch for gradient issues since passthrough features may have different scales |
| 120 | + |
| 121 | +--- |
| 122 | + |
| 123 | +<div class="prev-next"> |
| 124 | + <a href="cross-features.md" class="prev">← Cross Features</a> |
| 125 | + <a href="../optimization/overview.md" class="next">Optimization →</a> |
| 126 | +</div> |
| 127 | + |
| 128 | +<style> |
| 129 | +.prev-next { |
| 130 | + display: flex; |
| 131 | + justify-content: space-between; |
| 132 | + margin-top: 40px; |
| 133 | +} |
| 134 | +.prev-next a { |
| 135 | + padding: 10px 15px; |
| 136 | + background-color: #f1f1f1; |
| 137 | + border-radius: 5px; |
| 138 | + text-decoration: none; |
| 139 | + color: #333; |
| 140 | +} |
| 141 | +.prev-next a:hover { |
| 142 | + background-color: #ddd; |
| 143 | +} |
| 144 | +</style> |
0 commit comments