Skip to content

Commit cfbd38b

Browse files
feat(kdp): adding TabularAttentionLayers and implementation (#11)
2 parents 9d8f8b3 + 352e72b commit cfbd38b

File tree

9 files changed

+1256
-353
lines changed

9 files changed

+1256
-353
lines changed

.pre-commit-config.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
default_language_version:
2-
python: python3.11
2+
python: python3.12
33

4-
default_stages: [commit]
4+
default_stages: [pre-commit]
55
default_install_hook_types: [pre-commit, commit-msg]
66

77
repos:

docs/tabular_attention.md

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# Tabular Attention in KDP
2+
3+
The KDP package includes powerful attention mechanisms for tabular data:
4+
1. Standard TabularAttention for uniform feature processing
5+
2. MultiResolutionTabularAttention for type-specific feature processing
6+
7+
## Overview
8+
9+
### Standard TabularAttention
10+
The TabularAttention layer applies attention uniformly across all features, capturing:
11+
- Dependencies between features for each sample
12+
- Dependencies between samples for each feature
13+
14+
### MultiResolutionTabularAttention
15+
The MultiResolutionTabularAttention layer implements a hierarchical attention mechanism that processes different feature types appropriately:
16+
1. **Numerical Features**: Full-resolution attention that preserves precise numerical relationships
17+
2. **Categorical Features**: Embedding-based attention that captures categorical patterns
18+
3. **Cross-Feature Attention**: Hierarchical attention between numerical and categorical features
19+
20+
## Usage
21+
22+
### Standard TabularAttention
23+
24+
```python
25+
from kdp.processor import PreprocessingModel, TabularAttentionPlacementOptions
26+
27+
model = PreprocessingModel(
28+
# ... other parameters ...
29+
tabular_attention=True,
30+
tabular_attention_heads=4,
31+
tabular_attention_dim=64,
32+
tabular_attention_dropout=0.1,
33+
tabular_attention_placement=TabularAttentionPlacementOptions.ALL_FEATURES.value,
34+
)
35+
```
36+
37+
### Multi-Resolution TabularAttention
38+
39+
```python
40+
from kdp.processor import PreprocessingModel, TabularAttentionPlacementOptions
41+
42+
model = PreprocessingModel(
43+
# ... other parameters ...
44+
tabular_attention=True,
45+
tabular_attention_heads=4,
46+
tabular_attention_dim=64,
47+
tabular_attention_dropout=0.1,
48+
tabular_attention_embedding_dim=32, # Dimension for categorical embeddings
49+
tabular_attention_placement=TabularAttentionPlacementOptions.MULTI_RESOLUTION.value,
50+
)
51+
```
52+
53+
## Configuration Options
54+
55+
### Common Options
56+
- `tabular_attention` (bool): Enable/disable attention mechanisms
57+
- `tabular_attention_heads` (int): Number of attention heads
58+
- `tabular_attention_dim` (int): Dimension of the attention model
59+
- `tabular_attention_dropout` (float): Dropout rate for regularization
60+
61+
### Placement Options
62+
- `tabular_attention_placement` (str):
63+
- `ALL_FEATURES`: Apply uniform attention to all features
64+
- `NUMERIC`: Apply only to numeric features
65+
- `CATEGORICAL`: Apply only to categorical features
66+
- `MULTI_RESOLUTION`: Use type-specific attention mechanisms
67+
- `NONE`: Disable attention
68+
69+
### Multi-Resolution Specific Options
70+
- `tabular_attention_embedding_dim` (int): Dimension for categorical embeddings in multi-resolution mode
71+
72+
## How It Works
73+
74+
### Standard TabularAttention
75+
1. **Self-Attention**: Applied uniformly across all features
76+
2. **Layer Normalization**: Stabilizes learning
77+
3. **Feed-forward Network**: Processes attention outputs
78+
79+
### MultiResolutionTabularAttention
80+
1. **Numerical Processing**:
81+
- Full-resolution self-attention
82+
- Preserves numerical precision
83+
- Captures complex numerical relationships
84+
85+
2. **Categorical Processing**:
86+
- Embedding-based attention
87+
- Lower-dimensional representations
88+
- Captures categorical patterns efficiently
89+
90+
3. **Cross-Feature Integration**:
91+
- Hierarchical attention between feature types
92+
- Numerical features attend to categorical features
93+
- Preserves type-specific characteristics while enabling interaction
94+
95+
## Best Practices
96+
97+
### When to Use Standard TabularAttention
98+
- Data has uniform feature importance
99+
- Features are of similar scales
100+
- Memory usage is a concern
101+
102+
### When to Use MultiResolutionTabularAttention
103+
- Mixed numerical and categorical features
104+
- Different feature types have different importance
105+
- Need to preserve type-specific characteristics
106+
- Complex interactions between feature types
107+
108+
### Configuration Tips
109+
1. **Attention Heads**:
110+
- Start with 4-8 heads
111+
- Increase for complex relationships
112+
- Monitor computational cost
113+
114+
2. **Dimensions**:
115+
- `tabular_attention_dim`: Based on feature complexity
116+
- `tabular_attention_embedding_dim`: Usually smaller than main dimension
117+
- Balance between expressiveness and efficiency
118+
119+
3. **Dropout**:
120+
- Start with 0.1
121+
- Increase if overfitting
122+
- Monitor validation performance
123+
124+
## Advanced Usage
125+
126+
### Custom Layer Integration
127+
128+
```python
129+
from kdp.custom_layers import MultiResolutionTabularAttention
130+
import tensorflow as tf
131+
132+
# Create custom model with multi-resolution attention
133+
numerical_inputs = tf.keras.Input(shape=(num_numerical, numerical_dim))
134+
categorical_inputs = tf.keras.Input(shape=(num_categorical, categorical_dim))
135+
136+
attention_layer = MultiResolutionTabularAttention(
137+
num_heads=4,
138+
d_model=64,
139+
embedding_dim=32,
140+
dropout_rate=0.1
141+
)
142+
143+
num_attended, cat_attended = attention_layer(numerical_inputs, categorical_inputs)
144+
combined = tf.keras.layers.Concatenate(axis=1)([num_attended, cat_attended])
145+
outputs = tf.keras.layers.Dense(1)(combined)
146+
147+
model = tf.keras.Model(
148+
inputs=[numerical_inputs, categorical_inputs],
149+
outputs=outputs
150+
)
151+
```
152+
153+
### Layer Factory Usage
154+
155+
```python
156+
from kdp.layers_factory import PreprocessorLayerFactory
157+
158+
attention_layer = PreprocessorLayerFactory.multi_resolution_attention_layer(
159+
num_heads=4,
160+
d_model=64,
161+
embedding_dim=32,
162+
dropout_rate=0.1,
163+
name="custom_multi_attention"
164+
)
165+
```
166+
167+
## Performance Considerations
168+
169+
1. **Memory Usage**:
170+
- MultiResolutionTabularAttention is more memory-efficient for categorical features
171+
- Uses lower-dimensional embeddings for categorical data
172+
- Consider batch size when using multiple attention heads
173+
174+
2. **Computational Cost**:
175+
- Standard TabularAttention: O(n²) for n features
176+
- MultiResolutionTabularAttention: O(n_num² + n_cat²) for numerical and categorical features
177+
- Balance between resolution and performance
178+
179+
3. **Training Tips**:
180+
- Start with smaller dimensions and increase if needed
181+
- Monitor memory usage and training time
182+
- Use gradient clipping to stabilize training
183+
184+
## References
185+
186+
- [Attention Is All You Need](https://arxiv.org/abs/1706.03762) - Original transformer paper
187+
- [TabNet: Attentive Interpretable Tabular Learning](https://arxiv.org/abs/1908.07442) - Attention for tabular data
188+
- [Heterogeneous Graph Attention Network](https://arxiv.org/abs/1903.07293) - Multi-type attention mechanisms

0 commit comments

Comments
 (0)