Skip to content

Commit 8e7d0a7

Browse files
committed
docs(KDP): smart processing for custom pipelines
1 parent d4fc5f3 commit 8e7d0a7

File tree

4 files changed

+21
-4
lines changed

4 files changed

+21
-4
lines changed

docs/distribution_aware_encoder.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Distribution-Aware Encoder
22

33
## Overview
4-
The Distribution-Aware Encoder is an advanced preprocessing layer that automatically detects and handles various types of data distributions. It uses TensorFlow Probability (tfp) for accurate modeling and applies specialized transformations while preserving the statistical properties of the data.
4+
The **Distribution-Aware Encoder** is an advanced preprocessing layer that automatically detects and handles various types of data distributions. It leverages TensorFlow Probability (tfp) for accurate modeling and applies specialized transformations while preserving the statistical properties of the data.
55

66
## Features
77

@@ -80,7 +80,7 @@ The Distribution-Aware Encoder is an advanced preprocessing layer that automatic
8080

8181
### Basic Usage
8282

83-
The capability only works with numerical features!
83+
The Distribution-Aware Encoder works seamlessly (and only) with numerical features. Enable it by setting `use_distribution_aware=True` in the `PreprocessingModel`.
8484

8585
```python
8686
from kdp.processor import PreprocessingModel

docs/features.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Explore various methods to define numerical features tailored to your needs:
1313
"feat1": "float",
1414
"feat2": "FLOAT",
1515
"feat3": "FLOAT_NORMALIZED",
16-
"feat3": "FLOAT_RESCALED",
16+
"feat4": "FLOAT_RESCALED",
1717
...
1818
}
1919
```
@@ -50,7 +50,7 @@ Explore various methods to define numerical features tailored to your needs:
5050
"feat3": NumericalFeature(
5151
name="feat3",
5252
feature_type=FeatureType.FLOAT_DISCRETIZED,
53-
bin_boundaries=[(1, 10)],
53+
bin_boundaries=[0.0, 1.0, 2.0],
5454
),
5555
"feat4": NumericalFeature(
5656
name="feat4",
@@ -60,6 +60,10 @@ Explore various methods to define numerical features tailored to your needs:
6060
}
6161
```
6262

63+
### 📊 **Distribution-Aware Encoding**
64+
65+
Enhance your numerical feature processing by leveraging the **Distribution-Aware Encoder**. This allows automatic or manual detection of data distributions, applying appropriate transformations to preserve the integrity and statistical properties of your data.
66+
6367
Here's how the numeric preprocessing pipeline looks:
6468

6569
![Numeric Feature Pipeline](imgs/num_feature_pipeline.png)

docs/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,8 @@ features_specs = {
6565
ppr = PreprocessingModel(
6666
path_data="data/my_data.csv",
6767
features_specs=features_spec,
68+
use_distribution_aware=True, # Enable Distribution-Aware Encoding
69+
distribution_aware_bins=1000, # Set number of bins for finer data encoding
6870
)
6971
# construct the preprocessing pipelines
7072
ppr.build_preprocessor()

docs/kdp_overview.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,17 @@ KDP (Keras Data Processor) is a powerful preprocessing library designed to strea
3333
- Dynamic feature filtering
3434
- Interpretable weights
3535

36+
### 4. 📈 **Distribution-Aware Encoder**
37+
- **Automatic Distribution Detection**
38+
- Identifies underlying data distributions (e.g., Normal, Heavy-Tailed, Multimodal, etc.)
39+
- Applies specialized transformations to preserve statistical properties
40+
- **Adaptive Transformations**
41+
- Learns optimal parameters during training
42+
- Adjusts to data distribution changes dynamically
43+
- **Robust Handling**
44+
- Manages sparse and periodic data effectively
45+
- Ensures numerical stability across transformations
46+
3647
## 🏗️ Architecture Overview
3748

3849
```mermaid

0 commit comments

Comments
 (0)