dashi v0.1.0
Release v0.1.0 – Initial Version
We are excited to introduce dashi, a powerful Python library for dataset shift analysis and characterization!
This first release provides robust tools for analyzing temporal and multi-source dataset shifts, enabling both supervised and unsupervised evaluations to detect, understand, and mitigate changes in data distributions.
Key Features
Supervised Characterization
- Train classification/regression models (Random Forests) on batched data (temporal or multi-source).
- Analyze how dataset shifts impact model performance and pinpoint potential degradation areas.
Unsupervised Characterization
- Detect temporal dataset shifts by estimating statistical distributions over time.
- Project these distributions onto non-parametric statistical manifolds to reveal hidden trends and latent temporal variability.
Visualization Tools
To facilitate exploration and interpretation, dashi includes:
- Data Temporal Heatmaps (DTHs) – Visualize temporal shifts in data distributions.
- Information Geometric Temporal (IGT) plots – Embed temporal batches onto latent statistical manifolds for deeper insights.
- Multi-batch Contingency Matrices – Compare multiple evaluation metrics (F1-Score, Recall, Precision, AUC, etc.) across pairwise batches (temporal/multi-source).
This release provides a foundation for dataset shift analysis, helping researchers and data practitioners monitor and understand data integrity over time.
Installation
pip install dashi