Skip to content

dashi v0.3.0

Choose a tag to compare

@dfernar dfernar released this 09 Mar 11:17
· 39 commits to main since this release

dashi v0.3.0

Highlights

🚀 New dimensionality reduction: SVD is now available as a dimensionality reduction method for numerical data, alongside PCA, MCA, and FAMD.

🌲 Histogram Gradient Boosting models: A new model family (histogram_gradient_boosting) is now supported in estimate_multibatch_models, offering faster training and native categorical feature handling. This family of models perform better than the standard Random Forest for large datasets.

📊 PR-AUC metric: Precision-Recall AUC is now reported per class and as a macro average in classification tasks, complementing the existing ROC-AUC metrics.

💾 Memory optimization: The format_data function now supports inplace=True to avoid copying large DataFrames, and data type downcasting is available for further memory savings.

What's Changed

New Features

  • SVD as a dimensionality reduction method for numerical data.
  • Histogram Gradient Boosting for classification and regression in estimate_multibatch_models via model_type='histogram_gradient_boosting'. Note: Histogarm Gradient Boosting is now the default model. Random Forest can be selected via model_type='random_forest'.
  • PR-AUC classification metric (per class and macro average).
  • inplace parameter in format_data for memory-efficient transformation.
  • Data type downcasting support for memory optimization.

Bug Fixes

  • Fixed data type recognition when creating supports for variable distribution estimation.
  • Fixed bugs in the supervised characterization pipeline that decreased model performance.
  • All datetime units (datetime64[ns], [us], [ms], [s]) are now correctly recognized.
  • Fixed label misalignment in estimate_conditional_data_temporal_map when using start_date or end_date parameters.
  • Corrected various incorrectly raised or suppressed warnings.

Dependency Updates

  • plotly compatibility expanded from ==5.18.0 to >=5.18.0,<6.0.0.
  • scikit-learn compatibility expanded from ==1.5.1 to >=1.5.1,<2.0.0.

⚠️ Upgrade Notes

  • Dependency versions: This release widens the accepted versions for plotly and scikit-learn. If you pin exact versions in your environment, you may need to update them.
  • No breaking API changes: All existing code should continue to work without modification. New features are additive (new parameters with backward-compatible defaults).

Installation

pip install --upgrade dashi