Skip to content

Latest commit

 

History

History
422 lines (288 loc) · 23.2 KB

NEWS.md

File metadata and controls

422 lines (288 loc) · 23.2 KB

Changelog

development

  • added a way to pass sample_weight to loss functions in model_parts() (variable importance) using weights from dx.Explainer (#563)
  • fixed the visualization of shap_wrapper for shap==0.45.0

v1.7.0 (2024-02-28)

  • increase the dependencies to python>=3.8, pandas>=1.5.0, numpy>=1.23.3 and add python==3.11 to CI
  • added keras.src.models.sequential.Sequential to classes with a known predict_function; it should fix changes in keras==3.0.0 and tensorflow==2.16.0
  • turn off verbose in the predict method of tensorflow/keras models that changed in tensorflow>=2.9.0
  • update the warning occurring when specifying variable_splits (#558)
  • fix an error occuring in predict_profile() when a DataFrame has MultiIndex in pandas>=1.3.0 (#550)
  • fix gaussian norm() calculation in model_profile() from pi*sqrt(2) to sqrt(2*pi)
  • fix a warning (future error) between prepare_numerical_categorical() and prepare_x() with pandas==2.1.0
  • fix a warning (future error) concerning the default value of numeric_only in pandas.DataFrame.corr() in dalex.aspect.calculate_assoc_matrix()

v1.6.0 (2023-02-16)

  • add ZeroDivisionError to precision and recall functions (#532)
  • add a warning to calculate_depend_matrix() when there is a variable with only one value (#537)
  • fix missing EDA plots in (Python) Arena (#544)
  • fix baseline positions in the subplots of the predict parts explanations: BreakDown, Shap (#545)

v1.5.0 (2022-09-07)

This release consists of mostly maintenance updates and, after a year, marks the Beta -> Stable release.

  • increase the dependency from python>=3.6 to python>=3.7 (at this moment, both numpy and pandas depend on python>=3.8), and add python==3.10 to CI
  • increase the dependencies to pandas>=1.2.5, numpy>=1.20.3 (#526), scipy>=1.6.3, plotly>=5.1.0, and tqdm>=4.61.2 due to errors with pandas (see tqdm/#1199)
  • remove the use of pd.Series.append() (#489)
  • remove the use of np.isnan causing error in dalex.fairness (#491)
  • fix iBreakDown plot y-axis labels (#493)
  • stop the Arena's werkzeug server using a clearner and still supported API (#518)

v1.4.1 (2021-11-08)

features

  • added fairness plot for regression models to Arena (dalex/#408)
  • added new facet_scales parameter to AP.plot and CP.plot, which allows to free the y-axis with facet_scales="free" (dalex/#469); consistent with R (DALEX/#468, ingredients/#140)

fixes

  • fixed AP and CP progress bars

v1.4.0 (2021-09-09)

  • added new aspect module, which will focus on groups of dependent variables @krzyzinskim & @arturzolkowski
  • added new scipy>=1.5.4 dependency

breaking changes

  • improved the calculation of AUC, ROC plot (#459)

fixes

  • wrong yaxis labels in VariableImportance.plot(split="variable") (#451)
  • repr_html() didn't work for explanation objects before using the fit method (#449)

features

  • added new Aspect object with the predict_triplot, model_triplot, predict_parts, model_parts, get_aspects methods
  • added new PredictTriplot, ModelTriplot, PredictAspectImportance, ModelAspectImportance objects with the plot method

v1.3.0 (2021-07-17)

features

  • added bias mitigation techniques (resample, reweight, roc_pivot) into the fairness module (#432)

v1.2.0 (2021-05-31)

breaking changes

  • method set_options in Arena now takies option_category instead of plot_type (SHAPValues => ShapleyValues, FeatureImportance => VariableImportance) (#420)
  • methods using the N parameter now properly sample rows from data

fixes

  • fixed wrong error value when no predict_function is found in Explainer (77ca90d)
  • set multiprocessing context to 'spawn' (#412)
  • fixed bug in metric_scores plot that made only one subgroup appear on y-axis (#416)
  • added support for older keras models (#415)

features

  • added a resource mechanism to Arena (#419)
  • added ShapleyValuesImportance and ShapleyValuesDependence plots to Arena (#420)
  • return error instead of NaN when AUC is calculated on observations from one class only (#415)

v1.1.0 (2021-04-18)

breaking changes

  • fixed concurrent random seeds when processes > 1 (#392), which means that the results of parallel computation will vary between v1.1.0 and previous versions

fixes

  • GroupFairnessX.plot(type='fairness_check') generates ticks according to the x-axis range (#409)
  • GroupFainressRegression.plot(type='density') has a more readable hover - only for outliers (#409)
  • BreakDown.plot() wrongly displayed the "+all factors" bar when max_vars < p (#401)
  • GroupFairnessClassification.plot(type='metric_scores') did not handle NaN's (#399)

features

  • Experimental support for regression models in the fairness module. Added GroupFairnessRegression object, with the plot method having two types: fairness_check and density. Explainer.model_fairness method now depends on the model_type attribute. (#391)
  • added N parameter to the predict_parts method which is None by default (#402)
  • epsilon is now an argument of the GroupFairnessClassification object (#397)

v1.0.1 (2021-02-19)

fixes

  • fixed broken range on yaxis in fairness_check plot (#376)
  • warnings because np.float is depracated since numpy v1.20 (#384)

other

  • added ipython to test dependencies

v1.0.0 (2020-12-29)

breaking changes

These are summed up in (#368):

  • rename modules: dataset_level into model_explanations, instance_level into predict_explanations, _arena module into arena
  • use __dir__ method to define autocompletion in IPython environment - show only ['Explainer', 'Arena', 'fairness', 'datasets']
  • add plot method and result attribute to LimeExplanation (use lime.explanation.Explanation.as_pyplot_figure() and lime.explanation.Explanation.as_list())
  • CeterisParibus.plot(variable_type='categorical') now has horizontal barplots - horizontal_spacing=None by default (varies on variable_type). Also, once again added the "dot" for observation value.
  • predict_fn in predict_surrogate now uses predict_function (trying to make it work for more frameworks)

fixes

  • fixed wrong verbose output when any value in y_hat/residuals was an int not float
  • added proper "-" sign to negative dropout losses in VariableImportance.plot

features

  • added geom='bars' to AggregateProfiles.plot to force the categorical plot
  • added geom='roc' and geom='lift' to ModelPerformance.plot
  • added Fairness plot to Arena

other

  • remove colorize from Explainer
  • updated the documentation, refactored code (import modules not functions, unify variable names in object.py, move utils funcitons from checks.py to utils.py, etc.)
  • added license notice next to data

v0.4.1 (2020-12-03)

  • added support for h2o.estimators.* (#332)
  • added tensorflow.python.keras.engine.functional.Functional to the tensorflow list
  • updated the plotly dependency to >=4.12.0
  • code maintenance: yhat, check_data

fixes

  • fixed check_if_empty_fields() used in loading the Explainer from a pickle file, since several checks were changed
  • fixed plot() method in GroupFairnessClassification as it omitted plotting a metric when NaN was present in metric ratios (result)
  • fixed dragons and HR datasets having , delimeter instead of ., which transformed numerical columns into categorical.
  • fixed representation of the ShapWrapper class (removed _repr_html_ method)

features

  • allow for y to be a pandas.DataFrame (converted)
  • allow for data, y to be a H2OFrame (converted)
  • added label parameter to all the relevant dx.Explainer methods, which overrides the default label in explanation's result
  • now using GradientExplainer for tf.keras.engine.sequential.Sequential, added proper warning when shap_explainer_type is None (#366)

defaults

  • unify verbose output of Explainer

v0.4.0 (2020-11-17)

  • added new arena module, which adds the backend for Arena dashboard @piotrpiatyszek

features

  • added new aliases to dx.Explainer methods (#350) in model_parts it is {'permutational': 'variable_importance', 'feature_importance': 'variable_importance'}, in model_profile it is {'pdp': 'partial', 'ale': 'accumulated'}
  • added Arena object for dashboard backend. See https://github.com/ModelOriented/Arena
  • new fairness plot types: stacked, radar, performance_and_fairness, heatmap, ceteris_paribus_cutoff
  • upgraded fairness_check()

v0.3.0 (2020-10-26)

  • added new fairness module, which will focus on bias detection, visualization and mitigation @jakwisn

fixes

  • removed unnecessary warning when precalculate=False and verbose=False (#340)

features

  • added model_fairness method to the Explainer, which performs fairness explanation
  • added GroupFairnessClassification object, with the plot method having two types: fairness_check and metric_scores

defaults

  • added the N=50000 argument to ResidualDiagnostics.plot, which samples observations from the result parameter to omit performance issues when smooth=True (#341)

v0.2.2 (2020-09-21)

  • added support for tensorflow.python.keras.engine.sequential.Sequential and tensorflow.python.keras.engine.training.Model (#326)
  • updated the tqdm dependency to >=4.48.2, pandas dependency to >=1.1.2 and numpy dependency to >=1.18.4

fixes

  • fixed the wrong order of Explainer verbose messages
  • fixed a bug that caused model_info parameter to be overwritten by the default values
  • fixed a bug occurring when the variable from groups was not of str type (#327)
  • fixed model_profile: variable_type='categorical' not working when user passed variables parameter (#329) + the reverse order of bars in 'categorical' plots + (again) added variable_splits_type parameter to model_profile to specify how grid points shall be calculated (#266) + allow for both 'quantile' and 'quantiles' types (alias)

features

  • added informative error messages when importing optional dependencies (#316)
  • allow for data and y to be None - added checks in Explainer methods

defaults

  • wrong parameter name title_x changed to y_title in CeterisParibus.plot and AggregatedProfiles.plot (#317)
  • now warning the user in Explainer when predict_function returns an error or doesn't return numpy.ndarray (1d) (#325)

v0.2.1 (2020-08-31)

  • updated the pandas dependency to >=1.1.0

fixes

  • ModelPerformance.plot now uses a drwhy color palette
  • use unique method instead of np.unique in variable_splits (#293)
  • v0.2.0 didn't export new datasets
  • fixed a bug where predict_parts(type='shap') calculated wrong contributions (#300)
  • model_profile uses observation mean instead of profile mean in _yhat_ centering
  • fixed barplot baseline in categorical model_profile and predict_profile plots (#297)
  • fixed model_profile(type='accumulated') giving wrong results (#302)
  • vertical/horizontal lines in plots now end on the plot edges

features

  • added new type='shap_wrapper' to predict_parts and model_parts methods, which returns a new ShapWrapper object. It contains the main result attribute (shapley_values) and the plot method (force_plot and summary_plot respectively). These come from the shap package
  • Explainer.predict method now accepts numpy.ndarray
  • added the ResidualDiagnostics object with a plot method
  • added model_diagnostics method to the Explainer, which performs residual diagnostics
  • added predict_surrogate method to the Explainer, which is a wrapper for the lime tabular explanation from the lime package
  • added model_surrogate method to the Explainer, which creates a basic surrogate decision tree or linear model from the black-box model using the scikit-learn package
  • added a _repr_html_ method to all of the explanation objects (it prints the result attribute)
  • added dalex.__version__
  • added informative error messages in Explainer methods when y is of wrong type (#294)
  • CeterisParibus.plot(variable_type='categorical') now allows for multiple observations
  • new verbose checks for model_type
  • add type to model_info in dump and dumps for R compatibility (#303)
  • ModelPerformance.result now has label as index

defaults

  • removed _grid_ column in AggregatedProfiles.result and center only works with type=accumulated
  • use Pipeline._final_estimator to extract model_class of the actual model
  • use model._estimator_type to extract model_type if possible

v0.2.0 (2020-08-07)

  • major documentation update (#270)
  • unified the order of function parameters

fixes

  • v0.1.9 had wrong _original_ column in predict_profile
  • vertical_spacing acts as intended in VariableImportance.plot when split='variable'
  • loss_function='auc' now uses loss_one_minus_auc as this should be a descending measure
  • plots are now saved with the original height and width
  • model_profile now properly passes the variables parameter to CeterisParibus
  • variables parameter in predict_profile now can also be a string

features

  • use px.express instead of core plotly to make model_profile and predict_profile plots; thus, enhance performance and scalability
  • added verbose parameter where tqdm is used to verbose progress bar
  • added loss_one_minus_auc function that can be used with loss_function='1-auc' in model_parts
  • added new example data sets: apartments, dragons and hr
  • added color, opacity, title_x parameters to model_profile and predict_profile plots (#236), changed tooltips and legends (#262)
  • added geom='profiles' parameter to model_profile plot and raw_profiles attribute to AggregatedProfiles
  • added variable_splits_type parameter to predict_profile to specify how grid points shall be calculated (#266)
  • added variable_splits_with_obs parameter to predict_profile function to extend split points with observation variable values (#269)
  • added variable_splits parameter to model_profile

defaults

  • use different loss_function for classification and regression (#248)
  • models that use proba yhats now get model_type='classification' if it's not specified
  • use uniform way of grid points calculation in predict_profile and model_profile (see variable_splits_type parameter)
  • add the variable values of new_observation to variable_splits in predict_profile (see variable_splits_with_obs parameter)
  • use N=1000 in model_parts and N=300 in model_profile to comply with the R version
  • keep_raw_permutation is now set to False instead of None in model_parts
  • intercept parameter in model_profile is now named center

v0.1.9 (2020-07-01)

  • feature: added random_state parameter for predict_parts(type='shap') and model_profile for reproducible calculations
  • fix: fixed random_state parameter in model_parts
  • feature: multiprocessing added for: model_profile, model_parts, predict_profile and predict_parts(type='shap'), through the processes parameter
  • fix: significantly improved the speed of accumulated and conditional types in model_profile
  • bugfix: use pd.api.types.is_numeric_dtype() instead of np.issubdtype() to cover more types; e.g. it caused errors with string type
  • defaults: use pd.convert_dtypes() on the result of CeterisParibus to fix variable dtypes and later allow for a concatenation without the dtype conversion
  • fix: variables parameter now can be a single str value
  • fix: number rounding in predict_parts, model_parts (#245)
  • fix: CP calculations for models that take only variables as an input

v0.1.8 (2020-05-28)

  • bugfix: variable_splits parameter now works correctly in predict_profile
  • bugfix: fix baseline for 3+ models in AggregatedProfiles.plot (#234)
  • printing: now rounding numbers in Explainer messages
  • fix: minor checks fixes in instance_level
  • bugfix: AggregatedProfiles.plot now works with groups

v0.1.7 (2020-05-10)

  • feature: parameter N in model_profile can be set to None, to select all observations
  • input: groups and variable parameters in model_profile can be: str, list, numpy.ndarray, pandas.Series
  • fix: check_label returned only a first letter
  • bugfix: removed the conversion of all_variables to str in prepare_all_variables, which caused an error in model_profile (#214)
  • defaults: change numpy data variable names from numbers to strings

v0.1.6 (2020-04-30)

  • fix: change short_name encoding in fifa dataset (utf8->ascii)
  • fix: remove scipy dependency
  • defaults: default loss_root_mean_square in model parts changed to rmse
  • bugfix: checks related to new_observation in BreakDown, Shap, CeterisParibus now work for multiple inputs (#207)
  • bugfix: CeterisParibus.fit and CeterisParibus.plot now work for more types of new_observation.index, but won't work for a bolean type (#211)

v0.1.5 (2020-04-21)

  • feature: add xgboost package compatibility (#188)
  • feature: added model_class parameter to Explainer to handle wrapped models
  • feature: Exaplainer attribute model_info remembers if parameters are default
  • bugfix: variable_groups parameter now works correctly in model_parts
  • fix: changed parameter order in Explainer: model_type, model_info, colorize
  • documentation: model_parts documentation is updated
  • feature: new show parameter in plot methods that (if False) returns plotly Figure (#190)
  • feature: load_fifa() function which loads the preprocessed players_20 dataset
  • fix: CeterisParibus.plot tooltip

v0.1.4 (2020-04-14)

  • feature: new Explainer.residual method which uses residual_function to calculate residuals
  • feature: new dump and dumps methods for saving Explainer in a binary form; load and loads methods for loading Explainer from binary form
  • fix: Explainer constructor verbose text
  • bugfix: B:=B+1 - Shap now stores average results as B=0 and path results as B=1,2,...
  • bugfix: Explainer.model_performance method uses self.model_type when model_type is None
  • bugfix: values in BreakDown and Shap are now rounded to 4 significant places (#180)
  • bugfix: Shap by default uses path='average', sign column is properly updated and bars in plot are sorted by abs(contribution)

v0.1.3 (2020-04-10)

  • release of the dalex package
  • Explainer object with predict, predict_parts, predict_profile, model_performance, model_parts and model_profile methods
  • BreakDown, Shap, CeterisParibus, ModelPerformance, VariableImportance and AggregatedProfiles objects with a plot method
  • load_titanic() function which loads the titanic_imputed dataset