Resolve AppVeyor xfails #393

bbengfort · 2018-04-23T13:23:44Z

In #386 we added an AppVeyor configuration but mostly resolved image comparison failures by marking them as xfail (thinking they were the product of different operating systems producing different types of images).

In reality, there are a number of images that have some variability and have increased tolerances that actually do not support tests (see any tolerance of >=10) in the code.

Once we get #379 working we can start to diagnose these issues in detail, and hopefully start resolving the xfail markers to ensure our tests run on all platforms.

The text was updated successfully, but these errors were encountered:

nickpowersys · 2019-06-09T04:28:11Z

About 2/3 of xfails and/or Windows checks can be removed completely, without any win_tol

After removing the x and y labels and legends in #823, I analyzed the number of tests on Appveyor with failing image comparisons. Just for analysis purposes, I modified the Windows checks so that they did not lead to xfails (replacing win32 with a nonsense replacement string). This allowed them to fail if the RMSE was greater than the default value of tol.

Based on the passing and failing image comparisons, 55 of the checks for sys.platform == win32 are no longer needed, based on the updates to baseline images with x and y labels and legends removed in #823.

Remaining 1/3 currently xfail on both Windows and Linux conda

The 31 other cases are xfailing because they contain text outside of the types of text already removed. They are xfailing with the reason IS_WINDOWS_OR_CONDA in issue #892, as they fail in both Windows AND conda environments.

Freetype can address Linux conda and Windows conda

The matplotlib testing package has been validated on Appveyor Miniconda and Travis Miniconda CI builds. There is currently only a conda package.

Any of the tests with "P" below is passing, so its check for Windows and any xfail can be removed.

Conda tends to have a slightly higher RMSE between the two Python distributions, it is therefore shown first.

Shown below are P or F (pass or fail), the test name or image, and the RMSEs if failing.

Builds for the analysis:

PyPI/pip Python: https://ci.appveyor.com/project/nickpowersys/yellowbrick/builds/25047957
Anaconda: https://ci.appveyor.com/project/nickpowersys/yellowbrick/builds/25049306

P(ass)/F(ail), Test, conda RMSE, PyPI Python RMSE

tests/test_base.py
F,test_draw_visualizer_grid,8.102,7.905
F,test_draw_with_rows,6.764,6.486
F,test_draw_with_cols,5.387,5.221

test_classifier/test_classification_report.py
P,test_multiclass_class_report
P,test_pandas_integration

test_classifier/test_confusion_matrix.py
P,test_confusion_matrix
P,test_no_classes_provided
F,test_pandas_integration,3.193,3.176
F,test_quick_method,2.417,2.417

test_classifier/test_prcurve.py
P,test_binary_probability
F,test_binary_probability_decision,4.274,4.274
P,test_binary_decision
F,test_custom_iso_f1_scores,4.181,4.181
F,test_multiclass_probability,4.729,4.729
P,test_multiclass_probability_with_class_labels
F,test_quick_method 4.725,4.725
P,test_no_scoring_function
P,test_quick_method_with_test_set

test_classifier/test_threshold.py
P,test_binary_discrimination_threshold
P,test_pandas_integration
P,test_binary_discrimination_threshold_alt_args

test_cluster/test_elbow.py
F,test_distortion_metric 0.024,NA
P,test_silhouette_metric
P,test_calinski_harabaz_metric
F,test_timings 2.249,0.698

test_cluster/test_icdm.py
F,test_kmeans_mds,4.641,4.211
F,test_affinity_tsne_no_legend,1.557,1.557
F,test_quick_method 1.48,1.48

test_cluster/test_silhouette.py
P,test_integrated_kmeans_silhouette
P,test_integrated_mini_batch_kmeans_silhouette
P,test_colormap_silhoutte
P,test_colors_silhouette

test_contrib/test_classifier/test_boundaries.py
P,test_real_data_set_viz

tests/test_features/test_jointplot.py
F,test_columns_none_x_y,2.471,NA
F,test_columns_single_int_index_numpy,2.787,NA
F,test_columns_single_str_index_pandas,4.513,NA
F,test_columns_single_int_index_numpy_hist,5.469,NA
F,test_columns_single_str_index_pandas_hist,4.835,NA

test_features/test_pca.py
P,test_pca_decomposition_quick_method
P,test_scale_true_2d
P,test_biplot_2d
P,test_biplot_3d

test_features/test_radviz.py
F,test_integrated_radviz_with_pandas,0.272,1.991
P,test_integrated_radviz_with_numpy
P,test_integrated_radviz_pandas_classes_features
P,test_integrated_radviz_numpy_classes_features

test_features/test_rankd.py
F,test_rank2d_pearson,1.339,0.494
F,test_rank2d_covariance,1.339,0.494
F,test_rank2d_spearman,1.339,0.494
F,test_rank2d_kendalltau,1.339,0.494
F,test_rank2d_integrated_pandas,1.339,0.494
F,test_rank2d_integrated_numpy,1.339,0.494

test_features/test_rfecv.py
P,test_rfecv_classification
P,test_quick_method
P,test_pandas_integration
P,test_numpy_integration

test_model_selection/test_learning_curve.py
P,test_classifier
P,test_regressor
P,test_quick_method
P,test_pandas_integration

test_model_selection/test_validation_curve.py
P,test_classifier
P,test_quick_method
P,test_pandas_integration

test_regressor/test_alphas.py
P,test_similar_image

test_regressor/test_residuals.py
F,test_residuals_plot,1.410,1.384
P,test_residuals_plot_no_histogram
F,test_residuals_quick_method,1.384,1.384
F,test_residuals_plot_pandas,1.411,1.411

test_target/test_feature_correlation.py
P,test_feature_correlation_integrated_pearson
P,test_feature_correlation_integrated_mutual_info_regression
P,test_feature_correlation_integrated_mutual_info_classification
P,test_feature_correlation_quick_method

test_text/test_freqdist.py
F,test_integrated_freqdist,1.401

test_text/test_umap.py
P,test_make_pipeline
P,test_integrated_umap
P,test_sklearn_umap_size
P,test_sklearn_umap_title
P,test_custom_title_umap
P,test_custom_size_umap
P,test_custom_colors_umap
P,test_make_classification_umap
P,test_make_classification_umap_class_labels
P,test_umap_mismatched_labels
P,test_no_target_umap
P,test_visualizer_with_pandas
P,test_alpha_param

bbengfort · 2020-04-09T11:49:55Z

We've just settled into a mode of xfail handling for different environments; so going to go ahead and close this since it's not on our roadmap.

bbengfort added priority: low no particular rush in addressing type: technical debt work to optimize or generalize code os: windows specifically related to Windows labels Apr 23, 2018

jtpio mentioned this issue May 15, 2018

Add apt-get update command to the travis script #409

Merged

nickpowersys mentioned this issue Apr 30, 2019

Conda packages and image test-driven text removal for Miniconda on Appveyor for #744 and #690 #823

Merged

nickpowersys mentioned this issue Jul 4, 2019

Update tests to use windows_tol instead of conditional tolerances #911

Open

bbengfort closed this as completed Apr 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve AppVeyor xfails #393

Resolve AppVeyor xfails #393

bbengfort commented Apr 23, 2018

nickpowersys commented Jun 9, 2019 •

edited

bbengfort commented Apr 9, 2020

Resolve AppVeyor xfails #393

Resolve AppVeyor xfails #393

Comments

bbengfort commented Apr 23, 2018

nickpowersys commented Jun 9, 2019 • edited

About 2/3 of xfails and/or Windows checks can be removed completely, without any win_tol

Remaining 1/3 currently xfail on both Windows and Linux conda

Freetype can address Linux conda and Windows conda

Builds for the analysis:

P(ass)/F(ail), Test, conda RMSE, PyPI Python RMSE

bbengfort commented Apr 9, 2020

nickpowersys commented Jun 9, 2019 •

edited