Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve AppVeyor xfails #393

Closed
bbengfort opened this issue Apr 23, 2018 · 2 comments
Closed

Resolve AppVeyor xfails #393

bbengfort opened this issue Apr 23, 2018 · 2 comments
Labels
os: windows specifically related to Windows priority: low no particular rush in addressing type: technical debt work to optimize or generalize code

Comments

@bbengfort
Copy link
Member

In #386 we added an AppVeyor configuration but mostly resolved image comparison failures by marking them as xfail (thinking they were the product of different operating systems producing different types of images).

In reality, there are a number of images that have some variability and have increased tolerances that actually do not support tests (see any tolerance of >=10) in the code.

Once we get #379 working we can start to diagnose these issues in detail, and hopefully start resolving the xfail markers to ensure our tests run on all platforms.

@nickpowersys
Copy link
Contributor

nickpowersys commented Jun 9, 2019

About 2/3 of xfails and/or Windows checks can be removed completely, without any win_tol

After removing the x and y labels and legends in #823, I analyzed the number of tests on Appveyor with failing image comparisons. Just for analysis purposes, I modified the Windows checks so that they did not lead to xfails (replacing win32 with a nonsense replacement string). This allowed them to fail if the RMSE was greater than the default value of tol.

Based on the passing and failing image comparisons, 55 of the checks for sys.platform == win32 are no longer needed, based on the updates to baseline images with x and y labels and legends removed in #823.

Remaining 1/3 currently xfail on both Windows and Linux conda

The 31 other cases are xfailing because they contain text outside of the types of text already removed. They are xfailing with the reason IS_WINDOWS_OR_CONDA in issue #892, as they fail in both Windows AND conda environments.

Freetype can address Linux conda and Windows conda

The matplotlib testing package has been validated on Appveyor Miniconda and Travis Miniconda CI builds. There is currently only a conda package.

Any of the tests with "P" below is passing, so its check for Windows and any xfail can be removed.

Conda tends to have a slightly higher RMSE between the two Python distributions, it is therefore shown first.

Shown below are P or F (pass or fail), the test name or image, and the RMSEs if failing.

Builds for the analysis:

PyPI/pip Python: https://ci.appveyor.com/project/nickpowersys/yellowbrick/builds/25047957
Anaconda: https://ci.appveyor.com/project/nickpowersys/yellowbrick/builds/25049306

P(ass)/F(ail), Test, conda RMSE, PyPI Python RMSE

tests/test_base.py
F,test_draw_visualizer_grid,8.102,7.905
F,test_draw_with_rows,6.764,6.486
F,test_draw_with_cols,5.387,5.221

test_classifier/test_classification_report.py
P,test_multiclass_class_report
P,test_pandas_integration

test_classifier/test_confusion_matrix.py
P,test_confusion_matrix
P,test_no_classes_provided
F,test_pandas_integration,3.193,3.176
F,test_quick_method,2.417,2.417

test_classifier/test_prcurve.py
P,test_binary_probability
F,test_binary_probability_decision,4.274,4.274
P,test_binary_decision
F,test_custom_iso_f1_scores,4.181,4.181
F,test_multiclass_probability,4.729,4.729
P,test_multiclass_probability_with_class_labels
F,test_quick_method 4.725,4.725
P,test_no_scoring_function
P,test_quick_method_with_test_set

test_classifier/test_threshold.py
P,test_binary_discrimination_threshold
P,test_pandas_integration
P,test_binary_discrimination_threshold_alt_args

test_cluster/test_elbow.py
F,test_distortion_metric 0.024,NA
P,test_silhouette_metric
P,test_calinski_harabaz_metric
F,test_timings 2.249,0.698

test_cluster/test_icdm.py
F,test_kmeans_mds,4.641,4.211
F,test_affinity_tsne_no_legend,1.557,1.557
F,test_quick_method 1.48,1.48

test_cluster/test_silhouette.py
P,test_integrated_kmeans_silhouette
P,test_integrated_mini_batch_kmeans_silhouette
P,test_colormap_silhoutte
P,test_colors_silhouette

test_contrib/test_classifier/test_boundaries.py
P,test_real_data_set_viz

tests/test_features/test_jointplot.py
F,test_columns_none_x_y,2.471,NA
F,test_columns_single_int_index_numpy,2.787,NA
F,test_columns_single_str_index_pandas,4.513,NA
F,test_columns_single_int_index_numpy_hist,5.469,NA
F,test_columns_single_str_index_pandas_hist,4.835,NA

test_features/test_pca.py
P,test_pca_decomposition_quick_method
P,test_scale_true_2d
P,test_biplot_2d
P,test_biplot_3d

test_features/test_radviz.py
F,test_integrated_radviz_with_pandas,0.272,1.991
P,test_integrated_radviz_with_numpy
P,test_integrated_radviz_pandas_classes_features
P,test_integrated_radviz_numpy_classes_features

test_features/test_rankd.py
F,test_rank2d_pearson,1.339,0.494
F,test_rank2d_covariance,1.339,0.494
F,test_rank2d_spearman,1.339,0.494
F,test_rank2d_kendalltau,1.339,0.494
F,test_rank2d_integrated_pandas,1.339,0.494
F,test_rank2d_integrated_numpy,1.339,0.494

test_features/test_rfecv.py
P,test_rfecv_classification
P,test_quick_method
P,test_pandas_integration
P,test_numpy_integration

test_model_selection/test_learning_curve.py
P,test_classifier
P,test_regressor
P,test_quick_method
P,test_pandas_integration

test_model_selection/test_validation_curve.py
P,test_classifier
P,test_quick_method
P,test_pandas_integration

test_regressor/test_alphas.py
P,test_similar_image

test_regressor/test_residuals.py
F,test_residuals_plot,1.410,1.384
P,test_residuals_plot_no_histogram
F,test_residuals_quick_method,1.384,1.384
F,test_residuals_plot_pandas,1.411,1.411

test_target/test_feature_correlation.py
P,test_feature_correlation_integrated_pearson
P,test_feature_correlation_integrated_mutual_info_regression
P,test_feature_correlation_integrated_mutual_info_classification
P,test_feature_correlation_quick_method

test_text/test_freqdist.py
F,test_integrated_freqdist,1.401

test_text/test_umap.py
P,test_make_pipeline
P,test_integrated_umap
P,test_sklearn_umap_size
P,test_sklearn_umap_title
P,test_custom_title_umap
P,test_custom_size_umap
P,test_custom_colors_umap
P,test_make_classification_umap
P,test_make_classification_umap_class_labels
P,test_umap_mismatched_labels
P,test_no_target_umap
P,test_visualizer_with_pandas
P,test_alpha_param

@bbengfort
Copy link
Member Author

We've just settled into a mode of xfail handling for different environments; so going to go ahead and close this since it's not on our roadmap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
os: windows specifically related to Windows priority: low no particular rush in addressing type: technical debt work to optimize or generalize code
Projects
None yet
Development

No branches or pull requests

2 participants