Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conda packages and image test-driven text removal for Miniconda on Appveyor for #744 and #690 #823

Merged
merged 15 commits into from Jun 7, 2019

Conversation

nickpowersys
Copy link
Contributor

@nickpowersys nickpowersys commented Apr 21, 2019

Summary

To address the failing imports in Windows VMs with Miniconda:

The Matplotlib backend configuration has been addressed by #862. mpl.use('agg') has been working as intended, as explained in that PR.

To minimize the probability of image comparison tests failing in general:

  • removed x and y labels
  • changed default value of remove_legend to True
  • updated baseline images

Before removing the labels and legends, approximately 25% of Windows image comparison tests have required an elevated value of tol in order to pass (227 tests have the check if sys.platform == 'win32'), out of 894 image comparison tests.

When 'win32' is changed to 'wwwww' purely for analysis purposes, ignoring the Windows-specific tol values, the number of failing tests with regular Python on Appveyor is at 24, and with Miniconda, the number is 31. Therefore, the number of Windows tests with RMSE values >0.1 has dropped roughly 90%. Follow-up efforts can remove the instances of Windows-specific tol values.

Builds for above analysis:
PyPI/pip Python: https://ci.appveyor.com/project/nickpowersys/yellowbrick/builds/25047957
Anaconda: https://ci.appveyor.com/project/nickpowersys/yellowbrick/builds/25049306

To create the conda environment, I used tests/requirements.txt.
The conda packages were determined by the conda solver.
The name of the environment is yellowbrick. It is activated directly following the installation of the packages so that any further commands with the environment are done from the new environment.

In general, it can be expected that conda environments are used primarily with conda packages, and therefore using conda packages in Miniconda builds can help to verify conda users' and developers' experience through tests using these environments, including those involving the backend.

@bbengfort
Copy link
Member

@nickpowersys thank you so much for taking a look at this and diving in with the update! We don't have very many Windows/Anacondas users on the core contributor side, so we struggle with supporting it and really appreciate your PR. Also, I apologize if I end up asking basic questions for clarification as a result.

To summarize the solution to make sure we're on the same page - if using miniconda (identified by the MINICONDA_VERSION envvar), then running the tests in a conda environment ensures that the DLLs associated with the miniconda install are correctly loaded? If not using miniconda, then simply proceed with the pip/python mechanism, which has not been changed. The agg backend cropped up when making this change as per #690 - so the solution was to modify the configuration as per that discussion.

It looks like the tests are now failing due to increased RMSE in the conda environment. The failing tests all appear to have an elevated tolerance if sys.platform == win32. When we previously had access to a Windows machine we took a look at the image comparisons and found that there were font and aliasing issues that may have contributed to the error; would you be willing to do a minor spot check on the conda environment and see if that's also the case? Do you know if it is possible to check from within Python whether or not the Python version you're using is anaconda/miniconda?

To answer your questions:

  • We should probably handle the failing tests in this PR as they seem to be related to the conda environment; I'm hoping there is an easy configuration solution.
  • Documentation about locating the configuration would be helpful; I'm also willing to add the MPLBACKEND=agg to the tests to supersede all configurations, as suggested in Some YB tests fail with conda Python 3.7 & Matplotlib 3.x #690.

Thank you again for taking a look into this - it's a relief that we're moving forward and may be able to test miniconda on AppVeyor again!

@nickpowersys
Copy link
Contributor Author

@bbengfort Thank you for your feedback! I'm glad this is useful. About the conda environment, imports and DLLs, and using pip for non-conda VMs, yes.

I would be happy to look at aliasing that may be occurring. It may take me until next weekend to investigate.

It is possible to determine within Python whether a regular Python or Anaconda Python is being used. I could reply with more detail after ensuring the method is reliable.

@nickpowersys
Copy link
Contributor Author

nickpowersys commented Apr 26, 2019

I tried uncommenting several settings for antialiasing in matplotlibrc, which did not result in any changes in the number of tests failing (16) on a Windows machine with a conda environment. I looked at a pair of images (test_binary_probability.png) and could not see aliasing.

It is possible to detect whether the active Python is in a conda environment using Python, based on whether a directory specific to conda installations ('conda-meta') is part of the active environment path.

Using an environment variable for the backend seems straightforward to apply for testing purposes.

@nickpowersys
Copy link
Contributor Author

nickpowersys commented Apr 30, 2019

@bbengfort Would you recommend an approach of xfails for the failing tests in conda environments for now? For this PR, I would resolve those that are failing on Windows (there are 16). The RMSE are well below the tol of 10 mentioned in #393.

To understand possible causes for the image comparison failures, I ran builds on Travis for Python 3.6 and 3.7 Miniconda environments with conda packages on Ubuntu Xenial, and image comparison tests failed there as well (those were the only types of failing tests on Linux).

Therefore, there seems to be something associated with conda packages or dependencies that is causing the comparisons to fail. The xfails would be based on the Anaconda interpreter being active.

@nickpowersys
Copy link
Contributor Author

There is a possibility that Freetype is the reason. I can investigate this before going to xfails.

#690 (comment)

@nickpowersys
Copy link
Contributor Author

nickpowersys commented May 9, 2019

While there are some differences in fonts between baseline and test images in failing tests, text is not found in the baseline and test images in general. I will need to investigate further to see why there is text in failing tests.

Before comparisons are done, the ImageComparison class has a cleanup() function that removes the title and ticks on the x, y and z axes with a NullFormatter https://github.com/DistrictDataLabs/yellowbrick/blob/develop/tests/base.py

As an example passing test, the baseline images in https://github.com/DistrictDataLabs/yellowbrick/blob/develop/tests/baseline_images/test_classifier/test_class_prediction_error/test_class_prediction_error_quickmethod.png have all text removed. The images in the corresponding actual_images directory also do not have text.

@bbengfort
Copy link
Member

@nickpowersys I'm sorry it's taken me so long to respond; we've been bogged down in PRs and issues and many of us were at PyCon so your messages got lost in all of the GitHub notices; I apologize.

I really do appreciate you taking a look at this for us. We really do want to support our Windows/Anaconda users but unfortunately, none of us are Windows/Anacondas users, so it makes it tricky.

To answer some of your earlier questions:

  • We do modify the rcParams in yellowbrick.style - perhaps these rcParam modifications are overriding what you're doing with anti-aliasing? I'd be happy to take a look if you mention the specific parameters.

  • Even though we do try to strip out as much text as possible, occasionally there is still some text that gets in. E.g. the numeric labels in classification reports. My sense is that text is probably where things are going wrong.

  • Thank you for investigating freetype - I do recommend the use of xfail if a conda environment is detected. Of course, we are hoping for a better solution but having many tests with conda is better than having None!

How can I assist you in getting this to the end with the xfails? Do you want to investigate the style overrides for anti-aliasing? Also, sorry if I missed this, but do you have the tests passing locally on your windows machine? I'd be interested in the difference between the baseline and actual images on Windows if you were looking at them.

@bbengfort
Copy link
Member

@nickpowersys - on #690 there was some discussion about using the testing version of matplotlib to unify the freetype dependency with anaconda. It can be installed using conda-forge as follows:

conda install -c conda-forge/label/testing matplotlib

The version may be a bit behind, (currently 3.0.2) but it might be worth it so we don't have to xfail all these tests. Would you be willing to try this in this PR?

@nickpowersys
Copy link
Contributor Author

nickpowersys commented May 14, 2019

x and y labels were not being removed for the image comparison tests. The failing Windows Miniconda tests were at 16 after addressing the backend for conda. I will update the rest of the baseline images to remove the text using Linux.

Legends are not removed by default. [Note: This PR changed the default to True.] If you have any thoughts on removing legends either way, please let me know. I could look at Freetype if tests were still failing and there was a need to preserve legend text.

Aside: before using Windows on some projects, I used Ubuntu until hardware issues affected my previous laptop. Otherwise, I do development on a Mac.

@bbengfort
Copy link
Member

@nickpowersys thanks for sticking with this! I realize that any changes to the image similarity tests generally affects every single baseline image. If you need help regenerating the images, please let @ndanielsen or I know!

@nickpowersys
Copy link
Contributor Author

nickpowersys commented May 17, 2019

@bbengfort

I have

  • removed x and y labels for all image comparisons
  • removed legends by default if one exists
  • updated the baseline images

Then, there were 5 failing tests in tests/test_features/test_jointplot.py on the Miniconda builds. Within that test file, I marked these as xfail conditional on a conda environment, independent of platform.

The diffs created through test_jointplot.py show vertical gridlines that were not aligned, and some subtle aliasing otherwise. I did not see yellowbrick.style imported in that module. This PR partially addresses #690.

@ndanielsen
Copy link
Contributor

@nickpowersys thanks for pinging me. I'm getting caught up on this issue at the moment. I plan to deep dive into this over the next couple days.

@nickpowersys
Copy link
Contributor Author

nickpowersys commented May 19, 2019

This PR helps to address conda on Travis and #690.

Miniconda on Travis: After x, y labels were removed, there were 73 image comparison tests failing with conda on Linux. Therefore, there were also RMSE differences between conda environments and regular Python on Linux.

More analysis is needed to quantify tests failing on Linux conda environments with the new baseline images that have legends removed. There should be significantly fewer failing image comparisons. An assessment can determine if the causes for failing are the same on Travis as on Appveyor as the causes for image comparison tests on Windows environments with RMSE>0.1.

@nickpowersys
Copy link
Contributor Author

nickpowersys commented May 20, 2019

Freetype: I ran a Travis build after installing the testing version of matplotlib 3.0.2 into a conda environment. However, I saw no change in the number of tests passing/failing.

@ndanielsen
Copy link
Contributor

@nickpowersys Thanks for your patience and comments over the weekend. Please accept my apologies that we were unable to immediately respond to them.

As I mentioned in #823 (comment), I'm doing a deep dive on these windows testing issues today, and hopefully I can offer some guidance soon.

Definitely want to acknowledge that you've done a ton of work on this issue, and we're grateful for it.

@ndanielsen
Copy link
Contributor

ndanielsen commented May 21, 2019

@nickpowersys I spent a good chunk of today diving into several of the issues that this PR addresses. Also another similar issue popped up in this PR: #854 due to changes in matplotlib==3.1.0 where the backend seems like it can no longer be set programmatically as is done here:

matplotlib.use('Agg')

The matplotlibrc line in your appveyor config pointed me to solution for the issue in #854

I think that the best way to manage the backend for testing is creating a matplotlibrc file in the project root and setting backend : agg inside as described here in the docs . This might also be a more transparent way to set other attributes for testing, but more research on that would be required.

When I did this on one of my branches, tests were passing on travis, appveyor, my macbook, my ubuntu box, and windows10 in a virtualbox. (the stars seemed to be aligned on this) With matplotlib==3.1.0, there were some failing image comparison tests in test_class_balance and test_rfecv but that is expected behavior when dependencies get a bump up. When I generated new images, there were no problems.

What do you think of this as a solution? I know that you've done a ton of work on these issues, I want you to get your credit as a contributor. If you're okay with this do you mind opening a separate PR for #690 adding a matplotlibrc file in the project root and also allowing matplotlib==3.1.0 plus generating any new baseline images?

After that, let's put our heads together on highly focused PR on #744 and address the miniconda issues. I like your clever solution for removing the x, y labels. I've also been tinkering around with a separate windows_tolerance metric for when the test suite is run on windows machines. I think that it is a better solution than an expected fail. Also when I ran the test suite locally on my ubuntu box in a miniconda, the image failure diff was the result of the images being rendered a few pixels off --- so there are some breadcrumbs to follow on this.

Thanks again for your efforts on these issues. A robust testsuite and cross-platform CI is highly important for a growing library like yellowbrick.

@bbengfort
Copy link
Member

@ndanielsen and @nickpowersys I know you guys are cranking away at this and I want to say thank you to you both! @ndanielsen is the original author of our image similarity tests ported from matplotlib, so you're in good hands now @nickpowersys - good luck and let me know if you need me to chime in at any point!

@ndanielsen
Copy link
Contributor

@nickpowersys thanks for the response. I hope that you are enjoying the conference. They definitely take a lot out of you. I feel like I'm just starting to get my normal levels of energy back after spending about a week at pycon. =)

I am block time to work on this and other yellowbrick testing related issues on Friday.

For setting the MPL backend and related configuration for testing, I think that part of the solution needs to keep the contributor experience in mind in addition to CI configuration. Whatever the end solution is, I think that we should strive to make running the testsuite on mac, linux and windows as easy as possible along the lines of this workflow:

git clone http://github.com/../yellowbrick.git
cd yellowbrick
python3 -m venv yb-dev && source yb-dev/bin/activate 
pip install -r tests/requirements.txt
make test

I think one of the first goals is to have a painless contributor experience and we can extend the CI solutions(s) around that, but I'm definitely not tied to any particular solution. =)

As for structuring this effort, let's try to keep the PRs small, focused and minimally complex so that we can incrementally add value. The best approach might be to slowly "paint these problems into a corner", and (re)-build up cross-platform testingsuite support is the way to go.

Again, thanks for your continued follow up on these issues. This is probably one of the most important issues on the board as it directly impacts how easy it is to maintain the project and make contributions to the project easy.

Enjoy the rest of your conference. 🥇

@ndanielsen
Copy link
Contributor

ndanielsen commented May 24, 2019

@nickpowersys I agree with your suggestion for limiting this just to appveyor.yml. Please go ahead =)

I hope it would be okay if we keep the scope and approach for 823 to specifying the appveyor.yml in order to have a conda environment and packages installed, and then use xfails with the failing tests. The number of xfails was limited to 5 after the x and y labels were removed. It would still require me to submit a commit or two to ensure that all tests and baseline images that are now part of develop are handled.

@nickpowersys nickpowersys changed the title Conda packages and 'agg' MPL backend for Miniconda on Appveyor for #744 and #690 Conda packages and image test-driven text removal for Miniconda on Appveyor for #744 and #690 May 28, 2019
@nickpowersys
Copy link
Contributor Author

nickpowersys commented Jun 1, 2019

@ndanielsen I pulled develop through and including the PR for the added rankd test (866) then did an interactive rebase (I confirmed that PR 866 is in the git log for the appveyorminiconda branch).

The updates reflect the removal of both x and y labels, as well as legends.

Before proceeding further, I wanted to confirm that it will be okay to push with a regular force to origin/appveyorminiconda (not force-with-lease). I would do git push origin appveyorminiconda -f. Whatever you recommend, I will do.

The only baseline_image that I did not understand and did not update (when I tried, the image comparison did not fail as intended) is test_scatter_image_fail. The test is passing with the existing baseline image, which still has labels and a legend.

Thank you!

@nickpowersys
Copy link
Contributor Author

nickpowersys commented Jun 2, 2019

There were Appveyor build failures for regular Python and Miniconda due to test collection failures with tests/test_features/test_base.py.

By excluding pytest 4.6.0 through pytest>=4.2.0, !=4.6.0, the test collection works. The build used 4.6.1.

@nickpowersys
Copy link
Contributor Author

nickpowersys commented Jun 3, 2019

Although tests pass with mpl 3.0.0 the reason for it being excluded was that images broke silently. Therefore, the last commit in the PR recognizes this and maintains the exclusion.

Copy link
Contributor

@ndanielsen ndanielsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nickpowersys great job on this so far.

This is really close. I'm getting this error when I run it on Ubuntu with pyenv python-3.6.3.

______________________ DispersionPlotTests.test_dispersion_plot_annotate_docs _______________________

self = <tests.test_text.test_dispersion.DispersionPlotTests testMethod=test_dispersion_plot_annotate_docs>

    def test_dispersion_plot_annotate_docs(self):
        """
        Assert no errors occur during DispersionPlot integration
        with annotate_docs parameter turned on
        """
        text = [doc.split() for doc in corpus.data]
        target_words = ['girl', 'she', 'boy', 'he', 'man']
    
        visualizer = DispersionPlot(target_words, annotate_docs=True)
        visualizer.fit(text)
        visualizer.ax.grid(False)
    
>       self.assert_images_similar(visualizer, tol=25)
E       yellowbrick.exceptions.ImageComparisonFailure: images not close (RMS 25.070)
E       tests/actual_images/test_text/test_dispersion/test_dispersion_plot_annotate_docs.png
E       	vs
E       tests/baseline_images/test_text/test_dispersion/test_dispersion_plot_annotate_docs.png

tests/test_text/test_dispersion.py:108: ImageComparisonFailure

tests/base.py Show resolved Hide resolved
tests/test_features/test_jointplot.py Outdated Show resolved Hide resolved
@ndanielsen
Copy link
Contributor

For the record, here's the actual image drawn in ubuntu and the error diff:

test_dispersion_plot_annotate_docs

test_dispersion_plot_annotate_docs-failed-diff

@nickpowersys
Copy link
Contributor Author

nickpowersys commented Jun 4, 2019 via email

@nickpowersys
Copy link
Contributor Author

nickpowersys commented Jun 4, 2019

If I could merge from develop (I understand the conflict needs to be removed first?), then I could regenerate the new baseline images from test_draw.py that were added and the PR will hopefully be finalized with that update.

@ndanielsen
Copy link
Contributor

@nickpowersys I'll see what I can do on this tomorrow.

@ndanielsen
Copy link
Contributor

@nickpowersys I think that I figured this one out and got it passing in a PR on my personal fork.

Following the github instructions...

 $ git checkout -b nickpowersys-appveyorminiconda develop
 $ git pull git://github.com/nickpowersys/yellowbrick.git appveyorminiconda
 $ git checkout develop
 $ git merge --no-ff nickpowersys-appveyorminiconda

the only change that I was that instead of git push origin develop, I merged it back into the feature branch like this:

 $ git checkout appveyorminiconda
 $ git merge develop

Followed by a git push as appropriate.

Hope this helps!

if self.remove_legend:
if self.remove_labels:
self.ax.set_xlabel("")
self.ax.set_ylabel("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just note that this solves the xlabel, ylabel issue that has popped up in mtpl 3.x

@ndanielsen
Copy link
Contributor

@nickpowersys I've encountered this #git-problem a couple times where there are protected branches.

I think that the problem context is simple that tests/baseline_images/test_draw/test_manual_legend.png was updated ~2 days ago, and your current branch doesn't have those changes.

The solution is to merge in the development branch with a special git-fu and get tests passing without merge conflicts. As this PR has 197 files changed (large scope) and there are other PRs under active development, there is a lot of overhead and potential conflicts in keeping this open.

Outside of that, thanks again for your work on this PR and others. I'm starting to actively reference some of the changes in other PRs with other contributors and it's solved specially matplotlib 3.x problems that we didn't know that we had.

@nickpowersys
Copy link
Contributor Author

I went through the steps 1-6 (through git merge develop), then ran make test locally on Linux. 759 tests passed, and none failed. Then git push.

@nickpowersys
Copy link
Contributor Author

📢 A huge thank you @ndanielsen!

@ndanielsen ndanielsen merged commit a39df0d into DistrictDataLabs:develop Jun 7, 2019
@ndanielsen
Copy link
Contributor

@nickpowersys thanks for your big efforts on this! We sincerely hope to keep you as a contributor on yellowbrick!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AppVeyor with Conda/Miniconda Broken
3 participants