Addressing issue #589: Adding alpha params to PCA #811

naresh-bachwani · 2019-04-11T17:13:29Z

The following changes were made to PCA referenced #589 :
This PR was partially reviewed in #807

Added new parameter alpha to pca.py to specify the opacity.
Modified test_pca.py to add a test for alpha
The result is attached below:

naresh-bachwani · 2019-04-11T17:19:46Z

Hello @lwgray,
I have made the changes that you suggested in #807.

lwgray

Great job on adding alpha. All my changes are very minor. I am unsure about the test you built and I will wait for a comment from @rebeccabilbro

lwgray · 2019-04-11T18:27:08Z

yellowbrick/features/pca.py

@@ -69,6 +69,10 @@ class PCADecomposition(MultiFeatureVisualizer):
        Optional string or matplotlib cmap to colorize lines.
        Use either color to colorize the lines on a per class basis or
        colormap to color them on a continuous scale.
+
+    alpha : float, default: 1


alpha should be set to 0.75

lwgray · 2019-04-11T18:27:34Z

yellowbrick/features/pca.py

@@ -99,6 +103,7 @@ def __init__(self,
                 proj_features=False,
                 color=None,
                 colormap=palettes.DEFAULT_SEQUENCE,
+                 alpha = 1,


set alpha to 0.75

Tiny thing, but the spacing is off here. We shouldn't have spaces before and after the equals sign for keyword arguments.

lwgray · 2019-04-11T22:51:44Z

yellowbrick/features/pca.py

@@ -215,7 +221,7 @@ def finalize(self, **kwargs):

 def pca_decomposition(X, y=None, ax=None, features=None, scale=True,
                      proj_dim=2, proj_features=False, color=None,
-                      colormap=palettes.DEFAULT_SEQUENCE,
+                      colormap=palettes.DEFAULT_SEQUENCE, alpha=1,


set alpha = 0.75

lwgray · 2019-04-12T00:14:05Z

tests/test_features/test_pca.py

+        visualizer = PCADecomposition(**params).fit(self.dataset.X)
+        pca_array = visualizer.transform(self.dataset.X)
+        alpha=0.3
+        assert visualizer.alpha == alpha


0.3 doesn't need to be assigned to alpha. You should test directly assert visualizer.alpha == 0.3

lwgray · 2019-04-12T00:53:20Z

tests/test_features/test_pca.py

@@ -188,3 +191,26 @@ def test_scale_true_3d_execption(self):
        with pytest.raises(ValueError, match=e):
            pca = PCADecomposition(**params)
            pca.fit(X)
+
+    @mock.patch('yellowbrick.features.pca.plt.sca', autospec=True)


I am unsure about this test, I will wait on a review from @rebeccabilbro

Yes, the test looks good! Patching pyplot's set current axes method (sca ) ensures that when we make the mock on line 207 (which is used to called scatter and retrieve the alpha), matplotlib doesn't complain about it.

rebeccabilbro

Hey there @naresh-bachwani, good progress on this!

In addition to the feedback from @lwgray, I'm noticing that a few of the image comparison tests are failing due to your updates to this visualizer. This is fairly common, and the fix isn't too hard — essentially you'll need to (1) run the tests locally on your machine (2) copy the actual images generated by the tests (3) paste them into the baseline images and (4) commit the new baseline images to this same PR branch. You can find the instructions here. Let us know if you have questions!

rebeccabilbro · 2019-04-12T03:04:18Z

yellowbrick/features/pca.py

@@ -99,6 +103,7 @@ def __init__(self,
                 proj_features=False,
                 color=None,
                 colormap=palettes.DEFAULT_SEQUENCE,
+                 alpha = 1,


Tiny thing, but the spacing is off here. We shouldn't have spaces before and after the equals sign for keyword arguments.

rebeccabilbro · 2019-04-12T03:04:25Z

yellowbrick/features/pca.py

@@ -118,7 +123,7 @@ def __init__(self,
            [('scale', StandardScaler(with_std=self.scale)),
             ('pca', PCA(self.proj_dim, random_state=random_state))]
        )
-
+        self.alpha=alpha


Another spacing thing here (should be self.alpha = alpha)

rebeccabilbro · 2019-04-12T03:06:30Z

yellowbrick/features/pca.py

@@ -280,7 +290,7 @@ def pca_decomposition(X, y=None, ax=None, features=None, scale=True,
    visualizer = PCADecomposition(
        ax=ax, features=features, scale=scale, proj_dim=proj_dim,
        proj_features=proj_features, color=color, colormap=colormap,
-        random_state=random_state, **kwargs
+        alpha=alpha, random_state=random_state,**kwargs


It looks like the spaces were modified here? We should reinsert the space after the last comma and before **kwargs

rebeccabilbro · 2019-04-12T03:11:24Z

tests/test_features/test_pca.py

@@ -188,3 +191,26 @@ def test_scale_true_3d_execption(self):
        with pytest.raises(ValueError, match=e):
            pca = PCADecomposition(**params)
            pca.fit(X)
+
+    @mock.patch('yellowbrick.features.pca.plt.sca', autospec=True)


Yes, the test looks good! Patching pyplot's set current axes method (sca ) ensures that when we make the mock on line 207 (which is used to called scatter and retrieve the alpha), matplotlib doesn't complain about it.

naresh-bachwani · 2019-04-12T10:52:48Z

I changed the baseline images. The test passes on my local computer but is failing here!

rebeccabilbro · 2019-04-12T11:52:52Z

Ok, thanks @naresh-bachwani — we may just need to tweak the comparison tolerances. We'll have a look this weekend and get back to you!

naresh-bachwani · 2019-04-12T12:12:12Z

Ok! Thanks!

lwgray · 2019-04-14T02:42:17Z

@naresh-bachwani i advise that you increase the tolerance for the tests that didn’t pass

naresh-bachwani · 2019-04-14T07:40:49Z

I will increase the tolerance but I had a question.

All these tests pass on my local computer so how can I know in advance if my tests would pass after making a push?

lwgray · 2019-04-14T08:09:48Z

what OS are you using locally?

naresh-bachwani · 2019-04-14T09:29:25Z

Windows10

lwgray · 2019-04-14T12:39:15Z

@naresh-bachwani That might be the reason for the difference . We have known that the test run differently in Windows than in Linux ( the environment our test run in). There isnt a way to know ahead of time if test will run successfully on Travis. The goal is to make them pass on Travis despite what is happening locally.

naresh-bachwani · 2019-04-14T13:39:07Z

Thanks for the help!

rebeccabilbro · 2019-04-14T13:49:56Z

All these tests pass on my local computer so how can I know in advance if my tests would pass after making a push?

@naresh-bachwani — yes, this unfortunately is one of the challenges of doing visual comparison tests. Different operating systems have slightly different versions of fonts, colors, etc, which can cause our tests to pass locally (since the images match on our local OSs) but fail mysteriously in the CI. Our CI solution is to use both Travis and AppVeyor, which allows us to see what will happen in both a linux and windows virtual machine.

Generally my strategy is to run the tests locally, copy and commit any new baseline images, and then push to the PR branch to trigger the CI builds. If any builds fail, the first thing I check is if there are any random_state parameters I can set in the tests to make them run more deterministically (e.g. here, where we set it to 9932). Most scikit-learn estimators seem to have them.

If the random_state has already been set, I'll inspect the Travis or AppVeyor build report to see what the RMS error of the test failure was, and round up to set the tol for the self.assert_images_similar call in the failing test. Occasionally it takes a few times, but it looks like you've figured out good tolerance values in this case to ensure the tests will pass on a range of operating systems! Nice work and thanks for working through this!

naresh-bachwani · 2019-04-14T14:56:05Z

Hello @rebeccabilbro,
Thanks for your help! I had a question. How can I find the random state? Is it through some loop till RMSE decreases below threshold value?

rebeccabilbro · 2019-04-14T15:08:47Z

Hello @rebeccabilbro,
Thanks for your help! I had a question. How can I find the random state? Is it through some loop till RMSE decreases below threshold value?

Great question @naresh-bachwani! The random_state is actually a scikit-learn thing (here's where it's defined for pca, for example) that's just a seed for a random number generator. By default, random_state picks a value at random to seed the relevant algorithm(s) with. But, if you set the random_state to a specific integer value, it will always start with the same seed, which is very helpful for the tests (less so for actual machine learning though ;) ). Generally people just pick an arbitrary number; 42 is popular because many OSS contributors are also Adams fans :D

naresh-bachwani · 2019-04-15T05:43:10Z

Ok so 9932 is just an arbitrary number?

rebeccabilbro · 2019-04-15T11:37:14Z

Ok so 9932 is just an arbitrary number?

Yep!

Adding alpha to PCA

bf856c0

bbengfort added the ready label Apr 11, 2019

Some typos

e14c595

lwgray reviewed Apr 12, 2019

View reviewed changes

rebeccabilbro requested changes Apr 12, 2019

View reviewed changes

Typos+baseline_images

6a9bb56

increased the tolerance

b578076

rebeccabilbro approved these changes Apr 14, 2019

View reviewed changes

Merge branch 'develop' into feature_pca

ea5f737

rebeccabilbro merged commit 0b9bf13 into DistrictDataLabs:develop Apr 14, 2019

rebeccabilbro removed the ready label Apr 14, 2019

rebeccabilbro mentioned this pull request Apr 14, 2019

Addresses Issue #615 : Adding new parameters "Colorbar" and "Heatmap" in PCA visualizer #812

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addressing issue #589: Adding alpha params to PCA #811

Addressing issue #589: Adding alpha params to PCA #811

naresh-bachwani commented Apr 11, 2019 •

edited

naresh-bachwani commented Apr 11, 2019

lwgray left a comment

lwgray Apr 11, 2019

lwgray Apr 11, 2019

rebeccabilbro Apr 12, 2019

lwgray Apr 11, 2019

lwgray Apr 12, 2019

lwgray Apr 12, 2019

rebeccabilbro Apr 12, 2019

rebeccabilbro left a comment

rebeccabilbro Apr 12, 2019

rebeccabilbro Apr 12, 2019

rebeccabilbro Apr 12, 2019

rebeccabilbro Apr 12, 2019

naresh-bachwani commented Apr 12, 2019

rebeccabilbro commented Apr 12, 2019

naresh-bachwani commented Apr 12, 2019

lwgray commented Apr 14, 2019

naresh-bachwani commented Apr 14, 2019

lwgray commented Apr 14, 2019

naresh-bachwani commented Apr 14, 2019

lwgray commented Apr 14, 2019

naresh-bachwani commented Apr 14, 2019

rebeccabilbro commented Apr 14, 2019

naresh-bachwani commented Apr 14, 2019

rebeccabilbro commented Apr 14, 2019

naresh-bachwani commented Apr 15, 2019

rebeccabilbro commented Apr 15, 2019

Addressing issue #589: Adding alpha params to PCA #811

Addressing issue #589: Adding alpha params to PCA #811

Conversation

naresh-bachwani commented Apr 11, 2019 • edited

naresh-bachwani commented Apr 11, 2019

lwgray left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rebeccabilbro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naresh-bachwani commented Apr 12, 2019

rebeccabilbro commented Apr 12, 2019

naresh-bachwani commented Apr 12, 2019

lwgray commented Apr 14, 2019

naresh-bachwani commented Apr 14, 2019

lwgray commented Apr 14, 2019

naresh-bachwani commented Apr 14, 2019

lwgray commented Apr 14, 2019

naresh-bachwani commented Apr 14, 2019

rebeccabilbro commented Apr 14, 2019

naresh-bachwani commented Apr 14, 2019

rebeccabilbro commented Apr 14, 2019

naresh-bachwani commented Apr 15, 2019

rebeccabilbro commented Apr 15, 2019

naresh-bachwani commented Apr 11, 2019 •

edited