Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addressing issue #589: Adding alpha params to PCA #811

Merged
merged 5 commits into from Apr 14, 2019
Merged

Addressing issue #589: Adding alpha params to PCA #811

merged 5 commits into from Apr 14, 2019

Conversation

naresh-bachwani
Copy link
Contributor

@naresh-bachwani naresh-bachwani commented Apr 11, 2019

The following changes were made to PCA referenced #589 :
This PR was partially reviewed in #807

  1. Added new parameter alpha to pca.py to specify the opacity.
  2. Modified test_pca.py to add a test for alpha
    The result is attached below:
    Screenshot (144)

@naresh-bachwani
Copy link
Contributor Author

Hello @lwgray,
I have made the changes that you suggested in #807.

Copy link
Contributor

@lwgray lwgray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job on adding alpha. All my changes are very minor. I am unsure about the test you built and I will wait for a comment from @rebeccabilbro

@@ -69,6 +69,10 @@ class PCADecomposition(MultiFeatureVisualizer):
Optional string or matplotlib cmap to colorize lines.
Use either color to colorize the lines on a per class basis or
colormap to color them on a continuous scale.

alpha : float, default: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alpha should be set to 0.75

@@ -99,6 +103,7 @@ def __init__(self,
proj_features=False,
color=None,
colormap=palettes.DEFAULT_SEQUENCE,
alpha = 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set alpha to 0.75

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny thing, but the spacing is off here. We shouldn't have spaces before and after the equals sign for keyword arguments.

@@ -215,7 +221,7 @@ def finalize(self, **kwargs):

def pca_decomposition(X, y=None, ax=None, features=None, scale=True,
proj_dim=2, proj_features=False, color=None,
colormap=palettes.DEFAULT_SEQUENCE,
colormap=palettes.DEFAULT_SEQUENCE, alpha=1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set alpha = 0.75

visualizer = PCADecomposition(**params).fit(self.dataset.X)
pca_array = visualizer.transform(self.dataset.X)
alpha=0.3
assert visualizer.alpha == alpha
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.3 doesn't need to be assigned to alpha. You should test directly assert visualizer.alpha == 0.3

@@ -188,3 +191,26 @@ def test_scale_true_3d_execption(self):
with pytest.raises(ValueError, match=e):
pca = PCADecomposition(**params)
pca.fit(X)

@mock.patch('yellowbrick.features.pca.plt.sca', autospec=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure about this test, I will wait on a review from @rebeccabilbro

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the test looks good! Patching pyplot's set current axes method (sca ) ensures that when we make the mock on line 207 (which is used to called scatter and retrieve the alpha), matplotlib doesn't complain about it.

Copy link
Member

@rebeccabilbro rebeccabilbro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there @naresh-bachwani, good progress on this!

In addition to the feedback from @lwgray, I'm noticing that a few of the image comparison tests are failing due to your updates to this visualizer. This is fairly common, and the fix isn't too hard — essentially you'll need to (1) run the tests locally on your machine (2) copy the actual images generated by the tests (3) paste them into the baseline images and (4) commit the new baseline images to this same PR branch. You can find the instructions here. Let us know if you have questions!

@@ -99,6 +103,7 @@ def __init__(self,
proj_features=False,
color=None,
colormap=palettes.DEFAULT_SEQUENCE,
alpha = 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny thing, but the spacing is off here. We shouldn't have spaces before and after the equals sign for keyword arguments.

@@ -118,7 +123,7 @@ def __init__(self,
[('scale', StandardScaler(with_std=self.scale)),
('pca', PCA(self.proj_dim, random_state=random_state))]
)

self.alpha=alpha
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another spacing thing here (should be self.alpha = alpha)

@@ -280,7 +290,7 @@ def pca_decomposition(X, y=None, ax=None, features=None, scale=True,
visualizer = PCADecomposition(
ax=ax, features=features, scale=scale, proj_dim=proj_dim,
proj_features=proj_features, color=color, colormap=colormap,
random_state=random_state, **kwargs
alpha=alpha, random_state=random_state,**kwargs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the spaces were modified here? We should reinsert the space after the last comma and before **kwargs

@@ -188,3 +191,26 @@ def test_scale_true_3d_execption(self):
with pytest.raises(ValueError, match=e):
pca = PCADecomposition(**params)
pca.fit(X)

@mock.patch('yellowbrick.features.pca.plt.sca', autospec=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the test looks good! Patching pyplot's set current axes method (sca ) ensures that when we make the mock on line 207 (which is used to called scatter and retrieve the alpha), matplotlib doesn't complain about it.

@naresh-bachwani
Copy link
Contributor Author

I changed the baseline images. The test passes on my local computer but is failing here!

@rebeccabilbro
Copy link
Member

Ok, thanks @naresh-bachwani — we may just need to tweak the comparison tolerances. We'll have a look this weekend and get back to you!

@naresh-bachwani
Copy link
Contributor Author

Ok! Thanks!

@lwgray
Copy link
Contributor

lwgray commented Apr 14, 2019

@naresh-bachwani i advise that you increase the tolerance for the tests that didn’t pass

@naresh-bachwani
Copy link
Contributor Author

I will increase the tolerance but I had a question.

All these tests pass on my local computer so how can I know in advance if my tests would pass after making a push?

Capture1
Capture

@lwgray
Copy link
Contributor

lwgray commented Apr 14, 2019

what OS are you using locally?

@naresh-bachwani
Copy link
Contributor Author

Windows10

@lwgray
Copy link
Contributor

lwgray commented Apr 14, 2019

@naresh-bachwani That might be the reason for the difference . We have known that the test run differently in Windows than in Linux ( the environment our test run in). There isnt a way to know ahead of time if test will run successfully on Travis. The goal is to make them pass on Travis despite what is happening locally.

@naresh-bachwani
Copy link
Contributor Author

Thanks for the help!

@rebeccabilbro
Copy link
Member

All these tests pass on my local computer so how can I know in advance if my tests would pass after making a push?

@naresh-bachwani — yes, this unfortunately is one of the challenges of doing visual comparison tests. Different operating systems have slightly different versions of fonts, colors, etc, which can cause our tests to pass locally (since the images match on our local OSs) but fail mysteriously in the CI. Our CI solution is to use both Travis and AppVeyor, which allows us to see what will happen in both a linux and windows virtual machine.

Generally my strategy is to run the tests locally, copy and commit any new baseline images, and then push to the PR branch to trigger the CI builds. If any builds fail, the first thing I check is if there are any random_state parameters I can set in the tests to make them run more deterministically (e.g. here, where we set it to 9932). Most scikit-learn estimators seem to have them.

If the random_state has already been set, I'll inspect the Travis or AppVeyor build report to see what the RMS error of the test failure was, and round up to set the tol for the self.assert_images_similar call in the failing test. Occasionally it takes a few times, but it looks like you've figured out good tolerance values in this case to ensure the tests will pass on a range of operating systems! Nice work and thanks for working through this!

@rebeccabilbro rebeccabilbro merged commit 0b9bf13 into DistrictDataLabs:develop Apr 14, 2019
@naresh-bachwani
Copy link
Contributor Author

Hello @rebeccabilbro,
Thanks for your help! I had a question. How can I find the random state? Is it through some loop till RMSE decreases below threshold value?

@rebeccabilbro
Copy link
Member

Hello @rebeccabilbro,
Thanks for your help! I had a question. How can I find the random state? Is it through some loop till RMSE decreases below threshold value?

Great question @naresh-bachwani! The random_state is actually a scikit-learn thing (here's where it's defined for pca, for example) that's just a seed for a random number generator. By default, random_state picks a value at random to seed the relevant algorithm(s) with. But, if you set the random_state to a specific integer value, it will always start with the same seed, which is very helpful for the tests (less so for actual machine learning though ;) ). Generally people just pick an arbitrary number; 42 is popular because many OSS contributors are also Adams fans :D

@naresh-bachwani
Copy link
Contributor Author

Ok so 9932 is just an arbitrary number?

@rebeccabilbro
Copy link
Member

Ok so 9932 is just an arbitrary number?

Yep!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants