Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Addressing issue 874: Added Projection Visualizer #908

Merged
merged 19 commits into from Jul 17, 2019

Conversation

naresh-bachwani
Copy link
Contributor

@naresh-bachwani naresh-bachwani commented Jul 4, 2019

This PR handles #874 . I updated the tests/features/base.py, which now contains ProjectionVisualizer. It unifies the functionality of PCA and Manifold. It contains the following features:

  1. It handles both discrete and continuous variables.
  2. It handles both 2d and 3d plots.
  3. It manually draws colorbar for 2d plots.

TODOs and questions

  • Update to handle more features(like LabelEncoder)
  • Add docstring.
  • Create tests

Copy link
Member

@bbengfort bbengfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naresh-bachwani good start - and a lot of good things for discussion! Let's keep pressing forward, making sure we've got docstrings and tests to go along with this visualizer as well!

yellowbrick/features/base.py Outdated Show resolved Hide resolved
yellowbrick/features/base.py Outdated Show resolved Hide resolved
## Projection Visualizers
##########################################################################

class ProjectionVisualizer(DataVisualizer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should put this in yellowbrick/features/projection.py? base.py is getting a little full at this point!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to wrap this up, I'd like to make sure we have a complete docstring that describes what this base class does for other yellowbrick developers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will write the docstring and move it to projection.py!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm in the airport I'm having trouble with my Internet connection and can't easily check the source code - is there already a projection.py? (I'm hoping there is not).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, currently we do not have projection.py!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok awesome, I'm now thinking about the following organization:

yellowbrick
└── features
|   ├── __init__.py
|   └── projection
|   |   ├── __init__.py
|   |   ├── base.py
|   |   ├── manifold.py
|   |   ├── pca.py
|   ├── base.py
|   ├── decomposition.py

But let me think about this a little more.

yellowbrick/features/base.py Outdated Show resolved Hide resolved
yellowbrick/features/base.py Outdated Show resolved Hide resolved

elif self._target_color_type == TargetType.CONTINUOUS:
if(self.proj_dim==3):
self.fig.colorbar(self._scatter, ax=self.ax)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't rely on there being a self.fig on the visualizer. In this code because that only happens if proj_dim == 3; but generally since we won't have a figure object to work with.

self._scatter = self.ax.scatter(X[:,0], X[:,1], **scatter_kwargs)

if self.proj_dim == 3:
self.fig = plt.figure()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than creating a new figure, I'd prefer if we used plt.gcf (get current figure) - and also I don't think we should store the fig on the visualizer as discussed in #826.


if self.proj_dim == 3:
self.fig = plt.figure()
self.ax = self.fig.add_subplot(111, projection='3d')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the case where the user has passed in an Axes object? Let's refactor this to override the Visualizer.ax property and do appropriate checks as needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bbengfort,
Can you elaborate on what is needed here? I understand that when user passes Axes we get axes lines on the plot. To deal with that we just need to clear the figure and redefine the Axes. Is that what you mean here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a couple of things here, maybe I'm misunderstanding - but there is a difference between Axes and axis spines (see anatomy of a matplotlib plot). The Axes has nothing to do with lines on the plot, it is a part of the figure that is drawn upon.

Consider the following use case:

fig, axes = plt.subplots(ncols=2, projection='3d', figsize=(18,6))

Manifold(ax=axes[0], projection='3d').fit_transform(X, y)
PCA(ax=axes[1], projection='3d').fit_transform(X, y)

In this code, the figure is composed of two subplots represented by two Axes objects. The user is asking the Manifold visualization to be drawn on the left axes and the PCA visualization to be drawn on the right axes.

Because of this we can't just create a figure or an axes on our own - we have to respect what the user passes in if they do, and only create the axes if nothing has been supplied.

Does that help clear up the confusion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! Great! But what if the user passes non-3d axes? We can't draw 3d plots on that! Also, I could not find methods to convert 2d to 3d axes. Will we raise an error in this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good point - we should see what happens! If matplotlib raises a sensible error, then we'll keep that error, or wrap it in a YellowbrickError. Otherwise matplotlib might just plot it in 2D, I'm not sure - so let's find out!

yellowbrick/features/base.py Outdated Show resolved Hide resolved
X = self.transformed_features_
scatter_kwargs = self._determine_scatter_kwargs(y);

if self.proj_dim == 2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we divide this up into _draw_2d and _draw_3d methods to make the organization a bit easier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this, but I think that the method is quite small and understandable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, in that case let's leave it as it is!

@@ -60,8 +60,8 @@ def fit(self, X, y=None, **kwargs):

self.y = y

super(MissingDataVisualizer, self).fit(X, y, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably do still want the call to super right? To ensure that the target type is identified and that the class and feature names are collected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump. Or is this going into #923?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! I think that dealing with this in #923 will make more sense.

Copy link
Member

@bbengfort bbengfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really coming together, thank you so much for your hard work on this! The tests look great and are really well done.

I think the only remaining thing is the transfomer property on the visualizer. Right now it's accessed on fit() and transform() but without any other indication that it was supposed to be there. I had considered offering suggestions of creating a set_transformer function or using self._transfomer ... however I think that these are going to cause problems with our subclasses as well.

Instead what I recommend is that we remove the fit() and transform() methods completely from this visualizer. They don't really do anything except call the self.transformer method and our subclasses will likely modify that behavior. However, the subclasses will need to call fit super in order to set the classes, features, and detect the target type; so having this in the middle is going to cause problems when there is no self.transformer on the class. All the ProjectionVisualizer needs to know is how to draw Xp with 2 or 3 dimensions, and then the subclasses can call draw in their fit or transform methods.

This will necessarily change your testing methodology. So instead of relying on the transformer property in tests, you'll set up the subclass just as a Yellowbrick developer would use it:

class MockVisualizer(ProjectionVisualizer):

    def __init__(self, n_components=2):
        self.transformer = PCA(n_components, random_state=2019)
  
    def fit(self, X, y):
        super(MockVisualizer, self).fit(X, y)
        self.transformer.fit(X)
        return self 
    
     def transform(self, X, y):
         Xp = self.transformer.transform(X)
         self.draw(Xp, y)
         return Xp


class TestProjectionVisualizer(VisualTestCase):
    
    def test_discrete_plot(self):
        X, y = self.discrete
        oz = MockVisualizer()
        Xp = oz.fit_transform(X, y)
        self.assert_images_similar(oz)

Note that this does make some of my comments in the test section obsolete.

@@ -60,8 +60,8 @@ def fit(self, X, y=None, **kwargs):

self.y = y

super(MissingDataVisualizer, self).fit(X, y, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump. Or is this going into #923?

yellowbrick/features/pcoords.py Show resolved Hide resolved
tests/test_features/test_projection.py Outdated Show resolved Hide resolved
tests/test_features/test_projection.py Outdated Show resolved Hide resolved
tests/test_features/test_projection.py Outdated Show resolved Hide resolved
tests/test_features/test_projection.py Outdated Show resolved Hide resolved
yellowbrick/features/projection.py Outdated Show resolved Hide resolved
yellowbrick/features/projection.py Outdated Show resolved Hide resolved
yellowbrick/features/projection.py Outdated Show resolved Hide resolved
yellowbrick/features/projection.py Outdated Show resolved Hide resolved
@bbengfort
Copy link
Member

@naresh-bachwani when you've made the changes and are ready - let me know and I'll do the final run through of the code with the test images, etc.

@naresh-bachwani
Copy link
Contributor Author

Well I like your suggestion related to fit() and transform(). I will make the necessary changes and let you know!

Copy link
Member

@bbengfort bbengfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great and I'm glad the new MockVisualizer was able to smooth things to get the tests to pass! I have some minor cleanup that I'm going to push and then when the tests pass I'll get this merged in. I'll also quickly toss-up #923 for you to review so that you can have that to work with as you move forward.

In terms of organizing things for the next step, I think what I'd like to do is to move projection.py to yellowbrick/features/projection/base.py and move manifold.py and pca.py to yellowbrick/features/projection but make sure they are directly importable from yellowbrick.features (e.g. make sure the import is correct in __init__.py. We can then move decomposition.py to yellowbrick/features/pca.py and have ExplainedVariance be a tad more accessible. Don't worry - we'll go over this as you move to the next PR!

Great work!

@bbengfort bbengfort merged commit 21eb9d2 into DistrictDataLabs:develop Jul 17, 2019
@naresh-bachwani naresh-bachwani deleted the projectviz branch July 22, 2019 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants