New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Addressing issue 874: Added Projection Visualizer #908
[WIP] Addressing issue 874: Added Projection Visualizer #908
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@naresh-bachwani good start - and a lot of good things for discussion! Let's keep pressing forward, making sure we've got docstrings and tests to go along with this visualizer as well!
yellowbrick/features/base.py
Outdated
## Projection Visualizers | ||
########################################################################## | ||
|
||
class ProjectionVisualizer(DataVisualizer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should put this in yellowbrick/features/projection.py
? base.py
is getting a little full at this point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also to wrap this up, I'd like to make sure we have a complete docstring that describes what this base class does for other yellowbrick developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will write the docstring and move it to projection.py
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I'm in the airport I'm having trouble with my Internet connection and can't easily check the source code - is there already a projection.py
? (I'm hoping there is not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, currently we do not have projection.py!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok awesome, I'm now thinking about the following organization:
yellowbrick
└── features
| ├── __init__.py
| └── projection
| | ├── __init__.py
| | ├── base.py
| | ├── manifold.py
| | ├── pca.py
| ├── base.py
| ├── decomposition.py
But let me think about this a little more.
yellowbrick/features/base.py
Outdated
|
||
elif self._target_color_type == TargetType.CONTINUOUS: | ||
if(self.proj_dim==3): | ||
self.fig.colorbar(self._scatter, ax=self.ax) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't rely on there being a self.fig
on the visualizer. In this code because that only happens if proj_dim
== 3; but generally since we won't have a figure
object to work with.
yellowbrick/features/base.py
Outdated
self._scatter = self.ax.scatter(X[:,0], X[:,1], **scatter_kwargs) | ||
|
||
if self.proj_dim == 3: | ||
self.fig = plt.figure() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than creating a new figure, I'd prefer if we used plt.gcf
(get current figure) - and also I don't think we should store the fig on the visualizer as discussed in #826.
yellowbrick/features/base.py
Outdated
|
||
if self.proj_dim == 3: | ||
self.fig = plt.figure() | ||
self.ax = self.fig.add_subplot(111, projection='3d') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the case where the user has passed in an Axes
object? Let's refactor this to override the Visualizer.ax
property and do appropriate checks as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @bbengfort,
Can you elaborate on what is needed here? I understand that when user passes Axes
we get axes lines on the plot. To deal with that we just need to clear the figure and redefine the Axes
. Is that what you mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a couple of things here, maybe I'm misunderstanding - but there is a difference between Axes
and axis spines
(see anatomy of a matplotlib plot). The Axes
has nothing to do with lines on the plot, it is a part of the figure that is drawn upon.
Consider the following use case:
fig, axes = plt.subplots(ncols=2, projection='3d', figsize=(18,6))
Manifold(ax=axes[0], projection='3d').fit_transform(X, y)
PCA(ax=axes[1], projection='3d').fit_transform(X, y)
In this code, the figure is composed of two subplots represented by two Axes objects. The user is asking the Manifold visualization to be drawn on the left axes and the PCA visualization to be drawn on the right axes.
Because of this we can't just create a figure or an axes on our own - we have to respect what the user passes in if they do, and only create the axes if nothing has been supplied.
Does that help clear up the confusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! Great! But what if the user passes non-3d axes? We can't draw 3d plots on that! Also, I could not find methods to convert 2d to 3d axes. Will we raise an error in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very good point - we should see what happens! If matplotlib raises a sensible error, then we'll keep that error, or wrap it in a YellowbrickError. Otherwise matplotlib might just plot it in 2D, I'm not sure - so let's find out!
yellowbrick/features/base.py
Outdated
X = self.transformed_features_ | ||
scatter_kwargs = self._determine_scatter_kwargs(y); | ||
|
||
if self.proj_dim == 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we divide this up into _draw_2d
and _draw_3d
methods to make the organization a bit easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do this, but I think that the method is quite small and understandable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, in that case let's leave it as it is!
@@ -60,8 +60,8 @@ def fit(self, X, y=None, **kwargs): | |||
|
|||
self.y = y | |||
|
|||
super(MissingDataVisualizer, self).fit(X, y, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably do still want the call to super
right? To ensure that the target type is identified and that the class and feature names are collected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump. Or is this going into #923?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! I think that dealing with this in #923 will make more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really coming together, thank you so much for your hard work on this! The tests look great and are really well done.
I think the only remaining thing is the transfomer
property on the visualizer. Right now it's accessed on fit()
and transform()
but without any other indication that it was supposed to be there. I had considered offering suggestions of creating a set_transformer
function or using self._transfomer
... however I think that these are going to cause problems with our subclasses as well.
Instead what I recommend is that we remove the fit()
and transform()
methods completely from this visualizer. They don't really do anything except call the self.transformer
method and our subclasses will likely modify that behavior. However, the subclasses will need to call fit super in order to set the classes, features, and detect the target type; so having this in the middle is going to cause problems when there is no self.transformer
on the class. All the ProjectionVisualizer
needs to know is how to draw Xp
with 2 or 3 dimensions, and then the subclasses can call draw in their fit
or transform
methods.
This will necessarily change your testing methodology. So instead of relying on the transformer
property in tests, you'll set up the subclass just as a Yellowbrick developer would use it:
class MockVisualizer(ProjectionVisualizer):
def __init__(self, n_components=2):
self.transformer = PCA(n_components, random_state=2019)
def fit(self, X, y):
super(MockVisualizer, self).fit(X, y)
self.transformer.fit(X)
return self
def transform(self, X, y):
Xp = self.transformer.transform(X)
self.draw(Xp, y)
return Xp
class TestProjectionVisualizer(VisualTestCase):
def test_discrete_plot(self):
X, y = self.discrete
oz = MockVisualizer()
Xp = oz.fit_transform(X, y)
self.assert_images_similar(oz)
Note that this does make some of my comments in the test section obsolete.
@@ -60,8 +60,8 @@ def fit(self, X, y=None, **kwargs): | |||
|
|||
self.y = y | |||
|
|||
super(MissingDataVisualizer, self).fit(X, y, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump. Or is this going into #923?
@naresh-bachwani when you've made the changes and are ready - let me know and I'll do the final run through of the code with the test images, etc. |
Well I like your suggestion related to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great and I'm glad the new MockVisualizer
was able to smooth things to get the tests to pass! I have some minor cleanup that I'm going to push and then when the tests pass I'll get this merged in. I'll also quickly toss-up #923 for you to review so that you can have that to work with as you move forward.
In terms of organizing things for the next step, I think what I'd like to do is to move projection.py
to yellowbrick/features/projection/base.py
and move manifold.py
and pca.py
to yellowbrick/features/projection
but make sure they are directly importable from yellowbrick.features
(e.g. make sure the import is correct in __init__.py
. We can then move decomposition.py
to yellowbrick/features/pca.py
and have ExplainedVariance
be a tad more accessible. Don't worry - we'll go over this as you move to the next PR!
Great work!
This PR handles #874 . I updated the
tests/features/base.py
, which now contains ProjectionVisualizer. It unifies the functionality of PCA and Manifold. It contains the following features:TODOs and questions