New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Polish to Missing Value Vizualizers in Contrib #668
Conversation
""" | ||
|
||
def setUp(self): | ||
super(TestMissingBarVisualizer, self).setUp() | ||
super(MissingBarVisualizerTestCase, self).setUp() | ||
self.tol = 0.01 | ||
if os.name == 'nt': # Windows | ||
self.tol = 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In another PR, I'm going to move these tolerance settings into VisualTestCase
so they can be more generally shared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea, e.g. if tol is None then use these default tolerances based on the OS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great polish! In this case, its the little things like fixtures and reference removals that make a huge impact to the overall quality and robustness of the code. Thanks for continuing to knock away at these visualizers!
Is your plan to move these visualizers out of contrib at some point? If so, can I ask what your criteria for moving them out of contrib is?
|
||
|
||
@pytest.fixture(scope='class') | ||
def missingdata(request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great fixture!
def test_viz_properties(self): | ||
""" | ||
Integration test of visualizer with pandas | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this test require pandas?
""" | ||
|
||
def setUp(self): | ||
super(TestMissingBarVisualizer, self).setUp() | ||
super(MissingBarVisualizerTestCase, self).setUp() | ||
self.tol = 0.01 | ||
if os.name == 'nt': # Windows | ||
self.tol = 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea, e.g. if tol is None then use these default tolerances based on the OS.
viz.poof() | ||
|
||
self.assert_images_similar(viz, tol=self.tol) | ||
|
||
|
||
def test_missingvaluesbar_numpy_with_y_target(self): | ||
""" | ||
Integration test of visualizer with numpy without target y passed in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*with target y passed in
viz.poof() | ||
|
||
self.assert_images_similar(viz, tol=self.tol) | ||
|
||
|
||
def test_missingvaluesbar_numpy_with_y_target_with_labels(self): | ||
""" | ||
Integration test of visualizer with numpy without target y passed in | ||
but no class labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*with y passed in and class labels
@@ -41,20 +41,25 @@ def setUp(self): | |||
if os.name == 'nt': # Windows | |||
self.tol = 5.0 | |||
|
|||
@pytest.mark.skipif(pd is None, reason="test requires pandas") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as before - why does this test require pandas? Sorry if I'm missing something obvious!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm making an explicit test cases for support of pandas objects.
|
||
features = [str(n) for n in range(20)] | ||
classes = ['Class A', 'Class B'] | ||
viz = MissingValuesDispersion(features=features, classes=classes) | ||
viz.fit(X, y=y) | ||
viz.fit(self.missingdata.X, y=self.missingdata.y) | ||
viz.poof() | ||
|
||
self.assert_images_similar(viz, tol=self.tol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome clean up of the tests - the fixture definitely made the tests much simpler to read!
|
||
else: | ||
# add in counting of np.nan per target y by column | ||
nan_counts = [] | ||
for target_value in np.unique(self.y): | ||
for target_value in np.unique(y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for removing the data storage on the class! I think this is going to make the missing values visualizer much more performant!
@bbengfort my personal criteria for hoisting these visualizers into the main package are:
|
I'm going to address the comments here and then I'll merge in. @bbengfort thanks as always for the great code review! |
@ndanielsen those sound like good criteria and I'm excited that we're close to getting them into the main package! What would you think about putting these visualizers under a new package called As for the quick methods, I agree that these should be hoisted to prime time. See #600 which is already underway in #669. These issues change the quick method API a tad (e.g. returning a visualizer) but also ensure consistency across all quick methods. Once that is done, we need to find a way to include them more informatively in the documentation. I'm not sure whether to autodoc them or to create a special section for them (or a special format in the API documentation for them). Also one quick comment; see #673 -- there might be a minor merge collision with this PR. Would you mind working with @Kautumn06 to make sure both PRs are merged easily? |
Hey there @ndanielsen - is this something you're still interested in working on? It's just occurring to me that the develop branch is starting to diverge somewhat significantly from your PR branch (and it's about to change a lot more with #687, #669, etc), so wanted to double check with you before things get too tangly! |
Hi @ndanielsen! Given that the |
This PR is addressing comments in the original PR for the missing values bar and dispersion visualizers.
The original PR is located here: #519