Add Polish to Missing Value Vizualizers in Contrib #668

ndanielsen · 2018-12-10T00:22:58Z

This PR is addressing comments in the original PR for the missing values bar and dispersion visualizers.

The original PR is located here: #519

ndanielsen · 2018-12-10T21:16:12Z

tests/test_contrib/test_missing/test_bar.py

    """

    def setUp(self):
-        super(TestMissingBarVisualizer, self).setUp()
+        super(MissingBarVisualizerTestCase, self).setUp()
        self.tol = 0.01
        if os.name == 'nt': # Windows
            self.tol = 0.5


In another PR, I'm going to move these tolerance settings into VisualTestCase so they can be more generally shared.

I like that idea, e.g. if tol is None then use these default tolerances based on the OS.

bbengfort

Great polish! In this case, its the little things like fixtures and reference removals that make a huge impact to the overall quality and robustness of the code. Thanks for continuing to knock away at these visualizers!

Is your plan to move these visualizers out of contrib at some point? If so, can I ask what your criteria for moving them out of contrib is?

bbengfort · 2018-12-11T13:55:48Z

tests/test_contrib/test_missing/conftest.py

+
+
+@pytest.fixture(scope='class')
+def missingdata(request):


Great fixture!

bbengfort · 2018-12-11T13:57:48Z

tests/test_contrib/test_missing/test_bar.py

+    def test_viz_properties(self):
+        """
+        Integration test of visualizer with pandas
+        """


Why does this test require pandas?

bbengfort · 2018-12-11T13:58:27Z

tests/test_contrib/test_missing/test_bar.py

    """

    def setUp(self):
-        super(TestMissingBarVisualizer, self).setUp()
+        super(MissingBarVisualizerTestCase, self).setUp()
        self.tol = 0.01
        if os.name == 'nt': # Windows
            self.tol = 0.5


I like that idea, e.g. if tol is None then use these default tolerances based on the OS.

bbengfort · 2018-12-11T13:59:37Z

tests/test_contrib/test_missing/test_bar.py

        viz.poof()

        self.assert_images_similar(viz, tol=self.tol)

+
    def test_missingvaluesbar_numpy_with_y_target(self):
        """
        Integration test of visualizer with numpy without target y passed in


*with target y passed in

bbengfort · 2018-12-11T13:59:57Z

tests/test_contrib/test_missing/test_bar.py

        viz.poof()

        self.assert_images_similar(viz, tol=self.tol)

+
    def test_missingvaluesbar_numpy_with_y_target_with_labels(self):
        """
        Integration test of visualizer with numpy without target y passed in
        but no class labels


*with y passed in and class labels

bbengfort · 2018-12-11T14:00:45Z

tests/test_contrib/test_missing/test_dispersion.py

@@ -41,20 +41,25 @@ def setUp(self):
        if os.name == 'nt': # Windows
            self.tol = 5.0

+    @pytest.mark.skipif(pd is None, reason="test requires pandas")


Same question as before - why does this test require pandas? Sorry if I'm missing something obvious!

I'm making an explicit test cases for support of pandas objects.

bbengfort · 2018-12-11T14:01:30Z

tests/test_contrib/test_missing/test_dispersion.py


        features = [str(n) for n in range(20)]
        classes = ['Class A', 'Class B']
        viz = MissingValuesDispersion(features=features, classes=classes)
-        viz.fit(X, y=y)
+        viz.fit(self.missingdata.X, y=self.missingdata.y)
        viz.poof()

        self.assert_images_similar(viz, tol=self.tol)


Awesome clean up of the tests - the fixture definitely made the tests much simpler to read!

bbengfort · 2018-12-11T14:02:38Z

yellowbrick/contrib/missing/bar.py


        else:
            # add in counting of np.nan per target y by column
-            nan_counts = []
-            for target_value in np.unique(self.y):
+            for target_value in np.unique(y):


Thank you for removing the data storage on the class! I think this is going to make the missing values visualizer much more performant!

ndanielsen · 2018-12-11T17:45:45Z

@bbengfort my personal criteria for hoisting these visualizers into the main package are:

For bar viz, adding an option for percent in addition to count when targets are supplied
For dispersion, making sure that it accepts datetime index
Adding the quick method into the documentation. I think that we likely should emphasize quick methods more especially for feature vizualizers

ndanielsen · 2018-12-11T18:10:25Z

I'm going to address the comments here and then I'll merge in. @bbengfort thanks as always for the great code review!

bbengfort · 2018-12-11T21:50:11Z

@ndanielsen those sound like good criteria and I'm excited that we're close to getting them into the main package!

What would you think about putting these visualizers under a new package called yellowbrick.wrangling? Potentially this falls under feature analysis, but I think this along with outlier detection visualizers might be something that would fall under its own package?

As for the quick methods, I agree that these should be hoisted to prime time. See #600 which is already underway in #669. These issues change the quick method API a tad (e.g. returning a visualizer) but also ensure consistency across all quick methods. Once that is done, we need to find a way to include them more informatively in the documentation. I'm not sure whether to autodoc them or to create a special section for them (or a special format in the API documentation for them).

Also one quick comment; see #673 -- there might be a minor merge collision with this PR. Would you mind working with @Kautumn06 to make sure both PRs are merged easily?

rebeccabilbro · 2019-01-31T17:59:50Z

Hey there @ndanielsen - is this something you're still interested in working on? It's just occurring to me that the develop branch is starting to diverge somewhat significantly from your PR branch (and it's about to change a lot more with #687, #669, etc), so wanted to double check with you before things get too tangly!

rebeccabilbro · 2019-02-15T17:21:18Z

Hi @ndanielsen! Given that the develop branch has evolved enough that we'll probably end up with some fairly gnarly conflicts if we try to merge this in now, I'm going to go ahead and archive this PR. Thank you for all the work you put into this and hope we can revisit these contributions sometime in the future; I know it will be a very popular feature!

ndanielsen and others added 3 commits July 31, 2018 10:40

missing values docs

f63ea03

update tests per PR comments

51fb849

Merge remote-tracking branch 'origin/develop' into missingvalues-2

b26e6f9

bbengfort added the review PR is open label Dec 10, 2018

ndanielsen changed the title ~~Add Polish to Missing Value Vizualizers in Contrib~~ Add Polish to Missing Value Vizualizers in Contrib (Work in progress) Dec 10, 2018

ndanielsen added 6 commits December 9, 2018 16:58

enhance: making nan_counts public properties in bar and dispersion viz

3c56347

enhance: fix bar doc string.

c880d54

fix: missingvalues broken tests

e529fcf

fix: X and y no longer stored on the viz instance

f1d2e86

enhance: pull out create_nan_matrix into the missing values base class

0052fa0

enhance: making nan count properties have learned prop convention

fa0a690

ndanielsen changed the title ~~Add Polish to Missing Value Vizualizers in Contrib (Work in progress)~~ Add Polish to Missing Value Vizualizers in Contrib Dec 10, 2018

ndanielsen commented Dec 10, 2018

View reviewed changes

bbengfort approved these changes Dec 11, 2018

View reviewed changes

bbengfort mentioned this pull request Dec 11, 2018

Docs headings #673

Merged

add datetime labels for missing values dispersion viz

28b098c

rebeccabilbro added the gone-stale PR or issue has not seen activity in >30 days and/or develop branch has since diverged significantly label Feb 13, 2019

rebeccabilbro closed this Feb 15, 2019

rebeccabilbro removed the review PR is open label Feb 15, 2019

ndanielsen deleted the missing-value-refactor branch May 20, 2019 20:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Polish to Missing Value Vizualizers in Contrib #668

Add Polish to Missing Value Vizualizers in Contrib #668

ndanielsen commented Dec 10, 2018 •

edited

ndanielsen Dec 10, 2018

bbengfort Dec 11, 2018

bbengfort left a comment

bbengfort Dec 11, 2018

bbengfort Dec 11, 2018

bbengfort Dec 11, 2018

bbengfort Dec 11, 2018

bbengfort Dec 11, 2018

bbengfort Dec 11, 2018

ndanielsen Dec 16, 2018

bbengfort Dec 11, 2018

bbengfort Dec 11, 2018

ndanielsen commented Dec 11, 2018

ndanielsen commented Dec 11, 2018

bbengfort commented Dec 11, 2018

rebeccabilbro commented Jan 31, 2019

rebeccabilbro commented Feb 15, 2019

Add Polish to Missing Value Vizualizers in Contrib #668

Add Polish to Missing Value Vizualizers in Contrib #668

Conversation

ndanielsen commented Dec 10, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bbengfort left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndanielsen commented Dec 11, 2018

ndanielsen commented Dec 11, 2018

bbengfort commented Dec 11, 2018

rebeccabilbro commented Jan 31, 2019

rebeccabilbro commented Feb 15, 2019

ndanielsen commented Dec 10, 2018 •

edited