Grid search color plot viz (WIP) #261

pbs929 · 2017-06-12T16:35:14Z

Don't mean to inundate you all with PRs, but I'm having fun :)

I put together a color plot visualizer for grid search - see the ipynb for a demo. Also aiming to create a structure into which other grid search visualizers can fit.

Unit tests are definitely needed... I will come back to this after getting some experience with the testing workflow on my other PRs. General feedback on the approach is welcome in the meantime.

Called for in issue #12.

bbengfort · 2017-06-12T18:34:54Z

@NealHumphrey do you have time to take a look at this by any chance?

@pbs929 keep them coming!

bbengfort · 2017-12-04T21:25:02Z

@pbs929 still interested in finishing this PR? I'm happy to do a review.

pbs929 · 2017-12-05T20:34:33Z

Yes, still interested! As I recall I was waiting on some general feedback on the approach (see demo in the ipynb).

bbengfort

This looks very, very good - and I'm excited to get it into yellowbrick. I'm sorry it took so long to review, it just got buried in my email. Next time feel free to ping me/us so it comes back to our attention!

Let me know your thoughts on my comments below and then I can approve when you're ready.

bbengfort · 2017-12-13T15:20:21Z

yellowbrick/gridsearch/pcolor.py

+    param_1 : string
+        The name of the parameter to be visualized on the horizontal axis.
+
+    param_2 : string


I can see the motivation for param_1 vs param_2 - but because these names are connected to the vertical and horizontal axes, maybe it would be easier to name them "x_param" and "y_param" - I know this is not really the x and y axes but they do it on bar graphs so I think it would be ok here.

sure, easy enough

bbengfort · 2017-12-13T15:29:08Z

yellowbrick/gridsearch/pcolor.py

+    """
+    # Get unique values of the two display parameters
+    x_vals = sorted(list(set(cv_results['param_' + param_1].compressed())))
+    y_vals = sorted(list(set(cv_results['param_' + param_2].compressed())))


Ok, so just to make sure I understand what's happening here - you're fetching the param mask from the cv results dictionary, compressing it to get the real values instead, then calling set to find all unique values. These are the categories for the grid, right?

I've got a couple of notes:

'param_' + param_1 is used twice, it might be easier to store these in variables and note what they are or just fetch the mask array from the results and store that.

If the parameter doesn't exist in the grid search, this will return a KeyError - it would be better if we raised a yellowbrick key error, e.g. YellowbrickKeyError(YellowbrickError, KeyError) that says "parameter not found in grid search" so it's more obvious to the user what happened.

Yes, that's right - I've added a couple comment lines to try to clarify and stored the masked array like you suggested in (1). Also did what you suggested in (2).

bbengfort · 2017-12-13T15:34:24Z

yellowbrick/gridsearch/pcolor.py

+    # This is a n_x by n_y array of lists with `None` in place of empties
+    # (my kingdom for a dataframe...)
+    all_scores = [[None for _ in range(n_x)] for _ in range(n_y)]
+    for x, y, score in zip(idx_x, idx_y, cv_results['mean_test_score']):


It feels like we should allow the user to choose which metric as an input to the visualizer (which defaults to mean_test_score but could also be - mean_train_score, mean_fit_time, and mean_score_time.

bbengfort · 2017-12-13T15:34:51Z

yellowbrick/gridsearch/pcolor.py

+        if x is not None and y is not None:
+            if all_scores[y][x] is None:
+                all_scores[y][x] = []
+            all_scores[y][x].append(score)


I certainly understand the motivation for creating this intermediate data structure, but is it not possible to compute max score at this point?

bbengfort · 2017-12-13T15:36:30Z

yellowbrick/gridsearch/pcolor.py

+## Dimension reduction utility
+##########################################################################
+
+def param_projection(cv_results, param_1, param_2):


This function seems extremely useful for all GridSearchVisualizers - perhaps this should be in base.py -- possibly as a method of GridSearchVisualizer?

Or as a standalone function, with an aliased (static?) method in GridSearchVisualizer?

Also: future work - add more params?

Alright, I moved it up to the base module and aliased it in the base GridSearchVisualizer class. I didn't make the aliased method static, rather letting it access the gridsearch model object as self.estimator.

Definitely could add more params as future work.

bbengfort · 2017-12-13T15:37:15Z

yellowbrick/gridsearch/base.py

+## Base Grid Search Visualizer
+##########################################################################
+
+class GridSearchVisualizer(ModelVisualizer):


Very good set up of this into the Yellowbrick class hierarchy!

bbengfort · 2017-12-13T15:37:26Z

yellowbrick/gridsearch/base.py

+        Should return an error if it isn't.
+        """
+        # A bit of type checking
+        if not is_gridsearch(model):


bbengfort · 2017-12-13T15:38:33Z

yellowbrick/gridsearch/pcolor.py

+        )
+
+        # Mask nans so that they can be filled with a hatch
+        data = np.ma.masked_invalid(best_scores)


I like this a lot! I'm going to have to use this in other things! The hatch nans thing I mean.

bbengfort · 2017-12-13T15:40:23Z

yellowbrick/gridsearch/pcolor.py

+        self.ax.set_yticklabels(y_vals, rotation=45)
+
+        # Add the colorbar
+        cb = self.ax.figure.colorbar(mesh, None, self.ax)


This is a running debate that I hope you might weigh in on: should color bars have the range (min, max) or should they have the range (0, 1)? Should this be an option we supply to the user?

Arguments for (min, max):

they create greater color disparity so it's easier to tell values apart

They bound the area you're looking in

Arguments for (0,1):

You can easily compare multiple heatmaps with each other

The closer min is to max, the more similar they appear in this as opposed to the other.

@ndanielsen @rebeccabilbro -- just wanted to highlight this discussion for you again here.

In this case I'd advocate for min, max. With grid search, you can use any of a number of error metrics, which may or may not be bounded by 1. Furthermore when fine tuning parameters you may be optimizing over the 3rd or 4th decimal of the error metric, so fixing the range on a larger scale would make the visualization not very useful.

bbengfort · 2017-12-13T15:41:07Z

yellowbrick/gridsearch/pcolor.py

+        cb = self.ax.figure.colorbar(mesh, None, self.ax)
+        cb.outline.set_linewidth(0)
+
+        self.ax.set_aspect("equal")


Nice - glad you included this!

* develop: Feature visualizers clean up missing data to ensure visualization works DistrictDataLabs#304

pbs929 · 2018-02-03T02:33:07Z

Alright @bbengfort I think I've addressed your comments. Let me know if you catch anything else - I'm pretty tied up these days with a newborn but I will try to find the time. :)

bbengfort

@pbs929 This looks great thank you so much! I've approved the PR and will merge it into develop. The next steps for this include writing some tests (or even just test stubs) and updating the documentation. I think we can add those to an issue (or update an existing issue, I haven't looked yet).

Thanks again for all your contributions, and congrats on the new baby!

* develop: (36 commits) Grid search color plot viz (DistrictDataLabs#261) Feature visualizers clean up missing data to ensure visualization works DistrictDataLabs#304 Using visualizers on MNIST digits data. (DistrictDataLabs#290) Add pytest into the mix instead of nose (DistrictDataLabs#292) Updated ROC/AUC to correctly check the number of classes Bugfix DistrictDataLabs#288 Fix tests (DistrictDataLabs#293) fixed joint plot tests DistrictDataLabs#161 Created @Property for visualizer size. (DistrictDataLabs#289) Threshold visualizer (DistrictDataLabs#207) Add image similarity tests for feature visualizers (DistrictDataLabs#283) add image similarity test for freqdist (DistrictDataLabs#284) alphas image similarity tests (DistrictDataLabs#282) Add image similarity tests for elbow and silhouette plots (DistrictDataLabs#281) add image similarity tests for confusion matrix (DistrictDataLabs#280) adding image comparison document to the contributing rst (DistrictDataLabs#279) some editorial changes on matplotlib page and sphinx repair to line 184 of rst peplot test stubs effective matplotlib draft updated readme, conda description and pypi description update pypi description ...

pbs929 added 2 commits June 11, 2017 19:16

first working version of gridsearch

239102c

rethink projection method and add quick function

dd029e0

bbengfort added the ready label Jun 12, 2017

add axis labels

329e243

bbengfort requested a review from NealHumphrey June 12, 2017 18:34

Merge branch 'develop' into gridsearch

be1de52

bbengfort reviewed Dec 13, 2017

View reviewed changes

pbs929 added 7 commits February 2, 2018 16:59

Merge branch 'develop' into gridsearch

2637a5b

* develop: Feature visualizers clean up missing data to ensure visualization works DistrictDataLabs#304

move projection method up to base gridsearch module

ce3bb2f

DRYer param projection

58facd6

add key error exceptions

0becae6

allow viz of different grid search metrics

abf8d87

error handling for metric parameter

91f3b36

update demo notebook

e334137

bbengfort approved these changes Feb 3, 2018

View reviewed changes

bbengfort merged commit 03724ed into DistrictDataLabs:develop Feb 3, 2018

bbengfort removed the ready label Feb 3, 2018

This was referenced Feb 3, 2018

Finish GridSearch Color Plot #308

Open

Grid Search Viz #12

Closed

pbs929 deleted the gridsearch branch February 3, 2018 20:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grid search color plot viz (WIP) #261

Grid search color plot viz (WIP) #261

pbs929 commented Jun 12, 2017

bbengfort commented Jun 12, 2017

bbengfort commented Dec 4, 2017

pbs929 commented Dec 5, 2017

bbengfort left a comment

bbengfort Dec 13, 2017

pbs929 Feb 2, 2018

bbengfort Dec 13, 2017

pbs929 Feb 2, 2018

bbengfort Dec 13, 2017

bbengfort Dec 13, 2017

bbengfort Dec 13, 2017

bbengfort Dec 13, 2017

bbengfort Dec 13, 2017

pbs929 Feb 2, 2018

bbengfort Dec 13, 2017

pbs929 Feb 2, 2018

bbengfort Dec 13, 2017

bbengfort Dec 13, 2017

bbengfort Dec 13, 2017

bbengfort Dec 13, 2017

pbs929 Feb 3, 2018

bbengfort Dec 13, 2017

pbs929 Feb 3, 2018

pbs929 commented Feb 3, 2018

bbengfort left a comment

Grid search color plot viz (WIP) #261

Grid search color plot viz (WIP) #261

Conversation

pbs929 commented Jun 12, 2017

bbengfort commented Jun 12, 2017

bbengfort commented Dec 4, 2017

pbs929 commented Dec 5, 2017

bbengfort left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pbs929 commented Feb 3, 2018

bbengfort left a comment

Choose a reason for hiding this comment