Change plots to use plotnine #112

patrick-miller · 2017-08-22T21:21:02Z

Here I am replacing all of the plots with the ggplot inspired plotnine package. I think that these look good to me, but let me know what you think.

rdvelazquez · 2017-08-22T22:54:23Z

Nice work. These look really good based on my first look at them. A few quick comments:

Should you also revise the utils.py file to remove the functions that we no longer need?
I really like the revision to the probabilities plots at the end of the notebook (showing the plots for all three models). I think labeling the axis would also be very helpful.

dhimmel

Nice. What do you think about the default plotnine style? In general, I find theme_bw to be more aesthetic. If we want, we can define a style in utils. Always can do this in a future PR. Here are some ggplot2 mods I often use.

dhimmel · 2017-08-23T19:01:04Z

scripts/2.mutation-classifier.py

 import numpy as np
 import pandas as pd
-import seaborn as sns
+from plotnine import *


Let's avoid import * (reasons). If you're worried about plotnine taking up lot's of space you could do something like import plotnine as pln or import plotnine as gg.

Agreed, I was just copying from some plotnine examples.

dhimmel · 2017-08-23T19:09:02Z

scripts/2.mutation-classifier.py

+(ggplot(cv_results_df, aes(x='classify__alpha',
+                           y='mean_test_score',
+                           fill='feature_set'))
+ + geom_bar(stat='identity', position='dodge')


I'm not sure a barplot is the best. I'm thinking a scatterplot with model shown via line/dot color or via facet.

You could do a scale_x_log10 so the x-axis is continuous rather than discreet.

Ideally plot would expand_limits for y to include 0.5 and 1. This will have to wait until the next plotnine release: see has2k1/plotnine@192609d. In other words, 0.5-1.0 should be the ylim unless AUROCs are below 0.5 in which case the lower bound should accommodate the lowest AUROC.

I switched to a scatterplot and the log scale -- it initially looked funky to me with the default theme. I used a hack to get 'expand limits'.

dhimmel · 2017-08-23T19:14:34Z

scripts/2.mutation-classifier.py


-Vega(final_spec)
+
+pd.pivot_table(auc_output,


Minor: models as rows and partition as columns makes a little more sense to me.

patrick-miller · 2017-08-24T16:59:38Z

I translated one of my R ggplot themes to plotnine in the utils.py file. We can play around with this and the color scheme for the plots. The ROC plot might be a little busy.

rdvelazquez · 2017-08-24T17:32:09Z

Would it be possible to show the actual AUROC values on the AUROC plot? Maybe next to the lines on the legend? If not maybe we can insert a table that shows this. I think this is one of the more important pieces of information in the notebook so it would be good to be able to easily see the testing and training AUROC for each model.

dhimmel · 2017-08-25T15:31:07Z

@rdvelazquez I gave you write access to this repo. After all your suggestions have been addressed and @patrick-miller is done, you can "Squash and merge" this pull request. Make sure to select "Squash and merge" rather than "Create a merge commit"

patrick-miller · 2017-08-25T16:34:23Z

I agree @rdvelazquez about the AUROC values. I include them in the next few cells, but it would be better to have them inside the plot itself. Because there are 6 values, things can get cluttered. Maybe we only include the test AUROC inside the plot?

dhimmel · 2017-08-25T17:02:16Z

I'm fine with the current display of AUROCs (in the following cell). I don't think there is really a clean way to include them in the plot.

dhimmel · 2017-08-25T17:23:46Z

utils.py

+          panel_grid = gg.element_line(color = "#b3b3b3"), 
+          panel_grid_major_x = gg.element_blank(),
+          panel_grid_minor = gg.element_blank(),
+          strip_background = gg.element_rect(fill = "#e5e5e5", color = None),


Consider changing #e5e5e5 to #FEF2E2 for strip.background color to make the facet strips a little less drab.

dhimmel · 2017-08-28T20:44:16Z

@patrick-miller merge when you're ready. So ml-workers should be able now to directly use this notebook?

@dcgoss how should we go about using this notebook in production? Can ml-workers just use the URL for this notebook once it's merged?

rdvelazquez · 2017-08-28T21:03:07Z

how should we go about using this notebook in production? Can ml-workers just use the URL for this notebook once it's merged?

We are planning to use the machine-learning repo for development of the notebooks but ml-workers for production so the next step (either now or once all the items in #110 are done) will be to open a PR in ml-workers updating that notebook. @patrick-miller 's revisions in #111 were designed to let this notebook work in both the machine-learning and the ml-workers repos.

I'm not opposed to just having ml-workers reference this notebook but I think when we discussed it at the last meetup there were some potential issues with doing that.

patrick-miller added 4 commits August 21, 2017 15:13

Merge conflicts

70103ef

Add Python script

1e40f14

Remove Dockerfile

7334247

Change all of the notebook plots to plotnine

8423cd8

dhimmel reviewed Aug 23, 2017

View reviewed changes

rdvelazquez mentioned this pull request Aug 23, 2017

Machine Learning Punch List for Launch #110

Closed

5 tasks

patrick-miller added 2 commits August 24, 2017 10:49

Remove unneeded utility functions

814e7e4

Add namespace, create cognoma ggplot theme, update CV plot

0a446ad

patrick-miller force-pushed the change-plots branch from f26c402 to 0a446ad Compare August 24, 2017 14:54

Add new theme

559e390

dhimmel approved these changes Aug 25, 2017

View reviewed changes

dhimmel reviewed Aug 25, 2017

View reviewed changes

Change theme in facet titles

3b0a106

rdvelazquez merged commit 891ade3 into cognoma:master Aug 28, 2017

rdvelazquez mentioned this pull request Sep 26, 2017

Standardize the plots in notebooks #97

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change plots to use plotnine #112

Change plots to use plotnine #112

patrick-miller commented Aug 22, 2017

rdvelazquez commented Aug 22, 2017

dhimmel left a comment •

edited

Loading

dhimmel Aug 23, 2017

patrick-miller Aug 24, 2017

dhimmel Aug 23, 2017

patrick-miller Aug 24, 2017

dhimmel Aug 23, 2017

patrick-miller Aug 24, 2017

patrick-miller commented Aug 24, 2017

rdvelazquez commented Aug 24, 2017

dhimmel commented Aug 25, 2017

patrick-miller commented Aug 25, 2017

dhimmel commented Aug 25, 2017

dhimmel Aug 25, 2017

patrick-miller Aug 25, 2017

dhimmel commented Aug 28, 2017

rdvelazquez commented Aug 28, 2017


		Vega(final_spec)

		pd.pivot_table(auc_output,

Change plots to use plotnine #112

Change plots to use plotnine #112

Conversation

patrick-miller commented Aug 22, 2017

rdvelazquez commented Aug 22, 2017

dhimmel left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrick-miller commented Aug 24, 2017

rdvelazquez commented Aug 24, 2017

dhimmel commented Aug 25, 2017

patrick-miller commented Aug 25, 2017

dhimmel commented Aug 25, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhimmel commented Aug 28, 2017

rdvelazquez commented Aug 28, 2017

dhimmel left a comment •

edited

Loading