-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change plots to use plotnine #112
Conversation
Nice work. These look really good based on my first look at them. A few quick comments:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. What do you think about the default plotnine style? In general, I find theme_bw
to be more aesthetic. If we want, we can define a style in utils. Always can do this in a future PR. Here are some ggplot2 mods I often use.
scripts/2.mutation-classifier.py
Outdated
import numpy as np | ||
import pandas as pd | ||
import seaborn as sns | ||
from plotnine import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid import *
(reasons). If you're worried about plotnine taking up lot's of space you could do something like import plotnine as pln
or import plotnine as gg
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I was just copying from some plotnine
examples.
scripts/2.mutation-classifier.py
Outdated
(ggplot(cv_results_df, aes(x='classify__alpha', | ||
y='mean_test_score', | ||
fill='feature_set')) | ||
+ geom_bar(stat='identity', position='dodge') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure a barplot is the best. I'm thinking a scatterplot with model shown via line/dot color or via facet.
You could do a scale_x_log10
so the x-axis is continuous rather than discreet.
Ideally plot would expand_limits for y to include 0.5 and 1. This will have to wait until the next plotnine release: see has2k1/plotnine@192609d. In other words, 0.5-1.0 should be the ylim unless AUROCs are below 0.5 in which case the lower bound should accommodate the lowest AUROC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to a scatterplot and the log scale -- it initially looked funky to me with the default theme. I used a hack to get 'expand limits'.
scripts/2.mutation-classifier.py
Outdated
|
||
Vega(final_spec) | ||
|
||
pd.pivot_table(auc_output, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: models as rows and partition as columns makes a little more sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
f26c402
to
0a446ad
Compare
I translated one of my R ggplot themes to |
Would it be possible to show the actual AUROC values on the AUROC plot? Maybe next to the lines on the legend? If not maybe we can insert a table that shows this. I think this is one of the more important pieces of information in the notebook so it would be good to be able to easily see the testing and training AUROC for each model. |
@rdvelazquez I gave you write access to this repo. After all your suggestions have been addressed and @patrick-miller is done, you can "Squash and merge" this pull request. Make sure to select "Squash and merge" rather than "Create a merge commit" |
I agree @rdvelazquez about the AUROC values. I include them in the next few cells, but it would be better to have them inside the plot itself. Because there are 6 values, things can get cluttered. Maybe we only include the test AUROC inside the plot? |
I'm fine with the current display of AUROCs (in the following cell). I don't think there is really a clean way to include them in the plot. |
utils.py
Outdated
panel_grid = gg.element_line(color = "#b3b3b3"), | ||
panel_grid_major_x = gg.element_blank(), | ||
panel_grid_minor = gg.element_blank(), | ||
strip_background = gg.element_rect(fill = "#e5e5e5", color = None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider changing #e5e5e5
to #FEF2E2
for strip.background
color to make the facet strips a little less drab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
@patrick-miller merge when you're ready. So ml-workers should be able now to directly use this notebook? @dcgoss how should we go about using this notebook in production? Can ml-workers just use the URL for this notebook once it's merged? |
We are planning to use the machine-learning repo for development of the notebooks but ml-workers for production so the next step (either now or once all the items in #110 are done) will be to open a PR in ml-workers updating that notebook. @patrick-miller 's revisions in #111 were designed to let this notebook work in both the machine-learning and the ml-workers repos. I'm not opposed to just having ml-workers reference this notebook but I think when we discussed it at the last meetup there were some potential issues with doing that. |
Here I am replacing all of the plots with the ggplot inspired
plotnine
package. I think that these look good to me, but let me know what you think.