Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change plots to use plotnine #112

Merged
merged 8 commits into from
Aug 28, 2017
Merged

Conversation

patrick-miller
Copy link
Member

Here I am replacing all of the plots with the ggplot inspired plotnine package. I think that these look good to me, but let me know what you think.

@rdvelazquez
Copy link
Member

Nice work. These look really good based on my first look at them. A few quick comments:

  1. Should you also revise the utils.py file to remove the functions that we no longer need?

  2. I really like the revision to the probabilities plots at the end of the notebook (showing the plots for all three models). I think labeling the axis would also be very helpful.

Copy link
Member

@dhimmel dhimmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. What do you think about the default plotnine style? In general, I find theme_bw to be more aesthetic. If we want, we can define a style in utils. Always can do this in a future PR. Here are some ggplot2 mods I often use.

import numpy as np
import pandas as pd
import seaborn as sns
from plotnine import *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid import * (reasons). If you're worried about plotnine taking up lot's of space you could do something like import plotnine as pln or import plotnine as gg.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I was just copying from some plotnine examples.

(ggplot(cv_results_df, aes(x='classify__alpha',
y='mean_test_score',
fill='feature_set'))
+ geom_bar(stat='identity', position='dodge')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure a barplot is the best. I'm thinking a scatterplot with model shown via line/dot color or via facet.

You could do a scale_x_log10 so the x-axis is continuous rather than discreet.

Ideally plot would expand_limits for y to include 0.5 and 1. This will have to wait until the next plotnine release: see has2k1/plotnine@192609d. In other words, 0.5-1.0 should be the ylim unless AUROCs are below 0.5 in which case the lower bound should accommodate the lowest AUROC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to a scatterplot and the log scale -- it initially looked funky to me with the default theme. I used a hack to get 'expand limits'.


Vega(final_spec)

pd.pivot_table(auc_output,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: models as rows and partition as columns makes a little more sense to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@patrick-miller
Copy link
Member Author

I translated one of my R ggplot themes to plotnine in the utils.py file. We can play around with this and the color scheme for the plots. The ROC plot might be a little busy.

@rdvelazquez
Copy link
Member

Would it be possible to show the actual AUROC values on the AUROC plot? Maybe next to the lines on the legend? If not maybe we can insert a table that shows this. I think this is one of the more important pieces of information in the notebook so it would be good to be able to easily see the testing and training AUROC for each model.

@dhimmel
Copy link
Member

dhimmel commented Aug 25, 2017

@rdvelazquez I gave you write access to this repo. After all your suggestions have been addressed and @patrick-miller is done, you can "Squash and merge" this pull request. Make sure to select "Squash and merge" rather than "Create a merge commit"

@patrick-miller
Copy link
Member Author

I agree @rdvelazquez about the AUROC values. I include them in the next few cells, but it would be better to have them inside the plot itself. Because there are 6 values, things can get cluttered. Maybe we only include the test AUROC inside the plot?

@dhimmel
Copy link
Member

dhimmel commented Aug 25, 2017

I'm fine with the current display of AUROCs (in the following cell). I don't think there is really a clean way to include them in the plot.

utils.py Outdated
panel_grid = gg.element_line(color = "#b3b3b3"),
panel_grid_major_x = gg.element_blank(),
panel_grid_minor = gg.element_blank(),
strip_background = gg.element_rect(fill = "#e5e5e5", color = None),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing #e5e5e5 to #FEF2E2 for strip.background color to make the facet strips a little less drab.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@dhimmel
Copy link
Member

dhimmel commented Aug 28, 2017

@patrick-miller merge when you're ready. So ml-workers should be able now to directly use this notebook?

@dcgoss how should we go about using this notebook in production? Can ml-workers just use the URL for this notebook once it's merged?

@rdvelazquez
Copy link
Member

how should we go about using this notebook in production? Can ml-workers just use the URL for this notebook once it's merged?

We are planning to use the machine-learning repo for development of the notebooks but ml-workers for production so the next step (either now or once all the items in #110 are done) will be to open a PR in ml-workers updating that notebook. @patrick-miller 's revisions in #111 were designed to let this notebook work in both the machine-learning and the ml-workers repos.

I'm not opposed to just having ml-workers reference this notebook but I think when we discussed it at the last meetup there were some potential issues with doing that.

@rdvelazquez rdvelazquez merged commit 891ade3 into cognoma:master Aug 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants