Add first version of Vega ROC plots #77

patrick-miller · 2017-01-14T01:29:05Z

This introduces a Vega based ROC plot. Without the interactivity it looks like the following:

Currently, it takes a CSV file/data stream, but we can use a JSON one instead depending on how the backend team wants to serve it. The inputs to it are the false positive rate, the true positive rate, the curve type (train, test, CV). I plan on adding in the ability to specify the data set used (or model) so that we can split out the full feature model and the covariates only model.

Let me know if you have any questions/comments.

dhimmel

That was fast! Nice pull request.

Did you try vega-lite? The specification is higher level and gets compiled to vega. If vega-lite isn't lacking a necessary feature, I think that would be preferred. I'm impressed that you tackled the vega!

dhimmel · 2017-01-14T01:40:06Z

explore/visualization/roc_curves_sample.html

+
+<!-- TODO -->
+<!-- Install with npm install vega -->
+<script src="http://vega.github.io/vega-editor/vendor/d3.min.js" charset="utf-8"></script>


Let's switch to versioned includes, so we don't have any surprises at deployment time. From https://github.com/vega/vega-lite-demo/issues/1#issuecomment-271972536:

<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.17/d3.min.js" charset="utf-8"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/vega/2.6.5/vega.min.js" charset="utf-8"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/vega-lite/1.3.1/vega-lite.min.js" charset="utf-8"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/vega-embed/2.2.0/vega-embed.min.js" charset="utf-8"></script>

Check to see if you want vega 2 or 3.

Vega 2 uses D3 v3. Vega 3 uses D3 v4.

dhimmel · 2017-01-14T01:44:24Z

@patrick-miller where do you think we should note the AUROC for each curve? Either as additional text in the legend or on hover?

dhimmel · 2017-01-14T01:51:55Z

I'm starting to think being able to compute the TPRs and FPRs to create an ROC in javascript would be killer. There are up to 33 different cancers that can be selected -- users may be interested in selecting certain cancers, which will filter to a subset of samples (observations). Thus the ROC curve would change.

We could always have the backend recalculate, if doing this on the frontend is too burdensome. Not that this decision or implementation should be part of this PR. Just wanted to jot down my thoughts and get your opinion.

patrick-miller · 2017-01-14T02:20:08Z

I'll put some thought to it though I doubt I will have any strong opinions between the versions. I think vega 3 is still in development. As for vega vs. vega-lite, you can definitely do more with vega -- I'm not sure if you have the ability to do any interactive stuff with vega-lite (I have only used vega in the past).

There are a few different places we could put the AUROC. We can put it in the legend like you have been doing in Python. We can put it on hover (would switch to keeping hover on permanently). We can put it to the right of the lines. I'll play around with adding it in some different places in a separate pull request.

In terms of the way the data is going to be served...anytime a user filters to a subset of cancers we would need to make a server side call to the data set, correct? Or are you imagining storing all of the prediction data in the frontend? We can certainly move a step to the frontend, I'm just not sure if this will really speed things up that much if you have the data cached in Redis on the backend anyway. Correct me if I'm wrong, but isn't the difference just IO?

… the legend.

dhimmel · 2017-01-14T14:26:09Z

Correct me if I'm wrong, but isn't the difference just IO?

IO and programming language. The javascript method could be done entirely client side. Otherwise, we can use python via the backend to compute the ROC curve.

In terms of the way the data is going to be served...anytime a user filters to a subset of cancers we would need to make a server side call to the data set, correct?

Unless we load the entire prediction table into the browser. This table is at most 8,000 rows, so it's a possibility.

Let's defer any decisions here until we have a better idea of the results viewer.

dhimmel

Everything looking good.

I plan on adding in the ability to specify the data set used (or model) so that we can split out the full feature model and the covariates only model.

Once you get this implemented, I'll run locally and play with the viz.

…(covariate only)

patrick-miller · 2017-01-14T17:25:08Z

Here is how the visualization looks now. We can play with how the interactivity works once I start putting together the AUROC for each curve.

dhimmel

Coming along nicely.

I think it makes most sense to map partition to color and feature_set to linetype (e.g. solid for all features, dashed for covariates only). How difficult would that be to implement?

dhimmel · 2017-01-14T17:58:15Z

explore/visualization/data/sample_roc_data.csv

@@ -0,0 +1,25 @@
+false_positive,true_positive,curve,data


May want to add 0, 0 and 1, 1 to each curve to more closely represent the ROC curves on real data.

dhimmel · 2017-01-14T18:01:15Z

explore/visualization/README.md

+In order to run these sample files, you should first start up a simple HTTP server such as:
+
+```sh
+python -m SimpleHTTPServer 8000


Note that this repository's environment uses Python 3.

In Python 3, I think this should be:

python -m http.server 8000

Feel free to include both commands, if you'd like.

Ah, I'll fix that.

dhimmel · 2017-01-14T18:04:36Z

explore/visualization/data/sample_roc_data.csv

@@ -0,0 +1,25 @@
+false_positive,true_positive,curve,data


Can we make the column names more descriptive:

false_positive_rate

true_positive_rate

partition

feature_set

dhimmel · 2017-01-14T18:06:07Z

explore/visualization/README.md

+python -m SimpleHTTPServer 8000
+```
+
+Then navigate to that instance (localhost:8000) and click on the file that you wish to view.


Replacing localhost:8000 with http://localhost:8000/ will make the link clickable -- just worried that some ML devs will be confused by what localhost is.

…s to be more meaningful

patrick-miller · 2017-01-14T21:31:56Z

Made the small tweaks and switched to dashed lines for the covariates. It wasn't exactly straightforward, so there may be an easier way that I couldn't find to do it. Latest update:

dhimmel · 2017-01-16T19:40:45Z

@patrick-miller nice. I'm thinking we want to remove the dots (and keep just the lines), since there can be thousands of actual points in some of our ROC curves.

For the "feature set" legend, is it possible to use a line rather than a point to show the difference between solid and dashed. No big deal if this is too difficult.

Also, how hard is it to add some transparency/alpha to the lines... I'm thinking we may have overlapping ROCs.

dhimmel · 2017-01-16T19:41:08Z

Would love to get you some real data to plug in.

patrick-miller · 2017-01-16T22:47:35Z

Agreed on removing the dots, they are placeholders for now for the interactive portion -- still considering how I would want to best display it (thoughts are very welcome!)

I'll switch the legend to a line, I'm pretty sure it should be possible.

Transparency should be easy, I'll play around with some values. I'll do a data dump from one of the notebooks so that I can work out which values will be better.

…t the transparency

patrick-miller · 2017-01-27T20:15:52Z

I added 'real' data for the ROC plot (comes from the 2.TCGA-MLexample notebook) -- for the covariates only model I fabricated the data. I took out the dots to make the rendering faster, but we will probably want to sample from the full ROC data that sklearn outputs (too many FPR and TPR breaks).

Things left to decide on: interactivity and where to put the AUROC for each feature set/partition split.

dhimmel · 2017-01-27T20:28:27Z

@patrick-miller, looks great and thanks for creating the more realistic data.

I took out the dots to make the rendering faster, but we will probably want to sample from the full ROC data that sklearn outputs

Since most points in our ROC curve lie on the line and are not actually inflection points, we can prune many of the points without any change to the curve! Here is an R implementation of this method. It shouldn't be hard for us to implement this in python.

Things left to decide on: interactivity and where to put the AUROC for each feature set/partition split.

For the AUROC, I think the two options are in the tooltip that appears on hover or in an additional legend. The additional legend could just contain the linetypes and the AUROC%.

…cet by partition/feature set

patrick-miller · 2017-02-01T03:35:49Z

I got some interactivity working. It isn't perfect, but it is definitely a start.

dhimmel · 2017-02-01T15:12:09Z

I got some interactivity working. It isn't perfect, but it is definitely a start.

Looks great. My only suggesting would be making AUC a percentage, and making the TPR FPR and AUC percentages to have 1 decimal point of precision... like TPR 88.1%.

patrick-miller · 2017-02-01T19:53:53Z

Ok, I formatted the interactive legend to have 1 decimal point and all three figures are %s.

dhimmel · 2017-02-01T20:14:01Z

Great. I got the visualization up and running locally. See

I noticed the box overlaps with the AUC percentage sign. Is there an easy fix. If not, I'm happy to merge as is! Thanks for seeing this PR through. Can't wait till we deploy it.

patrick-miller · 2017-02-01T20:18:00Z

Yep, it is very easy. Right now, a lot of those parameters are hard coded, so I'm going to look at changing that in the future.

Add first version of ROC plots

00d01e2

dhimmel reviewed Jan 14, 2017

View reviewed changes

Update the includes to versioned instances. Make aesthetic changes to…

7bab733

… the legend.

dhimmel reviewed Jan 14, 2017

View reviewed changes

Add functionality for visualizing multiple types of models/data sets …

82832c9

…(covariate only)

dhimmel reviewed Jan 14, 2017

View reviewed changes

Add dashed lines for the covariate only model and rename the variable…

03dafcc

…s to be more meaningful

pwmiller added 2 commits January 18, 2017 11:40

Add transparency to the lines and change the legend shapes

ed835ff

Update the data to use one of the workbook's ROC curve data and adjus…

567562a

…t the transparency

Fix legend to accomodate dashes

677913a

pwmiller added 3 commits January 29, 2017 16:55

Add interactivity based on FPR location, set up for the ability to fa…

b12b2d3

…cet by partition/feature set

Add AUC and put into interactive legend

4d92d48

Build interactivity and format the interactive legend

c0e2e61

dhimmel mentioned this pull request Feb 1, 2017

Results Viewer View/App cognoma/frontend#64

Closed

Format the interactive legend

10c8c0b

Fix box overlapping text

122619d

dhimmel approved these changes Feb 1, 2017

View reviewed changes

dhimmel merged commit 40a02f7 into cognoma:master Feb 1, 2017

patrick-miller deleted the feature-vega_ROC branch February 1, 2017 21:28

dhimmel mentioned this pull request Feb 14, 2017

Create Vega specification for AUROC bar plots #84

Merged

patrick-miller mentioned this pull request May 31, 2017

Standardize the plots in notebooks #97

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add first version of Vega ROC plots #77

Add first version of Vega ROC plots #77

patrick-miller commented Jan 14, 2017

dhimmel left a comment

dhimmel Jan 14, 2017

dhimmel Jan 14, 2017

dhimmel commented Jan 14, 2017

dhimmel commented Jan 14, 2017 •

edited

Loading

patrick-miller commented Jan 14, 2017

dhimmel commented Jan 14, 2017

dhimmel left a comment

patrick-miller commented Jan 14, 2017

dhimmel left a comment

dhimmel Jan 14, 2017

dhimmel Jan 14, 2017

patrick-miller Jan 14, 2017

dhimmel Jan 14, 2017

dhimmel Jan 14, 2017

patrick-miller Jan 14, 2017

patrick-miller commented Jan 14, 2017

dhimmel commented Jan 16, 2017

dhimmel commented Jan 16, 2017

patrick-miller commented Jan 16, 2017

patrick-miller commented Jan 27, 2017 •

edited

Loading

dhimmel commented Jan 27, 2017

patrick-miller commented Feb 1, 2017

dhimmel commented Feb 1, 2017

patrick-miller commented Feb 1, 2017

dhimmel commented Feb 1, 2017

patrick-miller commented Feb 1, 2017

Add first version of Vega ROC plots #77

Add first version of Vega ROC plots #77

Conversation

patrick-miller commented Jan 14, 2017

dhimmel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhimmel commented Jan 14, 2017

dhimmel commented Jan 14, 2017 • edited Loading

patrick-miller commented Jan 14, 2017

dhimmel commented Jan 14, 2017

dhimmel left a comment

Choose a reason for hiding this comment

patrick-miller commented Jan 14, 2017

dhimmel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrick-miller commented Jan 14, 2017

dhimmel commented Jan 16, 2017

dhimmel commented Jan 16, 2017

patrick-miller commented Jan 16, 2017

patrick-miller commented Jan 27, 2017 • edited Loading

dhimmel commented Jan 27, 2017

patrick-miller commented Feb 1, 2017

dhimmel commented Feb 1, 2017

patrick-miller commented Feb 1, 2017

dhimmel commented Feb 1, 2017

patrick-miller commented Feb 1, 2017

dhimmel commented Jan 14, 2017 •

edited

Loading

patrick-miller commented Jan 27, 2017 •

edited

Loading