Using Vega-Lite and Altair API to visualize the data #74

superkostya · 2016-12-29T18:07:44Z

This is a preliminary result for using the combination of Vega-Lite and Altair to visualize some of the obtained results, e.g. heatmaps. The main objective is to take advantage of the lean and sufficiently flexible JSON format for the graphs in Vega-Lite, which should allow us to generate the figures (at least some of them) on the front end, thereby reducing the Internet traffic and increasing the performance and speed.

The changes are as follows:

Added a new file explore/Visualization_with_Vega-Lite_and_Altair.ipynb
Modified the file environment.yml

dhimmel · 2016-12-31T17:59:18Z

Cool, really exciting. Can't wait to take a deeper look.

Can you export the notebook to a script for easier code review? From explore, run:

jupyter nbconvert --to=script Visualization_with_Vega-Lite_and_Altair.ipynb

Also I suggest moving this analysis to a new directory inside of explore. In this directory, you can also export a vega-lite-heatmap.json specification as it's own file.

…a_lite_altair'

superkostya · 2017-01-03T16:53:56Z

Done. The created JSON file has a few changes already applied to it to improve the appearance. As I pointed out in the notebook, more formatting options need to be explored.

dhimmel · 2017-01-04T19:33:54Z

explore/visualization_vega_lite_altair/Visualization_with_Vega-Lite_and_Altair.py

+# In[18]:
+
+# Give the third column a meaningful name: 'count'
+df111.columns = ['disease', 'gene_symbol', 'count']


I think frequency would be more accurate than count.

dhimmel · 2017-01-04T19:38:44Z

explore/visualization_vega_lite_altair/Visualization_with_Vega-Lite_and_Altair.py

+# In[16]:
+
+# Convert into the Pandas dataframe
+df000 = pd.DataFrame(heatmap_df_stacked)


Let's try to use only descriptive variable names -- no 000. In instances where you are improving the dataframe, you can overwrite the same variable. For example, keep overwriting heatmap_df_stacked. In fact, you may be able to chain all the operations together, so you only have to assign to a variable once.

dhimmel · 2017-01-04T20:00:54Z

I think the next step is to touch up the vega-lite specificuation separately from altair. You've started to do this in your final notebook cell. What I think would be ideal is to separate the JSON for the dataset from the JSON of the vega-lite spec.

See this function for exporting a pandas.DataFrame to the vega-lite JSON specification. Once you upload the data to GitHub, you can modify the vega-lite spec to load the data from a URL (example).

Then we'll be able to give the JSON spec directly to the frontend and they'll generate the data.

dhimmel · 2017-01-04T20:07:11Z

@bdolly currently @superkostya is generating the heatmap from the following data structure:

"data": {
  "values": [
    {
      "disease": "adrenocortical cancer",
      "gene_symbol": "AJUBA",
      "count": 0.01282051282051282
    },
    {
      "disease": "adrenocortical cancer",
      "gene_symbol": "AMOT",
      "count": 0
    },
    {
      "disease": "adrenocortical cancer",
      "gene_symbol": "AMOTL1",
      "count": 0
    },
    {
      "disease": "adrenocortical cancer",
      "gene_symbol": "AMOTL2",
      "count": 0.01282051282051282
    },
    {
      "disease": "adrenocortical cancer",
      "gene_symbol": "LATS1",
      "count": 0
    }
  ]
}

Each value encodes a single cell in the heatmap and is a (disease, gene, frequency-of-mutation) combination. The idea is for the heatmap to show all of the diseases and genes the user has selected. We can obviously change the what types of IDs we're using for genes and diseases.

@bdolly can the frontend generate the above data structure? Or should we accommodate a different input data structure?

George-Zipperlen · 2017-01-09T19:49:43Z

Hi Kostyra, and Daniel, Happy New Year. Sorry to be late replying, just getting back into the swing of things. The Chart object can take a file/url argument instead of a dataframe. This is how I’ve been doing it: ```python # heatmap cell size in pixels, matches default text size # in jupyter notebook. hm_cell_pixel_size=(8, 8) hm_data_url = '3-tcga-hmdata.csv’ # hm_df is the tidied/normalized dataframe previously computed, # or passed in once this is made into a function hm_df.to_csv(hm_data_url) hm_chart_url = '3-tcga-hmchart.json' hm_chart_file = open(hm_chart_url,'w’) hm_chart = Chart(hm_data_url).mark_text( other parameters, ...) print(hm_chart.to_json(indent=2), file=hm_chart_file) hm_chart_file.close() # altair chart display must be on the last line of jupyter cell # this is a gotcha I found buried in the altair documentation hm_chart ``` Minor nit: The TOTAL column should be moved to the right. This should be an easy slice and dice. Better yet, make it a parallel, single column heatmap, as it is not a gene_symbol. Compute it as part of the heatmap display process, rather than in the disease/gene_symbol dataframe as is currently done in "3.TCGA-MLexample_Pathway" Management of the file name space, and deletion of .csv and .json files when no longer needed will need to be coordinated.

… dataframes meaningful names, etc.)

superkostya · 2017-01-15T17:36:33Z

Daniel,
Several changes have been made per your suggestion:

Dataframes have been renamed to more meaningful names, see the notebook "Visualization_with_Vega-Lite_and_Altair.ipynb"
The data for the heatmap has been stored in a separate file (heatmap_data_Altair_compatible.json). Note that this a so called tidy (aka long) format, which is one of the requirements in Altair API.
The Vega-lite compatible JSON file for the Heatmap has been created (heatmap_vega-lite.json). It does not contain the dataset; instead, the data to be visualized is read from a file "heatmap_data_Altair_compatible.json". See the line
"url": "./heatmap_data_Altair_compatible.json"

dhimmel · 2017-01-16T19:55:56Z

Nice, looks almost ready to merge.

Can we rename explore/visualization_vega_lite_altair/ to explore/heatmap-vega-lite/?

Would be nice if we could change the dashes to underscores in paths. So Visualization_with_Vega-Lite_and_Altair.ipynb becomes Visualization-with-Vega-Lite-and-Altair.ipynb. Or even simplify to heatmap.ipynb`.

dhimmel · 2017-01-16T19:51:21Z

explore/visualization_vega_lite_altair/heatmap_vega-lite.json

+    }
+  },
+  "data": {
+    "url": "./heatmap_data_Altair_compatible.json"


Can we change the name of heatmap_data_Altair_compatible.json to heatmap-data.json?

…ega-lite'

superkostya · 2017-01-22T19:10:37Z

Done. Files and the main directory are renamed per your suggestion.

dhimmel

Congrats @superkostya on the nice pull request! Thanks for learning something new while contributing to cognoma.

Kostyantyn Borysenko added 2 commits December 29, 2016 12:43

Added the file explore/Visualization_with_Vega-Lite_and_Altair.ipynb

afa9f01

Updated the file environment.yml to include the Altair API

4539a00

Kostyantyn Borysenko added 2 commits January 3, 2017 11:23

Moved all related files to a new directory 'explore/visualization_veg…

8e5c315

…a_lite_altair'

Added a file 'vega-lite-heatmap.json'

599e6a0

dhimmel reviewed Jan 4, 2017

View reviewed changes

Kostyantyn Borysenko added 4 commits January 15, 2017 12:13

Added the file heatmap_data_Altair_compatible.json

1481fc7

Created the Heatmap JSON file: heatmap_vega-lite.json

5249f56

Made changes in 'Visualization_with_Vega-Lite_and_Altair.ipynb' (give…

4b471c7

… dataframes meaningful names, etc.)

Updated the script Visualization_with_Vega-Lite_and_Altair.py

b3d2ff0

dhimmel reviewed Jan 16, 2017

View reviewed changes

Renamed files and directories. New directory name: 'explore/heatmap-v…

ae74b84

…ega-lite'

dhimmel approved these changes Jan 23, 2017

View reviewed changes

dhimmel merged commit ba7e4e6 into cognoma:master Jan 23, 2017

This was referenced Feb 1, 2017

Results Viewer View/App cognoma/frontend#64

Closed

Query Review Screen cognoma/frontend#65

Open

patrick-miller mentioned this pull request May 31, 2017

Standardize the plots in notebooks #97

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Vega-Lite and Altair API to visualize the data #74

Using Vega-Lite and Altair API to visualize the data #74

superkostya commented Dec 29, 2016

dhimmel commented Dec 31, 2016

superkostya commented Jan 3, 2017

dhimmel Jan 4, 2017

dhimmel Jan 4, 2017

dhimmel commented Jan 4, 2017

dhimmel commented Jan 4, 2017

George-Zipperlen commented Jan 9, 2017 via email •

edited by dhimmel

Loading

superkostya commented Jan 15, 2017

dhimmel commented Jan 16, 2017

dhimmel Jan 16, 2017

superkostya commented Jan 22, 2017

dhimmel left a comment

Using Vega-Lite and Altair API to visualize the data #74

Using Vega-Lite and Altair API to visualize the data #74

Conversation

superkostya commented Dec 29, 2016

dhimmel commented Dec 31, 2016

superkostya commented Jan 3, 2017

dhimmel Jan 4, 2017

Choose a reason for hiding this comment

dhimmel Jan 4, 2017

Choose a reason for hiding this comment

dhimmel commented Jan 4, 2017

dhimmel commented Jan 4, 2017

George-Zipperlen commented Jan 9, 2017 via email • edited by dhimmel Loading

superkostya commented Jan 15, 2017

dhimmel commented Jan 16, 2017

dhimmel Jan 16, 2017

Choose a reason for hiding this comment

superkostya commented Jan 22, 2017

dhimmel left a comment

Choose a reason for hiding this comment

George-Zipperlen commented Jan 9, 2017 via email •

edited by dhimmel

Loading