Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Vega-Lite and Altair API to visualize the data #74

Merged
merged 9 commits into from
Jan 23, 2017

Conversation

superkostya
Copy link
Contributor

This is a preliminary result for using the combination of Vega-Lite and Altair to visualize some of the obtained results, e.g. heatmaps. The main objective is to take advantage of the lean and sufficiently flexible JSON format for the graphs in Vega-Lite, which should allow us to generate the figures (at least some of them) on the front end, thereby reducing the Internet traffic and increasing the performance and speed.

The changes are as follows:

  1. Added a new file explore/Visualization_with_Vega-Lite_and_Altair.ipynb
  2. Modified the file environment.yml

@dhimmel
Copy link
Member

dhimmel commented Dec 31, 2016

Cool, really exciting. Can't wait to take a deeper look.

Can you export the notebook to a script for easier code review? From explore, run:

jupyter nbconvert --to=script Visualization_with_Vega-Lite_and_Altair.ipynb

Also I suggest moving this analysis to a new directory inside of explore. In this directory, you can also export a vega-lite-heatmap.json specification as it's own file.

@superkostya
Copy link
Contributor Author

Done. The created JSON file has a few changes already applied to it to improve the appearance. As I pointed out in the notebook, more formatting options need to be explored.

# In[18]:

# Give the third column a meaningful name: 'count'
df111.columns = ['disease', 'gene_symbol', 'count']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think frequency would be more accurate than count.

# In[16]:

# Convert into the Pandas dataframe
df000 = pd.DataFrame(heatmap_df_stacked)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to use only descriptive variable names -- no 000. In instances where you are improving the dataframe, you can overwrite the same variable. For example, keep overwriting heatmap_df_stacked. In fact, you may be able to chain all the operations together, so you only have to assign to a variable once.

@dhimmel
Copy link
Member

dhimmel commented Jan 4, 2017

I think the next step is to touch up the vega-lite specificuation separately from altair. You've started to do this in your final notebook cell. What I think would be ideal is to separate the JSON for the dataset from the JSON of the vega-lite spec.

See this function for exporting a pandas.DataFrame to the vega-lite JSON specification. Once you upload the data to GitHub, you can modify the vega-lite spec to load the data from a URL (example).

Then we'll be able to give the JSON spec directly to the frontend and they'll generate the data.

@dhimmel
Copy link
Member

dhimmel commented Jan 4, 2017

@bdolly currently @superkostya is generating the heatmap from the following data structure:

"data": {
  "values": [
    {
      "disease": "adrenocortical cancer",
      "gene_symbol": "AJUBA",
      "count": 0.01282051282051282
    },
    {
      "disease": "adrenocortical cancer",
      "gene_symbol": "AMOT",
      "count": 0
    },
    {
      "disease": "adrenocortical cancer",
      "gene_symbol": "AMOTL1",
      "count": 0
    },
    {
      "disease": "adrenocortical cancer",
      "gene_symbol": "AMOTL2",
      "count": 0.01282051282051282
    },
    {
      "disease": "adrenocortical cancer",
      "gene_symbol": "LATS1",
      "count": 0
    }
  ]
}

Each value encodes a single cell in the heatmap and is a (disease, gene, frequency-of-mutation) combination. The idea is for the heatmap to show all of the diseases and genes the user has selected. We can obviously change the what types of IDs we're using for genes and diseases.

@bdolly can the frontend generate the above data structure? Or should we accommodate a different input data structure?

@George-Zipperlen
Copy link
Contributor

George-Zipperlen commented Jan 9, 2017 via email

@superkostya
Copy link
Contributor Author

Daniel,
Several changes have been made per your suggestion:

  1. Dataframes have been renamed to more meaningful names, see the notebook "Visualization_with_Vega-Lite_and_Altair.ipynb"
  2. The data for the heatmap has been stored in a separate file (heatmap_data_Altair_compatible.json). Note that this a so called tidy (aka long) format, which is one of the requirements in Altair API.
  3. The Vega-lite compatible JSON file for the Heatmap has been created (heatmap_vega-lite.json). It does not contain the dataset; instead, the data to be visualized is read from a file "heatmap_data_Altair_compatible.json". See the line
    "url": "./heatmap_data_Altair_compatible.json"

@dhimmel
Copy link
Member

dhimmel commented Jan 16, 2017

Nice, looks almost ready to merge.

Can we rename explore/visualization_vega_lite_altair/ to explore/heatmap-vega-lite/?

Would be nice if we could change the dashes to underscores in paths. So Visualization_with_Vega-Lite_and_Altair.ipynb becomes Visualization-with-Vega-Lite-and-Altair.ipynb. Or even simplify to heatmap.ipynb`.

}
},
"data": {
"url": "./heatmap_data_Altair_compatible.json"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the name of heatmap_data_Altair_compatible.json to heatmap-data.json?

@superkostya
Copy link
Contributor Author

Done. Files and the main directory are renamed per your suggestion.

Copy link
Member

@dhimmel dhimmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats @superkostya on the nice pull request! Thanks for learning something new while contributing to cognoma.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants