-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Vega-Lite and Altair API to visualize the data #74
Conversation
Cool, really exciting. Can't wait to take a deeper look. Can you export the notebook to a script for easier code review? From
Also I suggest moving this analysis to a new directory inside of |
Done. The created JSON file has a few changes already applied to it to improve the appearance. As I pointed out in the notebook, more formatting options need to be explored. |
# In[18]: | ||
|
||
# Give the third column a meaningful name: 'count' | ||
df111.columns = ['disease', 'gene_symbol', 'count'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think frequency
would be more accurate than count
.
# In[16]: | ||
|
||
# Convert into the Pandas dataframe | ||
df000 = pd.DataFrame(heatmap_df_stacked) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to use only descriptive variable names -- no 000
. In instances where you are improving the dataframe, you can overwrite the same variable. For example, keep overwriting heatmap_df_stacked
. In fact, you may be able to chain all the operations together, so you only have to assign to a variable once.
I think the next step is to touch up the vega-lite specificuation separately from altair. You've started to do this in your final notebook cell. What I think would be ideal is to separate the JSON for the dataset from the JSON of the vega-lite spec. See this function for exporting a Then we'll be able to give the JSON spec directly to the frontend and they'll generate the data. |
@bdolly currently @superkostya is generating the heatmap from the following data structure: "data": {
"values": [
{
"disease": "adrenocortical cancer",
"gene_symbol": "AJUBA",
"count": 0.01282051282051282
},
{
"disease": "adrenocortical cancer",
"gene_symbol": "AMOT",
"count": 0
},
{
"disease": "adrenocortical cancer",
"gene_symbol": "AMOTL1",
"count": 0
},
{
"disease": "adrenocortical cancer",
"gene_symbol": "AMOTL2",
"count": 0.01282051282051282
},
{
"disease": "adrenocortical cancer",
"gene_symbol": "LATS1",
"count": 0
}
]
} Each value encodes a single cell in the heatmap and is a (disease, gene, frequency-of-mutation) combination. The idea is for the heatmap to show all of the diseases and genes the user has selected. We can obviously change the what types of IDs we're using for genes and diseases. @bdolly can the frontend generate the above data structure? Or should we accommodate a different input data structure? |
Hi Kostyra, and Daniel,
Happy New Year.
Sorry to be late replying, just getting back into the swing of things.
The Chart object can take a file/url argument instead of a dataframe.
This is how I’ve been doing it:
```python
# heatmap cell size in pixels, matches default text size
# in jupyter notebook.
hm_cell_pixel_size=(8, 8)
hm_data_url = '3-tcga-hmdata.csv’
# hm_df is the tidied/normalized dataframe previously computed,
# or passed in once this is made into a function
hm_df.to_csv(hm_data_url)
hm_chart_url = '3-tcga-hmchart.json'
hm_chart_file = open(hm_chart_url,'w’)
hm_chart = Chart(hm_data_url).mark_text(
other parameters, ...)
print(hm_chart.to_json(indent=2), file=hm_chart_file)
hm_chart_file.close()
# altair chart display must be on the last line of jupyter cell
# this is a gotcha I found buried in the altair documentation
hm_chart
```
Minor nit: The TOTAL column should be moved to the right. This should be an easy slice and dice.
Better yet, make it a parallel, single column heatmap, as it is not a gene_symbol. Compute it as
part of the heatmap display process, rather than in the disease/gene_symbol dataframe as is
currently done in "3.TCGA-MLexample_Pathway"
Management of the file name space, and deletion of .csv and .json files when no longer needed
will need to be coordinated.
|
… dataframes meaningful names, etc.)
Daniel,
|
Nice, looks almost ready to merge. Can we rename Would be nice if we could change the dashes to underscores in paths. So |
} | ||
}, | ||
"data": { | ||
"url": "./heatmap_data_Altair_compatible.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change the name of heatmap_data_Altair_compatible.json
to heatmap-data.json
?
Done. Files and the main directory are renamed per your suggestion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congrats @superkostya on the nice pull request! Thanks for learning something new while contributing to cognoma.
This is a preliminary result for using the combination of Vega-Lite and Altair to visualize some of the obtained results, e.g. heatmaps. The main objective is to take advantage of the lean and sufficiently flexible JSON format for the graphs in Vega-Lite, which should allow us to generate the figures (at least some of them) on the front end, thereby reducing the Internet traffic and increasing the performance and speed.
The changes are as follows: