-
Notifications
You must be signed in to change notification settings - Fork 67
Planned Analysis: Figure showing the distribution of samples by cancer type #5
Comments
This could include a graphic of a brain to show the origin / location of the different brain tumor types. |
To address the first part of this, would a bar plot with the cancer type on the x axis and the frequency on the y axis be well suited for this purpose? Possibly, a more complex version of this: or part A of this: |
@cbethell I think that the bar cart from the first paper would be aligned with what I was thinking here! I'm pretty sure the vision would be to use Column O in https://docs.google.com/spreadsheets/d/1Sa0hNX1lje40HdBpWiLUBY6-53UzvQczFLYb3uIm2jw/edit#gid=1663834900 . |
@cgreene Great! Thank you for the clarification. |
The graph below is a rough draft of my interpretation of the first figure suggested by @cgreene. As is, the bars are colored by |
This is really helpful to me. It's nice to understand the characteristics of the data. @jharenza: does this align with what you were expecting the distributions to look like in the dataset? |
This looks right and the coloring in this case would be fine being all black or later colored by subtypes, but may make the legend complex. You can also see https://pedcbioportal.kidsfirstdrc.org/study/summary?id=pbta_cbttc as a guide for this dataset. I was also thinking a multi-layer pie chart could be useful here with inner rings being the broad histology, outer being these unique histologies, and next circle can be molecular subtype. Example is the html image here. I have some code for this (made in R) - will push to my last paper repo tomorrow and share. We could also consider assigning specific colors for each histology that we use throughout the paper for consistency. See some of the TCGA papers for color scheme examples. |
@cbethell I pushed the code here: https://github.com/marislab/create-pptc-pdx-pie - let me know if you have any issues! |
@jharenza Thank you! I'll take a look at it now. |
In addition to the draft PR I filed, I have put together a graphic of the brain showing the sites of various cancer types as suggested by @PichaiRaman. Given the relatively large amount of unique cancer types in the dataset, I thought it may be more feasible to label the graphic with the highest expressed cancer type at each primary site. Let me know your thoughts on this concept (I'm sure the graphic itself can be further manipulated for better presentation so please feel free to provide input in this area as well): |
Hi @cbethell - we actually have a graphic designer on staff that can handle this brain region figure. I think this is a good start, but we are missing major tumor types, eg medulloblastoma, gliomas will have to be narrowed further, and some types have other regions not represented (eg DIPG also in brainstem and thalamus), so this may be difficult to represent and maybe we think about how to do this a bit more. I saw a figure from a paper I liked where they had a b&w brain and the number of tumors per region were in circles, with the circle being proportional in size to the sample N of that region and the color being either molecular subtype or tumor type, so maybe we can iterate around that a bit. (Can't remember the paper offhand, but thought some GBM/HGG subtyping paper). |
Hi @jharenza - this seems like a good way to present the major tumor types. I will look for the paper and for other diagrams similar to what you describe above, and prepare the data for use in this manner. |
@cbethell This issue may be a better spot for discussion around the plots included as part of #40. Can you post what the plots look like here? For the interactive plot, you can probably create a repo of your own that includes the HTML file generated in 857fd81 (if I understand correctly) -> turn on GitHub pages. Then you can post a link here for easy access. |
@jaclyn-taroni per your suggestion, here are the plots included in draft PR #40 : I modified the above plot to show percentages above each bar instead of the raw count. Thoughts? Above is a treemap plot used to display
|
As noted in draft PR #40, the main ideas I would like input on include:
|
For me, the treemap is less valuable as a static image. That is not to say that an interactive treemap would be better than the interactive pie chart you linked above. |
Good point. I can make the treemap interactive and we can decide how useful it may be from there. |
Find the link to the interactive treemap here. Its value still does not appear to exceed that of the interactive pie chart, however, what are the thoughts on including it now? |
I believe the analysis portion of this has been satisfied by #52, #54, and #55 with the exception of the changes in glioma brain region classification tracked in #57. I've opened AlexsLemonade/OpenPBTA-manuscript#38 to track the final assembly of the figure. If anything related to |
It is often helpful to have a part of the first figure for a dataset landscape paper that shows how the samples are distributed across cancer type to characterize the overall dataset. We would like a figure that summarizes the content of the dataset "at a glance."
The text was updated successfully, but these errors were encountered: