-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which body habitats are most/least variable through time? #2
Comments
Question about the Beta diversity part of this analysis - instead of averaging all the pairwise comparisons for an individual, should we average only those from adjacent time points? |
Added some data showing a comparison across individuals. See the analysis results here. Working on within individual comparisons now. |
One question I had about these analyses is normalizing sampling effort across individuals since some people have only 5 samples and others have up to 14? If there is a time distance decay relationship, then you would expect individuals who turned in samples further apart in time would have greater variability than those that turned in samples closer in time. Should we be randomly sampling five samples from each individual for these analyses? |
I would recommend doing some matched analyses testing the effect vs the subset of subjects who returned all samples (ie compare vs same 5 timepoints from subjects with all timepoints). On Nov 9, 2012, at 3:12 PM, "floresg" <notifications@github.commailto:notifications@github.com> wrote: One question I had about these analyses is normalizing sampling effort across individuals since some people have only 5 samples and others have up to 14? If there is a time distance decay relationship, then you would expect individuals who turned in samples further apart in time would have greater variability than those that turned in samples closer in time. Should we be randomly sampling five samples from each individual for these analyses? — |
Thanks. Those are extremely significant t test values and I bet all the nonparametric values are 0 even if you do 10^9 iterations. The fact that forehead is lower diversity/lower variability than palm was known in Costello et al. though not sure we reported it clearly. It might be worth reopening the discussion about which measures of variability are useful and how we should apply and compare them? On Nov 9, 2012, at 1:27 PM, Greg Caporaso <notifications@github.commailto:notifications@github.com> wrote: Added some data showing a comparison across individuals. See the analysis results herehttps://github.com/gregcaporaso/student-microbiome-project/wiki/Overall-variability-across-body-sites. Working on within individual comparisons now. — |
From Rob's comment:
This is something that @jrrideout is actively working on for the microbiogeo analysis/paper and we'll feed the results into this analysis. |
I think that one of the most important questions we need to answer is what On Sat, Nov 10, 2012 at 8:15 AM, Greg Caporaso notifications@github.comwrote:
Antonio González Peña |
I think just mean/median is not enough, but rather a five number summary - minimum, first quartile, median, third quartile, and maximum - would be better. Alternative would be median and median absolute deviation. Thoughts on this? I really don't like mean for this for the usual sensitivity to outliers reason, which can be pop up here all the time e.g. if someone sneezed on their hands a couple of mins before sampling at one of the time points (while these would look different, probably not different enough to be flagged as mislabeled). |
From Rob's comment:
One relatively minor issue here is that we don't currently define what it means for someone to have turned in all samples. Technically the sampling period was 10 weeks, but if people get providing samples, we kept taking them, so we have up to ~13 weeks of data from some individuals. Gilbert/Dan, you're most familiar with the metadata - would we be safe defining 10 weeks as "all"? If so, does anyone object to that definition? |
We may want to define all as 8 weeks worth of samples because then more individuals will be included. One other thing to consider is consecutive time points. For some individuals those 8 samples could have been turned in over a 14 week period. |
Sounds reasonable. If you're worried about outliers might it be worth looking at histograms of some/all of the distributions eg as thumbnails? On Nov 11, 2012, at 8:54 AM, "Greg Caporaso" <notifications@github.commailto:notifications@github.com> wrote: I think just mean/median is not enough, but rather a five number summary - minimum, first quartile, median, third quartile, and maximum - would be better. Alternative would be median and median absolute deviation. Thoughts on this? I really don't like mean for this for the usual sensitivity to outliers reason, which can be pop up here all the time e.g. if someone sneezed on their hands a couple of mins before sampling at one of the time points (while these would look different, probably not different enough to be flagged as mislabeled). — |
I guess my comment wasn't clear enough. My concern between mean/median is On Sun, Nov 11, 2012 at 12:21 PM, Rob Knight notifications@github.comwrote:
Antonio González Peña |
I think the histograms cover what we'd show in a five number summary. I think you're saying that it'd be worth mentioning in the paper why we're choosing to used median, etc rather than mean - is that right? I agree that that's a good technical point to mention. Also, I wanted to point out that Jai is working on a subsampling strategy relevant for time series analysis to address Gilbert's suggestion for subsampling. We're discussing this here, and he is shooting to have a function in place that we could use to explore this by the end of this week. |
There are two separate points here:
In both cases, comparison and discussion would probably be a good idea. Rob On Nov 13, 2012, at 5:40 PM, Greg Caporaso <notifications@github.commailto:notifications@github.com> wrote: I think the histograms cover what we'd show in a five number summary. I think you're saying that it'd be worth mentioning in the paper why we're choosing to used median, etc rather than mean - is that right? I agree that that's a good technical point to mention. Also, I wanted to point out that Jai is working on a subsampling strategy relevant for time series analysis to address Gilbert's suggestion for subsampling. We're discussing this herehttps://github.com/biocore/qiime/issues/446, and he is shooting to have a function in place that we could use to explore this by the end of this week. — |
Besides the moving pictures data and infant gut time-series, the other human microbiome time series studies involve the vagina and nares. Both used different metrics to quantify beta diversity variability. In the vaginal paper, they used the median of Jensen-Shannon divergence to represent "community deviation from constancy." The supplemental section of this manuscript describes this metric but it sounds like it is just another metric based on entropy. They do provide justification of this choice but it is not very clear. The nares paper used the index of multivariate dispersion (IMD) to measure "the variability of an individuals bacterial community structure among the months." I did a little digging on this metric but could not find anything very helpful. These two metrics might be something we want to look into for our work and at least should start a constructive conversation. I am not sure how to add the papers to GitHub so I will send them to Greg and maybe he can add them to my comment here? |
Here are links to those two papers: Camarinha-Silva (2012) and Gajer (2012). |
We collaborate with Jacques/Pawel so let me know if methods clarifications needed: Jacques and I are on the NIH call right after Fri meeting so I can bug him then... On Nov 14, 2012, at 9:02 PM, "Greg Caporaso" <notifications@github.commailto:notifications@github.com> wrote: Here are links to those two papers: Camarinha-Silva (2012)http://onlinelibrary.wiley.com/doi/10.1111/j.1758-2229.2011.00313.x/full and Gajer (2012)http://sciencemedicine.org/content/4/132/132ra52.short. — |
Added beta diversity dotplots for average values and MAD. For unweighted UniFrac, the results agree with Greg's boxplots and statistical analysis, that is variability of palm > forehead > gut > tongue. However, weighted UniFrac and MAD tell a different story. |
I have added some text and tables but am still working on this issue. |
A. Alpha diversity
a) Metrics – richness, phylogenetic diversity, Shannon Index)
b) Look for difference within each body habitat based on:
B. Beta diversity
a) Metrics – weighted/unweighted UniFrac
The text was updated successfully, but these errors were encountered: