Which body habitats are most/least variable through time? #2

floresg · 2012-11-01T16:26:55Z

A. Alpha diversity
a) Metrics – richness, phylogenetic diversity, Shannon Index)

Coefficient of variation (CV = standard deviation/mean) – useful to compare the variation of two populations independent of the magnitude of their means.
b) Look for difference within each body habitat based on:

gender, university, antibiotics, C-section birth, allergies, BMI class, etc.

B. Beta diversity
a) Metrics – weighted/unweighted UniFrac

metrics that contain abundance information are more appropriate for these data because skin habitats are rich in low abundance transient otus which will be more heavily weighted using a presence/absence metric
median absolute deviation (MAD) – not sensitive to outliers
mean of pairwise comparisons

floresg · 2012-11-01T21:01:00Z

Question about the Beta diversity part of this analysis - instead of averaging all the pairwise comparisons for an individual, should we average only those from adjacent time points?

gregcaporaso · 2012-11-09T20:27:03Z

Added some data showing a comparison across individuals. See the analysis results here. Working on within individual comparisons now.

floresg · 2012-11-09T22:11:23Z

One question I had about these analyses is normalizing sampling effort across individuals since some people have only 5 samples and others have up to 14? If there is a time distance decay relationship, then you would expect individuals who turned in samples further apart in time would have greater variability than those that turned in samples closer in time. Should we be randomly sampling five samples from each individual for these analyses?

rob-knight · 2012-11-09T22:38:49Z

I would recommend doing some matched analyses testing the effect vs the subset of subjects who returned all samples (ie compare vs same 5 timepoints from subjects with all timepoints).

On Nov 9, 2012, at 3:12 PM, "floresg" <notifications@github.com mailto:notifications@github.com> wrote:

One question I had about these analyses is normalizing sampling effort across individuals since some people have only 5 samples and others have up to 14? If there is a time distance decay relationship, then you would expect individuals who turned in samples further apart in time would have greater variability than those that turned in samples closer in time. Should we be randomly sampling five samples from each individual for these analyses?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2#issuecomment-10246459.

rob-knight · 2012-11-10T00:33:22Z

Thanks. Those are extremely significant t test values and I bet all the nonparametric values are 0 even if you do 10^9 iterations.

The fact that forehead is lower diversity/lower variability than palm was known in Costello et al. though not sure we reported it clearly.

It might be worth reopening the discussion about which measures of variability are useful and how we should apply and compare them?

On Nov 9, 2012, at 1:27 PM, Greg Caporaso <notifications@github.com mailto:notifications@github.com> wrote:

Added some data showing a comparison across individuals. See the analysis results herehttps://github.com/gregcaporaso/student-microbiome-project/wiki/Overall-variability-across-body-sites. Working on within individual comparisons now.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2#issuecomment-10243444.

gregcaporaso · 2012-11-10T15:15:37Z

From Rob's comment:

It might be worth reopening the discussion about which
measures of variability are useful and how we should apply
and compare them?

This is something that @jrrideout is actively working on for the microbiogeo analysis/paper and we'll feed the results into this analysis.

antgonza · 2012-11-10T19:10:19Z

I think that one of the most important questions we need to answer is what
is best wat to characterize variation in bacterial communities: mean or
median. Now, I'm not sure this is the perfect dataset to do this but it
will be good to keep it in mind while selecting analytical tools.

On Sat, Nov 10, 2012 at 8:15 AM, Greg Caporaso notifications@github.comwrote:

From Rob's comment:

It might be worth reopening the discussion about which
measures of variability are useful and how we should apply
and compare them?

This is something that Jai is actively working on for the microbiogeo
analysis/paper and we'll feed the results into this analysis.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2#issuecomment-10256088.

Antonio González Peña
Research Assistant, Knight Lab
University of Colorado at Boulder
https://chem.colorado.edu/knightgroup/
http://scholar.google.com/citations?user=d5EXd78AAAAJ

gregcaporaso · 2012-11-11T15:54:06Z

I think just mean/median is not enough, but rather a five number summary - minimum, first quartile, median, third quartile, and maximum - would be better. Alternative would be median and median absolute deviation. Thoughts on this?

I really don't like mean for this for the usual sensitivity to outliers reason, which can be pop up here all the time e.g. if someone sneezed on their hands a couple of mins before sampling at one of the time points (while these would look different, probably not different enough to be flagged as mislabeled).

gregcaporaso · 2012-11-11T15:57:43Z

From Rob's comment:

I would recommend doing some matched analyses testing
the effect vs the subset of subjects who returned all samples
(ie compare vs same 5 timepoints from subjects with all
timepoints).

One relatively minor issue here is that we don't currently define what it means for someone to have turned in all samples. Technically the sampling period was 10 weeks, but if people get providing samples, we kept taking them, so we have up to ~13 weeks of data from some individuals. Gilbert/Dan, you're most familiar with the metadata - would we be safe defining 10 weeks as "all"? If so, does anyone object to that definition?

floresg · 2012-11-11T16:13:43Z

We may want to define all as 8 weeks worth of samples because then more individuals will be included. One other thing to consider is consecutive time points. For some individuals those 8 samples could have been turned in over a 14 week period.

rob-knight · 2012-11-11T19:21:56Z

Sounds reasonable. If you're worried about outliers might it be worth looking at histograms of some/all of the distributions eg as thumbnails?

On Nov 11, 2012, at 8:54 AM, "Greg Caporaso" <notifications@github.com mailto:notifications@github.com> wrote:

I think just mean/median is not enough, but rather a five number summary - minimum, first quartile, median, third quartile, and maximum - would be better. Alternative would be median and median absolute deviation. Thoughts on this?

I really don't like mean for this for the usual sensitivity to outliers reason, which can be pop up here all the time e.g. if someone sneezed on their hands a couple of mins before sampling at one of the time points (while these would look different, probably not different enough to be flagged as mislabeled).

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2#issuecomment-10268534.

antgonza · 2012-11-12T04:02:59Z

I guess my comment wasn't clear enough. My concern between mean/median is
due to the use/introduction of median absolute deviance (MAD) vs. the
histograms/mean we have used before for other analyses and I just do not
want this point get lost.

On Sun, Nov 11, 2012 at 12:21 PM, Rob Knight notifications@github.comwrote:

Sounds reasonable. If you're worried about outliers might it be worth
looking at histograms of some/all of the distributions eg as thumbnails?

On Nov 11, 2012, at 8:54 AM, "Greg Caporaso" <notifications@github.com
mailto:notifications@github.com> wrote:

I think just mean/median is not enough, but rather a five number summary -
minimum, first quartile, median, third quartile, and maximum - would be
better. Alternative would be median and median absolute deviation. Thoughts
on this?

I really don't like mean for this for the usual sensitivity to outliers
reason, which can be pop up here all the time e.g. if someone sneezed on
their hands a couple of mins before sampling at one of the time points
(while these would look different, probably not different enough to be
flagged as mislabeled).

—
Reply to this email directly or view it on GitHub<
https://github.com/gregcaporaso/student-microbiome-project/issues/2#issuecomment-10268534>.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2#issuecomment-10270959.

Antonio González Peña
Research Assistant, Knight Lab
University of Colorado at Boulder
https://chem.colorado.edu/knightgroup/
http://scholar.google.com/citations?user=d5EXd78AAAAJ

gregcaporaso · 2012-11-14T00:40:45Z

I think the histograms cover what we'd show in a five number summary. I think you're saying that it'd be worth mentioning in the paper why we're choosing to used median, etc rather than mean - is that right? I agree that that's a good technical point to mention.

Also, I wanted to point out that Jai is working on a subsampling strategy relevant for time series analysis to address Gilbert's suggestion for subsampling. We're discussing this here, and he is shooting to have a function in place that we could use to explore this by the end of this week.

rob-knight · 2012-11-14T00:51:14Z

There are two separate points here:

mean vs median for comparisons of distances
whether to use a measure of central tendency (mean or median or whatever) or a measure of spread (standard deviation or MAD or whatever)

In both cases, comparison and discussion would probably be a good idea.

Rob

On Nov 13, 2012, at 5:40 PM, Greg Caporaso <notifications@github.com mailto:notifications@github.com> wrote:

I think the histograms cover what we'd show in a five number summary. I think you're saying that it'd be worth mentioning in the paper why we're choosing to used median, etc rather than mean - is that right? I agree that that's a good technical point to mention.

Also, I wanted to point out that Jai is working on a subsampling strategy relevant for time series analysis to address Gilbert's suggestion for subsampling. We're discussing this herehttps://github.com/biocore/qiime/issues/446, and he is shooting to have a function in place that we could use to explore this by the end of this week.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2#issuecomment-10350601.

floresg · 2012-11-14T21:45:05Z

Besides the moving pictures data and infant gut time-series, the other human microbiome time series studies involve the vagina and nares. Both used different metrics to quantify beta diversity variability. In the vaginal paper, they used the median of Jensen-Shannon divergence to represent "community deviation from constancy." The supplemental section of this manuscript describes this metric but it sounds like it is just another metric based on entropy. They do provide justification of this choice but it is not very clear. The nares paper used the index of multivariate dispersion (IMD) to measure "the variability of an individuals bacterial community structure among the months." I did a little digging on this metric but could not find anything very helpful. These two metrics might be something we want to look into for our work and at least should start a constructive conversation. I am not sure how to add the papers to GitHub so I will send them to Greg and maybe he can add them to my comment here?

gregcaporaso · 2012-11-15T04:02:17Z

Here are links to those two papers: Camarinha-Silva (2012) and Gajer (2012).

rob-knight · 2012-11-15T04:05:54Z

We collaborate with Jacques/Pawel so let me know if methods clarifications needed: Jacques and I are on the NIH call right after Fri meeting so I can bug him then...

On Nov 14, 2012, at 9:02 PM, "Greg Caporaso" <notifications@github.com mailto:notifications@github.com> wrote:

Here are links to those two papers: Camarinha-Silva (2012)http://onlinelibrary.wiley.com/doi/10.1111/j.1758-2229.2011.00313.x/full and Gajer (2012)http://sciencemedicine.org/content/4/132/132ra52.short.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2#issuecomment-10396851.

floresg · 2012-11-30T17:22:33Z

Added beta diversity dotplots for average values and MAD. For unweighted UniFrac, the results agree with Greg's boxplots and statistical analysis, that is variability of palm > forehead > gut > tongue. However, weighted UniFrac and MAD tell a different story.

gregcaporaso · 2013-03-27T17:28:22Z

@floresg is going to look specifically at what was previously issue #8 here (Are individuals that reported having atopic diseases (allergies, asthma, eczema, etc) more or less stable than those that did not? Diversity higher or lower?)

floresg · 2013-04-09T18:54:43Z

I have added some text and tables but am still working on this issue.

ghost assigned floresg Nov 1, 2012

gregcaporaso added a commit that referenced this issue Nov 9, 2012

adding variability figure in response to issue #2

c175d97

gregcaporaso mentioned this issue Mar 27, 2013

Are individuals that reported having atopic diseases (allergies, asthma, eczema, etc) more or less stable than those that did not? Diversity higher or lower? #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which body habitats are most/least variable through time? #2

Which body habitats are most/least variable through time? #2

floresg commented Nov 1, 2012

floresg commented Nov 1, 2012

gregcaporaso commented Nov 9, 2012

floresg commented Nov 9, 2012

rob-knight commented Nov 9, 2012

rob-knight commented Nov 10, 2012

gregcaporaso commented Nov 10, 2012

antgonza commented Nov 10, 2012

gregcaporaso commented Nov 11, 2012

gregcaporaso commented Nov 11, 2012

floresg commented Nov 11, 2012

rob-knight commented Nov 11, 2012

antgonza commented Nov 12, 2012

gregcaporaso commented Nov 14, 2012

rob-knight commented Nov 14, 2012

floresg commented Nov 14, 2012

gregcaporaso commented Nov 15, 2012

rob-knight commented Nov 15, 2012

floresg commented Nov 30, 2012

gregcaporaso commented Mar 27, 2013

floresg commented Apr 9, 2013

Which body habitats are most/least variable through time? #2

Which body habitats are most/least variable through time? #2

Comments

floresg commented Nov 1, 2012

floresg commented Nov 1, 2012

gregcaporaso commented Nov 9, 2012

floresg commented Nov 9, 2012

rob-knight commented Nov 9, 2012

rob-knight commented Nov 10, 2012

gregcaporaso commented Nov 10, 2012

antgonza commented Nov 10, 2012

gregcaporaso commented Nov 11, 2012

gregcaporaso commented Nov 11, 2012

floresg commented Nov 11, 2012

rob-knight commented Nov 11, 2012

antgonza commented Nov 12, 2012

gregcaporaso commented Nov 14, 2012

rob-knight commented Nov 14, 2012

floresg commented Nov 14, 2012

gregcaporaso commented Nov 15, 2012

rob-knight commented Nov 15, 2012

floresg commented Nov 30, 2012

gregcaporaso commented Mar 27, 2013

floresg commented Apr 9, 2013