Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are individuals always more similar to themselves through time for a given body habitat than they are to other individuals? #3

Open
floresg opened this issue Nov 1, 2012 · 10 comments
Labels

Comments

@floresg
Copy link
Contributor

floresg commented Nov 1, 2012

A. Beta diversity – within vs. between weighted UniFrac/Bray-Curtis values

@ghost ghost assigned gregcaporaso Nov 10, 2012
@gregcaporaso
Copy link
Member

See summary of analysis here.

Individuals are not always more similar to themselves, but on average, for all body sites, within individual distances are smaller than between individual distances.

@floresg
Copy link
Contributor Author

floresg commented Nov 10, 2012

So Jon and I have been developing a script to look at within individual variability per body site based on any distance/dissimilarity input matrix. We have it set up to color each box by metadata (gender for example) and it includes a line for median of all individuals within distances (red line) and mean (gold line). We are also toying with the idea have having it randomly sampling 5 samples from each individual to account for differences in sampling efforts – that is why I brought up the point in the previous github comment.

I am curious as to your choice of using unweighted UniFrac as opposed to weighted? I think weighted metrics are more appropriate for this study because many of the changes we may say could be a bloom of organisms which would be more easily detectable with a weighted metric. Also, for skin habitats with many transient organisms that likely don't mean anything biological to the system the unweihgted metrics may become inflated.

@rob-knight
Copy link

We have empirically found unweighted metrics to give clearer patterns in a lot of past studies, and if you want to establish blooms you need absolute abundance data anyway (you cannot detect this in the relative abundance data you have).

On Nov 10, 2012, at 4:19 PM, floresg <notifications@github.commailto:notifications@github.com> wrote:

So Jon and I have been developing a script to look at within individual variability per body site based on any distance/dissimilarity input matrix. We have it set up to color each box by metadata (gender for example) and it includes a line for median of all individuals within distances (red line) and mean (gold line). We are also toying with the idea have having it randomly sampling 5 samples from each individual to account for differences in sampling efforts – that is why I brought up the point in the previous github comment.

I am curious as to your choice of using unweighted UniFrac as opposed to weighted? I think weighted metrics are more appropriate for this study because many of the changes we may say could be a bloom of organisms which would be more easily detectable with a weighted metric. Also, for skin habitats with many transient organisms that likely don't mean anything biological to the system the unweihgted metrics may become inflated.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3#issuecomment-10261215.

@floresg
Copy link
Contributor Author

floresg commented Nov 15, 2012

Greg, would it be possible for you to produce the tabular data for each PersonalID that accompanies the boxplots you generated? Also, could you have it calculate the mean for each PersonalID.

@gregcaporaso
Copy link
Member

@jrrideout, is that something that'd be easy/possible to spit out of your
script?

On Thu, Nov 15, 2012 at 12:54 PM, floresg notifications@github.com wrote:

Greg, would it be possible for you to produce the tabular data for each
PersonalID that accompanies the boxplots you generated? Also, could you
have it calculate the mean for each PersonalID.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3#issuecomment-10422750.

@floresg
Copy link
Contributor Author

floresg commented Nov 15, 2012

Another question on the boxplots: for the All within and All between comparisons should be be plotting the distributions of the means for each individual as opposed to all within and all between distances?

@jairideout
Copy link
Collaborator

@floresg, try running the make_distance_boxplots.py script with --save_raw_data. This will create a TSV file with all of the distances that were used to generate the plots. You can then open those up in Excel and calculate the means.

Regarding the All within/between comparisons, the script currently uses all distances when generating the boxplots. If this is a feature that we need, I can add an option to the script that will only use the means.

@floresg
Copy link
Contributor Author

floresg commented Nov 15, 2012

@jrrideout Thanks - and I adding that functionality is the way to go with these data because each person is essentially a replicate sample and this may also help us overcome unequal number of samples from each person. It will also allow us to identify outlier individuals and then see if something in the metadata is special about that person. Greg, Rob, Noah thoughts?

@gregcaporaso
Copy link
Member

@DDomogala3 did procrustes analysis to attempt to get at this question, but the results don't look good. See his notes here.

@jairideout
Copy link
Collaborator

@gregcaporaso, @floresg: is this functionality that I should add to make_distance_boxplots.py?

@gregcaporaso gregcaporaso removed their assignment Sep 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants