Skip to content
This repository has been archived by the owner on Nov 9, 2023. It is now read-only.

add functionality for collapsing replicate samples in either OTU tables or distance matrices #1678

Closed
gregcaporaso opened this issue Sep 24, 2014 · 2 comments

Comments

@gregcaporaso
Copy link
Contributor

Define a replicate group as samples that are considered replicates of each other (e.g., biological or technical replicates). Samples belonging to a replicate group, in practice, are likely to be grouped based on a pair of sample metadata categories (e.g., subject-id and replicate-number).

It's common that we have replicate samples in a study, but when we start performing downstream analyses we want to collapse the replicates to a single sample per replicate group.

Some possible ways that we would want to collapse samples in a replicate group at the OTU table or distance matrix stage are:

  • randomly select one sample from the replicate group
  • for each observation, take the median count across samples in the replicate group

Other ideas for how we might want to collapse these?

If anyone has code for doing this already, please follow up here. I'll take the lead on this as I need the code for an analysis I'm running now.

@rob-knight
Copy link

Mean count across samples

Pick sample with the most reads (ie collapse before rarefaction)

Sum counts across samples (before or after rarefaction)

Pick sample that is the centroid of the set of replicates

Also you might want to apply automated outlier detection on per-group or per-dataset basis before running any of these.

Thanks for adding functionality for doing this generally and right -- it will be really useful!

On Sep 24, 2014, at 8:16 AM, "Greg Caporaso" <notifications@github.commailto:notifications@github.com> wrote:

Define a replicate group as samples that are considered replicates of each other (e.g., biological or technical replicates). Samples belonging to a replicate group, in practice, are likely to be grouped based on a pair of sample metadata categories (e.g., subject-id and replicate-number).

It's common that we have replicate samples in a study, but when we start performing downstream analyses we want to collapse the replicates to a single sample per replicate group.

Some possible ways that we would want to collapse samples in a replicate group at the OTU table or distance matrix stage are:

  • randomly select one sample from the replicate group
  • for each observation, take the median count across samples in the replicate group

Other ideas for how we might want to collapse these?

If anyone has code for doing this already, please follow up here. I'll take the lead on this as I need the code for an analysis I'm running now.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1678.

@gregcaporaso
Copy link
Contributor Author

Some initial experiments with this here.

@jairideout jairideout added this to the QIIME 1.9.0 milestone Dec 8, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants