Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General questions: Estimated richness lower than observed richness #190

Open
naurasd opened this issue May 16, 2023 · 3 comments
Open

General questions: Estimated richness lower than observed richness #190

naurasd opened this issue May 16, 2023 · 3 comments

Comments

@naurasd
Copy link

naurasd commented May 16, 2023

Hi,

I have been running breakaway a few times now on several data sets and there is a general observation I have made which I would like to understand. Breakaway/betta always gives me OTU richness estimates for a group of samples which are way lower than the observed OTU richness in these samples.

I am running the following (with ps16S being my phyloseq object) to estimate richness for the factor "Reef" (I am basically following this part from your website https://adw96.github.io/breakaway/articles/diversity-hypothesis-testing.html).

set.seed(1)

richness16S<-breakaway(ps16S)

meta16S <- ps16S %>% sample_data %>% as_tibble %>% mutate("sample_names" = ps16S %>% sample_names )

estimates16S <- meta16S %>% left_join(summary(16S_richness), by = "sample_names")

betta16S <- betta(formula = estimate ~ Reef -1, ses = error, data = estimates16S)

Fotr example, for one of the reefs, which is a data set of 3 samples, the observed OTU richness is 20,698 OTUs. I do have singletons in there (n = 515). Breakaway/betta estimates that the overall OTU richness for these 3 samples is roughly 11,000 OTUs, with a standard error of around 1,200, so about half of the observed richness.

So as I understand, betta is not actually estimating richness for this entire Reef community, but the average richness, so the richness I should expect to be present in any given sample from the respective location. Is this correct? And if so, how much sense does this make? Because I have clearly observed around 20,000 OTUs in my data set, so the overall community is obviously richer than any given sample, which is why we take multiple samples from one locaiton in the first place.Is there a way for betta to not give estmates for a hypothetical sample community from this location, but for the location itself, taking all samples into account account?

Thanks

Nauras

@adw96
Copy link
Owner

adw96 commented May 17, 2023

Hi @naurasd -- thanks for your question. Indeed it looks to me (given the way you've set it up) betta will be estimating the expected (i.e., average) richness at a given level of Reef. Given what you're describing, I expect that you have some sites with a large number of OTUs, but others with a lower number, and and overall average of the estimates at around 11k.

I think that what you're looking for is to have a "denoised" estimate of the richness at a given side, where the denoting comes from learning from similar samples. This is actually the purpose of the objects called BLUPs, which should be returned as part of the betta16S object. If you're struggling to find them, @svteichman can probably help point you to them. I think they also come with standard errors (maybe $blupses)?

I think I understand what you're looking for, and I'm glad that it is implemented already! Let us know if you have further questions, otherwise, please go ahead and close.

@naurasd
Copy link
Author

naurasd commented May 18, 2023

Hi @adw96,

I apreciate the swift response.

The thing is not really that I have a large number OTUs in some samples, and a low number in others. In my 3 samples, I have ~11,800, ~11,000 and ~9,200 OTUs, respectively. The overall observed richness in these 3 samples is ~20,000 OTUs. So the 3 samples of this particular habitat (i.e., Reef) have a large number of exclusive OTUs, which are restricted to one of the 3 samples.

So this is what I am concerned about. I have a data set of 3 samples with an observed richness of ~20,000 OTUs. Breakaway/betta is estimating that I can expect a richnes of ~11,000 OTUs in this particular reef. This doesn't really add up for me, because I clearly have a much higher richness in this reef, as my data shows. I have (data) evidence that ~20,000 distinct OTUs exist in this habitat (obviously, some of these are still most likely spurious, but this will be the case for all other samples of the other reefs as well). I am not interested in an estimation of the richness I would expect from any sample taken from this reef, but what I can expect to find overall if I sample this habitat in a sufficient manner.

Thanks for pointing out the blups object. Going through the officical breakaway manual, I am still having issues understanding what these represent. Also, the condfits mentioned there do not appear in any other part of the manual, so I am not sure what these represents. Either way, I am not sure how these would alleviate the issue, as the blups again just return values for every single sample, but not for each fixed factor level (Reef in this case). Could you elaborate a bit more on this part?

Thanks

Nauras

@naurasd
Copy link
Author

naurasd commented Jul 17, 2023

Hi,

any update on this?

Cheers

Nauras

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants