-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot undersatand the relative abundance of bins #67
Comments
The values reported by the Quant_bins module are essentially estimated average read coverage values for each bin in each sample, standardized to the number of reads in each sample. So they do not have to add up to any particular number. Looks like your bins have a relatively low coverage in each sample, but you were able to recover them because you co-assembled all samples together. Usually you can start to recover decent bins at about >6X coverage. We still call these values relative abundance because we don't actually know their true abundance because the biomass can change between samples too. |
Can I compare the relative abundance of bins generated by metaWRAP separately? In detail, I run the metaWRAP pipeline to two sample metagenome separately. Can I compare the relative abundance of these bins together? |
Yes, the counts are normalized to library size, so you should be able to. |
If i want to compare the treatment(smaple 1,2,3) and the control(sample 4,5,6) using the abundance_table.tab, Do I need to normalize anything? |
Good question. No, you do not need to modufy the values - they are already standardized to contig counts per million reads. |
if i want to see the different bin's abundance between the treatment and control and found some bins having significant different abundance,which test method should I use? i have tried to take bins as gene,and want to analyze different expression between treatment and control. But, the input file is a read count matrix in edgeR and DESeq2. |
Dear developer: |
This is a good, but tricky question. To put it simply, the total abundance of the bins absolutely do NOT have to be the same in each sample. This is very different from something like RNAseq gene expression values, because we cannot reliably reconstruct all the bins from all the samples. Because assembly and binning biases vary between samples, the total of bin (and contig) abundances can be different. To explain why, lets consider a simple example: Lets say you are comparing two microbiomes that have a total of 10 species living in them, but the distribution of their abundances is different. You perform binning and are able to assemble and extract 5 of the species as MAGs (bins). However, it is completely possible that these MAGs are the dominant species in sample 1, but are in lesser abundance in sample 2 (remember that good coverage is only one factor in how easy it is to extract a bin - maybe the abundant species in sample 2 have high GC, similar k-mer content, or higher strain heterogeneity). When you quantify your bins, you will find that the total abundance of sample 1 bins is much greater than in sample 2, however those abundances are very real observations. If you standardize to the total abundance of the MAGs instead of the library size, you can lose a lot of information. This principle also applies to contig quantitation - some samples assemble easier than others. It is also important to note that co-assembly does NOT resolve this issue. |
For me, the relative abundance is the proportion of a bin in all bacteria in a sample. The value of abundance should be less than 100%. But in my result
abundance_table.tab
, there are some value more than 100%. The result is below:Best
Chunxu
The text was updated successfully, but these errors were encountered: