Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot undersatand the relative abundance of bins #67

Open
xuechunxu opened this issue Nov 6, 2018 · 8 comments
Open

Cannot undersatand the relative abundance of bins #67

xuechunxu opened this issue Nov 6, 2018 · 8 comments

Comments

@xuechunxu
Copy link

xuechunxu commented Nov 6, 2018

For me, the relative abundance is the proportion of a bin in all bacteria in a sample. The value of abundance should be less than 100%. But in my result abundance_table.tab, there are some value more than 100%. The result is below:

Genomic bins P105 P104 P0 P100 P40
bin.89 1.01642417377 0.740592386455 0.0321206258667 0.218846480065 0.253145659366
bin.20 0.963242581623 0.524569817933 0.483642636588 3.27032881539 0.373453067762
bin.4 0.134127586944 0.169022620222 0.233740733919 0.185851477488 5.68130711026
bin.25 2.35837130015 2.89895004794 1.65217234455 0.687260743825 0.154927149251
bin.22 7.55454483699 4.18372928847 3.02498742027 2.40211272832 1.62468810039
bin.6 1.00329154299 0.491644116008 0.395472069503 0.842832324297 2.20394439833
bin.59 0.440208718227 2.22139093047 1.82780827394 0.726687098784 11.0912418403
bin.68 2.92792344489 0.702841057218 0.451971716723 0.624863317806 4.05763308655
bin.85 1.62597944618 2.2720013755 0.247840525204 0.993474047324 0.0243826926672
bin.70 5.2318808775 7.61637153762 0.543062410804 1.13768856145 2.80215597327
bin.46 0.00434900895308 0.00251981757132 0.0433195791454 0.0490018217851 2.37195654128
bin.54 2.15793628359 0.473770927286 0.141830643966 0.10916727931 0.0202958895629
bin.95 26.8128199057 0.11091891259 0.0475952155267 0.166989815169 3.07631781802
bin.43 2.34696728197 0.0538148163354 0.0846571529608 0.0636025753186 13.1897450547
bin.34 0.0394271019168 0.231815821177 0.3115832827 0.204541678386 6.47588873649
bin.37 0.239585802334 1.12369237474 5.84147018732 0.292707613385 0.668295339555
bin.32 0.213289445496 4.94568532922 0.166315541272 0.0874991451354 0.371490425819
bin.81 0.79353979682 0.761626381336 0.547699065344 0.335467401803 0.890959210605
bin.83 0.0192213055191 0.00775236104698 0.0505738079269 0.0052712200005 7.93839249807
bin.62 0.0424499490408 0.0634018522757 0.0786795553277 0.0717357956635 4.51239895531
bin.8 0.0869582593618 0.295646028737 0.769898819825 0.12559488423 2.30691559464
bin.53 0.715306856866 1.74695061708 5.03639940699 0.41971298049 1.28252356548
bin.96 0.119105678688 0.0411800233566 0.0564904098266 0.160159146079 3.83014186834
bin.12 2.29643698679 1.41841460702 0.325746720601 0.991381282801 0.223452596518
bin.1 0.327097216569 0.0158335809757 0.0584811656129 0.0713339403354 2.67288294734
bin.45 0.06702998048 1.5429963239 0.678975239549 0.172776208385 0.518324908554
bin.26 2.87715456175 1.07899589676 0.106952222796 0.225889449395 0.0690628253146
bin.19 0.00893653594517 0.00994879296056 6.58003044981 0.278355257796 0.024599722229
bin.84 15.1863525634 3.51183513889 1.22314998064 1.3919416632 11.958378571
bin.74 5.79563198262 3.92590782446 0.121449190387 0.212161920838 0.112280980073
bin.10 1.33979029911 1.693317173 0.0287509883091 0.351395268913 0.0231678316515
bin.82 6.40117578992 0.141331773275 0.0566487586618 0.235285197946 0.14302365095
bin.90 0.0902687034809 0.307398544863 0.261845094056 0.177149160616 4.34099630008
bin.63 0.11310324361 0.101029956664 0.159343612054 0.128280691797 10.2178831515
bin.42 0.00653093817518 0.00983527563305 0.026179555544 0.013028917488 3.09070106035
bin.80 6.45666777432 14.7921396399 11.8993578035 3.66589778684 1.12715280751
bin.3 0.0049708880577 0.00559133274411 0.0494536421958 0.0166739060868 4.36392320949
bin.38 0.0257131242018 0.0152405097624 3.34384864574 1.54611847187 0.0532976503274
bin.33 0.673987463114 1.61233136649 2.84331565472 0.682605621736 0.391666052436
bin.56 35.60807561 12.8934552667 38.9805656918 9.40451390012 16.9408968891
bin.14 0.297124350666 1.04138816517 0.113937303353 0.808230935446 0.248284305638
bin.13 0.0499529925063 0.465087372457 0.076471352044 0.186845649632 4.33277348287
bin.44 0.042423403563 0.014274994087 0.0291381875584 0.0154896500547 3.03221058958
bin.97 0.29426761927 0.0677073478683 0.0929824624904 0.0973960097809 7.62364944145
bin.41 0.0182001643762 0.021300697229 3.36993295997 1.82775025284 0.022435676523
bin.48 0.588705418481 0.706193361628 0.341519201752 0.629302752059 1.05833126294
bin.93 1.43273042491 0.0269065411081 0.129550375956 0.0595524178275 9.11254599919
bin.29 18.3142078224 0.580586821309 0.0839324959808 0.346647029071 0.132035123932
bin.78 0.0186137404444 0.0191062623792 7.71450174938 1.75011356579 0.0377024976526
bin.64 0.118352928527 0.138694745749 0.210272697567 0.201122757308 11.0657209681
bin.61 0.0405185632092 0.047630179385 0.130408476278 0.11697595577 4.32095122353
bin.66 1.546107928 0.532151151538 0.0422920757326 0.272057145106 0.0553509601092
bin.73 1.1464179313 1.53757562572 1.32189373925 0.759133994014 8.57810376372
bin.86 0.226681445244 0.0303478306252 0.153667372675 0.0263745755797 23.5554626173
bin.76 0.835470257529 1.34829396228 0.0343602313778 0.415837490579 0.0830056495306
bin.9 0.930025244298 1.60073661694 1.83089369695 1.02491182717 1.57670557214
bin.52 1.45843209291 0.839945921069 0.0203108389339 1.35811949962 0.0077563872481
bin.36 4.33492183014 0.0276727476348 0.061967385132 0.038509521961 9.33421405534
bin.27 0.00690879783419 0.181196379326 0.119525670044 0.140213619614 7.5077652911
bin.15 0.0123258327893 0.0121638985535 9.05319458272 0.960502153141 0.0258278539853
bin.17 7.14569312044 0.351198048496 1.57855869638 0.458860363636 1.05451812283
bin.49 8.42151582256 18.3686033291 8.74809845445 1.90777907346 15.8326168898
bin.40 0.0866032016368 0.309009942943 0.430562206112 0.224549447448 4.97495348107
bin.79 0.0231126153121 0.0201067194238 0.0777286978042 0.074850196877 8.53451037626
bin.18 0.751824358254 0.380376054798 0.236433254304 4.50044372315 0.361523595224
bin.28 0.366693979718 0.142403129479 1.07557922833 0.696432924211 2.28237567337
bin.35 8.00943012833 21.1034712651 129.866328282 7.69542229 9.11670482947
bin.30 2.83352008783 1.20693517545 0.149082861059 1.38443805342 0.0805387837442
bin.60 1.14531038283 1.20794850801 0.083832251763 3.39103018959 0.0213217955728
bin.88 0.0319876425246 0.0291397825448 15.3257075868 3.34407529898 0.0551228864745
bin.16 3.18774623194 4.71949277875 0.737464148634 1.15373414382 0.0671992522958
bin.77 0.878450731584 1.79194446385 2.07247497783 1.00101501984 1.10978200919
bin.39 0.193204029521 0.226118277867 0.430512349898 0.531370317985 28.8863706811
bin.24 0.248014423877 0.752853541556 1.79286940773 0.443897744311 0.182517858756
bin.98 0.402057204638 1.27728846893 1.16147517524 0.572184174175 3.11889345797
bin.55 0.0197531097726 0.0142118466322 0.0675537612963 0.0539315681948 2.89771762915
bin.50 0.362928647142 0.442737875081 9.3083412028 0.21401073177 10.7966196211
bin.71 3.30104754295 1.92827400909 5.55421993857 0.694084271924 0.322543747661
bin.92 26.9017791763 12.6573946342 3.92805386103 4.94337967301 2.23815425471
bin.87 0.00461601740262 0.00592792620342 0.0221535813219 0.0121411026699 2.89975144694
bin.67 0.139305245104 1.09156814825 0.772901549856 1.00011569058 8.50798826771
bin.2 1.10218327782 5.59264424616 1.63893447797 1.38711861424 1.79960792692
bin.5 0.524240956392 2.2345284087 0.336979486036 0.601706315704 13.634334936
bin.11 1.41434223224 0.0100647843393 0.0347602264229 0.0245605879295 1.40875493442
bin.31 0.012616140164 0.00689259187842 5.10259591338 2.11776753123 0.0157340292319
bin.7 6.23267858628 0.0973739968354 0.271747207493 0.262648075738 0.107933837851
bin.58 1.08923180417 1.64160334864 1.28465508884 0.505304701362 3.43022078584
bin.72 0.0311571771495 0.014775502679 17.5903456571 3.87829412039 0.0427469476213
bin.69 0.873629810992 0.521707324392 0.6353333976 1.52784345414 0.237895983573
bin.21 0.0239148147374 0.0607423563455 0.0394364749389 0.0815155016936 3.67779637813
bin.65 0.0582818148887 0.572030459278 0.320069390498 0.20427979297 2.5151478608
bin.91 0.812610065395 2.20493885408 2.39596002892 0.823989995718 0.851853755026
bin.51 2.93998372985 4.05657812749 7.97401849857 1.45435770088 1.12746157903
bin.94 1.12813063682 0.768264162576 0.0317690429722 0.243753152716 0.360976183297
bin.75 0.00737057950409 0.00526097436315 4.02787909498 0.857672905799 0.018916816811
bin.47 1.6565328428 1.10585489702 0.145630815955 0.261496100592 0.20735476094
bin.57 2.06411449301 0.0711262988851 0.0836426656942 0.1207483655 0.633055163148

Best
Chunxu

@ursky
Copy link
Collaborator

ursky commented Nov 6, 2018

The values reported by the Quant_bins module are essentially estimated average read coverage values for each bin in each sample, standardized to the number of reads in each sample. So they do not have to add up to any particular number. Looks like your bins have a relatively low coverage in each sample, but you were able to recover them because you co-assembled all samples together. Usually you can start to recover decent bins at about >6X coverage. We still call these values relative abundance because we don't actually know their true abundance because the biomass can change between samples too.

@xuechunxu
Copy link
Author

Can I compare the relative abundance of bins generated by metaWRAP separately? In detail, I run the metaWRAP pipeline to two sample metagenome separately. Can I compare the relative abundance of these bins together?

@ursky
Copy link
Collaborator

ursky commented Nov 7, 2018

Yes, the counts are normalized to library size, so you should be able to.

@tianchen2019
Copy link

Yes, the counts are normalized to library size, so you should be able to.

If i want to compare the treatment(smaple 1,2,3) and the control(sample 4,5,6) using the abundance_table.tab, Do I need to normalize anything?

@ursky
Copy link
Collaborator

ursky commented Mar 26, 2019

Good question. No, you do not need to modufy the values - they are already standardized to contig counts per million reads.

@tianchen2019
Copy link

Good question. No, you do not need to modufy the values - they are already standardized to contig counts per million reads.

if i want to see the different bin's abundance between the treatment and control and found some bins having significant different abundance,which test method should I use? i have tried to take bins as gene,and want to analyze different expression between treatment and control. But, the input file is a read count matrix in edgeR and DESeq2.

@tianchen2019
Copy link

Good question. No, you do not need to modufy the values - they are already standardized to contig counts per million reads.

Dear developer:
In the result of abundance_table.tab, i found the sum of all bins' abundance vaule in each sample is different. i think the sum of all bins' abundance vaule in each sample should be the same when i compare bin's abundance between the treatment(sample 1,2,3) and the control(sample 4,5,6). Like TPM in RNA-seq, the sum of all genes's TPM in different samples is the same.

  1. Is it because the normalizing way using split_salmon_out_into_bins.py ?
    image
  2. So, because the sum of all bins' abundance vaule in each samples is different , can i use every bin's abundance vaule divide by the sum of all bins' abundance vaule in corresponding sample to normalize the vaule?

@ursky
Copy link
Collaborator

ursky commented Apr 22, 2019

This is a good, but tricky question. To put it simply, the total abundance of the bins absolutely do NOT have to be the same in each sample. This is very different from something like RNAseq gene expression values, because we cannot reliably reconstruct all the bins from all the samples. Because assembly and binning biases vary between samples, the total of bin (and contig) abundances can be different. To explain why, lets consider a simple example:

Lets say you are comparing two microbiomes that have a total of 10 species living in them, but the distribution of their abundances is different. You perform binning and are able to assemble and extract 5 of the species as MAGs (bins). However, it is completely possible that these MAGs are the dominant species in sample 1, but are in lesser abundance in sample 2 (remember that good coverage is only one factor in how easy it is to extract a bin - maybe the abundant species in sample 2 have high GC, similar k-mer content, or higher strain heterogeneity). When you quantify your bins, you will find that the total abundance of sample 1 bins is much greater than in sample 2, however those abundances are very real observations. If you standardize to the total abundance of the MAGs instead of the library size, you can lose a lot of information. This principle also applies to contig quantitation - some samples assemble easier than others. It is also important to note that co-assembly does NOT resolve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants