-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected extra haplotype/ two alleles in one haplotype assembly #130
Comments
It might be an issue of hifiasm instead of yak, I have recently found similar problems on some datasets. May I ask what's the het rate of the problematic sample? Is it a sample with very low het rate? |
Could you please show how many "HG:A:a", "HG:A:p" and “HG:A:m” at the green problematic unitig? There is an issue that if the informative parental k-mers are too spare, hifiasm may not remove it correctly. We have fixed it but hasn't made a new release. |
Here is the summary: Here are the lines I grepped from the r_utg.gfa A utg000017l 0 - m64157_210116_203159/141623423/ccs 0 12923 id:i:1229444 HG:A:m |
The corresponding problem unitig (red): HG:A:p = 134 A utg000217l 0 - m64157_210116_203159/71239946/ccs 0 11772 id:i:627806 HG:A:a |
For contrast, here are the results from the unitig graph that worked well: Hap1 unitig Hap2 unitig Thank you for pointing out this feature of the hifiasm output, it seems pretty useful. The succesful trio has clear partitioning signal in the bubbles. However the problem trio has one unitig (red) that is largely ambiguous and another unitig (green) that seems largely maternal and also has some ambiguity at the terminal end. |
I see. It is caused by misjoins at unitig level. How about directly break these contigs and remove the subregions with large number of unexpected labels? |
I got ahead of myself in running hifiasm. On going back and checking the overlapping distributions of k-mers/ "hap-mers" it seems that one of the parents in the problem trio is incorrect (probably a recording error during crossing). Sorry for not checking this prior to posting the issue! Perhaps it's worth recommending a QC step in the README (I used the merqury hapmer script)? Thanks for your help. |
Yean, we need to have a detailed manual... |
If you think it'd be helpful, I could write up what I did to check the trio parentage and share it here? |
Of course it is very helpful. Thank you in advance. |
Thank you so much @swomics! I'm curious how long did Merqury/Meryl take for checking? Did it take longer time than assembly? |
It was fairly quick for me Generating a meryl database for the HiFi reads (not including the parental read databases): Running the hapmers.sh script: |
I see. Thanks a lot! I will integrate your great suggestion into our manual. |
We have integrated your great suggestion here: https://hifiasm.readthedocs.io/en/latest/faq.html#why-the-hamming-error-rate-or-the-swith-error-rate-of-trio-binning-assembly-is-very-high. Thank you so much! |
Hi,
Hifiasm does a wonderful job of partitioning our focal locus into two clean haplotypes in most of our trio sets. However, there's some unexpected behaviour in one trio set: The hap1 assembly contains a single clean haplotype, but the hap2 assembly contains two clean haplotypes and one appears to perfectly match the hap1 result, giving three haplotypes overall (the haplotype 2 assembly total size also comes out slightly larger than expected). I figure that since one haplotype is unique to the hap1 assembly but also occurs in the hap2 assembly, alongside another haplotype, I can just remove the shared haplotype from hap2, but is this a mistake?
I gather from the paper that these events should be very rare and is likely caused by some form of ambiguity in the read partitioning. I had a look at various parameters, but nothing seems like an obvious fix. I suppose I could try and increase the yak k-mer length, what do you think?
Here are the commands for clarity:
screen -L ~/bin/yak/yak count -k31 -b37 -t10 -o 406PFemale.yak ../../Illumina_trio/Trimmed/Sample_13-406PFemale/406PFemale_merged.fastq
screen -L ~/bin/yak/yak count -k31 -b37 -t10 -o 406PMale.yak ../../Illumina_trio/Trimmed/Sample_14-406PMale/406PMale_merged.fastq
screen -L hifiasm -o 406-F1-03-femG.asm -t 30 -1 406PFemale.yak -2 406PMale.yak ../../HiFi_all/HiFi/22804_8_8-406-F1-03-femG_m64157_210116_203159.ccs.fastq.gz
I tested again with the most recent release and the behaviour is the same.
Thanks,
Sam
The text was updated successfully, but these errors were encountered: