-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with blobtools_create step #213
Comments
Hi, It does look like a similar issue. That one should have been fixed in this commit that parsed the sequence IDs differently. Could you share the fasta header for the config that is in the BUSCO results so I can take a look at what needs to be done to get them to match this time |
Hi,
They are contigs output from hifiasm assembly. Then in the busco metazoa and eukaryota the contigs id are the same as in the fasta file: 270107at2759 Complete h1tg000039l_path 1923850 1928544 + 617.0 599 For the archaea and bacteria the "_path" has been removed: Hope that answer your question, let me know if you need more info. Cheers |
Hi, |
Looking at this more closely, it not straightforward to fix this in the BlobToolKit import code - the previous issue was with extra characters being added to the sequence IDs, but here they are removed so it is not as easy to put them back. I think the best way would be a manual workaround either changing the sequence IDs in the assembly or editing the BUSCO file before import. To get the pipeline to run happily, you could also skip the bacteria and archaea lineages from the set of BUSCOs to run as these are usually included for use when exploring the results rather than required for the pipeline to run. |
Thanks for this. I'll just skip the bacteria and archaea lineages for now. |
Hi,
|
I think you may just need to remove bacteria and archaea from the |
Yes, it worked but I ran into another issue at the run_blobtools_create: Reading all TSV files in ../window_stats I have redownloaeded the newtaxdump to sure that this was not the issue. -rw-r--r-- 1 adejode bmtitus 4,9K 14 mai 08:31 gencode.dmp It looks like some parsing issue for one of the file but I could not figure out which one. Any idea ? Here is my config file:
|
Hello, Does anyone have an idea about this last issue ? seq_id, start, end = re.split(r"[:-]", parts[0]) |
This step parses the BUSCO locations from the diamond blast results file so it expects the sequence IDs to look like
From the error, I guess you may have |
My diamond out put looks like this: My diamond_blastp results look like this: and my blastn results look like this: So I guess the issue comes from the diamon_blastp output file. I am wondering is there a way to not have this issue without having to manually edit the files ? |
I think I can update the code to split only on the suffix that the pipeline adds so seq IDs with these characters can still be imported - should be able to get a release out with this fixed by the end of next week. |
Looks like I need to trace this back a bit further, could you share the first few lines from the eukaryota busco |
Hello,
I have encountered the followinf error:
more SHADDVT3_asm_bp_hap1_p_ctg/run_blobtools_create.log
Reading all TSV files in ../window_stats
Traceback (most recent call last):
File "/share/apps/miniconda3/py311_23.5.2-0/installed/envs/snakemake7/bin/blobtools", line 8, in
sys.exit(cli())
File "/share/apps/miniconda3/py311_23.5.2-0/installed/envs/snakemake7/lib/python3.9/site-packages/blobtools/blobtools.py", line 105, in cli
sys.exit(subcommand())
File "/share/apps/miniconda3/py311_23.5.2-0/installed/envs/snakemake7/lib/python3.9/site-packages/blobtools/lib/add.py", line 203, in cli
main(args)
File "/share/apps/miniconda3/py311_23.5.2-0/installed/envs/snakemake7/lib/python3.9/site-packages/blobtools/lib/add.py", line 149, in main
parsed = field["module"].parse(
File "/share/apps/miniconda3/py311_23.5.2-0/installed/envs/snakemake7/lib/python3.9/site-packages/blobtools/lib/busco.py", line 82, in parse
busco = parse_busco(file, identifiers=kwargs["dependencies"]["identifiers"])
File "/share/apps/miniconda3/py311_23.5.2-0/installed/envs/snakemake7/lib/python3.9/site-packages/blobtools/lib/busco.py", line 63, in parse_busco
raise UserWarning(
UserWarning: Contig names in the Busco file did not match dataset identifiers.
I saw people having similar issue (e.g. #2) in the past but I did not see the fix for it.
In my case it looks like everything works fine for busco metazoa and eukaryota but for archaea and bacteria busco removed "_path" from the contigs id and I think that is what is causing the error.
zcat SHADDVT3_asm_bp_hap1_p_ctg.busco.bacteria_odb10/full_table.tsv.gz | head
BUSCO version is: 5.6.1
The lineage dataset is: bacteria_odb10 (Creation date: 2020-03-06, number of genomes: 4085, number of BUSCOs: 124)
Busco id Status Sequence Gene Start Gene End Strand Score Length OrthoDB url Description
4421at2 Missing
9601at2 Missing
26038at2 Fragmented h1tg000560l 68481 70196 + 151.4 265 https://www.orthodb.org/v10?query=26038at2 phosphoribosylformylglycinamidine synthase
91428at2 Missing
95696at2 Missing
143460at2 Missing
182107at2 Missing
Is there a fix to this or is it something that has to be done manually ?
Thanks for your help
The text was updated successfully, but these errors were encountered: