Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not working with compressed files #432

Closed
leomrtns opened this issue Apr 13, 2022 · 1 comment
Closed

not working with compressed files #432

leomrtns opened this issue Apr 13, 2022 · 1 comment

Comments

@leomrtns
Copy link

Hello! I am having trouble working with compressed files (xz in my case), which used to be automatically detected before. I am using pangolin 4.0.5, and the file has the proper xz suffix.

Here is the full output:

****
Pangolin running in usher mode.
****
Maximum ambiguity allowed is 0.5.
****
Query file:     /usr/users/QIB_fr005/deolivl/Academic/Quadram/021.ncov/220307.react_omicron/05.update_wave19_0404/09.omicron_recombinant_0411/../../00.received/05.w13_19.nonrecombinant.fasta.xz
****
Data files found:
usher_pb:       /usr/users/QIB_fr005/deolivl/local/miniconda3/lib/python3.7/site-packages/pangolin_data/data/lineageTree.pb
****
Job stats:
job                      count    min threads    max threads
---------------------  -------  -------------  -------------
align_to_reference           1              1              1
all                          1              1              1
cache_sequence_assign        1              1              1
create_seq_hash              1              1              1
get_constellations           1              1              1
merged_info                  1              1              1
scorpio                      1              1              1
sequence_qc                  1              1              1
total                        8              1              1

[Wed Apr 13 11:49:28 2022]
Error in rule align_to_reference:
    jobid: 1
    output: /tmp/tmpjt4nustg/alignment.fasta
    log: /tmp/tmpjt4nustg/logs/minimap2_sam.log (check log file(s) for error message)
    shell:

        awk '{ if ($0 !~ /^>/) { gsub("-", "",$0); } print $0; }' "/usr/users/QIB_fr005/deolivl/Academic/Quadram/021.ncov/220307.react_omicron/05.update_wave19_0404/09.omicron_recombinant_0411/../../00.received/05
.w13_19.nonrecombinant.fasta.xz" |         awk '{ { gsub(" ", "_",$0); } { gsub(",", "_",$0); } print $0; }'  |         minimap2 -a -x asm20 --sam-hit-only --secondary=no --score-N=0  -t  1 /usr/users/QIB_fr005/de
olivl/local/miniconda3/lib/python3.7/site-packages/pangolin/data/reference.fasta - -o /tmp/tmpjt4nustg/mapped.sam &> /tmp/tmpjt4nustg/logs/minimap2_sam.log
        gofasta sam toMultiAlign             -s /tmp/tmpjt4nustg/mapped.sam             -t 1             --reference /usr/users/QIB_fr005/deolivl/local/miniconda3/lib/python3.7/site-packages/pangolin/data/referenc
e.fasta             --trimstart 265             --trimend 29674             --trim             --pad > '/tmp/tmpjt4nustg/alignment.fasta'

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)


Exiting because a job execution failed. Look above for error message

(If I uncompress the file then everything goes smooth)

cheers,
Leo

@pcjentsch
Copy link
Contributor

I'm also having this issue, the contents of minimap2_sam.log:

[M::mm_idx_gen::1.103*0.99] collected minimizers
[M::mm_idx_gen::1.104*0.99] sorted minimizers
[M::main::1.104*0.99] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::1.104*0.99] mid_occ = 50
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::1.105*0.99] distinct minimizers: 5370 (100.00% are singletons); average occurrences: 1.000; average spacing: 5.569; total length: 29903
[WARNING]�[1;31m empty sequence name in the input.�[0m
[WARNING]�[1;31m failed to parse the FASTA/FASTQ record next to ''. Continue anyway.�[0m
[M::worker_pipeline::1.106*0.99] mapped 6 sequences
[WARNING]�[1;31m failed to parse the FASTA/FASTQ record next to '��U���5?����5���m�ǯ��_�Y�H������x����9NcC֛�z�����hm��#�$�.�_h���Ӄw���~�h|'�\s�*66��V���*L�)���$���}ّ�{;��'�c�˞�9z{��˨"�h�0�L+��0E�H�9����b������0��Y�ǨYS�>������t�ʟ'. Continue anyway.�[0m
[M::worker_pipeline::1.106*0.99] mapped 1 sequences
[WARNING]�[1;31m empty sequence name in the input.�[0m
[WARNING]�[1;31m failed to parse the FASTA/FASTQ record next to '��E8���_O��&���\n�M'. Continue anyway.�[0m
[M::worker_pipeline::1.106*0.99] mapped 5 sequences
[WARNING]�[1;31m failed to parse the first FASTA/FASTQ record. Continue anyway.�[0m
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -a -x asm20 --sam-hit-only --secondary=no --score-N=0 -t 1 -o /home/workaccount/Work/postdoc/phylogeny/./mapped.sam /home/workaccount/.conda/envs/pangolin/lib/python3.8/site-packages/pangolin/data/reference.fasta -
[M::main] Real time: 1.107 sec; CPU: 1.101 sec; Peak RSS: 0.004 GB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants