Error when running puma via docker #2

yjx1217 · 2023-01-13T03:09:29Z

Hello, we were trying to run puma via docker but encountered the following error. Could you please help us to diagnose what might be the cause. Thanks in advance!

Command:

docker run --rm -v "$(pwd)/data/:/data" -v "$(pwd)/in_out/:/in_out"  kvdlab/puma:1.2.1 run_puma.py \
    -i /in_out/hom_sap_BF288.final.tidy.fa \
    -o /in_out/puma_out \
    -d /data

Error message:


Traceback (most recent call last):
  File "/app/puma/scripts/run_puma.py", line 138, in <module>
    main()
  File "/app/puma/scripts/run_puma.py", line 130, in main
    puma.run(args)
  File "/app/puma/scripts/puma.py", line 2178, in run
    altered_genome = linearize_genome(original_genome, args)
  File "/app/puma/scripts/puma.py", line 107, in linearize_genome
    proteins = identify_main_proteins(extended_genome, args)
  File "/app/puma/scripts/puma.py", line 232, in identify_main_proteins
    blast_out, orfs_fa = blast_main_orfs(genome, args)
  File "/app/puma/scripts/puma.py", line 214, in blast_main_orfs
    num_hits = run_blastp(orfs_fa, blast_sub, blast_out)
  File "/app/puma/scripts/puma.py", line 175, in run_blastp
    stdout, stderr = cmd()
  File "/usr/local/lib/python3.8/site-packages/Bio/Application/__init__.py", line 569, in __call__
    raise ApplicationError(return_code, str(self), stdout_str, stderr_str)
Bio.Application.ApplicationError: Non-zero return code 1 from 'blastp -out /in_out/puma_out/hom_sap_BF288/program_files/main_blast/blast_results_main.tab -outfmt 6 -query /in_out/puma_out/hom_sap_BF288/program_files/main_blast/orfs.fa -evalue 1e-05 -subject /data/main_blast.fa', message 'Command line argument error: Argument "subject". File is not accessible:  `/data/main_blast.fa\''

The text was updated successfully, but these errors were encountered:

KVDlab · 2023-01-25T19:37:21Z

sorry about the delayed response, would you mind sharing the fasta file?

RobJackson28 · 2023-01-26T22:23:58Z

@yjx1217 I was able to reproduce the error via docker and was able to resolve it as follows:

Remove -v "$(pwd)/data/:/data". This mounts a local data folder which would need to contain the data_dir files.
Change -d /data to -d /app/puma/data_dir. Rather than requiring a mounted local folder, this points directly to the required data_dir files that are already within the docker container.
Make sure you have a local folder that contains your input fasta file. In the working code below, I named this "input_and_output", just to keep it unique. This local folder gets mounted as "in_out". The output folder will get written to this same folder.

This docker code runs successfully:
sudo docker run --rm -v "$(pwd)/input_and_output/:/in_out" kvdlab/puma:1.2.1 run_puma.py -i /in_out/hom_sap_BF288.final.tidy.fa -o /in_out/puma_out -d /app/puma/data_dir

Hopefully that works for you!

yjx1217 · 2023-01-27T06:23:27Z

Dear @RobJackson28,

Thanks for the feedback and detailed information. I made the change according to your suggestions and the previous error disappeared. So many thanks for the tips!

However, I now encountered a new error. Could you provide further help please?

The error message is as follows:

Traceback (most recent call last):
  File "/app/puma/scripts/run_puma.py", line 138, in <module>
    main()
  File "/app/puma/scripts/run_puma.py", line 130, in main
    puma.run(args)
  File "/app/puma/scripts/puma.py", line 2278, in run
    virus.update(find_urr(virus))
  File "/app/puma/scripts/puma.py", line 666, in find_urr
    int(genomelen), 1,
NameError: name 'genomelen' is not defined

The input fasta file is further attached.

Best,
Jia-Xing

hom_sap_BF288.final.tidy.fa.gz

KVDlab · 2023-01-27T14:59:58Z

@yjx1217, could you share the fasta sequence?

yjx1217 · 2023-01-28T00:31:29Z

Hi @KVDlab ，

The input fasta file has been attached in my last reply on the github issue page (#2). The direct URL is as follows: https://github.com/KVD-lab/puma/files/10516081/hom_sap_BF288.final.tidy.fa.gz :-)

Best,
Jia-Xing

KVDlab · 2023-01-30T18:14:01Z

@yjx1217

This new 'puma.py' should run your sequence. That was a weird bug, thanks for pointing it out.
We will update the docker and github repo asap

puma.py.zip

yjx1217 · 2023-02-02T06:25:52Z

Dear @KVDlab ,

Many thanks for the quick fix! I installed puma via docker. So is it possible to directly replace the old puma.py script with the new one within docker? If so, could you guide me how to do it? Thanks in advance!

Best,
Jia-Xing

KVDlab · 2023-02-08T16:43:37Z

@yjx1217 docker should be updated and good to go

yjx1217 · 2023-02-15T01:42:10Z

Dear @KVDlab ,

Many thanks for the fix and version update! The docker version 1.2.2 works nicely now!

Best,
Jia-Xing

yjx1217 · 2023-02-17T06:59:47Z

Dear @KVDlab

Sorry. While testing with HPV16 and HPV18 reference genomes, we noticed that although the annotated CDS/protein sequences seems to be correct, the genomic coordinates might be wrong.

For example, in the reference annotation of HPV16 (https://www.ncbi.nlm.nih.gov/nuccore/NC_001526.4), the annotation shows>>


[gene](https://www.ncbi.nlm.nih.gov/nuccore/NC_001526.4?from=1892&to=2989)            1892..2989
                     /gene="E2"
                     /locus_tag="HpV16gp4"
                     /db_xref="GeneID:[1489080](https://www.ncbi.nlm.nih.gov/gene/1489080)"
     [CDS](https://www.ncbi.nlm.nih.gov/nuccore/NC_001526.4?from=1892&to=2989)             1892..2989
                     /gene="E2"
                     /locus_tag="HpV16gp4"
                     /note="E2. Plays a role in the initiation of viral DNA
                     replication. Forms E1-E2 dimer with replication protein
                     E1. The E1-E2 complex binds to the replication origin
                     which contains binding sites for both proteins."
                     /codon_start=1
                     /product="regulatory protein E2"
                     /protein_id="[NP_041328.1](https://www.ncbi.nlm.nih.gov/protein/9627106)"
                     /db_xref="GeneID:[1489080](https://www.ncbi.nlm.nih.gov/gene/1489080)"
                     /translation="METLCQRLNVCQDKILTHYENDSTDLRDHIDYWKHMRLECAIYY
                     KAREMGFKHINHQVVPTLAVSKNKALQAIELQLTLETIYNSQYSNEKWTLQDVSLEVY
                     LTAPTGCIKKHGYTVEVQFDGDICNTMHYTNWTHIYICEEASVTVVEGQVDYYGLYYV
                     HEGIRTYFVQFKDDAEKYSKNKVWEVHAGGQVILCPTSVFSSNEVSSPEIIRQHLANH
                     PAATHTKAVALGTEETQTTIQRPRSEPDTGNPCHTTKLLHRDSVDSAPILTAFNSSHK
                     GRINCNSNTTPIVHLKGDANTLKCLRYRFKKHCTLYTAVSSTWHWTGHNVKHKSAIVT
                     LTYDSEWQRDQFLSQVKIPKTITVSTGFMSI"

However, the annotation from PUMA suggests:


E2 start and stop position:
3506,4603

E2 sequence:
atggagactctttgccaacgtttaaatgtgtgtcaggacaaaatactaacacattatgaaaatgatagtacagacctacgtgaccatatagactattggaaacacatgcgcctagaatgtgctatttattacaaggccagagaaatgggatttaaacatattaaccaccaggtggtgccaacactggctgtatcaaagaataaagcattacaagcaattgaactgcaactaacgttagaaacaatatataactcacaatatagtaatgaaaagtggacattacaagacgttagccttgaagtgtatttaactgcaccaacaggatgtataaaaaaacatggatatacagtggaagtgcagtttgatggagacatatgcaatacaatgcattatacaaactggacacatatatatatttgtgaagaagcatcagtaactgtggtagagggtcaagttgactattatggtttatattatgttcatgaaggaatacgaacatattttgtgcagtttaaagatgatgcagaaaaatatagtaaaaataaagtatgggaagttcatgcgggtggtcaggtaatattatgtcctacatctgtgtttagcagcaacgaagtatcctctcctgaaattattaggcagcacttggccaaccaccccgccgcgacccataccaaagccgtcgccttgggcaccgaagaaacacagacgactatccagcgaccaagatcagagccagacaccggaaacccctgccacaccactaagttgttgcacagagactcagtggacagtgctccaatcctcactgcatttaacagctcacacaaaggacggattaactgtaatagtaacactacacccatagtacatttaaaaggtgatgctaatactttaaaatgtttaagatatagatttaaaaagcattgtacattgtatactgcagtgtcgtctacatggcattggacaggacataatgtaaaacataaaagtgcaattgttacacttacatatgatagtgaatggcaacgtgaccaatttttgtctcaagttaaaataccaaaaactattacagtgtctactggatttatgtctatatga

E2 translated sequence:
METLCQRLNVCQDKILTHYENDSTDLRDHIDYWKHMRLECAIYYKAREMGFKHINHQVVPTLAVSKNKALQAIELQLTLETIYNSQYSNEKWTLQDVSLEVYLTAPTGCIKKHGYTVEVQFDGDICNTMHYTNWTHIYICEEASVTVVEGQVDYYGLYYVHEGIRTYFVQFKDDAEKYSKNKVWEVHAGGQVILCPTSVFSSNEVSSPEIIRQHLANHPAATHTKAVALGTEETQTTIQRPRSEPDTGNPCHTTKLLHRDSVDSAPILTAFNSSHKGRINCNSNTTPIVHLKGDANTLKCLRYRFKKHCTLYTAVSSTWHWTGHNVKHKSAIVTLTYDSEWQRDQFLSQVKIPKTITVSTGFMSI

Best,
Jia-Xing

KVDlab · 2023-02-17T15:31:36Z

Hi Jia-Xing, When we made PuMA, we decided the recircularize all genomes at the same position (i.e., the first nt after the L1 stop codon). Since most of the older genomes in genbank recircularize somewhere upstream of E6, our positions will be offset. I hope this help! Koenraad On Feb 16, 2023, at 11:59 PM, Jia-Xing Yue ***@***.******@***.***>> wrote: External Email Dear @KVDlab<https://github.com/KVDlab> Sorry. While testing with HPV16 and HPV18 reference genomes, we noticed that although the annotated CDS/protein sequences seems to be correct, the genomic coordinates might be wrong. For example, in the reference annotation of HPV16 (https://www.ncbi.nlm.nih.gov/nuccore/NC_001526.4), the annotation shows>> [gene](https://www.ncbi.nlm.nih.gov/nuccore/NC_001526.4?from=1892&to=2989) 1892..2989 /gene="E2" /locus_tag="HpV16gp4" /db_xref="GeneID:[1489080](https://www.ncbi.nlm.nih.gov/gene/1489080)" [CDS](https://www.ncbi.nlm.nih.gov/nuccore/NC_001526.4?from=1892&to=2989) 1892..2989 /gene="E2" /locus_tag="HpV16gp4" /note="E2. Plays a role in the initiation of viral DNA replication. Forms E1-E2 dimer with replication protein E1. The E1-E2 complex binds to the replication origin which contains binding sites for both proteins." /codon_start=1 /product="regulatory protein E2" /protein_id="[NP_041328.1](https://www.ncbi.nlm.nih.gov/protein/9627106)" /db_xref="GeneID:[1489080](https://www.ncbi.nlm.nih.gov/gene/1489080)" /translation="METLCQRLNVCQDKILTHYENDSTDLRDHIDYWKHMRLECAIYY KAREMGFKHINHQVVPTLAVSKNKALQAIELQLTLETIYNSQYSNEKWTLQDVSLEVY LTAPTGCIKKHGYTVEVQFDGDICNTMHYTNWTHIYICEEASVTVVEGQVDYYGLYYV HEGIRTYFVQFKDDAEKYSKNKVWEVHAGGQVILCPTSVFSSNEVSSPEIIRQHLANH PAATHTKAVALGTEETQTTIQRPRSEPDTGNPCHTTKLLHRDSVDSAPILTAFNSSHK GRINCNSNTTPIVHLKGDANTLKCLRYRFKKHCTLYTAVSSTWHWTGHNVKHKSAIVT LTYDSEWQRDQFLSQVKIPKTITVSTGFMSI" However, the annotation from PUMA suggests: E2 start and stop position: 3506,4603 E2 sequence: atggagactctttgccaacgtttaaatgtgtgtcaggacaaaatactaacacattatgaaaatgatagtacagacctacgtgaccatatagactattggaaacacatgcgcctagaatgtgctatttattacaaggccagagaaatgggatttaaacatattaaccaccaggtggtgccaacactggctgtatcaaagaataaagcattacaagcaattgaactgcaactaacgttagaaacaatatataactcacaatatagtaatgaaaagtggacattacaagacgttagccttgaagtgtatttaactgcaccaacaggatgtataaaaaaacatggatatacagtggaagtgcagtttgatggagacatatgcaatacaatgcattatacaaactggacacatatatatatttgtgaagaagcatcagtaactgtggtagagggtcaagttgactattatggtttatattatgttcatgaaggaatacgaacatattttgtgcagtttaaagatgatgcagaaaaatatagtaaaaataaagtatgggaagttcatgcgggtggtcaggtaatattatgtcctacatctgtgtttagcagcaacgaagtatcctctcctgaaattattaggcagcacttggccaaccaccccgccgcgacccataccaaagccgtcgccttgggcaccgaagaaacacagacgactatccagcgaccaagatcagagccagacaccggaaacccctgccacaccactaagttgttgcacagagactcagtggacagtgctccaatcctcactgcatttaacagctcacacaaaggacggattaactgtaatagtaacactacacccatagtacatttaaaaggtgatgctaatactttaaaatgtttaagatatagatttaaaaagcattgtacattgtatactgcagtgtcgtctacatggcattggacaggacataatgtaaaacataaaagtgcaattgttacacttacatatgatagtgaatggcaacgtgaccaatttttgtctcaagttaaaataccaaaaactattacagtgtctactggatttatgtctatatga E2 translated sequence: METLCQRLNVCQDKILTHYENDSTDLRDHIDYWKHMRLECAIYYKAREMGFKHINHQVVPTLAVSKNKALQAIELQLTLETIYNSQYSNEKWTLQDVSLEVYLTAPTGCIKKHGYTVEVQFDGDICNTMHYTNWTHIYICEEASVTVVEGQVDYYGLYYVHEGIRTYFVQFKDDAEKYSKNKVWEVHAGGQVILCPTSVFSSNEVSSPEIIRQHLANHPAATHTKAVALGTEETQTTIQRPRSEPDTGNPCHTTKLLHRDSVDSAPILTAFNSSHKGRINCNSNTTPIVHLKGDANTLKCLRYRFKKHCTLYTAVSSTWHWTGHNVKHKSAIVTLTYDSEWQRDQFLSQVKIPKTITVSTGFMSI Best, Jia-Xing — Reply to this email directly, view it on GitHub<#2 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AKMLAIFC7J3NKZQ5PE2NTATWX4OW7ANCNFSM6AAAAAATZ56Z54>. You are receiving this because you were mentioned.Message ID: ***@***.***>

yjx1217 · 2023-02-21T06:52:30Z

Dear @KVDlab,

I see. Make sense. Thank you very much!

On another note, after making more tests with puma, we noticed that:

The HPV18 L1 gene annotated from puma is always a bit shorter compared with the HPV18 refseq (NC_001357.1) on NCBI.
It seems that the E4 gene cannot be annotated by PUMA.

We were wondering if further updates could be applied to make corresponding improvements.

FYI:
This is the HPV18 L1 protein sequence based on NCBI REF (NC_001357.1):
MCLYTRVLILHYHLLPLYGPLYHPQPLPLHSILVYMVHIIICGHYIILFLKSVNVFPIFLQMALWRPSDNTVYLPPPSVARVVNTDDYVTRTSIFYHAGSSRLLTVGNPYFRVPAGGGNKQDIPKVSAYQYRVFRVQLPDPNKFGLPDNSIYNPETQRLVWACAGVEIGRGQPLGVGLSGHPFYNKLDDTESSHAATSNVSEDVRDNVSVDYKQTQLCILGCAPAIGEHWAKGTACKSRPLSQGDCPPLELKNTVLEDGDMVDTGYGAMDFSTLQDTKCEVPLDICQSICKYPDYLQMSADPYGDSMFFCLRREQLFARHFWNRAGTMGDTVPQSLYIKGTGMRASPGSCVYSPSPSGSIVTSDSQLFNKPYWLHKAQGHNNGICWHNQLFVTVVDTTRSTNLTICASTQSPVPGQYDATKFKQYSRHVEEYDLQFIFQLCTITLTADVMSYIHSMNSSILEDWNFGVPPPPTTSLVDTYRFVQSVAITCQKDAAPAENKDPYDKLKFWNVDLKEKFSLDLDQYPLGRKFLVQAGLRRKPTIGPRKRSAPSATTSSKPAKRVRVRARK

And this is the HPV18 L1 protein sequence annotated by PUMA using the same input genome sequnce:
MALWRPSDNTVYLPPPSVARVVNTDDYVTPTSIFYHAGSSRLLTVGNPYFRVPAGGGNKQDIPKVSAYQYRVFRVQLPDPNKFGLPDTSIYNPETQRLVWACAGVEIGRGQPLGVGLSGHPFYNKLDDTESSHAATSNVSEDVRDNVSVDYKQTQLCILGCAPAIGEHWAKGTACKSRPLSQGDCPPLELKNTVLEDGDMVDTGYGAMDFSTLQDTKCEVPLDICQSICKYPDYLQMSADPYGDSMFFCLRREQLFARHFWNRAGTMGDTVPQSLYIKGTGMPASPGSCVYSPSPSGSIVTSDSQLFNKPYWLHKAQGHNNGVCWHNQLFVTVVDTTPSTNLTICASTQSPVPGQYDATKFKQYSRHVEEYDLQFIFQLCTITLTADVMSYIHSMNSSILEDWNFGVPPPPTTSLVDTYRFVQSVAITCQKDAAPAENKDPYDKLKFWNVDLKEKFSLDLDQYPLGRKFLVQAGLRRKPTIGPRKRSAPSATTSSKPAKRVRVRARK

Best,
Jia-Xing

KVDlab · 2023-02-27T02:40:46Z

Hey, Not sure why you are reopening, but if it is due to the E4 and L1 I can explain below 1) As we describe in the PuMA paper, the biological evidence suggests that L1 uses a Met start codon that is typically 4 aa upstream of a conserved W. 2) We do not think ‘E4’ is a protein. We annotate E1^E4, not the E4 exon alone. Hope that helps On Feb 26, 2023, at 5:30 PM, Jia-Xing Yue ***@***.******@***.***>> wrote: External Email Reopened #2<#2>. — Reply to this email directly, view it on GitHub<#2 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AKMLAICOF5BRT5DMFAS255LWZPYSXANCNFSM6AAAAAATZ56Z54>. You are receiving this because you were mentioned.Message ID: ***@***.***>

yjx1217 · 2023-02-27T04:43:41Z

Dear @KVDlab,

Many thanks for the explanation! Very helpful!
I am closing the ticket now.

Best,
Jia-Xing

yjx1217 closed this as completed Feb 15, 2023

yjx1217 reopened this Feb 27, 2023

yjx1217 closed this as completed Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when running puma via docker #2

Error when running puma via docker #2

yjx1217 commented Jan 13, 2023

KVDlab commented Jan 25, 2023

RobJackson28 commented Jan 26, 2023

yjx1217 commented Jan 27, 2023

KVDlab commented Jan 27, 2023

yjx1217 commented Jan 28, 2023 •

edited

Loading

KVDlab commented Jan 30, 2023

yjx1217 commented Feb 2, 2023

KVDlab commented Feb 8, 2023

yjx1217 commented Feb 15, 2023

yjx1217 commented Feb 17, 2023

KVDlab commented Feb 17, 2023 via email

yjx1217 commented Feb 21, 2023 •

edited

Loading

KVDlab commented Feb 27, 2023 via email

yjx1217 commented Feb 27, 2023 •

edited

Loading

Error when running puma via docker #2

Error when running puma via docker #2

Comments

yjx1217 commented Jan 13, 2023

KVDlab commented Jan 25, 2023

RobJackson28 commented Jan 26, 2023

yjx1217 commented Jan 27, 2023

KVDlab commented Jan 27, 2023

yjx1217 commented Jan 28, 2023 • edited Loading

KVDlab commented Jan 30, 2023

yjx1217 commented Feb 2, 2023

KVDlab commented Feb 8, 2023

yjx1217 commented Feb 15, 2023

yjx1217 commented Feb 17, 2023

KVDlab commented Feb 17, 2023 via email

yjx1217 commented Feb 21, 2023 • edited Loading

KVDlab commented Feb 27, 2023 via email

yjx1217 commented Feb 27, 2023 • edited Loading

yjx1217 commented Jan 28, 2023 •

edited

Loading

yjx1217 commented Feb 21, 2023 •

edited

Loading

yjx1217 commented Feb 27, 2023 •

edited

Loading