Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running puma via docker #2

Closed
yjx1217 opened this issue Jan 13, 2023 · 14 comments
Closed

Error when running puma via docker #2

yjx1217 opened this issue Jan 13, 2023 · 14 comments

Comments

@yjx1217
Copy link

yjx1217 commented Jan 13, 2023

Hello, we were trying to run puma via docker but encountered the following error. Could you please help us to diagnose what might be the cause. Thanks in advance!

Command:

docker run --rm -v "$(pwd)/data/:/data" -v "$(pwd)/in_out/:/in_out"  kvdlab/puma:1.2.1 run_puma.py \
    -i /in_out/hom_sap_BF288.final.tidy.fa \
    -o /in_out/puma_out \
    -d /data

Error message:


Traceback (most recent call last):
  File "/app/puma/scripts/run_puma.py", line 138, in <module>
    main()
  File "/app/puma/scripts/run_puma.py", line 130, in main
    puma.run(args)
  File "/app/puma/scripts/puma.py", line 2178, in run
    altered_genome = linearize_genome(original_genome, args)
  File "/app/puma/scripts/puma.py", line 107, in linearize_genome
    proteins = identify_main_proteins(extended_genome, args)
  File "/app/puma/scripts/puma.py", line 232, in identify_main_proteins
    blast_out, orfs_fa = blast_main_orfs(genome, args)
  File "/app/puma/scripts/puma.py", line 214, in blast_main_orfs
    num_hits = run_blastp(orfs_fa, blast_sub, blast_out)
  File "/app/puma/scripts/puma.py", line 175, in run_blastp
    stdout, stderr = cmd()
  File "/usr/local/lib/python3.8/site-packages/Bio/Application/__init__.py", line 569, in __call__
    raise ApplicationError(return_code, str(self), stdout_str, stderr_str)
Bio.Application.ApplicationError: Non-zero return code 1 from 'blastp -out /in_out/puma_out/hom_sap_BF288/program_files/main_blast/blast_results_main.tab -outfmt 6 -query /in_out/puma_out/hom_sap_BF288/program_files/main_blast/orfs.fa -evalue 1e-05 -subject /data/main_blast.fa', message 'Command line argument error: Argument "subject". File is not accessible:  `/data/main_blast.fa\''
@KVDlab
Copy link
Contributor

KVDlab commented Jan 25, 2023

sorry about the delayed response, would you mind sharing the fasta file?

@RobJackson28
Copy link

@yjx1217 I was able to reproduce the error via docker and was able to resolve it as follows:

  1. Remove -v "$(pwd)/data/:/data". This mounts a local data folder which would need to contain the data_dir files.
  2. Change -d /data to -d /app/puma/data_dir. Rather than requiring a mounted local folder, this points directly to the required data_dir files that are already within the docker container.
  3. Make sure you have a local folder that contains your input fasta file. In the working code below, I named this "input_and_output", just to keep it unique. This local folder gets mounted as "in_out". The output folder will get written to this same folder.

This docker code runs successfully:
sudo docker run --rm -v "$(pwd)/input_and_output/:/in_out" kvdlab/puma:1.2.1 run_puma.py -i /in_out/hom_sap_BF288.final.tidy.fa -o /in_out/puma_out -d /app/puma/data_dir

Hopefully that works for you!

@yjx1217
Copy link
Author

yjx1217 commented Jan 27, 2023

Dear @RobJackson28,

Thanks for the feedback and detailed information. I made the change according to your suggestions and the previous error disappeared. So many thanks for the tips!

However, I now encountered a new error. Could you provide further help please?

The error message is as follows:

Traceback (most recent call last):
  File "/app/puma/scripts/run_puma.py", line 138, in <module>
    main()
  File "/app/puma/scripts/run_puma.py", line 130, in main
    puma.run(args)
  File "/app/puma/scripts/puma.py", line 2278, in run
    virus.update(find_urr(virus))
  File "/app/puma/scripts/puma.py", line 666, in find_urr
    int(genomelen), 1,
NameError: name 'genomelen' is not defined

The input fasta file is further attached.

Best,
Jia-Xing

hom_sap_BF288.final.tidy.fa.gz

@KVDlab
Copy link
Contributor

KVDlab commented Jan 27, 2023

@yjx1217, could you share the fasta sequence?

@yjx1217
Copy link
Author

yjx1217 commented Jan 28, 2023

Hi @KVDlab

The input fasta file has been attached in my last reply on the github issue page (#2). The direct URL is as follows: https://github.com/KVD-lab/puma/files/10516081/hom_sap_BF288.final.tidy.fa.gz :-)

Best,
Jia-Xing

@KVDlab
Copy link
Contributor

KVDlab commented Jan 30, 2023

@yjx1217

This new 'puma.py' should run your sequence. That was a weird bug, thanks for pointing it out.
We will update the docker and github repo asap

puma.py.zip

@yjx1217
Copy link
Author

yjx1217 commented Feb 2, 2023

Dear @KVDlab ,

Many thanks for the quick fix! I installed puma via docker. So is it possible to directly replace the old puma.py script with the new one within docker? If so, could you guide me how to do it? Thanks in advance!

Best,
Jia-Xing

@KVDlab
Copy link
Contributor

KVDlab commented Feb 8, 2023

@yjx1217 docker should be updated and good to go

@yjx1217
Copy link
Author

yjx1217 commented Feb 15, 2023

Dear @KVDlab ,

Many thanks for the fix and version update! The docker version 1.2.2 works nicely now!

Best,
Jia-Xing

@yjx1217 yjx1217 closed this as completed Feb 15, 2023
@yjx1217
Copy link
Author

yjx1217 commented Feb 17, 2023

Dear @KVDlab

Sorry. While testing with HPV16 and HPV18 reference genomes, we noticed that although the annotated CDS/protein sequences seems to be correct, the genomic coordinates might be wrong.

For example, in the reference annotation of HPV16 (https://www.ncbi.nlm.nih.gov/nuccore/NC_001526.4), the annotation shows>>


[gene](https://www.ncbi.nlm.nih.gov/nuccore/NC_001526.4?from=1892&to=2989)            1892..2989
                     /gene="E2"
                     /locus_tag="HpV16gp4"
                     /db_xref="GeneID:[1489080](https://www.ncbi.nlm.nih.gov/gene/1489080)"
     [CDS](https://www.ncbi.nlm.nih.gov/nuccore/NC_001526.4?from=1892&to=2989)             1892..2989
                     /gene="E2"
                     /locus_tag="HpV16gp4"
                     /note="E2. Plays a role in the initiation of viral DNA
                     replication. Forms E1-E2 dimer with replication protein
                     E1. The E1-E2 complex binds to the replication origin
                     which contains binding sites for both proteins."
                     /codon_start=1
                     /product="regulatory protein E2"
                     /protein_id="[NP_041328.1](https://www.ncbi.nlm.nih.gov/protein/9627106)"
                     /db_xref="GeneID:[1489080](https://www.ncbi.nlm.nih.gov/gene/1489080)"
                     /translation="METLCQRLNVCQDKILTHYENDSTDLRDHIDYWKHMRLECAIYY
                     KAREMGFKHINHQVVPTLAVSKNKALQAIELQLTLETIYNSQYSNEKWTLQDVSLEVY
                     LTAPTGCIKKHGYTVEVQFDGDICNTMHYTNWTHIYICEEASVTVVEGQVDYYGLYYV
                     HEGIRTYFVQFKDDAEKYSKNKVWEVHAGGQVILCPTSVFSSNEVSSPEIIRQHLANH
                     PAATHTKAVALGTEETQTTIQRPRSEPDTGNPCHTTKLLHRDSVDSAPILTAFNSSHK
                     GRINCNSNTTPIVHLKGDANTLKCLRYRFKKHCTLYTAVSSTWHWTGHNVKHKSAIVT
                     LTYDSEWQRDQFLSQVKIPKTITVSTGFMSI"

However, the annotation from PUMA suggests:


E2 start and stop position:
3506,4603

E2 sequence:
atggagactctttgccaacgtttaaatgtgtgtcaggacaaaatactaacacattatgaaaatgatagtacagacctacgtgaccatatagactattggaaacacatgcgcctagaatgtgctatttattacaaggccagagaaatgggatttaaacatattaaccaccaggtggtgccaacactggctgtatcaaagaataaagcattacaagcaattgaactgcaactaacgttagaaacaatatataactcacaatatagtaatgaaaagtggacattacaagacgttagccttgaagtgtatttaactgcaccaacaggatgtataaaaaaacatggatatacagtggaagtgcagtttgatggagacatatgcaatacaatgcattatacaaactggacacatatatatatttgtgaagaagcatcagtaactgtggtagagggtcaagttgactattatggtttatattatgttcatgaaggaatacgaacatattttgtgcagtttaaagatgatgcagaaaaatatagtaaaaataaagtatgggaagttcatgcgggtggtcaggtaatattatgtcctacatctgtgtttagcagcaacgaagtatcctctcctgaaattattaggcagcacttggccaaccaccccgccgcgacccataccaaagccgtcgccttgggcaccgaagaaacacagacgactatccagcgaccaagatcagagccagacaccggaaacccctgccacaccactaagttgttgcacagagactcagtggacagtgctccaatcctcactgcatttaacagctcacacaaaggacggattaactgtaatagtaacactacacccatagtacatttaaaaggtgatgctaatactttaaaatgtttaagatatagatttaaaaagcattgtacattgtatactgcagtgtcgtctacatggcattggacaggacataatgtaaaacataaaagtgcaattgttacacttacatatgatagtgaatggcaacgtgaccaatttttgtctcaagttaaaataccaaaaactattacagtgtctactggatttatgtctatatga

E2 translated sequence:
METLCQRLNVCQDKILTHYENDSTDLRDHIDYWKHMRLECAIYYKAREMGFKHINHQVVPTLAVSKNKALQAIELQLTLETIYNSQYSNEKWTLQDVSLEVYLTAPTGCIKKHGYTVEVQFDGDICNTMHYTNWTHIYICEEASVTVVEGQVDYYGLYYVHEGIRTYFVQFKDDAEKYSKNKVWEVHAGGQVILCPTSVFSSNEVSSPEIIRQHLANHPAATHTKAVALGTEETQTTIQRPRSEPDTGNPCHTTKLLHRDSVDSAPILTAFNSSHKGRINCNSNTTPIVHLKGDANTLKCLRYRFKKHCTLYTAVSSTWHWTGHNVKHKSAIVTLTYDSEWQRDQFLSQVKIPKTITVSTGFMSI

Best,
Jia-Xing

@KVDlab
Copy link
Contributor

KVDlab commented Feb 17, 2023 via email

@yjx1217
Copy link
Author

yjx1217 commented Feb 21, 2023

Dear @KVDlab,

I see. Make sense. Thank you very much!

On another note, after making more tests with puma, we noticed that:

  1. The HPV18 L1 gene annotated from puma is always a bit shorter compared with the HPV18 refseq (NC_001357.1) on NCBI.
  2. It seems that the E4 gene cannot be annotated by PUMA.

We were wondering if further updates could be applied to make corresponding improvements.

FYI:
This is the HPV18 L1 protein sequence based on NCBI REF (NC_001357.1):
MCLYTRVLILHYHLLPLYGPLYHPQPLPLHSILVYMVHIIICGHYIILFLKSVNVFPIFLQMALWRPSDNTVYLPPPSVARVVNTDDYVTRTSIFYHAGSSRLLTVGNPYFRVPAGGGNKQDIPKVSAYQYRVFRVQLPDPNKFGLPDNSIYNPETQRLVWACAGVEIGRGQPLGVGLSGHPFYNKLDDTESSHAATSNVSEDVRDNVSVDYKQTQLCILGCAPAIGEHWAKGTACKSRPLSQGDCPPLELKNTVLEDGDMVDTGYGAMDFSTLQDTKCEVPLDICQSICKYPDYLQMSADPYGDSMFFCLRREQLFARHFWNRAGTMGDTVPQSLYIKGTGMRASPGSCVYSPSPSGSIVTSDSQLFNKPYWLHKAQGHNNGICWHNQLFVTVVDTTRSTNLTICASTQSPVPGQYDATKFKQYSRHVEEYDLQFIFQLCTITLTADVMSYIHSMNSSILEDWNFGVPPPPTTSLVDTYRFVQSVAITCQKDAAPAENKDPYDKLKFWNVDLKEKFSLDLDQYPLGRKFLVQAGLRRKPTIGPRKRSAPSATTSSKPAKRVRVRARK

And this is the HPV18 L1 protein sequence annotated by PUMA using the same input genome sequnce:
MALWRPSDNTVYLPPPSVARVVNTDDYVTPTSIFYHAGSSRLLTVGNPYFRVPAGGGNKQDIPKVSAYQYRVFRVQLPDPNKFGLPDTSIYNPETQRLVWACAGVEIGRGQPLGVGLSGHPFYNKLDDTESSHAATSNVSEDVRDNVSVDYKQTQLCILGCAPAIGEHWAKGTACKSRPLSQGDCPPLELKNTVLEDGDMVDTGYGAMDFSTLQDTKCEVPLDICQSICKYPDYLQMSADPYGDSMFFCLRREQLFARHFWNRAGTMGDTVPQSLYIKGTGMPASPGSCVYSPSPSGSIVTSDSQLFNKPYWLHKAQGHNNGVCWHNQLFVTVVDTTPSTNLTICASTQSPVPGQYDATKFKQYSRHVEEYDLQFIFQLCTITLTADVMSYIHSMNSSILEDWNFGVPPPPTTSLVDTYRFVQSVAITCQKDAAPAENKDPYDKLKFWNVDLKEKFSLDLDQYPLGRKFLVQAGLRRKPTIGPRKRSAPSATTSSKPAKRVRVRARK

Best,
Jia-Xing

@yjx1217 yjx1217 reopened this Feb 27, 2023
@KVDlab
Copy link
Contributor

KVDlab commented Feb 27, 2023 via email

@yjx1217
Copy link
Author

yjx1217 commented Feb 27, 2023

Dear @KVDlab,

Many thanks for the explanation! Very helpful!
I am closing the ticket now.

Best,
Jia-Xing

@yjx1217 yjx1217 closed this as completed Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants