Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dockerfile for pgma #290

Merged
merged 4 commits into from
Feb 8, 2022
Merged

add dockerfile for pgma #290

merged 4 commits into from
Feb 8, 2022

Conversation

rpetit3
Copy link
Contributor

@rpetit3 rpetit3 commented Feb 8, 2022

This PR adds a fork of pmga for serotyping, serotyping and MLST of all Neisseria species and Haemophilus influenzae.

It is originally a part of Bacterial Meningitis Genome Analysis Platform (BMGAP) which is available on the OAMD portal, but I wanted to be able to run it on the command-line.

This docker build includes the pre-built BLAST databases which makes the image slightly larger. If there is enough interest in the version of pmga, I would consider looking into making the pre-builds smaller (e.g. do we need 5k databases?)

  • This comment contains a description of what is in the pull request.
  • Build your own docker image using a Dockerfile
    • Directory structure should be name of the tool in lower case with special characters removed with a subdirectory of the version number (i.e. spades/3.12.0/Dockerfile)
    • Includes the recommended LABELS
  • (Optional) Dockerfile is built with best practices and has been approved by a linter (such as https://hadolint.github.io/hadolint/)
  • Edit main README.md
  • Edit Program_Licenses.md
  • Create a simple container-specific README.md in the same directory as the Dockerfile (i.e. spades/3.12.0/README.md)
  • Write a GitHub actions workflow
    • Should be located in .github/workflows/ and named test-.yml (i.e. .github/workflows/test-spades.yml)
    • Any files required for building are located in the same directory as the Dockerfile (i.e. spades/3.12.0/my_spades_tests.sh)
    • Have successfully run the workflow "Test image" in your forked repository

@rpetit3
Copy link
Contributor Author

rpetit3 commented Feb 8, 2022

Example run:

docker run --rm -u $(id -u):$(id -g) -v ${PWD}:/data pmga:latest pmga.py /data/GCF_900478275.fna --blast /pmga/blastdbs --force
2022-02-08 02:39:38:root:INFO - Found --force, removing existing /data/pmga
2022-02-08 02:39:39:root:INFO - Using Mash predicted species: Haemophilus influenzae
2022-02-08 02:39:39:root:INFO - Step 1. BLASTing against PubMLST DBs
2022-02-08 02:39:39:root:INFO - Blasting against pubMLST database with 1 workers for file /data/GCF_900478275.fna
2022-02-08 02:49:05:root:INFO - Completed BLAST for /data/GCF_900478275.fna
2022-02-08 02:49:06:root:INFO - Evaluating BLAST Results
2022-02-08 02:49:06:root:INFO - Step 2. Parsing BLAST results
2022-02-08 02:49:06:root:INFO - Parsing BLAST data to identify alleles to extract from pubMLST
2022-02-08 02:49:08:root:INFO - Compiling results
2022-02-08 02:49:08:root:INFO - Step 3. Writing outputs

pmga/3.0.2/Dockerfile Outdated Show resolved Hide resolved
rpetit3 and others added 3 commits February 8, 2022 08:10
Got this error on my first attempt, wound up just being a small typo

```
$ docker run --rm -u $(id -u):$(id -g) -v ${PWD}:/data robert/pmga:latest pmga /data/GCF_021535585.1_ASM2153558v1_genomic.Ngonorrhoeae.fna --blast /pmga/blastdb -t 4 -o /data/Ngonorrhoeae-pmga-test
2022-02-08 15:49:39:root:ERROR - Input BLAST directory (/pmga/blastdb) does not exist, please verify and try again
```
@kapsakcj
Copy link
Collaborator

kapsakcj commented Feb 8, 2022

Thanks for the PR!

Docker image builds successfully, although it does take quite a long time to build.....not your fault. I'll likely build locally and push to dockerhub/quay manually since this one takes so long to build.

I was able to run pmga successfully on one N. gonnorhea genome and one H. influenzae genome.

$ docker run --rm -u $(id -u):$(id -g) -v ${PWD}:/data robert/pmga:latest pmga /data/GCF_021535585.1_ASM2153558v1_genomic.Ngonorrhoeae.fna --blast /pmga/blastdbs -t 4 -o /data/Ngonorrhoeae-pmga-test
2022-02-08 15:50:05:root:INFO - Using Mash predicted species: Neisseria gonorrhoeae
2022-02-08 15:50:05:root:INFO - Step 1. BLASTing against PubMLST DBs
2022-02-08 15:50:05:root:INFO - Blasting against pubMLST database with 4 workers for file /data/GCF_021535585.1_ASM2153558v1_genomic.Ngonorrhoeae.fna
2022-02-08 16:07:34:root:INFO - Completed BLAST for /data/GCF_021535585.1_ASM2153558v1_genomic.Ngonorrhoeae.fna
2022-02-08 16:07:59:root:INFO - Evaluating BLAST Results
2022-02-08 16:07:59:root:INFO - Step 2. Parsing BLAST results
2022-02-08 16:07:59:root:INFO - Parsing BLAST data to identify alleles to extract from pubMLST
2022-02-08 16:08:04:root:INFO - Compiling results
/usr/local/lib/python3.8/dist-packages/Bio/Seq.py:2979: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
2022-02-08 16:08:05:root:INFO - Step 3. Writing outputs

# roughly 18 minutes wall time


# H influenzae ran a bit faster than the Ngonnorhea
$ time docker run --rm -u $(id -u):$(id -g) -v ${PWD}:/data robert/pmga:latest pmga /data/GCF_900478275.1_34211_D02_genomic.H-influenzae.fna --blast /pmga/blastdbs -t 4 -o /data/Hinfluenzae-pmga-test
2022-02-08 16:12:30:root:INFO - Using Mash predicted species: Haemophilus influenzae
2022-02-08 16:12:30:root:INFO - Step 1. BLASTing against PubMLST DBs
2022-02-08 16:12:30:root:INFO - Blasting against pubMLST database with 4 workers for file /data/GCF_900478275.1_34211_D02_genomic.H-influenzae.fna
2022-02-08 16:16:22:root:INFO - Completed BLAST for /data/GCF_900478275.1_34211_D02_genomic.H-influenzae.fna
2022-02-08 16:16:24:root:INFO - Evaluating BLAST Results
2022-02-08 16:16:24:root:INFO - Step 2. Parsing BLAST results
2022-02-08 16:16:24:root:INFO - Parsing BLAST data to identify alleles to extract from pubMLST
2022-02-08 16:16:26:root:INFO - Compiling results
/usr/local/lib/python3.8/dist-packages/Bio/Seq.py:2979: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
2022-02-08 16:16:26:root:INFO - Step 3. Writing outputs

real    3m59.915s
user    0m0.049s
sys     0m0.028s

Generated the expected output files 👍

@kapsakcj kapsakcj merged commit 91f790f into StaPH-B:master Feb 8, 2022
@rpetit3 rpetit3 deleted the rp3-add-pgma branch February 8, 2022 16:28
@rpetit3
Copy link
Contributor Author

rpetit3 commented Feb 8, 2022

Thank you!

@kapsakcj
Copy link
Collaborator

kapsakcj commented Feb 8, 2022

pmga is now available on dockerhub and quay 🥳

https://hub.docker.com/r/staphb/pmga/tags

https://quay.io/repository/staphb/pmga?tab=tags

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants