Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add database to Seroba recipe #35378

Merged
merged 12 commits into from
Jun 17, 2022
Merged

Add database to Seroba recipe #35378

merged 12 commits into from
Jun 17, 2022

Conversation

rpetit3
Copy link
Member

@rpetit3 rpetit3 commented Jun 14, 2022

The database for Seroba is included in the repository (https://github.com/sanger-pathogens/seroba/tree/master/database) and is only 150 MB in size.

This PR adds the database to the recipe, and defaults to it. Users still have the option to provide their own database.


Please read the guidelines for Bioconda recipes before opening a pull request (PR).

  • If this PR adds or updates a recipe, use "Add" or "Update" appropriately as the first word in its title.
  • New recipes not directly relevant to the biological sciences need to be submitted to the conda-forge channel instead of Bioconda.
  • PRs require reviews prior to being merged. Once your PR is passing tests and ready to be merged, please issue the @BiocondaBot please add label command.
  • Please post questions on Gitter or ping @bioconda/core in a comment.
Please use the following BiocondaBot commands:

Everyone has access to the following BiocondaBot commands, which can be given in a comment:

@BiocondaBot please update Merge the master branch into a PR.
@BiocondaBot please add label Add the please review & merge label.
@BiocondaBot please fetch artifacts Post links to CI-built packages/containers.
You can use this to test packages locally.

For members of the Bioconda project, the following command is also available:

@BiocondaBot please merge Upload built packages/containers and merge a PR.
Someone must approve a PR first!
This reduces CI build time by reusing built artifacts.

Also, the bot watches for comments from non-members that include @bioconda/<team> and will automatically re-post them to notify the addressed <team>.

@rpetit3
Copy link
Member Author

rpetit3 commented Jun 14, 2022

@BiocondaBot please fetch artifacts

@BiocondaBot
Copy link
Collaborator

Package(s) built on Azure are ready for inspection:

Arch Package Zip File
noarch seroba-1.0.2-pyhdfd78af_1.tar.bz2 LinuxArtifacts

You may also use conda to install these after downloading and extracting the appropriate zip file. From the LinuxArtifacts or OSXArtifacts directories:

conda install -c packages <package name>

Docker image(s) built (images are in the LinuxArtifacts zip file above):

Package Tag Install with docker
seroba 1.0.2--pyhdfd78af_1
showgzip -dc LinuxArtifacts/images/seroba:1.0.2--pyhdfd78af_1.tar.gz | docker load

@rpetit3
Copy link
Member Author

rpetit3 commented Jun 14, 2022

Grabbed Test data from https://github.com/sanger-pathogens/pathogen-informatics-training/tree/master/Notebooks/SEROBA/data/run_seroba

Testing Conda

mamba create -n test-seroba -c conda-forge -c bioconda 'python>=3' setuptools 'kmc>=3.0' mummer bowtie2 cd-hit 'ariba>=2.9.1' 'pymummer>=0.10.2' 'biopython>=1.68,<1.78' 'pyyaml>=3.12' 'pyfastaq>=3.14.0' 'pysam>=0.15.3'
conda activate test-seroba
conda install LinuxArtifacts/packages/noarch/seroba-1.0.2-pyhdfd78af_1.tar.bz2

# Test --help
seroba runSerotyping --help
usage: seroba runSerotyping [options]  <read1> <read2> <prefix>

identify serotype of your input data

positional arguments:
  read1                 forward read file
  read2                 backward read file
  prefix                unique prefix

optional arguments:
  -h, --help            show this help message and exit

Other options:
  --databases DATABASES
                        path to database directory, default /home/robert_petit/miniconda3/envs/test-seroba/share/seroba-1.0.2/database
  --noclean             Do not clean up intermediate files (assemblies, ariba report)
  --coverage COVERAGE   threshold for k-mer coverage of the reference sequence , default = 20

# Test genome
mkdir conda
seroba runSerotyping ../sample1_1.fq.gz ../sample1_2.fq.gz test
Traceback (most recent call last):
  File "/home/robert_petit/miniconda3/envs/test-seroba/bin/seroba", line 88, in <module>
    args.func(args)
  File "/home/robert_petit/miniconda3/envs/test-seroba/lib/python3.9/site-packages/seroba/tasks/sero_run.py", line 13, in run
    sero = serotyping.Serotyping(options.databases,
  File "/home/robert_petit/miniconda3/envs/test-seroba/lib/python3.9/site-packages/seroba/serotyping.py", line 29, in __init__
    self.kmer_size = open(os.path.join(databases,'kmer_size.txt'),'r').readline().strip()
FileNotFoundError: [Errno 2] No such file or directory: '/home/robert_petit/miniconda3/envs/test-seroba/share/seroba-1.0.2/database/kmer_size.txt'

Forgot to build db! brb

@rpetit3
Copy link
Member Author

rpetit3 commented Jun 14, 2022

ran into issue with latest KMC version

/home/robert_petit/miniconda3/envs/test-seroba/bin/kmc -k71  -ci1 -m1 -t1 -fm /home/robert_petit/miniconda3/envs/test-seroba/share/seroba-1.0.2/database/kmer_db/alternative_aliB_NT/alternative_aliB_NT.fasta /home/robert_petit/miniconda3/envs/test-seroba/share/seroba-1.0.2/database/kmer_db/alternative_aliB_NT/alternative_aliB_NT /home/robert_petit/miniconda3/envs/test-seroba/share/seroba-1.0.2/database/kmer_db/alternative_aliB_NT
Error: Wrong parameret: min memory must be at least 2GB

@rpetit3
Copy link
Member Author

rpetit3 commented Jun 15, 2022

@BiocondaBot please fetch artifacts

@BiocondaBot
Copy link
Collaborator

Package(s) built on Azure are ready for inspection:

Arch Package Zip File
noarch seroba-1.0.2-pyhdfd78af_1.tar.bz2 LinuxArtifacts

You may also use conda to install these after downloading and extracting the appropriate zip file. From the LinuxArtifacts or OSXArtifacts directories:

conda install -c packages <package name>

Docker image(s) built (images are in the LinuxArtifacts zip file above):

Package Tag Install with docker
seroba 1.0.2--pyhdfd78af_1
showgzip -dc LinuxArtifacts/images/seroba:1.0.2--pyhdfd78af_1.tar.gz | docker load

@rpetit3
Copy link
Member Author

rpetit3 commented Jun 15, 2022

Round 2!

Conda

mamba clean -ay
mamba create -n test-seroba -c conda-forge -c bioconda 'python>=3' setuptools 'kmc>=3.0' mummer bowtie2 cd-hit 'ariba>=2.9.1' 'pymummer>=0.10.2' 'biopython>=1.68,<1.78' 'pyyaml>=3.12' 'pyfastaq>=3.14.0' 'pysam>=0.15.3'
conda activate test-seroba
conda install LinuxArtifacts/packages/noarch/seroba-1.0.2-pyhdfd78af_1.tar.bz2

seroba --help
usage: seroba <command> <options>

optional arguments:
  -h, --help     show this help message and exit

Available commands:

    getPneumocat
                 downloads genetic information from PneumoCat
    createDBs    creates Databases for kmc and ariba
    runSerotyping
                 indetify serotype of your input data
    summary      output folder has to contain all folders with prediction results
    version      Get versions and exit

seroba runSerotyping --help
usage: seroba runSerotyping [options]  <read1> <read2> <prefix>

identify serotype of your input data

positional arguments:
  read1                 forward read file
  read2                 backward read file
  prefix                unique prefix

optional arguments:
  -h, --help            show this help message and exit

Other options:
  --databases DATABASES
                        path to database directory, default /home/robert_petit/miniconda3/envs/test-seroba/share/seroba-1.0.2/database
  --noclean             Do not clean up intermediate files (assemblies, ariba report)
  --coverage COVERAGE   threshold for k-mer coverage of the reference sequence , default = 20

# Looks like everything is there
ls -lha /home/robert_petit/miniconda3/envs/test-seroba/share/seroba-1.0.2/database
total 1.7M
drwxr-xr-x   5 robert_petit robert_petit 4.0K Jun 15 00:50 .
drwxr-xr-x   3 robert_petit robert_petit 4.0K Jun 15 00:50 ..
drwxr-xr-x  61 robert_petit robert_petit 4.0K Jun 15 00:50 ariba_db
-rw-rw-r--   2 robert_petit robert_petit 1.1K Jun 15 00:02 cd_cluster.tsv
-rw-rw-r--   2 robert_petit robert_petit  405 Jun 15 00:02 cdhit_cluster
drwxr-xr-x 104 robert_petit robert_petit 4.0K Jun 15 00:50 kmer_db
-rw-rw-r--   2 robert_petit robert_petit    2 Jun 15 00:04 kmer_size.txt
-rw-rw-r--   2 robert_petit robert_petit  12K Jun 15 00:02 meta.tsv
-rw-rw-r--   2 robert_petit robert_petit 1.7M Jun 15 00:02 reference.fasta
drwxr-xr-x  22 robert_petit robert_petit 4.0K Jun 15 00:50 streptococcus-pneumoniae-ctvdb

# Test Seroba
seroba runSerotyping ../sample1_1.fq.gz ../sample1_2.fq.gz test
 -ci4  -m2 -t1
/home/robert_petit/miniconda3/envs/test-seroba/bin/kmc -k71  -ci4  -m2 -t1 ../sample1_1.fq.gz /home/robert_petit/temp/test-seroba/conda/temp.kmcncc7xy5i/test /home/robert_petit/temp/test-seroba/conda/temp.kmcncc7xy5i
**
Stage 1: 100%
Stage 2: 100%
1st stage: 0.135496s
2nd stage: 0.087368s
Total    : 0.222864s
Tmp size : 0MB

Stats:
   No. of k-mers below min. threshold :        23440
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :        36140
   No. of unique counted k-mers       :        12700
   Total no. of k-mers                :       383225
   Total no. of reads                 :        12789
   Total no. of super-k-mers          :        24178
   
.. TRUNCATED ...
   
08
cluster detected 1 threads available to it
mpileup: invalid option -- 't'
Failed cluster: cluster
Traceback (most recent call last):
  File "/home/robert_petit/miniconda3/envs/test-seroba/lib/python3.9/site-packages/ariba/clusters.py", line 612, in run
    self._run()
  File "/home/robert_petit/miniconda3/envs/test-seroba/lib/python3.9/site-packages/ariba/clusters.py", line 646, in _run
    raise Error('At least one cluster failed! Stopping...')
ariba.clusters.Error: At least one cluster failed! Stopping...


# the most recent version of bcftools  and pysam not playing nice with Ariba
mamba install -c conda-forge -c bioconda 'bcftools<=1.14' 'pysam<=0.18.0'

# Try rerun
seroba runSerotyping ../sample1_1.fq.gz ../sample1_2.fq.gz test
 -ci4  -m2 -t1
/home/robert_petit/miniconda3/envs/test-seroba/bin/kmc -k71  -ci4  -m2 -t1 ../sample1_1.fq.gz /home/robert_petit/temp/test-seroba/conda/temp.kmc1c222wyi/test /home/robert_petit/temp/test-seroba/conda/temp.kmc1c222wyi
**
Stage 1: 100%
Stage 2: 100%
1st stage: 0.134339s
2nd stage: 0.089949s
Total    : 0.224288s
Tmp size : 0MB

Stats:
   No. of k-mers below min. threshold :        23440
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :        36140
   No. of unique counted k-mers       :        12700
   Total no. of k-mers                :       383225
   Total no. of reads                 :        12789
   Total no. of super-k-mers          :        24178
/home/robert_petit/miniconda3/envs/test-seroba/bin/kmc_tools simple /home/robert_petit/temp/test-seroba/conda/temp.kmc1c222wyi/test /home/robert_petit/miniconda3/envs/test-seroba/share/seroba-1.0.2/database/kmer_db/13/13 intersect /home/robert_petit/temp/test-seroba/conda/temp.kmc1c222wyi/inter
in1: 100% in2: 100%
in1: 100%
0.01688057153695391

... TRUNCATED ...

0.11736472201588481
/home/robert_petit/miniconda3/envs/test-seroba/bin/kmc_tools simple /home/robert_petit/temp/test-seroba/conda/temp.kmc1c222wyi/test /home/robert_petit/miniconda3/envs/test-seroba/share/seroba-1.0.2/database/kmer_db/alternative_aliB_NT/alternative_aliB_NT intersect /home/robert_petit/temp/test-seroba/conda/temp.kmc1c222wyi/inter
in1: 100% in2: 100%
in1: 100%
0.0
08
cluster detected 1 threads available to it
cluster reported completion

# print prediction
cat test/pred.tsv
test    08      contamination

# try another one!
cat test7/pred.tsv
test7   22A     contamination

cat test7/detailed_serogroup_info.txt
Predicted Serotype:     22A
Serotype predicted by ariba     :22A
assembly from ariba as an identiy of:   99.41    with this serotype
Serotype         Genetic Variant
22A     genes   wcwC
22A     genes   wcwA
22F     genes   wcwC
22F     genes   wcwA

Super close, just need to fix the bcftools/pysam issue

@rpetit3
Copy link
Member Author

rpetit3 commented Jun 15, 2022

@BiocondaBot please fetch artifacts

@BiocondaBot
Copy link
Collaborator

Package(s) built on Azure are ready for inspection:

Arch Package Zip File
noarch seroba-1.0.2-pyhdfd78af_1.tar.bz2 LinuxArtifacts

You may also use conda to install these after downloading and extracting the appropriate zip file. From the LinuxArtifacts or OSXArtifacts directories:

conda install -c packages <package name>

Docker image(s) built (images are in the LinuxArtifacts zip file above):

Package Tag Install with docker
seroba 1.0.2--pyhdfd78af_1
showgzip -dc LinuxArtifacts/images/seroba:1.0.2--pyhdfd78af_1.tar.gz | docker load

@rpetit3
Copy link
Member Author

rpetit3 commented Jun 15, 2022

Round 3

Conda

mamba create -n test-seroba -c conda-forge -c bioconda 'python>=3' setuptools 'kmc>=3.0' mummer bowtie2 cd-hit 'ariba>=2.9.1' 'pymummer>=0.10.2' 'biopython>=1.68,<1.78' 'pyyaml>=3.12' 'pyfastaq>=3.14.0' 'pysam>=0.15.3,<=0.18.0' 'bcftools<=1.14'
conda activate test-seroba
conda install LinuxArtifacts/packages/noarch/seroba-1.0.2-pyhdfd78af_1.tar.bz2

# Skipping the helps and straight to runs
mkdir conda
cd conda

seroba runSerotyping ../sample1_1.fq.gz ../sample1_2.fq.gz test1
 -ci4  -m2 -t1
/home/robert_petit/miniconda3/envs/test-seroba/bin/kmc -k71  -ci4  -m2 -t1 ../sample1_1.fq.gz /home/robert_petit/temp/test-seroba/conda/temp.kmc_yeiukmo/test1 /home/robert_petit/temp/test-seroba/conda/temp.kmc_yeiukmo
**
Stage 1: 100%
Stage 2: 100%
... TRUNCATED ...
cluster detected 1 threads available to it
cluster reported completion

cat test1/pred.tsv
test1   08      contamination

seroba runSerotyping ../sample7_1.fq.gz ../sample7_2.fq.gz test7
 -ci4  -m2 -t1
/home/robert_petit/miniconda3/envs/test-seroba/bin/kmc -k71  -ci4  -m2 -t1 ../sample7_1.fq.gz /home/robert_petit/temp/test-seroba/conda/temp.kmcxqlyj8re/test7 /home/robert_petit/temp/test-seroba/conda/temp.kmcxqlyj8re
***
Stage 1: 100%
... TRUNCATED ...
cluster_1 reported completion
{'22A': 0, '22F': 8}
{'22A': 0, '22F': 8}
{'22A': {'genes': ['wcwC', 'wcwA'], 'pseudo': [], 'allele': [], 'snps': []}, '22F': {'genes': ['wcwC', 'wcwA'], 'pseudo': [], 'allele': [], 'snps': []}}
['22A']
{'22A': 0, '22F': 8}
{'22A': {'genes': ['wcwC', 'wcwA'], 'pseudo': [], 'allele': [], 'snps': []}, '22F': {'genes': ['wcwC', 'wcwA'], 'pseudo': [], 'allele': [], 'snps': []}}

cat test7/detailed_serogroup_info.txt
Predicted Serotype:     22A
Serotype predicted by ariba     :22A
assembly from ariba as an identiy of:   99.41    with this serotype
Serotype         Genetic Variant
22A     genes   wcwC
22A     genes   wcwA
22F     genes   wcwC
22F     genes   wcwA

# verify --database is working with bad path
seroba runSerotyping ../sample7_1.fq.gz ../sample7_2.fq.gz test7 --database /dev/null
Traceback (most recent call last):
  File "/home/robert_petit/miniconda3/envs/test-seroba/bin/seroba", line 88, in <module>
    args.func(args)
  File "/home/robert_petit/miniconda3/envs/test-seroba/lib/python3.9/site-packages/seroba/tasks/sero_run.py", line 13, in run
    sero = serotyping.Serotyping(options.databases,
  File "/home/robert_petit/miniconda3/envs/test-seroba/lib/python3.9/site-packages/seroba/serotyping.py", line 29, in __init__
    self.kmer_size = open(os.path.join(databases,'kmer_size.txt'),'r').readline().strip()
NotADirectoryError: [Errno 20] Not a directory: '/dev/null/kmer_size.txt'

# verify --database is working with good path
seroba runSerotyping ../sample7_1.fq.gz ../sample7_2.fq.gz test7 --database /home/robert_petit/miniconda3/envs/test-seroba/share/seroba-1.0.2/database
 -ci4  -m2 -t1
/home/robert_petit/miniconda3/envs/test-seroba/bin/kmc -k71  -ci4  -m2 -t1 ../sample7_1.fq.gz /home/robert_petit/temp/test-seroba/conda/temp.kmcj9sy47a4/test7 /home/robert_petit/temp/test-seroba/conda/temp.kmcj9sy47a4
***
Stage 1: 100%
Stage 2: 100%
... TRUNCATED ...

# check summary
seroba summary ./
cat summary.tsv
test1   08      contamination
test7   22A     contamination

Conda looks good, test docker now

Docker

gzip -dc LinuxArtifacts/images/seroba:1.0.2--pyhdfd78af_1.tar.gz | docker load

docker run --rm -u $(id -u):$(id -g) -v ${PWD}:/data quay.io/biocontainers/seroba:1.0.2--pyhdfd78af_1 seroba --help
usage: seroba <command> <options>

optional arguments:
  -h, --help     show this help message and exit

Available commands:

    getPneumocat
                 downloads genetic information from PneumoCat
    createDBs    creates Databases for kmc and ariba
    runSerotyping
                 indetify serotype of your input data
    summary      output folder has to contain all folders with prediction
                 results
    version      Get versions and exit

# Run a genome
docker run --rm -u $(id -u):$(id -g) -v ${PWD}:/data quay.io/biocontainers/seroba:1.0.2--pyhdfd78af_1 seroba runSerotyping /data/sample7_1.fq.gz /data/sample7_2.fq.gz /data/test7
Traceback (most recent call last):
  File "/usr/local/bin/seroba", line 88, in <module>
    args.func(args)
  File "/usr/local/lib/python3.8/site-packages/seroba/tasks/sero_run.py", line 19, in run
    sero.run()
  File "/usr/local/lib/python3.8/site-packages/seroba/serotyping.py", line 468, in run
    self._run_kmc()
  File "/usr/local/lib/python3.8/site-packages/seroba/serotyping.py", line 55, in _run_kmc
    temp_dir = tempfile.mkdtemp(prefix = 'temp.kmc', dir=os.getcwd())
  File "/usr/local/lib/python3.8/tempfile.py", line 358, in mkdtemp
    _os.mkdir(file, 0o700)
PermissionError: [Errno 13] Permission denied: '/temp.kmco4ebdaal'

Looks like seroba creates a temp tile at the entry point

@rpetit3
Copy link
Member Author

rpetit3 commented Jun 15, 2022

@BiocondaBot please fetch artifacts

@BiocondaBot
Copy link
Collaborator

Package(s) built on Azure are ready for inspection:

Arch Package Zip File
noarch seroba-1.0.2-pyhdfd78af_1.tar.bz2 LinuxArtifacts

You may also use conda to install these after downloading and extracting the appropriate zip file. From the LinuxArtifacts or OSXArtifacts directories:

conda install -c packages <package name>

Docker image(s) built (images are in the LinuxArtifacts zip file above):

Package Tag Install with docker
seroba 1.0.2--pyhdfd78af_1
showgzip -dc LinuxArtifacts/images/seroba:1.0.2--pyhdfd78af_1.tar.gz | docker load

@rpetit3 rpetit3 mentioned this pull request Jun 15, 2022
@rpetit3
Copy link
Member Author

rpetit3 commented Jun 15, 2022

Round 4

Conda

mamba clean -ay
mamba create -n test-seroba -c conda-forge -c bioconda 'python>=3' setuptools 'kmc>=3.0' mummer bowtie2 cd-hit 'ariba>=2.9.1' 'pymummer>=0.10.2' 'biopython>=1.68,<1.78' 'pyyaml>=3.12' 'pyfastaq>=3.14.0' 'pysam>=0.15.3,<=0.18.0' 'bcftools<=1.14'
conda activate test-seroba
conda install LinuxArtifacts/packages/noarch/seroba-1.0.2-pyhdfd78af_1.tar.bz2

# Straight to running genomes
seroba runSerotyping ../sample7_1.fq.gz ../sample7_2.fq.gz test7

cat test7/detailed_serogroup_info.txt
Predicted Serotype:     22A
Serotype predicted by ariba     :22A
assembly from ariba as an identiy of:   99.41    with this serotype
Serotype         Genetic Variant
22A     genes   wcwC
22A     genes   wcwA
22F     genes   wcwC
22F     genes   wcwA

Conda looks good still

Docker

gzip -dc LinuxArtifacts/images/seroba:1.0.2--pyhdfd78af_1.tar.gz | docker load
docker run --rm -u $(id -u):$(id -g) -v ${PWD}:/data quay.io/biocontainers/seroba:1.0.2--pyhdfd78af_1 seroba runSerotyping /data/sample7_1.fq.gz /data/sample7_2.fq.gz /data/test7
***
Stage 1: 100%
Stage 2: 100%
1st stage: 0.15856s
2nd stage: 0.13474s
... TRUNCATED ...
/usr/local/bin/kmc_tools simple /data/test7 /usr/local/share/seroba-1.0.2/database/kmer_db/alternative_aliB_NT/alternative_aliB_NT intersect /data/test7/temp.kmc5xonr7ly/inter
0.001181035937236376
22A
Traceback (most recent call last):
  File "/usr/local/bin/seroba", line 88, in <module>
    args.func(args)
  File "/usr/local/lib/python3.8/site-packages/seroba/tasks/sero_run.py", line 19, in run
    sero.run()
  File "/usr/local/lib/python3.8/site-packages/seroba/serotyping.py", line 481, in run
    self._prediction(assemblie_file,cluster)
  File "/usr/local/lib/python3.8/site-packages/seroba/serotyping.py", line 452, in _prediction
    self.sero, self.imp = Serotyping._find_serotype(assemblie_file,serogroup_fasta,self.meta_data_dict[serogroup],\
  File "/usr/local/lib/python3.8/site-packages/seroba/serotyping.py", line 261, in _find_serotype
    pymummer.nucmer.Runner(
  File "/usr/local/lib/python3.8/site-packages/pymummer/nucmer.py", line 139, in run
    tmpdir = tempfile.mkdtemp(prefix='tmp.run_nucmer.', dir=os.getcwd())
  File "/usr/local/lib/python3.8/tempfile.py", line 358, in mkdtemp
    _os.mkdir(file, 0o700)
PermissionError: [Errno 13] Permission denied: '/tmp.run_nucmer.727hfrey'

Bummer, at this point this extends outside the scope of Seroba, as now pymummer is making the temp directory

I've submitted a PR to fix the pymummer recipe: #35379

@rpetit3
Copy link
Member Author

rpetit3 commented Jun 15, 2022

Submitted upstream PR to help fix the temp dir issue: sanger-pathogens/seroba#68 Unsure of a timeline of review for this PR

@rpetit3
Copy link
Member Author

rpetit3 commented Jun 15, 2022

Have to update Ariba dependencies to get this working #35383

@rpetit3
Copy link
Member Author

rpetit3 commented Jun 15, 2022

@BiocondaBot please fetch artifacts

@BiocondaBot
Copy link
Collaborator

Package(s) built on Azure are ready for inspection:

Arch Package Zip File
noarch seroba-1.0.2-pyhdfd78af_1.tar.bz2 LinuxArtifacts

You may also use conda to install these after downloading and extracting the appropriate zip file. From the LinuxArtifacts or OSXArtifacts directories:

conda install -c packages <package name>

Docker image(s) built (images are in the LinuxArtifacts zip file above):

Package Tag Install with docker
seroba 1.0.2--pyhdfd78af_1
showgzip -dc LinuxArtifacts/images/seroba:1.0.2--pyhdfd78af_1.tar.gz | docker load

@rpetit3
Copy link
Member Author

rpetit3 commented Jun 15, 2022

Round 5

Conda

mamba create -n test-seroba -c conda-forge -c bioconda 'python>=3' setuptools 'kmc>=3.0' mummer bowtie2 cd-hit 'ariba>=2.9.1' 'pymummer>=0.11.0' 'biopython>=1.68,<1.78' 'pyyaml>=3.12' 'pyfastaq>=3.14.0' 'pysam>=0.15.3,<=0.18.0' 'bcftools<=1.14'
conda activate test-seroba
conda install LinuxArtifacts/packages/noarch/seroba-1.0.2-pyhdfd78af_1.tar.bz2

# Straight to running genomes
seroba runSerotyping ../sample7_1.fq.gz ../sample7_2.fq.gz test7

# Output
cat test7/detailed_serogroup_info.txt
Predicted Serotype:     22A
Serotype predicted by ariba     :22A
assembly from ariba as an identiy of:   99.41    with this serotype
Serotype         Genetic Variant
22A     genes   wcwC
22A     genes   wcwA
22F     genes   wcwC
22F     genes   wcwA

Now the test....

Docker

gzip -dc LinuxArtifacts/images/seroba:1.0.2--pyhdfd78af_1.tar.gz | docker load
docker run --rm -u $(id -u):$(id -g) -v ${PWD}:/data quay.io/biocontainers/seroba:1.0.2--pyhdfd78af_1 seroba runSerotyping /data/sample7_1.fq.gz /data/sample7_2.fq.gz /data/test-docker
Stage 1: 100%
Stage 2: 100%
1st stage: 0.156408s
2nd stage: 0.136832s
Total    : 0.29324s
... TRUNCATED ...
0.001181035937236376
22A
{'22A': 0, '22F': 8}
{'22A': 0, '22F': 8}
{'22A': {'genes': ['wcwC', 'wcwA'], 'pseudo': [], 'allele': [], 'snps': []}, '22F': {'genes': ['wcwC', 'wcwA'], 'pseudo': [], 'allele': [], 'snps': []}}
['22A']
{'22A': 0, '22F': 8}
{'22A': {'genes': ['wcwC', 'wcwA'], 'pseudo': [], 'allele': [], 'snps': []}, '22F': {'genes': ['wcwC', 'wcwA'], 'pseudo': [], 'allele': [], 'snps': []}}

# SUCCESS!!!!
cat test-docker/detailed_serogroup_info.txt
Predicted Serotype:     22A
Serotype predicted by ariba     :22A
assembly from ariba as an identiy of:   99.41    with this serotype
Serotype         Genetic Variant
22A     genes   wcwC
22A     genes   wcwA
22F     genes   wcwC
22F     genes   wcwA

Last test, docker with --entrypoint to play nicely with workflow managers (e.g. Nextflow)

Docker with --entrypoint

docker run --rm -u $(id -u):$(id -g) -v ${PWD}:/data --entrypoint /bin/bash quay.io/biocontainers/seroba:1.0.2--pyhdfd78af_1 -c 'seroba runSerotyping /data/sample7_1.fq.gz /data/s
ample7_2.fq.gz /data/test-docker-entrypoint'
Stage 1: 100%
Stage 2: 100%
1st stage: 0.17872s
...TRUNCATED...
22A
{'22A': 0, '22F': 8}
{'22A': 0, '22F': 8}
{'22A': {'genes': ['wcwC', 'wcwA'], 'pseudo': [], 'allele': [], 'snps': []}, '22F': {'genes': ['wcwC', 'wcwA'], 'pseudo': [], 'allele': [], 'snps': []}}
['22A']
{'22A': 0, '22F': 8}
{'22A': {'genes': ['wcwC', 'wcwA'], 'pseudo': [], 'allele': [], 'snps': []}, '22F': {'genes': ['wcwC', 'wcwA'], 'pseudo': [], 'allele': [], 'snps': []}}

# Check results
cat test-docker-entrypoint/detailed_serogroup_info.txt
Predicted Serotype:     22A
Serotype predicted by ariba     :22A
assembly from ariba as an identiy of:   99.41    with this serotype
Serotype         Genetic Variant
22A     genes   wcwC
22A     genes   wcwA
22F     genes   wcwC
22F     genes   wcwA

Verify outputs are all the same

find . -name "*detailed*" | xargs -I {} md5sum {}
bb5094dfec091a90f2b80d7d2a868c5b  ./conda/test7/detailed_serogroup_info.txt
bb5094dfec091a90f2b80d7d2a868c5b  ./docker/test-docker/detailed_serogroup_info.txt
bb5094dfec091a90f2b80d7d2a868c5b  ./docker-entrypoint/test-docker-entrypoint/detailed_serogroup_info.txt

All good! This recipe is ready to place nicely with everyone!

@rpetit3
Copy link
Member Author

rpetit3 commented Jun 15, 2022

@BiocondaBot please add label

Copy link
Member

@apeltzer apeltzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, also liking that you've documented your attempts in fixing things :-)

@apeltzer
Copy link
Member

@BiocondaBot please merge

@BiocondaBot
Copy link
Collaborator

Sorry, this PR cannot be merged at this time.

@rpetit3
Copy link
Member Author

rpetit3 commented Jun 17, 2022

@BiocondaBot please merge

@BiocondaBot
Copy link
Collaborator

I will attempt to upload artifacts and merge this PR. This may take some time, please have patience.

@BiocondaBot BiocondaBot merged commit e03b961 into master Jun 17, 2022
@BiocondaBot BiocondaBot deleted the rp3-update-seroba branch June 17, 2022 14:06
@rpetit3
Copy link
Member Author

rpetit3 commented Jun 17, 2022

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
please review & merge set to ask for merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants