Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent gene calls between Pyrodigal-2.0.2 and Prodigal-2.6.3+31b300a #26

Closed
aaronmussig opened this issue Dec 19, 2022 · 6 comments
Labels
bug Something isn't working

Comments

@aaronmussig
Copy link

Hello,

I just wanted to say I appreciate the port of Prodigal, it's great work!

There is a case where running GCA_900004415.1 gives different results when running Pyrodigal-2.0.2 compared to Prodigal-2.6.3 and Prodigal-2.6.3+31b300a.

Notably, there are quite a few differences where the gc_cont value differs slightly, there are also a few cases where genes differ (content and quantity).

Below are the commands that I ran (output attached):

# Set the environment

mkdir -p /tmp/prodigal
ls /tmp/prodigal
# GCA_900004415.1_BBA-B_PRJEB8795_v1_genomic.fna


# ---------------------------------------------------

# Create the prodigal-2.6.3 environment
mamba create -n prodigal-2.6.3
mamba activate prodigal-2.6.3
mamba install -c bioconda prodigal==2.6.3

# Run Prodigal
mkdir -p /tmp/prodigal/prodigal-2.6.3
cd /tmp/prodigal/prodigal-2.6.3
prodigal -i /tmp/prodigal/GCA_900004415.1_BBA-B_PRJEB8795_v1_genomic.fna -m -g 11 -o features.gff -a amino.faa -d nuc.fna

# Get checksums
md5sum *
# f1abac1dbe74388093e56550625d943a  amino.faa
# 44be68f0f21f9db116b853a162800b6c  features.gff
# 587802c6eae01eabd23d27d9c6f72f59  nuc.fna


# ---------------------------------------------------


# Create the prodigal-2.6.3 environment (with patch)
mamba create -n prodigal-2.6.3+31b300a
mamba activate prodigal-2.6.3+31b300a

# Fetch the Git repository
mkdir -p ~/mambaforge/envs/prodigal-2.6.3+31b300a/share/prodigal-2.6.3+31b300a
cd ~/mambaforge/envs/prodigal-2.6.3+31b300a/share/prodigal-2.6.3+31b300a
git clone https://github.com/hyattpd/Prodigal.git
cd Prodigal
git rev-parse HEAD 
# 31b300a99a39964893057128ea10338e9a26bd6c

# Make the install
mkdir -p ~/mambaforge/envs/prodigal-2.6.3+31b300a/bin
make install INSTALLDIR=~/mambaforge/envs/prodigal-2.6.3+31b300a/bin

# Run Prodigal
mkdir -p /tmp/prodigal/prodigal-2.6.3+31b300a
cd /tmp/prodigal/prodigal-2.6.3+31b300a
prodigal -i /tmp/prodigal/GCA_900004415.1_BBA-B_PRJEB8795_v1_genomic.fna -m -g 11 -o features.gff -a amino.faa -d nuc.fna

# Get checksums
md5sum *
# f1abac1dbe74388093e56550625d943a  amino.faa
# 44be68f0f21f9db116b853a162800b6c  features.gff
# 587802c6eae01eabd23d27d9c6f72f59  nuc.fna


# ---------------------------------------------------


# Create the pyrodigal environment
mamba create -n pyrodigal-2.0.2
mamba activate pyrodigal-2.0.2
mamba install -c bioconda pyrodigal==2.0.2

# Run Prodigal
mkdir -p /tmp/prodigal/pyrodigal
cd /tmp/prodigal/pyrodigal
pyrodigal -i /tmp/prodigal/GCA_900004415.1_BBA-B_PRJEB8795_v1_genomic.fna -m -g 11 -o features.gff -a amino.faa -d nuc.fna

# Get checksums
md5sum *
# 6fd912cdf58f2442c11ddfb810e20c4e  amino.faa
# b3b1694dd07dd9b94cede4d446cafcfb  features.gff
# bfc670a857f6fe5a138dd356f5b4e424  nuc.fna
@althonos
Copy link
Owner

althonos commented Dec 20, 2022

Hi @aaronmussig , thanks for using Pyrodigal!

I'll have a look at it. I think the bug is coming from region masking (-m), since I'm not seeing a difference in predicted genes when testing with region masking disabled. Note that comparing the MD5 sums is not the proper way to check result consistency because the headers may be different, because of rounding issues or this kind of thing. You can use the comparison repository to make sure the gene coordinates match exactly between the two.

@althonos althonos added the bug Something isn't working label Dec 20, 2022
@althonos
Copy link
Owner

Okay, I think found the discrepancy between Pyrodigal and Prodigal: Prodigal only masks regions of 50 N or more, whereas Prodigal masks any N when masking is enabled. I'll make a patch and see if that addresses the problem.

@althonos
Copy link
Owner

$ python compare.py --prodigal prodigal --genome GCA_900004415.fna -c -m
Genomes closed=True masked=True
Hits genome=GCA_900004415: prodigal=6693, pyrodigal=6693, equal=True

Looks like that was it, I'll make a new release.

@althonos
Copy link
Owner

Fixed in v2.0.3.

@aaronmussig
Copy link
Author

You're quick! Thanks for looking into this

@althonos
Copy link
Owner

I'd rather not leave this kind of bugs around before going on holidays, ahah 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants