GitHub - deprekate/goodorfs: Code to classify open reading frames into coding and noncoding

Introduction

GOODORFS is a tool to classify open reading frames into coding and noncoding

It takes as input a fasta file representing the entire genome. It then finds all potential open reading frames, and then for each, calculates the Energy Density Profile from the amino acid frequency.

To install GOODORFS,

pip3 install goodorfs

or

git clone https://github.com/deprekate/goodorfs.git
cd goodorfs
python3 setup.py install

To run GOODORFS simply provide the path to a fasta file. The default output is the same format as Glimmers LONGORFS program, in order to serve as a drop in replacement. The columns are: orf_id, start_location, stop_location, frame, a bunch of zeros as filler

$ goodorfs.py tests/NC_001416.fna
00001     191     736  +2   0.000
00002     711    2636  +3   0.000
00003    2633    2839  +2   0.000
00004    3270    2830  -1   0.000
00005    2836    4437  +1   0.000
00006    5095    4604  -2   0.000
00007    4283    5737  +2   0.000
...

Additionally GOODORFS can also output the nucleotide sequences in fasta format for use in other applications:

$ good-orfs.py -Y fna tests/NC_001416.fna | head
>NC_001416_orf1 [START=191] [STOP=736]
ATGGAAGTCAACAAAAAGCAGCTGGCTGACATTTTCGGTGCGAGTATCCGTACCATTCA...
>NC_001416_orf2 [START=711] [STOP=2636]
GTGAATATATCGAACAGTCAGGTTAACAGGCTGCGGCATTTTGTCCGCGCCGGGCTTCG...
>NC_001416_orf3 [START=2633] [STOP=2839]
ATGACGCGACAGGAAGAACTTGCCGCTGCCCGTGCGGCACTGCATGACCTGATGACAGG...
>NC_001416_orf4 [START=3270] [STOP=2830]
GTGCATGGCCACACCTTCCCGAATCATCATGGTAAACGTGCGTTTTCGCTCAACGTCAA...
...

We have started testing GOODORFS to run on metagenomes. All that is needed is to bin reads according to their GC content and then run the bins through GOODROFS in batches in order to predict gene fragments within the reads.

We have added a script to group the reads according to gc content. It prints out batches of 500 reads separated by the null terminator character, which allows commands to be chained to xargs. To run GOODORFS on the supplied sample metagenome (which is in FASTA file format), run the command:

python3 scripts/bin_reads.py tests/ERR5004783_part.fasta | xargs -0 -I {} sh -c "echo '{}' | ./goodorfs.py -Y fna"

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
genomes		genomes
goodorfs		goodorfs
patches		patches
scripts		scripts
src		src
tests		tests
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
goodorfs.py		goodorfs.py
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

About

Releases

Packages

Languages

License

deprekate/goodorfs

Folders and files

Latest commit

History

Repository files navigation

Introduction

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages