Skip to content
This repository has been archived by the owner on Apr 24, 2018. It is now read-only.
/ I5KNAL_OGS Public archive
forked from taiwaness/I5KNAL_OGS

NOTE: This repo is no longer maintained, please refer to https://github.com/NAL-i5K/GFF3toolkit/. This project is to develop python tools for generating official gene set (OGS) by integrating manually curated and predicted gene annotations (GFF3 format). There are two phases involved: (1) QC phase and (2) Merge phase.

License

Notifications You must be signed in to change notification settings

NAL-i5K/I5KNAL_OGS

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QC and OGS generation pipeline by I5K Workspace@NAL

This project is to develop python tools for generating official gene set (OGS) by integrating manually curated and predicted gene annotations (GFF3 format). There are two phases involved: (1) QC phase and (2) Merge phase. A prototype of the whole pipeline has been done by I5K Workspace@NAL team. However, the source codes of the prototype program is not release for public, because it incorporated several components written by programming languages other than Python. Therefore, this project will re-implement those non-python components, and expects to deliver a complete python package for OGS generation. If you have urgent needs for OGS generation, you can send queries to I5K [at] ars.usda.gov. The i5k team can help to host your data, and apply OGS generation pipeline on your data for you.

Wiki page for QC and OGS generation pipeline by I5K Workspace@NAL

__develop__/

Tools under development.

  • example_file/
    • Example files for testing
  • function4gff/
    • Functions for dealing with gff3
  • inter_model/
    • QC functions for processing multiple features between models (inter-model) in GFF3 file.
  • intra_model/
    • QC functions for processing multiple features within a model (intra-model) in GFF3 file.
  • single_feature/
    • QC functions for processing every single feature in GFF3 file.
  • template/
    • Template script for development

bin/

General script for running through different phases of the OGS pipeline.

  • gff-QC.py
    • Detection of GFF format errors (~50 types of errors. Details can be found in wiki page)

lib/

Completed tools would be shown as under a specific directory. Tools under development would be shown as a Symbolic link.

  • gff3_modified/
    • Basic data structure used for nesting the information of genome annotations in GFF3 format.
  • gff3_to_fasta/
    • Extract specific sequeces from genome sequences accroding to gff file.

About

NOTE: This repo is no longer maintained, please refer to https://github.com/NAL-i5K/GFF3toolkit/. This project is to develop python tools for generating official gene set (OGS) by integrating manually curated and predicted gene annotations (GFF3 format). There are two phases involved: (1) QC phase and (2) Merge phase.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%