Skip to content
forked from wang-q/withncbi

egaz and alignDB work with external (NCBI/EBI) data

Notifications You must be signed in to change notification settings

IvanWoo22/withncbi

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Table of Contents

Name

withncbi - egaz and alignDB work with external (NCBI/EBI) data.

Purpose

Fetch sequences, generate reports and build alignments according to various NCBI databases.

For more detailed, check README.md in each sub-directories.

Directory organization

  • db/: turn NCBI genome reports and assembly reports into a query-able MySQL database.

  • ensembl/: Ensembl related scripts.

  • misc/: miscellaneous projects.

  • pop/: build alignments on an whole Eukaryotes genus.

  • taxon/: process (small) genomes according to NCBI Taxonomy.

  • util/: miscellaneous utilities.

Conventions

fasta

  • .fa - genomic sequences
  • .fas - blocked fasta files
  • .fasta - normal/miscellaneous fasta files

fastq

Use .fq over .fastq

Concepts

IntSpans

An IntSpan represents sets of integers as a number of inclusive ranges, for example '1-10,19,45-48'.

The following picture is the schema of an IntSpan object. Jump lines are above the baseline; loop lines are below it.

intspans

AlignDB::IntSpan and jintspan are implements of IntSpan objects in Perl and Java, respectively.

Positions

Examples in S288c.txt

I:1-100
I(+):90-150
S288c.I(-):190-200
II:21294-22075
II:23537-24097

positions

Simple rules:

  • chromosome and start are required
  • species, strand and end are optional
  • . to separate species and chromosome
  • strand is one of + and - and surround by round brackets
  • : to separate names and digits
  • - to separate start and end
  • names should be alphanumeric and without spaces
species.chromosome(strand):start-end
--------^^^^^^^^^^--------^^^^^^----

Runlists in YAML

App::RL

jrunlist

Blocked fasta files

Examples in example.fas

>S288c.I(+):13267-13287|species=S288c
TCGTCAGTTGGTTGACCATTA
>YJM789.gi_151941327(-):5668-5688|species=YJM789
TCGTCAGTTGGTTGACCATTA
>RM11.gi_61385832(-):5590-5610|species=RM11
TCGTCAGTTGGTTGACCATTA
>Spar.gi_29362400(+):2477-2497|species=Spar
TCATCAGTTGGCAAACCGTTA

blocked-fasta-files

App::Fasops

Ranges and links of ranges

App::Rangeops

jrange

Author

Qiang Wang <wang-q@outlook.com>

Copyright and license

This software is copyright (c) 2015 by Qiang Wang.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

About

egaz and alignDB work with external (NCBI/EBI) data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Perl 83.6%
  • Shell 13.3%
  • Other 3.1%