Convert sequence IDs between ucsc/refseq/genbank
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
GFF3 formats.png
SquidStream workflow.png

Coverage Status Build Status

Squidstream (an amazing tool that does wonderful stuff)—Documentation

Generic Feature Format version 3 (GFF3) is a file type that is commonly used in bioinformatic applications. Different institutions have varying naming conventions for the genomic identifier column in the GFF3 format. Therefore, there can be GFF3 files that use different seqids for the same genomic feature. In addition, there are other file formats that also have sequence identifiers, such as GTF, BED, SAM, and BAM files. Squidstream is an easy-to-use command line tool that can convert the genomic feature reference name for chromosomes, scaffolds, and contigs in different file formats to the corresponding seqid from NCBI’s RefSeq database. GFF3 files are a common input into many different types of bioinformatics tools and pipelines, and Squidstream provides naming consistency in these input files by converting the sequence feature IDs in the entire file to the desired ID format using a single command.

Squidstream Workflow: Figure 1. Examples of NCBI, UCSC, and RefSeq GFF3 files.

Sequence Identifier Conversion Examples:

  • Annotation with RefSeq ID to UCSC ID for use in UCSC Genome Browser tracks
  • Convert to NCBI ID to search KEGG GENES Database
  • RefSeq to Genbank ID

Squidstream was built in Python and runs from the command line. Users provide the input file, the specific reference genome, and the desired name of the output file.

A summary of the seqconv commands is provided below.

Command Description
convert Converts sequence IDs

Links to file format descriptions: GFF3, SAM, BED, GFF/GTF



python install


python install

Squidstream Workflow:

Figure 2. Squidstream workflow.