Visualizing-genome-annotations

D3 JavaScript interactive visualization of genome feature (gff) files

This introduces a framework to create an interative and informative visualization of a genome feature file(GFF).

General Feature Format (GFF) also known as Gene-Finding Format is a file format which describes the features of genomic and protein sequences. A GFF file is a tab delimited text file where each feature is described on a single line.

More information about GFF format can be found at Wellcome Trust Sanger Institute.

e.g. for maize, the GFF file I used looks like -

9	ensembl	chromosome	1	156750706	.	.	.	ID=9;Name=chromosome:AGPv2:9:1:156750706:1
9	ensembl	gene	66347	68582	.	-	.	ID=GRMZM2G354611;Name=GRMZM2G354611;biotype=protein_coding
9	ensembl	mRNA	66347	68582	.	-	.	ID=GRMZM2G354611_T01;Parent=GRMZM2G354611;Name=GRMZM2G354611_T01;biotype=protein_coding
9	ensembl	intron	68433	68561	.	-	.	Parent=GRMZM2G354611_T01;Name=intron.1

This uses JavaScript framework D3.js.

Usage:

We need to create a CSV file containing the relevent information. This uses a python script to parse GFF file. This is taken from GFF Parser. We have written a script main.py which will create a CSV file named parentChild.csv. Choose the chromosome and the segment of the chromosome to explore. e.g -
```
  python main.py begin_segment end_segment chromsome gff
  python main.py 5000 50000 chromosomeI sample1.gff
```
Launch the gffTree.html into any browser. This will contain the tree visualization.
In short, the main script here is main.py. After running this as above, we can directly open the HTML file to view the tree.

Exploration:

The visualization is a tree encoding parent-child relationship of the genomic annotations e.g. for each chromosom, we have several genes and each gene, in turn, his composed of one or more transcripts, which are further divided into coding regions.
Each node in the tree represents an annotations e.g chromosome, gene, transcript, CDS.
Each node is collapsibale, meaning the children will collapse into the node on click.
Each node contains the information about the type of annotation it is encoding, and it's start & end. This can be accessed by moving the mouse over to the name (text) of the node. The values will pop-up on the screen.

See the visualization in action at bl.ocks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visualizing-genome-annotations

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
gffTree.html		gffTree.html
gff_parser.py		gff_parser.py
gff_parser.pyc		gff_parser.pyc
main.py		main.py
parentChild.csv		parentChild.csv
sample1.gff		sample1.gff

Jverma/Visualizing-genome-annotations

Folders and files

Latest commit

History

Repository files navigation

Visualizing-genome-annotations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages