Skip to content

alexandra-zaharia/genome-coverage

Repository files navigation

Genome Coverage

Given some genomes and many reads, computes and displays how genomes are covered by the reads.

Introduction

We start out with the input data:

  • a multi-FASTA file containing several genomes, and
  • a FastQ file containing several thousand reads (short DNA sequences)

The idea is to search for every exact occurrence of every read in every genome, and then compute and display genome coverage. Genome coverage refers to how many sequences "cover" every position in a given genome.

Screenshot

Features

  • Overlapping reads are allowed.
  • Occurrence search methods can be plugged in at any time by extending the abstract class PatternSearch. For now, only two types of search are implemented: naïve search (slow) and suffix array pattern search (really fast). The search method can be selected at run time.
  • The user can:
    • Save the coverage chart for any given genome.
    • Save the coverage charts for all genomes.
    • View a coverage chart in full screen.
    • Superimpose several coverage charts on a single graphic (and save this file).
    • Save read occurrences in a parsable output file.
    • Get an orange "About" pop-up :-)
    • Get usage instructions (in French) :-)
    • See stack traces in a scrollable SWING panel along with a helpful error message, but of course there are never any errors :-)

Running GenomeCoverage

GenomeCoverage can be run from the provided JAR:

java -jar GenomeCoverage.jar

GenomeCoverage can equally be run from the command line:

git clone https://github.com/alexandra-zaharia/genome-coverage.git
cd genome-coverage/out/production/GenomeCoverage
java -cp ../../../lib/*:. io.github.alexandra.zaharia.gui.GUI

Test files

A multi-FASTA file containing 8 genomes of about 8,000 nucleotides each is provided: HPV.fna

wget https://raw.githubusercontent.com/alexandra-zaharia/genome-coverage/master/res/HPV.fna

A FastQ file containing 10,000 reads of 100 nucleotides each is provided: reads.fq

wget https://raw.githubusercontent.com/alexandra-zaharia/genome-coverage/master/res/reads.fq

Notes

  • I wrote this project for an assignment in 2014 when I was a first year Master's student. Class, method and variable names are in English, but comments and documentation are in French.

  • Reads are considered error-free.

  • This project uses JFreeChart.

  • The suffix array implementation is based on an existing implementation by Robert Sedgewick and Kevin Wayne.

  • I used this workaround on StackOverflow for an issue with java.awt.Desktop failing to open an URL in the default web browser under Linux if libgnome2-0 is not installed.

About

Given some genomes and many reads, computes and displays genome coverage by the reads.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published