mreid / splice

Playing around with BioJava and Clojure

This URL has Read+Write access

splice /
name age message
file .clojure Loading commit data...
file .gitignore
file README.markdown
file lazyread.clj
file test.clj
README.markdown

DNA Analysis

This code is used to analyse DNA sequences using BioJava and Clojure.

Setup

There are several data and library requirements used by this project. The following needs to be installed to successfully run the code.

We assume that you are running Mac OS X Leopard, that downloads go to ~/Downloads, and that commands are run from the directory containing this file. The following directory need to be created manually:

$ mkdir -p data/dm4p1 vendor

NOTE: In the interests of space, the files installed here are not (and should not) be checked into the source control repository. If you need to set up a new local copy of this project you will have to manually download and set up these files.

[Clojure][]

  1. Follow these instructions for setting up Clojure for Mac OS X Leopard.

  2. Use the following command to add the extra jars to Clojure's classpath:
    $ echo "vendor/biojava.jar:vendor/bytecode.jar" > .clojure

D. melanogaster (4.1) EID data

  1. Download dm4p1.EID.tar.gz.

  2. Unpack to the data directory:
    $ tar xf ~/Downloads/dm4p1.EID.tar -C data/dm4p1

BioJava

  1. Download the jar file for BioJava 1.6.

  2. Download the bytecode jar file (required for BioJava).

  3. Copy both jars to the vendor directory:
    $ cp ~/Downloads/biojava.jar vendor ; cp ~/Downloads/bytecode.jar vendor

Simple BioJava Test

There is a simple test file test.clj, written in Clojure, that reads in the start of the dm4p1.dEID file and matches a pattern against it.

Once everything is set up correctly, the test can be run from the project directory (the one containing this file) by:

$ clj test.clj

The following output should be observed:

Sequence "gattggggcaaagtttatccaaatatgtctggagatggtgctcttggtatgcttattaatcgtaaagcagatatatgcattggagctatgtactcgtggtacgaagattacacatacttagacctttctatgtatcttgtacgttctggaataacgtgtcttgtaccagcgcctttgcggttgactagttggtaccttcccttagagcctttcaaagaaactttatgggctgcaattctattatgtctatgtgcagaagccacaggattggttttagcatataaaagtgagcaggcgctgtatgtactgcctggctaccgagagggctggtggacttgtacaagctttggagtatgtaccacctttaaacttttcatatcgcaatcaggaaacagcaaggcatattcactgacagttcgtgtactactctttgcctgcttccttaatgatttaataataaccagcatatatggtggcggccttgctagtatattgacaattcctagcatggacgaggcagccgacactgtcacccgcttgcgatttcaccgattacagtgggcggccaactcagaggcatgggtctcggccatacgcgcttctgatgaggtaagtgttttaatgaagatcaatatcatcttagaagcatacgtttctttctatgaaaggcattagtgaaggatatattgtacaattttcacatctatagcgacgatgagttgctacgcttagcacaggaccagcatatgcgcattggatttactgtggagcgtctgccattcggtaacaacaaaaattaatgacgggaaccaatattatatttttcttgttctgtaggtcactttgctatcggaaactatttggggcctcaagcgattgaccagttagttataatgaaggacgatatttattttcaatatacggtggcttttgttcccagactttggcccctcctcgataaattaaatactctgatatatagctggcattcgtctggtttcgataaatactgggaatatcgagttgttgccgataacttaaatctgaagatacaacaacaagttcaagaaacaatgacaggaactaaagatattggtccagtcccgcttggaatgtcaaactttgcgggatttataattgtctggatattaggatctgctatagctacattaacttttttgttagaactatcactgacatatattttaaaacagagcaatctgaaataa"
Pattern "agnct"
Match "agcct"