Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BED file reader/writer #20

Merged
merged 3 commits into from Jan 11, 2017
Merged

BED file reader/writer #20

merged 3 commits into from Jan 11, 2017

Conversation

alumi
Copy link
Member

@alumi alumi commented Jan 5, 2017

BED file reader/writer

Added BED file reader.
Format of BED file is based on https://genome.ucsc.edu/FAQ/FAQformat#format1

e.g.

(def bed
  (with-open [r (clojure.java.io/reader "PATH/TO/BED/FILE")]
    (doall (cljam.bed/read-fields r))))
;; cljam.bed/read-fields returns a lazy sequence of maps

bed => `({:chr "chr1" :start 1 :end 100 :name "Plus1" :score 0 :strand :plus :thick-start 1 :thick-end 0
         :item-rgb "255,0,0" :block-count 2 :block-sizes [10 90] :block-starts [0 10]}
        {:chr "chr2" :start 1 :end 2000 ...} ...)

(with-open [w (clojure.java.io/writer "PATH/TO/BED/FILE")]
  (cljam.bed/write-fields w bed))
;; supported writer

(with-open [r (cljam.bam/reader "PATH/TO/BAM/FILE")]
  (cljam.io/read-alignments r (first bed)) => `({:qname "Read1" :flags 4 :rname "chr1" ...} ...))
;; Map of BED fields can be used for range-based querying

Range-based query

BED file uses zero-based index and end index is exclusive,
which means that chr1 0 3 represents first three bases of chromosome 1.
Currently, cljam uses one-based index and inclusive end for BAM APIs,
zero-based index and exclusive end for FASTA and pileup APIs.

To provide consistent APIs, this PR unifies the APIs as zero-based one-based index and inclusive end.
Thus, {:chr "chr1" :start 1 :end 3} represents first three bases of chromosome 1.

Please be aware that this is a destructive change!

@alumi alumi changed the title BED file reader BED file reader/writer Jan 5, 2017
@totakke
Copy link
Member

totakke commented Jan 6, 2017

To provide consistent APIs, this PR unifies the APIs as zero-based index and inclusive end.

Is "zero-based index" a typo for "one-based index"?

This change is destructive as you say. I think this PR should be merged after release of v0.1.6 (and bump up to v0.2.0-SNAPSHOT).

@alumi
Copy link
Member Author

alumi commented Jan 6, 2017

To provide consistent APIs, this PR unifies the APIs as zero-based index and inclusive end.

Is "zero-based index" a typo for "one-based index"?

Sorry about the mistake 🙇 It's fixed now.

bump up to v0.2.0-SNAPSHOT

LGTM:+1: I have some projects depending on v0.1.6-SHAPSHOT..

Copy link
Member

@totakke totakke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some comments for code format. Functions of BED I/O have no problems.

@@ -0,0 +1,135 @@
(ns cljam.bed
(:require [clojure.string :as str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use cstr as an aliase of clojure.string in cljam project.

(defn- normalize [m]
"Normalize BED fields.
BED fields are stored in format: 0-origin and inclusive-start / exclusive-end.
This function converts the coordinate into cljam style: 1-origin and inclusice-start / inclusive-end."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arg vec and docstring are reverse.


(defn- denormalize [m]
"De-normalize BED fields.
This is an inverse function of normalize."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arg vec and docstring are reverse.

@alumi
Copy link
Member Author

alumi commented Jan 6, 2017

Thank you for your advice. I've fixed them and rebased.

@totakke totakke merged commit d4f2ba0 into master Jan 11, 2017
totakke added a commit that referenced this pull request Jan 11, 2017
@totakke
Copy link
Member

totakke commented Jan 11, 2017

Thank you for the fixes. I've just merged this PR.

@totakke totakke deleted the feature/bed-reader branch January 12, 2017 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants