New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-327] Adding gene, transcript, and exon models. #404

Merged
merged 4 commits into from Oct 13, 2014

Conversation

Projects
None yet
4 participants
@tdanford
Contributor

tdanford commented Oct 7, 2014

A few major changes in these commits:

  • Upgrading to bdg-formats 0.3.1
  • This version of bdg-formats changes the Pileup schema, so we fix all the errors that this introduces -- and we eliminate the Pileup aggregation code
  • This version of bdg-formats also changes the Feature schema, so we've updated the feature parsing code to create these new features correctly
  • The FeatureHierarchy classes have been rewritten as wrappers around Feature
  • Added Gene, Transcript, Exon, and UTR case classes to represent gene models, and added code for assembling them from 'flat' Features.
@tdanford

This comment has been minimized.

Show comment
Hide comment
@tdanford

tdanford Oct 7, 2014

Contributor

This PR is replacing the previously-open PR with some of the same code, #359

Contributor

tdanford commented Oct 7, 2014

This PR is replacing the previously-open PR with some of the same code, #359

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 7, 2014

Member

+1, great to see this!

Member

fnothaft commented Oct 7, 2014

+1, great to see this!

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 7, 2014

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/272/

Build result: FAILURE

GitHub pull request #404 of commit 17ecc29 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on test03.amplab (CentOS 6.5) (test03 centos) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/404/merge^{commit} # timeout=10Checking out Revision 7df3c73 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 7df3c73 > git rev-list c6703100935589fbf08b7dd7f62fa59728f34606 # timeout=10Triggering ADAM-prb » 1.0.4,centosTriggering ADAM-prb » 2.3.0,centosTriggering ADAM-prb » 2.2.0,centosADAM-prb » 1.0.4,centos completed with result FAILUREADAM-prb » 2.3.0,centos completed with result FAILUREADAM-prb » 2.2.0,centos completed with result FAILURETest FAILed.

AmplabJenkins commented Oct 7, 2014

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/272/

Build result: FAILURE

GitHub pull request #404 of commit 17ecc29 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on test03.amplab (CentOS 6.5) (test03 centos) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/404/merge^{commit} # timeout=10Checking out Revision 7df3c73 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 7df3c73 > git rev-list c6703100935589fbf08b7dd7f62fa59728f34606 # timeout=10Triggering ADAM-prb » 1.0.4,centosTriggering ADAM-prb » 2.3.0,centosTriggering ADAM-prb » 2.2.0,centosADAM-prb » 1.0.4,centos completed with result FAILUREADAM-prb » 2.3.0,centos completed with result FAILUREADAM-prb » 2.2.0,centos completed with result FAILURETest FAILed.

@@ -1,58 +0,0 @@
/**

This comment has been minimized.

@fnothaft

fnothaft Oct 7, 2014

Member

+1

@fnothaft
@tdanford

This comment has been minimized.

Show comment
Hide comment
@tdanford

tdanford Oct 7, 2014

Contributor

Is this failure because we're building from 0.3.1-SNAPSHOT rather than just 0.3.1? it's clearly because we're not picking up the latest version of bdg-formats.

Contributor

tdanford commented Oct 7, 2014

Is this failure because we're building from 0.3.1-SNAPSHOT rather than just 0.3.1? it's clearly because we're not picking up the latest version of bdg-formats.

@tdanford

This comment has been minimized.

Show comment
Hide comment
@tdanford

tdanford Oct 7, 2014

Contributor

Fixes #381 and #327

Contributor

tdanford commented Oct 7, 2014

Fixes #381 and #327

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 7, 2014

Member

Do you need 0.3.1-SNAPSHOT instead of 0.3.0? I thought we wanted to keep the bdg-format dependency off of SNAPSHOTs?

Member

fnothaft commented Oct 7, 2014

Do you need 0.3.1-SNAPSHOT instead of 0.3.0? I thought we wanted to keep the bdg-format dependency off of SNAPSHOTs?

@tdanford

This comment has been minimized.

Show comment
Hide comment
@tdanford

tdanford Oct 7, 2014

Contributor

I think we need 0.3.1 of some kind, right? Am I forgetting which version of bdg-formats these Pileup/Feature changes went into? Let me check. Maybe we need to merge a PR over there first?

Contributor

tdanford commented Oct 7, 2014

I think we need 0.3.1 of some kind, right? Am I forgetting which version of bdg-formats these Pileup/Feature changes went into? Let me check. Maybe we need to merge a PR over there first?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft
Member

fnothaft commented Oct 7, 2014

Do you need bigdatagenomics/bdg-formats#30 merged?

@tdanford

This comment has been minimized.

Show comment
Hide comment
@tdanford

tdanford Oct 7, 2014

Contributor

I think this will depend on that ... yeah.

Contributor

tdanford commented Oct 7, 2014

I think this will depend on that ... yeah.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 7, 2014

Member

Sure, I'll merge bigdatagenomics/bdg-formats#30 when ready and cut a release.

Member

fnothaft commented Oct 7, 2014

Sure, I'll merge bigdatagenomics/bdg-formats#30 when ready and cut a release.

@tdanford

This comment has been minimized.

Show comment
Hide comment
@tdanford

tdanford Oct 7, 2014

Contributor

Large scale questions I have about this:

  • is the gene modeling complete/sufficient?
  • am I handling coordinates correctly in all cases here? (I suspect the answer is, 'no', this could use some review)
  • how do we want to handle the alphabet question? (I think it's clear we don't want to hard-code it, as I've done here.)
Contributor

tdanford commented Oct 7, 2014

Large scale questions I have about this:

  • is the gene modeling complete/sufficient?
  • am I handling coordinates correctly in all cases here? (I suspect the answer is, 'no', this could use some review)
  • how do we want to handle the alphabet question? (I think it's clear we don't want to hard-code it, as I've done here.)
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 7, 2014

Member

I've cut bdg-formats-0.3.1; expect to see it in Maven Central in a few hours...

Member

fnothaft commented Oct 7, 2014

I've cut bdg-formats-0.3.1; expect to see it in Maven Central in a few hours...

* Extractable (not necessarily a contiguous substring, and possibly reverse-
* complemented or transformed in other ways).
*
* @param referenceSequence The reference sequence (e.g. chromosomal sequence) with

This comment has been minimized.

@fnothaft

fnothaft Oct 7, 2014

Member

Again, how much reference is this? Am I giving you the whole contig, just the portion of the contig that overlaps, etc?

@fnothaft

fnothaft Oct 7, 2014

Member

Again, how much reference is this? Am I giving you the whole contig, just the portion of the contig that overlaps, etc?

This comment has been minimized.

@tdanford

tdanford Oct 9, 2014

Contributor

Basically, it's a question of the interaction between the coordinates of the Extractable object and the sequence you give it -- the coordinate 0 needs to correspond to the first character of the sequence you pass in. I'll add more documentation.

@tdanford

tdanford Oct 9, 2014

Contributor

Basically, it's a question of the interaction between the coordinates of the Extractable object and the sequence you give it -- the coordinate 0 needs to correspond to the first character of the sequence you pass in. I'll add more documentation.

* An exon model (here represented as a value of the Exon class) is a representation of a
* single exon from a transcript in genomic coordinates.
*
* NOTE: we're not handling shared exons here

This comment has been minimized.

@fnothaft

fnothaft Oct 7, 2014

Member

I presume this means that if an exon is shared, we duplicate it, correct?

@fnothaft

fnothaft Oct 7, 2014

Member

I presume this means that if an exon is shared, we duplicate it, correct?

This comment has been minimized.

@tdanford

tdanford Oct 9, 2014

Contributor

Yup. "Shared" exons are a slippery notion.

@tdanford

tdanford Oct 9, 2014

Contributor

Yup. "Shared" exons are a slippery notion.

class GeneContext(sc: SparkContext) {
// This import has to go here and not at the top level, because there's a
// conflict with another implicit used in asGenes, below.

This comment has been minimized.

@fnothaft

fnothaft Oct 7, 2014

Member

Why do the two implicits conflict?

@fnothaft

fnothaft Oct 7, 2014

Member

Why do the two implicits conflict?

This comment has been minimized.

@tdanford

tdanford Oct 9, 2014

Contributor

I have no idea. But basically, if you put this import at the top level, then all the implicit conversions from java.util.List to the Scala equivalents (basically, all the JavaConversions) blow up in the Gene transformation code below.

@tdanford

tdanford Oct 9, 2014

Contributor

I have no idea. But basically, if you put this import at the top level, then all the implicit conversions from java.util.List to the Scala equivalents (basically, all the JavaConversions) blow up in the Gene transformation code below.

object ReferenceUtils {
def unionReferenceSet(refs: Iterable[ReferenceRegion]): Iterable[ReferenceRegion] = {

This comment has been minimized.

@fnothaft

fnothaft Oct 7, 2014

Member

What does this method do? Docs would be useful.

@fnothaft

fnothaft Oct 7, 2014

Member

What does this method do? Docs would be useful.

.setFeatureType(feature)
.setSource(source)
val _strand = strand match {

This comment has been minimized.

@fnothaft

fnothaft Oct 7, 2014

Member

We haven't been prefixing internal values with _ elsewhere in the code; let's remove those for consistency.

@fnothaft

fnothaft Oct 7, 2014

Member

We haven't been prefixing internal values with _ elsewhere in the code; let's remove those for consistency.

This comment has been minimized.

@tdanford

tdanford Oct 9, 2014

Contributor

Yeah, in this case, it's an Option[T] value and I wanted to break it up into two lines for readability. There's already a strand field above (ditto for _id, etc, below).

@tdanford

tdanford Oct 9, 2014

Contributor

Yeah, in this case, it's an Option[T] value and I wanted to break it up into two lines for readability. There's already a strand field above (ditto for _id, etc, below).

}
f.setStrand(_strand)
val (_id, _parentId) =

This comment has been minimized.

@fnothaft

fnothaft Oct 7, 2014

Member

Ditto here RE: _

@fnothaft

fnothaft Oct 7, 2014

Member

Ditto here RE: _

This comment has been minimized.

@tdanford

tdanford Oct 9, 2014

Contributor

Ditto, as above.

@tdanford

tdanford Oct 9, 2014

Contributor

Ditto, as above.

f.setAttributes(attrs)
Seq(f.build())

This comment has been minimized.

@fnothaft

fnothaft Oct 7, 2014

Member

As an aside, we're just wrapping all of the features in a Seq before returning. This seems inefficient. Shall we remove the Seq[Feature], and just return Feature?

@fnothaft

fnothaft Oct 7, 2014

Member

As an aside, we're just wrapping all of the features in a Seq before returning. This seems inefficient. Shall we remove the Seq[Feature], and just return Feature?

This comment has been minimized.

@tdanford

tdanford Oct 9, 2014

Contributor

Or Option[Feature] would work too.

But I'm looking forward to the future, when we might parse (say) a GBFF file and get multiple features-per-entry.

@tdanford

tdanford Oct 9, 2014

Contributor

Or Option[Feature] would work too.

But I'm looking forward to the future, when we might parse (say) a GBFF file and get multiple features-per-entry.

This comment has been minimized.

@laserson

laserson Oct 10, 2014

Contributor

+1 to @tdanford. Covers the case of no features returned and multiple features returned.

@laserson

laserson Oct 10, 2014

Contributor

+1 to @tdanford. Covers the case of no features returned and multiple features returned.

import scala.io.Source
class GeneSuite extends SparkFunSuite {

This comment has been minimized.

@fnothaft

fnothaft Oct 7, 2014

Member

Why are we ignoring the first two tests?

@fnothaft

fnothaft Oct 7, 2014

Member

Why are we ignoring the first two tests?

This comment has been minimized.

@tdanford

tdanford Oct 9, 2014

Contributor

Those'll be added back in.

@tdanford

tdanford Oct 9, 2014

Contributor

Those'll be added back in.

@@ -105,5 +105,17 @@ trait SparkFunSuite extends FunSuite with BeforeAndAfter {
def sparkTest(name: String)(body: => Unit) {
sparkTest(name, silenceSpark = true)(body)
}

This comment has been minimized.

@fnothaft

fnothaft Oct 7, 2014

Member

+1; can you make sure this gets propagated upstream?

@fnothaft

fnothaft Oct 7, 2014

Member

+1; can you make sure this gets propagated upstream?

This comment has been minimized.

@tdanford

tdanford Oct 9, 2014

Contributor

Sure.

@tdanford

tdanford Oct 9, 2014

Contributor

Sure.

This comment has been minimized.

@tdanford
@tdanford
@@ -19,18 +19,23 @@ package org.bdgenomics.adam.rdd.features
import org.apache.spark.{ SparkContext, Logging }
import org.apache.spark.rdd.RDD
import org.bdgenomics.adam.models.{ NarrowPeakFeature, BEDFeature }
import org.bdgenomics.formats.avro.Feature
object ADAMFeaturesContext {
implicit def sparkContextToADAMFeaturesContext(sc: SparkContext): ADAMFeaturesContext = new ADAMFeaturesContext(sc)

This comment has been minimized.

@laserson

laserson Oct 7, 2014

Contributor

Proposal: change to FeaturesContext

@laserson

laserson Oct 7, 2014

Contributor

Proposal: change to FeaturesContext

This comment has been minimized.

@tdanford

tdanford Oct 9, 2014

Contributor

We talked about this in our call -- @fnothaft can you remind us, did we decide to rename this or not?

@tdanford

tdanford Oct 9, 2014

Contributor

We talked about this in our call -- @fnothaft can you remind us, did we decide to rename this or not?

This comment has been minimized.

@fnothaft

fnothaft Oct 13, 2014

Member

No; we'll fix this later. I've opened #416.

@fnothaft

fnothaft Oct 13, 2014

Member

No; we'll fix this later. I've opened #416.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 7, 2014

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/276/

Build result: FAILURE

GitHub pull request #404 of commit 0d52afd automatically merged.[EnvInject] - Loading node environment variables.Building remotely on test03.amplab (CentOS 6.5) (test03 centos) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/404/merge^{commit} # timeout=10Checking out Revision 206c778 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 206c778 > git rev-list c6703100935589fbf08b7dd7f62fa59728f34606 # timeout=10Triggering ADAM-prb » 1.0.4,centosTriggering ADAM-prb » 2.3.0,centosTriggering ADAM-prb » 2.2.0,centosADAM-prb » 1.0.4,centos completed with result FAILUREADAM-prb » 2.3.0,centos completed with result FAILUREADAM-prb » 2.2.0,centos completed with result FAILURETest FAILed.

AmplabJenkins commented Oct 7, 2014

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/276/

Build result: FAILURE

GitHub pull request #404 of commit 0d52afd automatically merged.[EnvInject] - Loading node environment variables.Building remotely on test03.amplab (CentOS 6.5) (test03 centos) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/404/merge^{commit} # timeout=10Checking out Revision 206c778 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 206c778 > git rev-list c6703100935589fbf08b7dd7f62fa59728f34606 # timeout=10Triggering ADAM-prb » 1.0.4,centosTriggering ADAM-prb » 2.3.0,centosTriggering ADAM-prb » 2.2.0,centosADAM-prb » 1.0.4,centos completed with result FAILUREADAM-prb » 2.3.0,centos completed with result FAILUREADAM-prb » 2.2.0,centos completed with result FAILURETest FAILed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 9, 2014

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/278/

Build result: FAILURE

GitHub pull request #404 of commit 50fad0d automatically merged.[EnvInject] - Loading node environment variables.Building remotely on test03.amplab (CentOS 6.5) (test03 centos) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/404/merge^{commit} # timeout=10Checking out Revision 7824ccf1b550d3d5efb661327cf290b4eeb6674a (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 7824ccf1b550d3d5efb661327cf290b4eeb6674a > git rev-list c6703100935589fbf08b7dd7f62fa59728f34606 # timeout=10Triggering ADAM-prb » 1.0.4,centosTriggering ADAM-prb » 2.3.0,centosTriggering ADAM-prb » 2.2.0,centosADAM-prb » 1.0.4,centos completed with result FAILUREADAM-prb » 2.3.0,centos completed with result FAILUREADAM-prb » 2.2.0,centos completed with result FAILURETest FAILed.

AmplabJenkins commented Oct 9, 2014

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/278/

Build result: FAILURE

GitHub pull request #404 of commit 50fad0d automatically merged.[EnvInject] - Loading node environment variables.Building remotely on test03.amplab (CentOS 6.5) (test03 centos) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/404/merge^{commit} # timeout=10Checking out Revision 7824ccf1b550d3d5efb661327cf290b4eeb6674a (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 7824ccf1b550d3d5efb661327cf290b4eeb6674a > git rev-list c6703100935589fbf08b7dd7f62fa59728f34606 # timeout=10Triggering ADAM-prb » 1.0.4,centosTriggering ADAM-prb » 2.3.0,centosTriggering ADAM-prb » 2.2.0,centosADAM-prb » 1.0.4,centos completed with result FAILUREADAM-prb » 2.3.0,centos completed with result FAILUREADAM-prb » 2.2.0,centos completed with result FAILURETest FAILed.

@tdanford

This comment has been minimized.

Show comment
Hide comment
@tdanford

tdanford Oct 9, 2014

Contributor

@fnothaft Let's talk about the gzipped data files that are in src/main/resources here. Is the objection to them that they're gzipped, that they're large, that they're unnecessary, or something else?

Contributor

tdanford commented Oct 9, 2014

@fnothaft Let's talk about the gzipped data files that are in src/main/resources here. Is the objection to them that they're gzipped, that they're large, that they're unnecessary, or something else?

Show outdated Hide outdated pom.xml
@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 10, 2014

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/283/

Build result: FAILURE

GitHub pull request #404 of commit 3fde5c7 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on test03.amplab (CentOS 6.5) (test03 centos) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/404/merge^{commit} # timeout=10Checking out Revision a075a10 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f a075a10 > git rev-list c6703100935589fbf08b7dd7f62fa59728f34606 # timeout=10Triggering ADAM-prb » 1.0.4,centosTriggering ADAM-prb » 2.3.0,centosTriggering ADAM-prb » 2.2.0,centosADAM-prb » 1.0.4,centos completed with result FAILUREADAM-prb » 2.3.0,centos completed with result FAILUREADAM-prb » 2.2.0,centos completed with result FAILURETest FAILed.

AmplabJenkins commented Oct 10, 2014

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/283/

Build result: FAILURE

GitHub pull request #404 of commit 3fde5c7 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on test03.amplab (CentOS 6.5) (test03 centos) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/404/merge^{commit} # timeout=10Checking out Revision a075a10 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f a075a10 > git rev-list c6703100935589fbf08b7dd7f62fa59728f34606 # timeout=10Triggering ADAM-prb » 1.0.4,centosTriggering ADAM-prb » 2.3.0,centosTriggering ADAM-prb » 2.2.0,centosADAM-prb » 1.0.4,centos completed with result FAILUREADAM-prb » 2.3.0,centos completed with result FAILUREADAM-prb » 2.2.0,centos completed with result FAILURETest FAILed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 10, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/284/Test PASSed.

AmplabJenkins commented Oct 10, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/284/Test PASSed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 10, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/285/Test PASSed.

AmplabJenkins commented Oct 10, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/285/Test PASSed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 10, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/286/Test PASSed.

AmplabJenkins commented Oct 10, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/286/Test PASSed.

val f = Feature.newBuilder()
.setContig(contig)
.setStart(start.toLong - 1) // GTF/GFF ranges are 1-based
.setEnd(end.toLong) // GTF/GFF ranges are closed

This comment has been minimized.

@laserson

laserson Oct 10, 2014

Contributor

Looks good to me here

@laserson

laserson Oct 10, 2014

Contributor

Looks good to me here

// if (fields.length > 10) fb.setBlockSizes(fields(10).split(",").map(new java.lang.Long(_)).toList)
// if (fields.length > 11) fb.setBlockStarts(fields(11).split(",").map(new java.lang.Long(_)).toList)
new BEDFeature(fb.build())

This comment has been minimized.

@laserson

laserson Oct 10, 2014

Contributor

Looks good here

@laserson

laserson Oct 10, 2014

Contributor

Looks good here

_id.foreach(f.setFeatureId)
_parentId.foreach(parentId => f.setParentIds(List[CharSequence](parentId)))
f.setAttributes(attrs)

This comment has been minimized.

@laserson

laserson Oct 10, 2014

Contributor

Do attributes ever have coordinates that also need to be tranformed into our coordinate system? That would be highly annoying...

@laserson

laserson Oct 10, 2014

Contributor

Do attributes ever have coordinates that also need to be tranformed into our coordinate system? That would be highly annoying...

This comment has been minimized.

@tdanford

tdanford Oct 13, 2014

Contributor

I don't know -- but I don't think we should address it here. The answer is probably, "probably."

@tdanford

tdanford Oct 13, 2014

Contributor

I don't know -- but I don't think we should address it here. The answer is probably, "probably."

fb.setFeatureId(UUID.randomUUID().toString)
// Peak files are 0-based space-coordinates, so conversion to
// our coordinate space should mean that the values are unchanged.
fb.setStart(fields(1).toLong)
fb.setEnd(fields(2).toLong)

This comment has been minimized.

@laserson

laserson Oct 10, 2014

Contributor

Looks good here

@laserson

laserson Oct 10, 2014

Contributor

Looks good here

def getFeature: String = feature.getFeatureType.toString
def getStart: Long = feature.getStart

This comment has been minimized.

@laserson

laserson Oct 10, 2014

Contributor

Does this need to be feature.getStart + 1 to match the GFF coord system?

@laserson

laserson Oct 10, 2014

Contributor

Does this need to be feature.getStart + 1 to match the GFF coord system?

This comment has been minimized.

@tdanford

tdanford Oct 13, 2014

Contributor

This is the setting that worked, w/r/t the gencode GTF transcript extraction test. Maybe I'm misunderstanding something? Do we have a data comparison test (the same data that we can pull from two different sources) to test this particular question?

@tdanford

tdanford Oct 13, 2014

Contributor

This is the setting that worked, w/r/t the gencode GTF transcript extraction test. Maybe I'm misunderstanding something? Do we have a data comparison test (the same data that we can pull from two different sources) to test this particular question?

This comment has been minimized.

@laserson

laserson Oct 13, 2014

Contributor

Perhaps that's because it's getting all the other coords using the ADAM coord system as well. I am guessing the transcript extraction might still work if you change it to have the +1, as long as you don't mix GTF features with non-GTF features, and you always use the GTFFeature wrapper.

@laserson

laserson Oct 13, 2014

Contributor

Perhaps that's because it's getting all the other coords using the ADAM coord system as well. I am guessing the transcript extraction might still work if you change it to have the +1, as long as you don't mix GTF features with non-GTF features, and you always use the GTFFeature wrapper.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 11, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/292/Test PASSed.

AmplabJenkins commented Oct 11, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/292/Test PASSed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 13, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amp.lab.cs.berkeley.edu/jenkins//job/ADAM-prb/294/
Test PASSed.

AmplabJenkins commented Oct 13, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amp.lab.cs.berkeley.edu/jenkins//job/ADAM-prb/294/
Test PASSed.

fnothaft added a commit that referenced this pull request Oct 13, 2014

Merge pull request #404 from tdanford/gene-models
[ADAM-327] Adding gene, transcript, and exon models.

@fnothaft fnothaft merged commit beec466 into bigdatagenomics:master Oct 13, 2014

1 check passed

default Merged build finished.
Details
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 13, 2014

Member

Merged! Thanks @tdanford! Great to see this work hit.

Member

fnothaft commented Oct 13, 2014

Merged! Thanks @tdanford! Great to see this work hit.

@tdanford tdanford deleted the tdanford:gene-models branch Oct 14, 2014

@tdanford

This comment has been minimized.

Show comment
Hide comment
@tdanford

tdanford Oct 14, 2014

Contributor

Phew. Thanks Frank.

There are probably a few rough edges still left in, please call 'em out as you see 'em, and I'll add fixes.

@laserson I think we were still having a coordinate discussion, let's not let that drop.

Contributor

tdanford commented Oct 14, 2014

Phew. Thanks Frank.

There are probably a few rough edges still left in, please call 'em out as you see 'em, and I'll add fixes.

@laserson I think we were still having a coordinate discussion, let's not let that drop.

@fnothaft fnothaft referenced this pull request Oct 16, 2014

Closed

Remove Gene models #39

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment