Newbie questions - learning resources? Reading a range of records from Adam? #281

Closed
pkondratyuk opened this Issue Jun 28, 2014 · 10 comments

Comments

Projects
None yet
4 participants
@pkondratyuk

Being totally new to Adam, I was wondering is there a set of learning resources to start teaching myself the Java/Scala API?

On a more specific note, what API functions would I need to read a range of records (say all records from number 5 to 637) from an adam file?

@alartin

This comment has been minimized.

Show comment
Hide comment
@alartin

alartin Jun 28, 2014

I faced same issue. Where is API doc like Javadoc or scaladoc and examples.

发自我的 iPhone

在 2014-6-28,上午10:11,pkondratyuk notifications@github.com 写道:

Being totally new to Adam, I was wondering is there a set of learning resources to start teaching myself the Java/Scala API?

On a more specific note, what API functions would I need to read a range of records (say all records from number 5 to 637) from an adam file?


Reply to this email directly or view it on GitHub.

alartin commented Jun 28, 2014

I faced same issue. Where is API doc like Javadoc or scaladoc and examples.

发自我的 iPhone

在 2014-6-28,上午10:11,pkondratyuk notifications@github.com 写道:

Being totally new to Adam, I was wondering is there a set of learning resources to start teaching myself the Java/Scala API?

On a more specific note, what API functions would I need to read a range of records (say all records from number 5 to 637) from an adam file?


Reply to this email directly or view it on GitHub.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jun 28, 2014

Member

Hi @pkondratyuk and @alartin!

We're glad to see you using the project; we're pushing towards a production release towards the end of this year where we have fully fleshed-out documentation. However, there has been a large demand for Scaladoc recently, so we will work to publish that. https://github.com/hammerlab has built an ipython notebook that documents how to setup ADAM, @massie @hammer @arahuja, can you post this link? If you have the repo checked out, you can generate scaladoc with:

mvn scala:doc

@pkondratyuk there are two ways to filter on genomic region, depending on whether you want to filter on read from Parquet, or if you want to filter after you've read the data. Are you interested in a particular pattern?

Member

fnothaft commented Jun 28, 2014

Hi @pkondratyuk and @alartin!

We're glad to see you using the project; we're pushing towards a production release towards the end of this year where we have fully fleshed-out documentation. However, there has been a large demand for Scaladoc recently, so we will work to publish that. https://github.com/hammerlab has built an ipython notebook that documents how to setup ADAM, @massie @hammer @arahuja, can you post this link? If you have the repo checked out, you can generate scaladoc with:

mvn scala:doc

@pkondratyuk there are two ways to filter on genomic region, depending on whether you want to filter on read from Parquet, or if you want to filter after you've read the data. Are you interested in a particular pattern?

@pkondratyuk

This comment has been minimized.

Show comment
Hide comment
@pkondratyuk

pkondratyuk Jun 28, 2014

Thank you for your response. We don't really need to filter for this particular task, I think... We would just like to retrieve a set of records as AdamRecord when given the start record number and the end record number. I.e. retrieve records numbers 500 to 576 (contiguous), and possibly the SAM header from that Adam file. Or does it count as filtering on record number?

Thank you for your response. We don't really need to filter for this particular task, I think... We would just like to retrieve a set of records as AdamRecord when given the start record number and the end record number. I.e. retrieve records numbers 500 to 576 (contiguous), and possibly the SAM header from that Adam file. Or does it count as filtering on record number?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 2, 2014

Member

The iPython notebook I mentioned above was hiding in plain sight! ;) For anyone interested, you can find it here.

@pkondratyuk sorry for the slow response to your question above; I've been traveling the last few days, and a few other team members are out of the office as well. I'll try to post a more detailed reply tomorrow, but you may want to check out some of the code in the LocusPredicate, for an example of how we can use a predicate in Parquet to filter on records. In your case, I think you'd want to apply a predicate on the range of records you're looking to pull out.

Member

fnothaft commented Jul 2, 2014

The iPython notebook I mentioned above was hiding in plain sight! ;) For anyone interested, you can find it here.

@pkondratyuk sorry for the slow response to your question above; I've been traveling the last few days, and a few other team members are out of the office as well. I'll try to post a more detailed reply tomorrow, but you may want to check out some of the code in the LocusPredicate, for an example of how we can use a predicate in Parquet to filter on records. In your case, I think you'd want to apply a predicate on the range of records you're looking to pull out.

@pkondratyuk

This comment has been minimized.

Show comment
Hide comment
@pkondratyuk

pkondratyuk Jul 2, 2014

Thanks, that will definitely be helpful to us. We've built a simple program based on the adam-core API that reads from an ADAM file and tries to write a SAM file, and it compiles fine but for some reason fails at runtime with a strange-looking exception:

java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.bdgenomics.adam.avro.ADAMRecord

What could be going here? The code is below, it's pretty basic:

def main(args : Array[String])
{
val adamFolder : String = "hdfs://test/user/root/testAdamFiles";
val samFile : String = "/usr/local/adam/testSAM/file.sam";

val sc = new SparkContext("local", "Simple App", "/usr/lib/spark",
List("/root/NetBeansProjects/mavenscala1/target/mavenscala1-1.0.jar"));

val adamContext : ADAMContext = new ADAMContext(sc);
val rddAdamRec : RDD[ADAMRecord] = adamContext.adamLoad(adamFolder);

val adamFunctions : ADAMRecordRDDFunctions = new ADAMRecordRDDFunctions(rddAdamRec);

adamFunctions.adamSAMSave(samFile, true); // save as SAM
}

Thanks, that will definitely be helpful to us. We've built a simple program based on the adam-core API that reads from an ADAM file and tries to write a SAM file, and it compiles fine but for some reason fails at runtime with a strange-looking exception:

java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.bdgenomics.adam.avro.ADAMRecord

What could be going here? The code is below, it's pretty basic:

def main(args : Array[String])
{
val adamFolder : String = "hdfs://test/user/root/testAdamFiles";
val samFile : String = "/usr/local/adam/testSAM/file.sam";

val sc = new SparkContext("local", "Simple App", "/usr/lib/spark",
List("/root/NetBeansProjects/mavenscala1/target/mavenscala1-1.0.jar"));

val adamContext : ADAMContext = new ADAMContext(sc);
val rddAdamRec : RDD[ADAMRecord] = adamContext.adamLoad(adamFolder);

val adamFunctions : ADAMRecordRDDFunctions = new ADAMRecordRDDFunctions(rddAdamRec);

adamFunctions.adamSAMSave(samFile, true); // save as SAM
}

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 2, 2014

Member

I think @massie and I were just talking about this a few days ago; there's a specific issue that can occur (some dependency injection) problem that can cause Parquet/AVRO to disagree about whether two schemas match. @massie do you remember the fix?

Member

fnothaft commented Jul 2, 2014

I think @massie and I were just talking about this a few days ago; there's a specific issue that can occur (some dependency injection) problem that can cause Parquet/AVRO to disagree about whether two schemas match. @massie do you remember the fix?

@pkondratyuk

This comment has been minimized.

Show comment
Hide comment
@pkondratyuk

pkondratyuk Jul 4, 2014

@fnothaft : do you mind if I post the avro.generic.GenericData class cast problem as a separate issue? I think other people starting to work with ADAM may run into it too.

@fnothaft : do you mind if I post the avro.generic.GenericData class cast problem as a separate issue? I think other people starting to work with ADAM may run into it too.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 4, 2014

Member

@pkondratyuk yes, please! That would be great.

Member

fnothaft commented Jul 4, 2014

@pkondratyuk yes, please! That would be great.

@pkondratyuk

This comment has been minimized.

Show comment
Hide comment
@pkondratyuk

pkondratyuk Jul 4, 2014

Done! If you could help us with this issue, a bioinformatics company would be on its way to storing its data internally in the ADAM format :)) Thanks a lot for developing this product.

Done! If you could help us with this issue, a bioinformatics company would be on its way to storing its data internally in the ADAM format :)) Thanks a lot for developing this product.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 20, 2016

Member

Closing in favor of other documentation issues.

Member

fnothaft commented Jul 20, 2016

Closing in favor of other documentation issues.

@fnothaft fnothaft closed this Jul 20, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment