Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newbie questions - learning resources? Reading a range of records from Adam? #281

Closed
pkondratyuk opened this issue Jun 28, 2014 · 10 comments

Comments

@pkondratyuk
Copy link

Being totally new to Adam, I was wondering is there a set of learning resources to start teaching myself the Java/Scala API?

On a more specific note, what API functions would I need to read a range of records (say all records from number 5 to 637) from an adam file?

@alartin
Copy link

alartin commented Jun 28, 2014

I faced same issue. Where is API doc like Javadoc or scaladoc and examples.

发自我的 iPhone

在 2014-6-28,上午10:11,pkondratyuk notifications@github.com 写道:

Being totally new to Adam, I was wondering is there a set of learning resources to start teaching myself the Java/Scala API?

On a more specific note, what API functions would I need to read a range of records (say all records from number 5 to 637) from an adam file?


Reply to this email directly or view it on GitHub.

@fnothaft
Copy link
Member

Hi @pkondratyuk and @alartin!

We're glad to see you using the project; we're pushing towards a production release towards the end of this year where we have fully fleshed-out documentation. However, there has been a large demand for Scaladoc recently, so we will work to publish that. https://github.com/hammerlab has built an ipython notebook that documents how to setup ADAM, @massie @hammer @arahuja, can you post this link? If you have the repo checked out, you can generate scaladoc with:

mvn scala:doc

@pkondratyuk there are two ways to filter on genomic region, depending on whether you want to filter on read from Parquet, or if you want to filter after you've read the data. Are you interested in a particular pattern?

@pkondratyuk
Copy link
Author

Thank you for your response. We don't really need to filter for this particular task, I think... We would just like to retrieve a set of records as AdamRecord when given the start record number and the end record number. I.e. retrieve records numbers 500 to 576 (contiguous), and possibly the SAM header from that Adam file. Or does it count as filtering on record number?

@fnothaft
Copy link
Member

fnothaft commented Jul 2, 2014

The iPython notebook I mentioned above was hiding in plain sight! ;) For anyone interested, you can find it here.

@pkondratyuk sorry for the slow response to your question above; I've been traveling the last few days, and a few other team members are out of the office as well. I'll try to post a more detailed reply tomorrow, but you may want to check out some of the code in the LocusPredicate, for an example of how we can use a predicate in Parquet to filter on records. In your case, I think you'd want to apply a predicate on the range of records you're looking to pull out.

@pkondratyuk
Copy link
Author

Thanks, that will definitely be helpful to us. We've built a simple program based on the adam-core API that reads from an ADAM file and tries to write a SAM file, and it compiles fine but for some reason fails at runtime with a strange-looking exception:

java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.bdgenomics.adam.avro.ADAMRecord

What could be going here? The code is below, it's pretty basic:

def main(args : Array[String])
{
val adamFolder : String = "hdfs://test/user/root/testAdamFiles";
val samFile : String = "/usr/local/adam/testSAM/file.sam";

val sc = new SparkContext("local", "Simple App", "/usr/lib/spark",
List("/root/NetBeansProjects/mavenscala1/target/mavenscala1-1.0.jar"));

val adamContext : ADAMContext = new ADAMContext(sc);
val rddAdamRec : RDD[ADAMRecord] = adamContext.adamLoad(adamFolder);

val adamFunctions : ADAMRecordRDDFunctions = new ADAMRecordRDDFunctions(rddAdamRec);

adamFunctions.adamSAMSave(samFile, true); // save as SAM
}

@fnothaft
Copy link
Member

fnothaft commented Jul 2, 2014

I think @massie and I were just talking about this a few days ago; there's a specific issue that can occur (some dependency injection) problem that can cause Parquet/AVRO to disagree about whether two schemas match. @massie do you remember the fix?

@pkondratyuk
Copy link
Author

@fnothaft : do you mind if I post the avro.generic.GenericData class cast problem as a separate issue? I think other people starting to work with ADAM may run into it too.

@fnothaft
Copy link
Member

fnothaft commented Jul 4, 2014

@pkondratyuk yes, please! That would be great.

@pkondratyuk
Copy link
Author

Done! If you could help us with this issue, a bioinformatics company would be on its way to storing its data internally in the ADAM format :)) Thanks a lot for developing this product.

@fnothaft
Copy link
Member

Closing in favor of other documentation issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants