Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Question: How to get paired-end alignemntRecord like RDD[AlignmentRecord, AlignmentRecordRDD]? #1419
Question: How to get paired-end alignemntRecord like RDD[AlignmentRecord, AlignmentRecord]?
I want to use it to read mapping.
I try loadPairedFastq and loadFastq:
But it only get AlignmentRecordRDD or RDD[AlignmentRecord], I cann't get paired reads.
Ask for help, thanks, please.
Thank you! @fnothaft I will read it later.
New I try groupBy readName to get RDD[readName,iterator[AlignmentRecord]], Have you try this? I don't know whether there is a problem...
I run cs-bwamem with upload function before, it spend a lot time for paired-end fastq from local fs to HDFS.
The groupBy on readname usually works, but sometimes people put a /1,/2 or _1/_2 suffix on the reads for first/second of pair, so you'll want to look out for that.
Actually, the slowness of the FASTQ upload was one of my motivations for doing the cs-bwamem refactor with Fragment!
Adam and cs-bwamem both has deal with /1,/2 , I groupBy data after loadFastq in adam.
I has not seen _1/_2 suffix format ,and not sure it has been deal with in Adam?
Thanks, I have use Spark+Adam+BWA to read mapping in distributed enviromental and faster than cs-bwamem, but I have not to verification all.
Thank you for your help and contribution.