Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Question: How to get paired-end alignemntRecord like RDD[AlignmentRecord, AlignmentRecordRDD]? #1419
Question: How to get paired-end alignemntRecord like RDD[AlignmentRecord, AlignmentRecord]?
I want to use it to read mapping.
I try loadPairedFastq and loadFastq:
But it only get AlignmentRecordRDD or RDD[AlignmentRecord], I cann't get paired reads.
Ask for help, thanks, please.
Thank you! @fnothaft I will read it later.
New I try groupBy readName to get RDD[readName,iterator[AlignmentRecord]], Have you try this? I don't know whether there is a problem...
I run cs-bwamem with upload function before, it spend a lot time for paired-end fastq from local fs to HDFS.
The groupBy on readname usually works, but sometimes people put a /1,/2 or _1/_2 suffix on the reads for first/second of pair, so you'll want to look out for that.
Actually, the slowness of the FASTQ upload was one of my motivations for doing the cs-bwamem refactor with Fragment!
Adam and cs-bwamem both has deal with /1,/2 , I groupBy data after loadFastq in adam.
I has not seen _1/_2 suffix format ,and not sure it has been deal with in Adam?
Thanks, I have use Spark+Adam+BWA to read mapping in distributed enviromental and faster than cs-bwamem, but I have not to verification all.
Thank you for your help and contribution.