New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How to get paired-end alignemntRecord like RDD[AlignmentRecord, AlignmentRecordRDD]? #1419

Closed
xubo245 opened this Issue Mar 4, 2017 · 5 comments

Comments

Projects
3 participants
@xubo245

xubo245 commented Mar 4, 2017

Question: How to get paired-end alignemntRecord like RDD[AlignmentRecord, AlignmentRecord]?

I want to use it to read mapping.

I try loadPairedFastq and loadFastq:

    val pairdRDD = sc.loadPairedFastq(str1, str2, None, ValidationStringency.STRICT)
    val pairdRDD = sc.loadFastq(filePath1 = str1, filePath2Opt = Option(str2))

But it only get AlignmentRecordRDD or RDD[AlignmentRecord], I cann't get paired reads.

Ask for help, thanks, please.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Mar 4, 2017

Member

Hi @xubo245! ADAM's paired end data structure is the Fragment/FragmentRDD. I have a usage example for alignment at ytchen0323/cloud-scale-bwamem#9.

Member

fnothaft commented Mar 4, 2017

Hi @xubo245! ADAM's paired end data structure is the Fragment/FragmentRDD. I have a usage example for alignment at ytchen0323/cloud-scale-bwamem#9.

@xubo245

This comment has been minimized.

Show comment
Hide comment
@xubo245

xubo245 Mar 4, 2017

Thank you! @fnothaft I will read it later.

New I try groupBy readName to get RDD[readName,iterator[AlignmentRecord]], Have you try this? I don't know whether there is a problem...

I run cs-bwamem with upload function before, it spend a lot time for paired-end fastq from local fs to HDFS.

xubo245 commented Mar 4, 2017

Thank you! @fnothaft I will read it later.

New I try groupBy readName to get RDD[readName,iterator[AlignmentRecord]], Have you try this? I don't know whether there is a problem...

I run cs-bwamem with upload function before, it spend a lot time for paired-end fastq from local fs to HDFS.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Mar 4, 2017

Member

The groupBy on readname usually works, but sometimes people put a /1,/2 or _1/_2 suffix on the reads for first/second of pair, so you'll want to look out for that.

Actually, the slowness of the FASTQ upload was one of my motivations for doing the cs-bwamem refactor with Fragment!

Member

fnothaft commented Mar 4, 2017

The groupBy on readname usually works, but sometimes people put a /1,/2 or _1/_2 suffix on the reads for first/second of pair, so you'll want to look out for that.

Actually, the slowness of the FASTQ upload was one of my motivations for doing the cs-bwamem refactor with Fragment!

@xubo245

This comment has been minimized.

Show comment
Hide comment
@xubo245

xubo245 Mar 5, 2017

Adam and cs-bwamem both has deal with /1,/2 , I groupBy data after loadFastq in adam.

I has not seen _1/_2 suffix format ,and not sure it has been deal with in Adam?

Thanks, I have use Spark+Adam+BWA to read mapping in distributed enviromental and faster than cs-bwamem, but I have not to verification all.

Thank you for your help and contribution.

xubo245 commented Mar 5, 2017

Adam and cs-bwamem both has deal with /1,/2 , I groupBy data after loadFastq in adam.

I has not seen _1/_2 suffix format ,and not sure it has been deal with in Adam?

Thanks, I have use Spark+Adam+BWA to read mapping in distributed enviromental and faster than cs-bwamem, but I have not to verification all.

Thank you for your help and contribution.

@xubo245

This comment has been minimized.

Show comment
Hide comment
@xubo245

xubo245 Mar 5, 2017

Have you try to improvement the cs-bwamem performance in align? such as SWExtend in worker1 of cs-bwamem

xubo245 commented Mar 5, 2017

Have you try to improvement the cs-bwamem performance in align? such as SWExtend in worker1 of cs-bwamem

@fnothaft fnothaft closed this May 12, 2017

@heuermh heuermh modified the milestone: 0.23.0 Jul 22, 2017

@heuermh heuermh added this to Completed in Release 0.23.0 Jan 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment