New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question : Why the number of paired sequence in adam-0.21.0 less than adam-0.19.0? #1424

Closed
xubo245 opened this Issue Mar 7, 2017 · 13 comments

Comments

Projects
None yet
3 participants
@xubo245

xubo245 commented Mar 7, 2017

I load fastq and count by :

    val rdd= ac.loadAlignments(fastq1,filePath2Opt = Option(fastq2)).rdd
    println("count" + rdd.count())

fastq1 and fastq2 are 10000+10000=20000 sequences
The count number is 15458 with adadm-0.21.0,but the count is 20000 both in adam-0.18.2 and adam-0.19.0. I do not known the reason.

I try to load fastq1 and fastq2:

    val df4 = ac.loadAlignments(fastq1).rdd
    println(df4.count)
    val df5 = ac.loadAlignments(fastq2).rdd
    println(df5.count)

the count are

7729
7729

Why both are not 10000?

Question : Why the number of paired sequence in adam-0.21.0 less than adam-0.19.0?

Thanks

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Mar 7, 2017

Member

I can't reproduce, at least with the data I have locally. Does this happen with a smaller set of data that you could share?

Member

heuermh commented Mar 7, 2017

I can't reproduce, at least with the data I have locally. Does this happen with a smaller set of data that you could share?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Mar 7, 2017

Member

Hi @xubo245! Does this occur on a file that you can share with us? I'm guessing that there are some records that fail validation, with error messages that should appear in a log file.

Member

fnothaft commented Mar 7, 2017

Hi @xubo245! Does this occur on a file that you can share with us? I'm guessing that there are some records that fail validation, with error messages that should appear in a log file.

@xubo245

This comment has been minimized.

Show comment
Hide comment
@xubo245

xubo245 Mar 8, 2017

I can not find any err log information in IDEA(run in wondows 7):

RDD.take(10)

start:
2017-03-08 10:40:29 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-03-08 10:40:30 WARN  MetricsSystem:71 - Using default name DAGScheduler for source because spark.app.id is not set.
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_2210736_2211206_1:0:0_3:0:0_0", "sequence": "AAAAGAAACTTGGTCCCAAGAGAGGCAGTGGCATGGCTGCCGGGGCCCAA", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_37775980_37776505_1:0:0_2:1:0_1", "sequence": "GTGAGGCAAGATCGCGCCATTGCACTCCAGCCTGGGCAACAAGAGCGAAA", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_82335203_82335645_1:0:0_0:0:0_2", "sequence": "TTATTTTCCTCAATGTAGGGGACAAGTAGGAAAACCAACCCATGAAGGAG", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_198664314_198664834_1:0:0_2:0:0_3", "sequence": "TGCAAACTGTTTTGGGAGCAAAACGAATTATTTCGGTAGCAAGAAGAGAC", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_221652337_221652840_0:0:0_5:0:0_4", "sequence": "ACTAGAGAAACACTTCTAGCTTGTTACCATGGGTACCCAGGGGATTGGGC", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_84654344_84654775_1:0:0_2:0:0_5", "sequence": "AAACCAGGAAAGAAAATTGATAATACTACAATTAAGTAAATGATATTCTA", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_24973382_24973883_0:0:0_0:0:0_6", "sequence": "GGGTGGCTTACCGTCCTGGATATCTGGAGCTCAACCCCACCAGGGACGTT", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_223525105_223525623_2:0:0_0:0:0_7", "sequence": "TCTTGATTACAAGCTAAATATGTGGTGAACTATTCATGAGTTATCCAGGA", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_80637418_80637903_1:0:0_1:0:0_8", "sequence": "CATGGATAAAGACCACATTAATAGTGATGGCATAAATGAGGTCTAGGGAA", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_186229095_186229622_0:0:0_1:0:0_9", "sequence": "TTTTTTTACTCCTACATAAAAATCCTCACAGCAGGTATCCTATTCACTAT", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
count:15458
end

The data is crested by wgsim tool (heng li: https://github.com/lh3/wgsim)

xubo245 commented Mar 8, 2017

I can not find any err log information in IDEA(run in wondows 7):

RDD.take(10)

start:
2017-03-08 10:40:29 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-03-08 10:40:30 WARN  MetricsSystem:71 - Using default name DAGScheduler for source because spark.app.id is not set.
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_2210736_2211206_1:0:0_3:0:0_0", "sequence": "AAAAGAAACTTGGTCCCAAGAGAGGCAGTGGCATGGCTGCCGGGGCCCAA", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_37775980_37776505_1:0:0_2:1:0_1", "sequence": "GTGAGGCAAGATCGCGCCATTGCACTCCAGCCTGGGCAACAAGAGCGAAA", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_82335203_82335645_1:0:0_0:0:0_2", "sequence": "TTATTTTCCTCAATGTAGGGGACAAGTAGGAAAACCAACCCATGAAGGAG", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_198664314_198664834_1:0:0_2:0:0_3", "sequence": "TGCAAACTGTTTTGGGAGCAAAACGAATTATTTCGGTAGCAAGAAGAGAC", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_221652337_221652840_0:0:0_5:0:0_4", "sequence": "ACTAGAGAAACACTTCTAGCTTGTTACCATGGGTACCCAGGGGATTGGGC", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_84654344_84654775_1:0:0_2:0:0_5", "sequence": "AAACCAGGAAAGAAAATTGATAATACTACAATTAAGTAAATGATATTCTA", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_24973382_24973883_0:0:0_0:0:0_6", "sequence": "GGGTGGCTTACCGTCCTGGATATCTGGAGCTCAACCCCACCAGGGACGTT", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_223525105_223525623_2:0:0_0:0:0_7", "sequence": "TCTTGATTACAAGCTAAATATGTGGTGAACTATTCATGAGTTATCCAGGA", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_80637418_80637903_1:0:0_1:0:0_8", "sequence": "CATGGATAAAGACCACATTAATAGTGATGGCATAAATGAGGTCTAGGGAA", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_186229095_186229622_0:0:0_1:0:0_9", "sequence": "TTTTTTTACTCCTACATAAAAATCCTCACAGCAGGTATCCTATTCACTAT", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
count:15458
end

The data is crested by wgsim tool (heng li: https://github.com/lh3/wgsim)

@xubo245

This comment has been minimized.

Show comment
Hide comment
@xubo245

xubo245 Mar 8, 2017

@heuermh I test small data with 6 paired-end reads, but loadAlignmnet only 5:
code:

    println("start:")
    val fastq1 = "hdfs://Master:9000/xubo/project/alignment/sparkBWA/input/g38/newsmall/newg38L50c6Nhs20Paired1.fastq"
    val fastq2 = "hdfs://Master:9000/xubo/project/alignment/sparkBWA/input/g38/newsmall/newg38L50c6Nhs20Paired2.fastq"
    val conf = new SparkConf().setMaster("local[16]").setAppName(this.getClass().getSimpleName().filter(!_.equals('$')))
    conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    val sc = new SparkContext(conf)
    val ac = new ADAMContext(sc)
    val df3 = ac.loadAlignments(fastq1, filePath2Opt = Option(fastq2)).rdd
    df3.take(10).foreach(println)
    println("count:" + df3.count())
    val df4 = ac.loadAlignments(fastq1).rdd
    println(df4.count)
    val df5 = ac.loadAlignments(fastq2).rdd
    println(df5.count)
    println("end")
result:

start:


2017-03-08 11:08:41 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-03-08 11:08:42 WARN  MetricsSystem:71 - Using default name DAGScheduler for source because spark.app.id is not set.
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_154009142_154009658_0:0:0_0:0:0_0", "sequence": "CTCAGGTGATCCACCTGCCTCGGCCTCCCAAAGTACTGGGATTACAGGTG", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_221066361_221066927_1:0:0_2:0:0_1", "sequence": "ATTATGGAGAAATAAAACTTGAAAAGGTTATATTCAAGAAGGGAAATGAG", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_78454694_78455121_0:0:0_2:0:0_2", "sequence": "AAGTCCTACCTCTAGCACTGATATTTGCTTGCATGCACCAGCATCAGAGC", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_229513854_229514250_1:0:0_0:0:0_3", "sequence": "TAGTGTGTGCAGTGATACGGTTCAGTACCTACCACCCCAAAATGTGGCAC", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_218786619_218787132_2:0:0_1:0:0_4", "sequence": "GCACTTACTTTTTCACTAACCTAATATTTTGGGAAAAGTAACAAAAATGT", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 1, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_154009142_154009658_0:0:0_0:0:0_0", "sequence": "CATCATGCCTTTTTTTTTTTTTTTTTTTTTTTTTAAGAGCAACGTGATCT", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 1, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_221066361_221066927_1:0:0_2:0:0_1", "sequence": "CAAAGATTGTTACAGTGAGAAGTAGACGGGCAGCTCAGTTTATGATGCGG", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 1, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_78454694_78455121_0:0:0_2:0:0_2", "sequence": "CATCCTCCTGTCAAAGGTGAATCTGCCTTCCTTGCATTAGGTATCCCTCC", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 1, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_229513854_229514250_1:0:0_0:0:0_3", "sequence": "ATTTATTTAAAAGTTTAACATGACATAGGAGCCCTTGGAAATGAAGACCC", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 1, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_218786619_218787132_2:0:0_1:0:0_4", "sequence": "TCATTTTAAATTATTTTTATAGCTGTCTGATATAATTAGAATGCATAATT", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
count:10
5
5
end

xubo245 commented Mar 8, 2017

@heuermh I test small data with 6 paired-end reads, but loadAlignmnet only 5:
code:

    println("start:")
    val fastq1 = "hdfs://Master:9000/xubo/project/alignment/sparkBWA/input/g38/newsmall/newg38L50c6Nhs20Paired1.fastq"
    val fastq2 = "hdfs://Master:9000/xubo/project/alignment/sparkBWA/input/g38/newsmall/newg38L50c6Nhs20Paired2.fastq"
    val conf = new SparkConf().setMaster("local[16]").setAppName(this.getClass().getSimpleName().filter(!_.equals('$')))
    conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    val sc = new SparkContext(conf)
    val ac = new ADAMContext(sc)
    val df3 = ac.loadAlignments(fastq1, filePath2Opt = Option(fastq2)).rdd
    df3.take(10).foreach(println)
    println("count:" + df3.count())
    val df4 = ac.loadAlignments(fastq1).rdd
    println(df4.count)
    val df5 = ac.loadAlignments(fastq2).rdd
    println(df5.count)
    println("end")
result:

start:


2017-03-08 11:08:41 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-03-08 11:08:42 WARN  MetricsSystem:71 - Using default name DAGScheduler for source because spark.app.id is not set.
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_154009142_154009658_0:0:0_0:0:0_0", "sequence": "CTCAGGTGATCCACCTGCCTCGGCCTCCCAAAGTACTGGGATTACAGGTG", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_221066361_221066927_1:0:0_2:0:0_1", "sequence": "ATTATGGAGAAATAAAACTTGAAAAGGTTATATTCAAGAAGGGAAATGAG", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_78454694_78455121_0:0:0_2:0:0_2", "sequence": "AAGTCCTACCTCTAGCACTGATATTTGCTTGCATGCACCAGCATCAGAGC", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_229513854_229514250_1:0:0_0:0:0_3", "sequence": "TAGTGTGTGCAGTGATACGGTTCAGTACCTACCACCCCAAAATGTGGCAC", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 0, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_218786619_218787132_2:0:0_1:0:0_4", "sequence": "GCACTTACTTTTTCACTAACCTAATATTTTGGGAAAAGTAACAAAAATGT", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 1, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_154009142_154009658_0:0:0_0:0:0_0", "sequence": "CATCATGCCTTTTTTTTTTTTTTTTTTTTTTTTTAAGAGCAACGTGATCT", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 1, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_221066361_221066927_1:0:0_2:0:0_1", "sequence": "CAAAGATTGTTACAGTGAGAAGTAGACGGGCAGCTCAGTTTATGATGCGG", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 1, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_78454694_78455121_0:0:0_2:0:0_2", "sequence": "CATCCTCCTGTCAAAGGTGAATCTGCCTTCCTTGCATTAGGTATCCCTCC", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 1, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_229513854_229514250_1:0:0_0:0:0_3", "sequence": "ATTTATTTAAAAGTTTAACATGACATAGGAGCCCTTGGAAATGAAGACCC", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
{"readInFragment": 1, "contigName": null, "start": null, "oldPosition": null, "end": null, "mapq": null, "readName": "chr1_218786619_218787132_2:0:0_1:0:0_4", "sequence": "TCATTTTAAATTATTTTTATAGCTGTCTGATATAATTAGAATGCATAATT", "qual": "22222222222222222222222222222222222222222222222222", "cigar": null, "oldCigar": null, "basesTrimmedFromStart": 0, "basesTrimmedFromEnd": 0, "readPaired": true, "properPair": false, "readMapped": false, "mateMapped": false, "failedVendorQualityChecks": false, "duplicateRead": false, "readNegativeStrand": false, "mateNegativeStrand": false, "primaryAlignment": false, "secondaryAlignment": false, "supplementaryAlignment": false, "mismatchingPositions": null, "origQual": null, "attributes": null, "recordGroupName": null, "recordGroupSample": null, "mateAlignmentStart": null, "mateAlignmentEnd": null, "mateContigName": null, "inferredInsertSize": null}
count:10
5
5
end
@xubo245

This comment has been minimized.

Show comment
Hide comment
@xubo245

xubo245 Mar 8, 2017

@heuermh @fnothaft

I test the native data in adam-0.21.0. the number of line in bqsr1-r1.fq is 1952, it should be 488 reads.
bqsr1-r2.fq too.

It should be total 976 reads after loadAlignments, but there are 892 reads by running my code.

code:

  sparkTest("load fastq") {
    val fastq1Path = testFile("bqsr1-r1.fq")
    val fastq2Path = testFile("bqsr1-r2.fq")
    var align1=sc.loadAlignments(fastq1Path,filePath2Opt = Option(fastq2Path))
    println(align1.rdd.count())
    println(align1.rdd.count()*2)
  }

I add the code into org.bdgenomics.adam.cli.Adam2FastqSuite
the data in adam-adam-parent-spark2_2.10-0.21.0\adam-cli\src\test\resources

result:

2017-03-08 11:21:46 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-03-08 11:21:47 WARN  MetricsSystem:71 - Using default name DAGScheduler for source because spark.app.id is not set.
892
1784

xubo245 commented Mar 8, 2017

@heuermh @fnothaft

I test the native data in adam-0.21.0. the number of line in bqsr1-r1.fq is 1952, it should be 488 reads.
bqsr1-r2.fq too.

It should be total 976 reads after loadAlignments, but there are 892 reads by running my code.

code:

  sparkTest("load fastq") {
    val fastq1Path = testFile("bqsr1-r1.fq")
    val fastq2Path = testFile("bqsr1-r2.fq")
    var align1=sc.loadAlignments(fastq1Path,filePath2Opt = Option(fastq2Path))
    println(align1.rdd.count())
    println(align1.rdd.count()*2)
  }

I add the code into org.bdgenomics.adam.cli.Adam2FastqSuite
the data in adam-adam-parent-spark2_2.10-0.21.0\adam-cli\src\test\resources

result:

2017-03-08 11:21:46 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-03-08 11:21:47 WARN  MetricsSystem:71 - Using default name DAGScheduler for source because spark.app.id is not set.
892
1784

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Mar 8, 2017

Member

Oh, interesting! Any chance you could email those files to me at fnothaft@berkeley.edu? I'd be glad to debug them locally.

Member

fnothaft commented Mar 8, 2017

Oh, interesting! Any chance you could email those files to me at fnothaft@berkeley.edu? I'd be glad to debug them locally.

@xubo245

This comment has been minimized.

Show comment
Hide comment
@xubo245

xubo245 Mar 8, 2017

@fnothaft I have sent a email to you. Thank you very much!

xubo245 commented Mar 8, 2017

@fnothaft I have sent a email to you. Thank you very much!

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Mar 8, 2017

Member

Thanks @xubo245! I've received your email and will look at the data sometime later today or tomorrow.

Member

fnothaft commented Mar 8, 2017

Thanks @xubo245! I've received your email and will look at the data sometime later today or tomorrow.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Mar 10, 2017

Member

Hmmm, interesting! I assume it was the newg38L50c10000Nhs20Paired*.fastq files that were the 10000+10000 -> 20000 ones. On the current master branch, I get 20000 reads:

scala> import org.bdgenomics.adam.rdd.ADAMContext._
import org.bdgenomics.adam.rdd.ADAMContext._

scala> sc.loadAlignments("data/wgsimCreate/newg38L50c10000Nhs20Paired1.fastq", filePath2Opt = Some("data/wgsimCreate/newg38L50c10000Nhs20Paired2.fastq")).rdd.count
res1: Long = 20000

On the 0.21.0 tag, I do get dropped reads:

scala> import org.bdgenomics.adam.rdd.ADAMContext._
import org.bdgenomics.adam.rdd.ADAMContext._

scala> sc.loadAlignments("data/wgsimCreate/newg38L50c10000Nhs20Paired1.fastq", filePath2Opt = Some("data/wgsimCreate/newg38L50c10000Nhs20Paired2.fastq")).rdd.count
res0: Long = 15458

Can you try running on the latest master branch? I'm going to try to do a git bisect (or something of that type) to see if I can find the breaking change.

Member

fnothaft commented Mar 10, 2017

Hmmm, interesting! I assume it was the newg38L50c10000Nhs20Paired*.fastq files that were the 10000+10000 -> 20000 ones. On the current master branch, I get 20000 reads:

scala> import org.bdgenomics.adam.rdd.ADAMContext._
import org.bdgenomics.adam.rdd.ADAMContext._

scala> sc.loadAlignments("data/wgsimCreate/newg38L50c10000Nhs20Paired1.fastq", filePath2Opt = Some("data/wgsimCreate/newg38L50c10000Nhs20Paired2.fastq")).rdd.count
res1: Long = 20000

On the 0.21.0 tag, I do get dropped reads:

scala> import org.bdgenomics.adam.rdd.ADAMContext._
import org.bdgenomics.adam.rdd.ADAMContext._

scala> sc.loadAlignments("data/wgsimCreate/newg38L50c10000Nhs20Paired1.fastq", filePath2Opt = Some("data/wgsimCreate/newg38L50c10000Nhs20Paired2.fastq")).rdd.count
res0: Long = 15458

Can you try running on the latest master branch? I'm going to try to do a git bisect (or something of that type) to see if I can find the breaking change.

@xubo245

This comment has been minimized.

Show comment
Hide comment
@xubo245

xubo245 Mar 10, 2017

I test the latest master branch, there are correct in number.

Only Adam-0.21.0 and 0.20.0 have problem about it.
What causes this problem? Can you tell me if you find?Please

My project run with maven, I will try to replace old version with Adam jar. When the new version of Adam will release?

Thank you very much.

xubo245 commented Mar 10, 2017

I test the latest master branch, there are correct in number.

Only Adam-0.21.0 and 0.20.0 have problem about it.
What causes this problem? Can you tell me if you find?Please

My project run with maven, I will try to replace old version with Adam jar. When the new version of Adam will release?

Thank you very much.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Mar 10, 2017

Member

Hi @xubo245! I believe the error was fixed in 0e57357. We are planning to release ADAM 0.22.0 within the next two weeks (current slated date is 3/18); you can track our progress both here and on ticket #1210.

In the meantime, you can use the ADAM snapshot. A new snapshot gets built and pushed every time there is a commit to ADAM.

Member

fnothaft commented Mar 10, 2017

Hi @xubo245! I believe the error was fixed in 0e57357. We are planning to release ADAM 0.22.0 within the next two weeks (current slated date is 3/18); you can track our progress both here and on ticket #1210.

In the meantime, you can use the ADAM snapshot. A new snapshot gets built and pushed every time there is a commit to ADAM.

@xubo245

This comment has been minimized.

Show comment
Hide comment
@xubo245

xubo245 Mar 10, 2017

Thanks! I test some sample with 0e57357;
I also try adam-0.21.1-SNAPSHOT;
there both are no problems about number of line .

Thank you for your help.

xubo245 commented Mar 10, 2017

Thanks! I test some sample with 0e57357;
I also try adam-0.21.1-SNAPSHOT;
there both are no problems about number of line .

Thank you for your help.

@xubo245

This comment has been minimized.

Show comment
Hide comment
@xubo245

xubo245 Mar 10, 2017

I add a PR with a test sample for this issue: #1433

xubo245 commented Mar 10, 2017

I add a PR with a test sample for this issue: #1433

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment