Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference running markdups with and without projection #1014

Closed
jpdna opened this issue Apr 24, 2016 · 1 comment
Closed

Difference running markdups with and without projection #1014

jpdna opened this issue Apr 24, 2016 · 1 comment

Comments

@jpdna
Copy link
Member

@jpdna jpdna commented Apr 24, 2016

There is a seemingly minor but unexpected difference I see when running ADAM markdups with and without the -limit-projection flag. My assumption was that this flag was just a performance optimization, and should not affect the result.

I noticed the difference when running ADAM flagstat on two post mark-dups ADAM files, one that was produced including the -limit-projection flag like:

adam-submit --executor-memory=14G --master spark://127.0.0.1:7077 --conf spark.sql.shuffle.partitions=121 -- transform ./input.adam ./output.dupsmarked.adam -mark_duplicate_reads -limit_projection

and one run that was the same command but excluded -limit_projection

The differences were 3 counts shifting in the flagstat output between two of the categories,
primary duplicates - both read and mate mapped and primary duplicates - only read mapped

Without limit project

38632222 + 0 in total (QC-passed reads + QC-failed reads)
1765963 + 0 primary duplicates
1290944 + 0 primary duplicates - both read and mate mapped
475019 + 0 primary duplicates - only read mapped
535062 + 0 primary duplicates - cross chromosome
0 + 0 secondary duplicates
0 + 0 secondary duplicates - both read and mate mapped
0 + 0 secondary duplicates - only read mapped
0 + 0 secondary duplicates - cross chromosome
36854876 + 0 mapped (95.40%:0.00%)
32767453 + 0 paired in sequencing
16383937 + 0 read1
16383516 + 0 read2
26170970 + 0 properly paired (67.74%:0.00%)
29212761 + 0 with itself and mate mapped
1777346 + 0 singletons (4.60%:0.00%)
1324347 + 0 with mate mapped to a different chr
676289 + 0 with mate mapped to a different chr (mapQ>=5)

With Limit Projection:

38632222 + 0 in total (QC-passed reads + QC-failed reads)
1765963 + 0 primary duplicates
1290947 + 0 primary duplicates - both read and mate mapped
475016 + 0 primary duplicates - only read mapped
535062 + 0 primary duplicates - cross chromosome
0 + 0 secondary duplicates
0 + 0 secondary duplicates - both read and mate mapped
0 + 0 secondary duplicates - only read mapped
0 + 0 secondary duplicates - cross chromosome
36854876 + 0 mapped (95.40%:0.00%)
32767453 + 0 paired in sequencing
16383937 + 0 read1
16383516 + 0 read2
26170970 + 0 properly paired (67.74%:0.00%)
29212761 + 0 with itself and mate mapped
1777346 + 0 singletons (4.60%:0.00%)
1324347 + 0 with mate mapped to a different chr
676289 + 0 with mate mapped to a different chr (mapQ>=5)
@fnothaft fnothaft added this to the 0.20.0 milestone Jul 20, 2016
@heuermh heuermh modified the milestones: 0.20.0, 0.22.0 Oct 13, 2016
@fnothaft
Copy link
Member

@fnothaft fnothaft commented Mar 3, 2017

As referenced from #941, this was caused to an issue in the projection that dropped the read in fragment field. Closing; @jpdna let me know if I am incorrect.

@fnothaft fnothaft closed this Mar 3, 2017
@heuermh heuermh added this to Completed in Release 0.23.0 Mar 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.