Make AlignmentRecordConverter public so that it can be used from other projects #1157

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
4 participants
@tomwhite
Member

tomwhite commented Sep 9, 2016

AlignmentRecordConverter used to be a public class, but was made private in 9764b2c. Unfortunately, this causes a problem for GATK, which uses AlignmentRecordConverter to read ADAM formatted files. This PR reinstates public visibility. There are other converters in the same package that are private that could be changed back, but I haven't done that in this PR.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Sep 9, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1476/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1476/
Test PASSed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 9, 2016

Member

Hi @tomwhite! Thanks for sending along the patch! It'd be my preference to keep the AlignmentRecordConverter class private (so that we can change underlying interfaces, etc, without breaking user code), so I'm wondering if there might be another way that we could expose this functionality. Are you all looking for the ability to convert an RDD of SAMRecord(Writable) to AlignmentRecords? If so, I think it would be preferable to expose this conversion from inside of the org.bdgenomics.adam.rdd package. I'll take a gander over the GATK code and see what I find. That being said, if there's no way around making AlignmentRecordConverter public, we'll go with that route.

Member

fnothaft commented Sep 9, 2016

Hi @tomwhite! Thanks for sending along the patch! It'd be my preference to keep the AlignmentRecordConverter class private (so that we can change underlying interfaces, etc, without breaking user code), so I'm wondering if there might be another way that we could expose this functionality. Are you all looking for the ability to convert an RDD of SAMRecord(Writable) to AlignmentRecords? If so, I think it would be preferable to expose this conversion from inside of the org.bdgenomics.adam.rdd package. I'll take a gander over the GATK code and see what I find. That being said, if there's no way around making AlignmentRecordConverter public, we'll go with that route.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Sep 12, 2016

Member

I believe early on the formats schema and autogenerated classes were meant as a point of collaboration but now I see them more as an internal implementation detail. Thus I'm in agreement that public APIs for conversion at the level of RDDs seem more appropriate.

Member

heuermh commented Sep 12, 2016

I believe early on the formats schema and autogenerated classes were meant as a point of collaboration but now I see them more as an internal implementation detail. Thus I'm in agreement that public APIs for conversion at the level of RDDs seem more appropriate.

@tomwhite

This comment has been minimized.

Show comment
Hide comment
@tomwhite

tomwhite Sep 12, 2016

Member

AlignmentRecord is still a public class though, so it would be useful to have a utility to convert to and from SAMRecord. SAMRecordConverter is still public, so it's odd that its counterpart AlignmentRecordConverter is not. Perhaps the private methods on AlignmentRecordConverter could be moved elsewhere?

Also, I think it's more useful to have the conversion utility at the record level, rather than the RDD level, since then it can be used outside Spark. Also, there are some issues to do with how headers are transmitted that can be handled differently by the downstream client.

Member

tomwhite commented Sep 12, 2016

AlignmentRecord is still a public class though, so it would be useful to have a utility to convert to and from SAMRecord. SAMRecordConverter is still public, so it's odd that its counterpart AlignmentRecordConverter is not. Perhaps the private methods on AlignmentRecordConverter could be moved elsewhere?

Also, I think it's more useful to have the conversion utility at the record level, rather than the RDD level, since then it can be used outside Spark. Also, there are some issues to do with how headers are transmitted that can be handled differently by the downstream client.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 12, 2016

Member

I believe early on the formats schema and autogenerated classes were meant as a point of collaboration but now I see them more as an internal implementation detail.

I would strongly disagree here; I still very much so see the schemas as an interchange format.

SAMRecordConverter is still public, so it's odd that its counterpart AlignmentRecordConverter is not.

TBH, this was just an oversight. Both of them were meant to be moved to private when I went through and cleaned up this package a bit ago.

Also, I think it's more useful to have the conversion utility at the record level, rather than the RDD level, since then it can be used outside Spark. Also, there are some issues to do with how headers are transmitted that can be handled differently by the downstream client.

I think these are two reasonable points. I am OK with merging this PR to resolve them. That being said, what I might want to do to follow up is to:

  • Do a review pass and make some of the AlignmentRecordConverter methods private
  • Make SAMFileHeaderWritable private to ADAM, and change the AlignmentRecordConverter.convert signature to take a SAMFileHeader

I'm going to leave this open for another 24hrs, and unless there is a -1 I will merge and create a ticket for my two bullet points.

Member

fnothaft commented Sep 12, 2016

I believe early on the formats schema and autogenerated classes were meant as a point of collaboration but now I see them more as an internal implementation detail.

I would strongly disagree here; I still very much so see the schemas as an interchange format.

SAMRecordConverter is still public, so it's odd that its counterpart AlignmentRecordConverter is not.

TBH, this was just an oversight. Both of them were meant to be moved to private when I went through and cleaned up this package a bit ago.

Also, I think it's more useful to have the conversion utility at the record level, rather than the RDD level, since then it can be used outside Spark. Also, there are some issues to do with how headers are transmitted that can be handled differently by the downstream client.

I think these are two reasonable points. I am OK with merging this PR to resolve them. That being said, what I might want to do to follow up is to:

  • Do a review pass and make some of the AlignmentRecordConverter methods private
  • Make SAMFileHeaderWritable private to ADAM, and change the AlignmentRecordConverter.convert signature to take a SAMFileHeader

I'm going to leave this open for another 24hrs, and unless there is a -1 I will merge and create a ticket for my two bullet points.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 13, 2016

Member

I didn't hear any more peeps, so I've merged this as fd2c27b. Thanks, @tomwhite!

Member

fnothaft commented Sep 13, 2016

I didn't hear any more peeps, so I've merged this as fd2c27b. Thanks, @tomwhite!

@fnothaft fnothaft closed this Sep 13, 2016

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Sep 13, 2016

Member

@fnothaft please do create a follow up issue as you describe above.

Also, I think it's more useful to have the conversion utility at the record level, rather than the RDD level, since then it can be used outside Spark.

If that were the case, it would be preferable to me to have the conversion stuff written in Java instead of Scala and be a library external to ADAM.

It appears though that there are different ideas of what our bdg-formats are for. :)

Member

heuermh commented Sep 13, 2016

@fnothaft please do create a follow up issue as you describe above.

Also, I think it's more useful to have the conversion utility at the record level, rather than the RDD level, since then it can be used outside Spark.

If that were the case, it would be preferable to me to have the conversion stuff written in Java instead of Scala and be a library external to ADAM.

It appears though that there are different ideas of what our bdg-formats are for. :)

@tomwhite

This comment has been minimized.

Show comment
Hide comment
@tomwhite

tomwhite Sep 15, 2016

Member

Thanks @fnothaft! I agree with your suggested cleanups.

Member

tomwhite commented Sep 15, 2016

Thanks @fnothaft! I agree with your suggested cleanups.

@tomwhite tomwhite deleted the tomwhite:public_alignment_record_converter branch Sep 15, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment