Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BDG-FORMATS-29] Re-organize the Feature schema #30

Merged
merged 1 commit into from
Oct 7, 2014

Conversation

tdanford
Copy link
Contributor

This is attempting to re-organize the Feature schema, along the
discussions that we (Timothy, Uri, Matt, Frank) have had in email and on
the phone. The main requirements are:

  • less file-format dependence in the field choice ('qValue'-like fields
    could be relegated to the 'attributes' field)
  • fewer fields to improve the memory footprint

@tdanford tdanford changed the title [FORMATS-92] Re-organize the Feature schema [FORMATS-29] Re-organize the Feature schema Sep 15, 2014
@tdanford
Copy link
Contributor Author

This PR reflects a fix for Issue #29


Key is database name and value is the accession.
*/
map<string> dbxrefs = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last time, you mentioned the possibility of multiple accessions in a single database. Does this reflect your resolution of this potential issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! This is meant to be an ID->DB map, but I haven't done any downstream testing yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ever possible that an object can have one accession in multiple databases? I don't know if we're talking about a 7-sigma issue here, but is it worth going back to your original proposal of having an array of Dbxref objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, and ID->DB (as opposed to DB->ID) mapping would handle "multiple acc's in one DB," right? I'm agnostic either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean, what if the same accession is used in multiple databases? Does
this ever happen?
On Sep 15, 2014 12:39 PM, "Timothy Danford" notifications@github.com
wrote:

In src/main/resources/avro/bdg.avdl:

  • union { null, double } signalValue = null;
  • union { null, double } pValue = null;
  • union { null, double } qValue = null;
  • union { null, long } peak = null;
  • /**
  • The value associated with this feature (if double)
  • */
  • union { null, double } value = null;
  • /**
  • Cross-references into other databases.
  • Key is database name and value is the accession.
  • */
  • map dbxrefs = null;

Well, and ID->DB (as opposed to DB->ID) mapping would handle "multiple
acc's in one DB," right? I'm agnostic either way.


Reply to this email directly or view it on GitHub
https://github.com/bigdatagenomics/bdg-formats/pull/30/files#r17563834.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Um. Yeah, maybe. "7-sigma," like you said, but we should probably plan for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we go back to your initial suggestion then? Of an array of Dbxref objects?

@laserson
Copy link
Contributor

+1 overall

@tdanford tdanford force-pushed the revised-feature branch 3 times, most recently from c39a1c2 to 9020236 Compare September 15, 2014 18:04
@tdanford tdanford changed the title [FORMATS-29] Re-organize the Feature schema [BDG-FORMATS-29] Re-organize the Feature schema Sep 15, 2014
@fnothaft
Copy link
Member

@tdanford would you like to get this in before I cut a new bdg-formats release?

@tdanford
Copy link
Contributor Author

I don't think it's necessary, no.

@fnothaft
Copy link
Member

OK, thanks. I'll cut the release now.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/bdg-formats-prb/44/

@tdanford
Copy link
Contributor Author

tdanford commented Oct 7, 2014

Should this be rebased down before merging?

ANSWER: YES.

Let me do that, real quick.

@fnothaft
Copy link
Member

fnothaft commented Oct 7, 2014

Yes, please squash.

This is attempting to re-organize the Feature schema, along the
discussions that we (Timothy, Uri, Matt, Frank) have had in email and on
the phone.  The main requirements are:
* less file-format dependence in the field choice ('qValue'-like fields
  could be relegated to the 'attributes' field)
* fewer fields to improve the memory footprint
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/bdg-formats-prb/45/Test PASSed.

fnothaft added a commit that referenced this pull request Oct 7, 2014
[BDG-FORMATS-29] Re-organize the Feature schema
@fnothaft fnothaft merged commit 4f86ad1 into bigdatagenomics:master Oct 7, 2014
@fnothaft
Copy link
Member

fnothaft commented Oct 7, 2014

Merged! Thanks @tdanford!

@tdanford tdanford deleted the revised-feature branch October 9, 2014 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants