Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Store all variant headers and also combine them for an overall header #301

Merged
merged 2 commits into from
Oct 26, 2014

Conversation

jmthibault79
Copy link
Contributor

Before: the first variant file's header was used for all variants in MultipleVariantReader/Iterator

Now: each variant returned by MultipleVariantIterator is associated with its source header, and MultipleVariantReader::combined_header() returns the combination of all input headers.

@MauricioCarneiro
Copy link
Contributor

very interesting that clang failed but gcc passed, any idea what's going on?

@jmthibault79
Copy link
Contributor Author

I can't reproduce using clang on Mavericks, and I don't have a working clang environment on a gsa host. Any ideas on how to solve this?

@MauricioCarneiro
Copy link
Contributor

Annoying... :-/

We need to ask help to install clang 3.5 on the server. Can you ask them?
There are packages for all systems.

@jmthibault79
Copy link
Contributor Author

I asked last week. No response yet. BITS ticket is INC0037669.

@MauricioCarneiro
Copy link
Contributor

cool , you can always raise the priority by escalating it to Chris Dwan's
attention.

if you link your bug and mention that your work is stalled because of this,
they will prioritize it accordingly.

Another good thing we could do would be to ask IT for a Ubuntu 12.04 VM, so
we can have a "mimic'd travis environment" to test things out. We can't
always ask Travis for a VM.

@lbergelson
Copy link
Member

@MauricioCarneiro I've been using vagrant to spin up ubuntu vms locally. It's easy and has the advantage of being able to be restarted from a clean state with a button push. I couldn't get it to be exactly like whatever the travis vm is though. I had a number of builds that went fine on my local ubuntu 12.04 vm and then failed on travis.

@MauricioCarneiro
Copy link
Contributor

@droazen did you pick this one up?

@droazen
Copy link
Contributor

droazen commented Oct 15, 2014

I have poked help about it and asked them to prioritize the ticket.

@droazen
Copy link
Contributor

droazen commented Oct 17, 2014

clang 3.5 now available as a dotkit.

@kgururaj kgururaj mentioned this pull request Oct 21, 2014
droazen referenced this pull request Oct 21, 2014
take care of bcf_*_vector_end values implicitly
A more serious bug in the previous implementation of the == operator was
in the way string fields were handled. Points to remember for string
fields:
1. The size() function returns the number of characters allocated for the string.
2. The variable m_num_bytes also stores the number of characters
allocated for the string (same as size())
3. When operator[i] is called for string fields, the address that is
accessed is m_format_ptr->p + i*m_num_bytes (not m_format_ptr->p + i).
The address accessed is outside the original allocated memory.

Moral of the story: use iterators always, or remember the intricacies of
string v/s other types of fields.

Similar fix for SharedField also
@droazen
Copy link
Contributor

droazen commented Oct 23, 2014

A fix for the clang issues (which were actually undefined behavior resulting from bugs in gamgee) has been worked out in #316 -- we're going to apply the fix to this jt_mvr_fixes branch so that this pull request can finally be reviewed properly and merged.

take care of bcf_*_vector_end values implicitly
A more serious bug in the previous implementation of the == operator was
in the way string fields were handled. Points to remember for string
fields:
a. The size() function returns the number of characters allocated for the string.
b. The variable m_num_bytes also stores the number of characters
allocated for the string (same as size())
c. When operator[i] is called for string fields, the address that is
accessed is m_format_ptr->p + i*m_num_bytes (not m_format_ptr->p + i).
The address accessed is outside the original allocated memory.

Moral of the story: use iterators always, or remember the intricacies of
string v/s other types of fields.

Similar fix for SharedField also

2. Use references to avoid copies where possible as per David's
suggestions
3. Specialized operator[] for IndividualFieldValue<string> and
SharedField<string> so that if the original VCF field type is of type
String, the index argument for operator [] cannot be greater than 0. If
it is greater than 0, then an out_of_range exception is thrown.

3. Added extra out of bounds test cases (by catching exceptions)

4. Fixed a memory leak in file_utils.h
@kgururaj
Copy link
Collaborator

Moved the commit for the clang fix to Joel's branch

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.12%) when pulling 37d7964 on jt_mvr_fixes into 7bdce8e on master.

for (auto i=0u; i != size(); ++i) {
if (!utils::bcf_check_equal_primitive(operator[](i), other[i]))
return false;
//Use iterators where possible as they take care of field sizes, bcf_*_vector_end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't we just check that end() - begin() == other.end() - other.begin() ?

This would save us iterating over if the sizes are mismatched

I forgot that in the new implementation of the iterator, once it sees a vector_end it jumps to the end(). You are right, this is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to add a comment explaining this rather unexpected behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

David and I discussed this, the problem occurs for variable length vector
fields with a bcf_vector_end value marking the end of the vector. The
vector capacity might be different for the 2 vectors being compared, but
the number of valid elements (including bcf_vector_end) could be the same.
-Karthik

On Sun, Oct 26, 2014 at 2:19 PM, Mauricio Carneiro <notifications@github.com

wrote:

In gamgee/individual_field_value.h:

@@ -110,12 +110,20 @@ class IndividualFieldValue {
bool operator==(const IndividualFieldValue& other) const {
if (this == &other)
return true;

  • if (size() != other.size())
  •  return false;
    
  • for (auto i=0u; i != size(); ++i) {
  •  if (!utils::bcf_check_equal_primitive(operator[](i), other[i]))
    
  •    return false;
    
  • //Use iterators where possible as they take care of field sizes, bcf_*_vector_end

couldn't we just check that end() - begin() == other.end() - other.begin()
?

This would save us iterating over if the sizes are mismatched


Reply to this email directly or view it on GitHub
https://github.com/broadinstitute/gamgee/pull/301/files#r19383832.

@MauricioCarneiro
Copy link
Contributor

Good to merge!

MauricioCarneiro pushed a commit that referenced this pull request Oct 26, 2014
Store all variant headers and also combine them for an overall header
@MauricioCarneiro MauricioCarneiro merged commit d92bf08 into master Oct 26, 2014
@MauricioCarneiro MauricioCarneiro deleted the jt_mvr_fixes branch October 26, 2014 21:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants