-
Notifications
You must be signed in to change notification settings - Fork 13
Store all variant headers and also combine them for an overall header #301
Conversation
very interesting that clang failed but gcc passed, any idea what's going on? |
I can't reproduce using clang on Mavericks, and I don't have a working clang environment on a gsa host. Any ideas on how to solve this? |
Annoying... :-/ We need to ask help to install clang 3.5 on the server. Can you ask them? |
I asked last week. No response yet. BITS ticket is INC0037669. |
cool , you can always raise the priority by escalating it to Chris Dwan's if you link your bug and mention that your work is stalled because of this, Another good thing we could do would be to ask IT for a Ubuntu 12.04 VM, so |
@MauricioCarneiro I've been using vagrant to spin up ubuntu vms locally. It's easy and has the advantage of being able to be restarted from a clean state with a button push. I couldn't get it to be exactly like whatever the travis vm is though. I had a number of builds that went fine on my local ubuntu 12.04 vm and then failed on travis. |
@droazen did you pick this one up? |
I have poked help about it and asked them to prioritize the ticket. |
clang 3.5 now available as a dotkit. |
take care of bcf_*_vector_end values implicitly A more serious bug in the previous implementation of the == operator was in the way string fields were handled. Points to remember for string fields: 1. The size() function returns the number of characters allocated for the string. 2. The variable m_num_bytes also stores the number of characters allocated for the string (same as size()) 3. When operator[i] is called for string fields, the address that is accessed is m_format_ptr->p + i*m_num_bytes (not m_format_ptr->p + i). The address accessed is outside the original allocated memory. Moral of the story: use iterators always, or remember the intricacies of string v/s other types of fields. Similar fix for SharedField also
A fix for the clang issues (which were actually undefined behavior resulting from bugs in gamgee) has been worked out in #316 -- we're going to apply the fix to this jt_mvr_fixes branch so that this pull request can finally be reviewed properly and merged. |
take care of bcf_*_vector_end values implicitly A more serious bug in the previous implementation of the == operator was in the way string fields were handled. Points to remember for string fields: a. The size() function returns the number of characters allocated for the string. b. The variable m_num_bytes also stores the number of characters allocated for the string (same as size()) c. When operator[i] is called for string fields, the address that is accessed is m_format_ptr->p + i*m_num_bytes (not m_format_ptr->p + i). The address accessed is outside the original allocated memory. Moral of the story: use iterators always, or remember the intricacies of string v/s other types of fields. Similar fix for SharedField also 2. Use references to avoid copies where possible as per David's suggestions 3. Specialized operator[] for IndividualFieldValue<string> and SharedField<string> so that if the original VCF field type is of type String, the index argument for operator [] cannot be greater than 0. If it is greater than 0, then an out_of_range exception is thrown. 3. Added extra out of bounds test cases (by catching exceptions) 4. Fixed a memory leak in file_utils.h
0f59d8c
to
37d7964
Compare
Moved the commit for the clang fix to Joel's branch |
for (auto i=0u; i != size(); ++i) { | ||
if (!utils::bcf_check_equal_primitive(operator[](i), other[i])) | ||
return false; | ||
//Use iterators where possible as they take care of field sizes, bcf_*_vector_end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't we just check that end() - begin() == other.end() - other.begin()
?
This would save us iterating over if the sizes are mismatched
I forgot that in the new implementation of the iterator, once it sees a vector_end it jumps to the end(). You are right, this is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to add a comment explaining this rather unexpected behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
David and I discussed this, the problem occurs for variable length vector
fields with a bcf_vector_end value marking the end of the vector. The
vector capacity might be different for the 2 vectors being compared, but
the number of valid elements (including bcf_vector_end) could be the same.
-Karthik
On Sun, Oct 26, 2014 at 2:19 PM, Mauricio Carneiro <notifications@github.com
wrote:
In gamgee/individual_field_value.h:
@@ -110,12 +110,20 @@ class IndividualFieldValue {
bool operator==(const IndividualFieldValue& other) const {
if (this == &other)
return true;
- if (size() != other.size())
return false;
- for (auto i=0u; i != size(); ++i) {
if (!utils::bcf_check_equal_primitive(operator[](i), other[i]))
return false;
- //Use iterators where possible as they take care of field sizes, bcf_*_vector_end
couldn't we just check that end() - begin() == other.end() - other.begin()
?This would save us iterating over if the sizes are mismatched
—
Reply to this email directly or view it on GitHub
https://github.com/broadinstitute/gamgee/pull/301/files#r19383832.
Good to merge! |
Store all variant headers and also combine them for an overall header
Before: the first variant file's header was used for all variants in MultipleVariantReader/Iterator
Now: each variant returned by MultipleVariantIterator is associated with its source header, and MultipleVariantReader::combined_header() returns the combination of all input headers.