You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As brought up by Dan Robinson on the mailing list (thank you for catching this!), there is an inconsistency in the format documents in the representation of nulls with the ValueVectors code import – since I drafted these format documents initially I'll take the blame for the inconsistency, but:
Drill / ValueVectors uses the value 0 for null data, and 1 for non-null data
The format document currently states the opposite (values are null if the bit is set)
I can see arguments both ways, but one argument for the ValueVectors style is that values must be explicitly set to be non-null, versus uninitialized values being accidentally interpreted as being non-null. When initializing a bitmap, one can memset the bits to 0, then set then to 1 when non-null values are appended during construction.
Jacques Nadeau / @jacques-n:
I consider the bitmap to be a validity map as opposed to a null map. I've also seen a couple places where it is nice to zero out values that are null using the zero in the bitmap without a condition... although I can't remember where we took advantage of this previously.
Dan Robinson / @danrobinson:
Being able to bitwise-& against the null bitmask definitely seems nice, although (returning to my other idea from the e-mail list) if the spec required values in nulled slots to be zeroed out, you wouldn't even have to do this.
Wes McKinney / @wesm:
Since we already have production code (i.e. Drill) using 0 as null, and it's consistent with Postgres, I'm inclined to stick with that.
I expect that the null bitmap will also be used in practice in conjunction with evaluated predicates, so in aggregations you will include values that are included and not null. If nulls are 1, then you need to use included[i] & ~nulls[i] versus included[i] & valid[i]
As brought up by Dan Robinson on the mailing list (thank you for catching this!), there is an inconsistency in the format documents in the representation of nulls with the ValueVectors code import – since I drafted these format documents initially I'll take the blame for the inconsistency, but:
I can see arguments both ways, but one argument for the ValueVectors style is that values must be explicitly set to be non-null, versus uninitialized values being accidentally interpreted as being non-null. When initializing a bitmap, one can
memset
the bits to 0, then set then to 1 when non-null values are appended during construction.Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm
Note: This issue was originally created as ARROW-62. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: