Format: Are the nulls bits 0 or 1 for null values? #15479

asfimport · 2016-03-11T00:24:56Z

As brought up by Dan Robinson on the mailing list (thank you for catching this!), there is an inconsistency in the format documents in the representation of nulls with the ValueVectors code import – since I drafted these format documents initially I'll take the blame for the inconsistency, but:

Drill / ValueVectors uses the value 0 for null data, and 1 for non-null data
The format document currently states the opposite (values are null if the bit is set)

I can see arguments both ways, but one argument for the ValueVectors style is that values must be explicitly set to be non-null, versus uninitialized values being accidentally interpreted as being non-null. When initializing a bitmap, one can memset the bits to 0, then set then to 1 when non-null values are appended during construction.

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

_{Note: This issue was originally created as ARROW-62. Please see the migration documentation for further details.}

The text was updated successfully, but these errors were encountered:

asfimport · 2016-03-13T20:34:10Z

Jacques Nadeau / @jacques-n:
I consider the bitmap to be a validity map as opposed to a null map. I've also seen a couple places where it is nice to zero out values that are null using the zero in the bitmap without a condition... although I can't remember where we took advantage of this previously.

asfimport · 2016-03-14T01:22:55Z

Dan Robinson / @danrobinson:
Being able to bitwise-& against the null bitmask definitely seems nice, although (returning to my other idea from the e-mail list) if the spec required values in nulled slots to be zeroed out, you wouldn't even have to do this.

asfimport · 2016-03-14T03:33:08Z

Dan Robinson / @danrobinson:
For whatever it's worth: it seems PostgreSQL uses 0 in a null bitmap to indicate null values (http://www.postgresql.org/docs/8.0/static/storage-page-layout.html) while MySQL and SQL Server use 1 (https://dev.mysql.com/doc/internals/en/null-bitmap.html, http://www.sqlpassion.at/archive/2011/06/29/the-mystery-of-the-null-bitmap-mask/). And of course Drill uses 0, while Numpy uses 1. So there does not seem to be an established convention yet. IMHO I guess I think the validity-map approach that uses 0 is a little more elegant.

asfimport · 2016-03-14T03:50:43Z

Wes McKinney / @wesm:
Since we already have production code (i.e. Drill) using 0 as null, and it's consistent with Postgres, I'm inclined to stick with that.

I expect that the null bitmap will also be used in practice in conjunction with evaluated predicates, so in aggregations you will include values that are included and not null. If nulls are 1, then you need to use included[i] & ~nulls[i] versus included[i] & valid[i]

asfimport · 2016-03-23T02:17:12Z

Wes McKinney / @wesm:
see patch #34

asfimport · 2016-03-25T02:20:05Z

Wes McKinney / @wesm:
Issue resolved by pull request 34
#34

asfimport closed this as completed Mar 25, 2016

asfimport assigned wesm Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Format: Are the nulls bits 0 or 1 for null values? #15479

Format: Are the nulls bits 0 or 1 for null values? #15479

asfimport commented Mar 11, 2016

asfimport commented Mar 13, 2016

asfimport commented Mar 14, 2016

asfimport commented Mar 14, 2016

asfimport commented Mar 14, 2016

asfimport commented Mar 23, 2016

asfimport commented Mar 25, 2016

Format: Are the nulls bits 0 or 1 for null values? #15479

Format: Are the nulls bits 0 or 1 for null values? #15479

Comments

asfimport commented Mar 11, 2016

asfimport commented Mar 13, 2016

asfimport commented Mar 14, 2016

asfimport commented Mar 14, 2016

asfimport commented Mar 14, 2016

asfimport commented Mar 23, 2016

asfimport commented Mar 25, 2016