Skip to content

[BEAM-6604] Check for null values when encoding Key columns.#7747

Merged
chamikaramj merged 1 commit intoapache:masterfrom
nielm:nielmbeam
Feb 8, 2019
Merged

[BEAM-6604] Check for null values when encoding Key columns.#7747
chamikaramj merged 1 commit intoapache:masterfrom
nielm:nielmbeam

Conversation

@nielm
Copy link
Contributor

@nielm nielm commented Feb 6, 2019

When encoding Key Column values, if the column value is unspecified, assume that the value is null.

This corrects an NPE when encoding keys.
@chamikaramj

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
Build Status --- --- ---

for (SpannerSchema.KeyPart part : schema.getKeyParts(m.getTable())) {
Value val = mutationMap.get(part.getField());
if (val.isNull()) {
if (val == null || val.isNull()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically we consider all fields that are omitted as NULL ? Isn't it better to ask users to explicitly set these values to NULL ? I'm worried if this will result in NULL values being inserted due to bugs in user code.

Another option might be to make this more explicit, for example, by adding a transform builder method withNullAsDefault().

Also, looks like Spanner supports NOT NULL columns which can result in a confusing behavior in-combination with a default non-explicit NULL from Beam.

WDYT ?

Copy link
Contributor Author

@nielm nielm Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the behavior of a normal insert statement that omits a column value:

CREATE TABLE test (
    key1 STRING(MAX), 
    key2 STRING(max)) 
    PRIMARY KEY (key1, key2)
INSERT INTO test 
    (key1) 
    VALUES ("a")

This will result in a row with values ("a", <NULL>)
So assuming a missed value == NULL is the same behavior as the database.

Note also that this code is only concerned with Primary Key columns, and the encoded value is only used for sorting mutations in a bundle by table and primary key -- the value encoded here is never written to the database.

NOT NULL constraints can be set on any column. If the Mutation is an insert and does not specify a column which is NOT NULL (or sets it to NULL), then the insert will fail, and the Mutation appended to a PCollection of failed MutationGroups which is output by the SpannerIO.Write transform.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Thanks for clarifying.

When encoding Key Column values, if the column value is
unspecified, assume that the value is null.
@chamikaramj
Copy link
Contributor

Run Beam PostCommit

@chamikaramj
Copy link
Contributor

Run Java PostCommit

@chamikaramj
Copy link
Contributor

LGTM. Will merge after post-commit tests pass.

@chamikaramj chamikaramj merged commit 1512551 into apache:master Feb 8, 2019
@nielm nielm changed the title [BEAM-4359] Check for null values when encoding Key columns. [BEAM-6604] Check for null values when encoding Key columns. Feb 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants