Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve handling of NULLs #5019

Merged
merged 7 commits into from Apr 14, 2020

Conversation

big-andy-coates
Copy link
Contributor

@big-andy-coates big-andy-coates commented Apr 7, 2020

Description

This PR replaces #4869. It has many of the same changes, but not SqlNull type, and more fixes on top.

fixes: #4912 and may other bugs around use of NULL, you can now:

  • insert NULL values using INSERT INTO, for example INSERT INTO foo VALUES (NULL, NULL);, including use of ARRAY, MAP & STRUCT constructors with NULLs.
  • create MAPs with null keys and values, for example MAP('k0' := NULL, CAST(NULL AS STRING) := 10). (At least one key needs to be non-null or CAST so that ksqlDB can determine the schema of the map).
  • CAST NULL to any type, including the complex types ARRAY, MAP and STRUCT.

Also, better error messages, on:

  • Pull query on NULL key. Previously, blew up for non-string keys.
  • Pull query with non-literal expression for the key. Previously blew up.

Testing done

usual.

Reviewing notes:

Commits are broken down as follows:

  1. Production changes. I've added notes inline where I thought it might help.
  2. Remove some unused functions in production code.
  3. Test code updates (including (r)QTT).
  4. historical plans

I advise reviewing each commit separately. It may help to view the QTT & RQTT test changes in the third commit to understand the problems being solved.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

fixes: confluentinc#4912 and may other bugs around use of NULL.

fixes a few issues with NULLs in ksqlDB. You can now:

 - insert NULL values using `INSERT INTO`, for example `INSERT INTO foo VALUES (NULL, NULL);`, including use of `ARRAY`, `MAP` & `STRUCT` constructors with `NULL`s.
 - create MAPs with null keys and values, for example `MAP('k0' := NULL, CAST(NULL AS STRING) := 10)`.  (At least one key needs to be non-null or CAST so that ksqlDB can determine the schema of the map).
 - CAST NULL to any type, including the complex types `ARRAY`, `MAP` and `STRUCT`.

Also, better error messages, on:

 - Pull query on NULL key.  Previously, blew up for non-string keys.
 - Pull query with non-literal expression for the key. Previously blew up.
@big-andy-coates big-andy-coates requested a review from a team as a code owner April 7, 2020 15:22
@big-andy-coates big-andy-coates mentioned this pull request Apr 7, 2020
2 tasks

});
@Override
public Object visitNullLiteral(final NullLiteral node, final Void context) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids any compiling of code etc to handle a NULL literal.

Comment on lines 115 to 118
if (expressionType != null) {
// expressionType can be null if expression is NULL.
ee.setExpressionType(SQL_TO_JAVA_TYPE_CONVERTER.toJavaType(expressionType));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids an NPE. If not called the return value is left at the default Object, which is fine for nulls.

Comment on lines +839 to +841
final StringBuilder map = new StringBuilder("new MapBuilder(");
map.append(exp.getMap().size());
map.append((')'));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch to a new builder type that won't throw on null keys or values.

Comment on lines +932 to +940
final SqlType sourceType = expr.getRight();
if (sourceType == null || sqlType.equals(sourceType)) {
// sourceType is null if source is SQL NULL
return new Pair<>(expr.getLeft(), sqlType);
}

return CASTERS.getOrDefault(
sqlType.baseType(),
(e, t, r) -> {
throw new KsqlException("Invalid cast operation: " + t);
}
)
.cast(expr, sqlType, sqlType);
return CASTERS.getOrDefault(sqlType.baseType(), CastVisitor::unsupportedCast)
.cast(expr, sqlType);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getCast no longer calls sqlType.supportsCast as it is this the getCast method that determines which casts are supported and which are not. Hence supportsCast is superfluous and likely to get out of date with what this method supports.

Comment on lines +1008 to +1011
final Pair<String, SqlType> expr, final SqlType returnType
) {
if (!(sqltype instanceof SqlDecimal)) {
throw new KsqlException("Expected decimal type: " + sqltype);
if (!(returnType instanceof SqlDecimal)) {
throw new KsqlException("Expected decimal type: " + returnType);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: these methods were always called with the same param for returnType and sqlType: removed the duplication.

* Used to construct maps using the builder pattern. Note that we cannot use {@link
* com.google.common.collect.ImmutableMap} because it does not accept null values.
*/
public class MapBuilder {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builder of maps with nulls

}

expressionTypeContext.setSqlType(sqlType);
expressionTypeContext.setSqlType(node.getType().getSqlType());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this as its not needed. It's getCast in SqlToJavaVisitor that determines what casts are supported. Removing this check just means it fails, with the same error, slightly later in flow.

}
}).build();

@Override
public Optional<?> coerce(final Object value, final SqlType targetType) {
public Result coerce(final Object value, final SqlType targetType) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored to allow coerce to return a successful coercion with a no value result, i.e. coercion of nulls. Previous return value would not differentiate between a failure and 'no value'

Comment on lines +62 to +65
if (value == null) {
// NULL can be cast to any type:
return Result.nullResult();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look mar, null handling!

*/
Optional<?> coerce(Object value, SqlType targetSchema);
Result coerce(Object value, SqlType targetSchema);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, this change in return value is to allow callers to differentiate between:

  • the coercion failed. Previously this returned Optional.empty and now returns Result.failed.
  • the coercion succeeded and the result was null. Previously, this threw an NPE and now returns Result.nullResult.

Normal results where previously Optional.of(whateva) and are now Result.of(whateva).

Comment on lines +568 to +574
if (!(other instanceof Literal)) {
throw new KsqlException("Ony comparison to literals is currently supported: " + comparison);
}

if (other instanceof NullLiteral) {
throw new KsqlException("Primary key columns can not be NULL: " + comparison);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catches errors on pull queries.

  • first one catches comparison of ROWKEY to non-literal, e.g. a udf call, or some other expression. Previously this would blow up with class cast exception.
  • second one catches comparison with NULL. For STRING keys this previously returned some weird streams error and for non-STRING keys some confusing error about not being able to coerce a STRING to the required key type. (NullLiteral returns "null" from getValue for some weird reason).

Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems surprisingly straightforward 😄 always my favorite types of PRs

@big-andy-coates big-andy-coates merged commit c53dd68 into confluentinc:master Apr 14, 2020
@big-andy-coates big-andy-coates deleted the nulls branch April 14, 2020 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can not use NULLs in INSERT VALUES
2 participants