Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improved NULL handling #4869

Closed
wants to merge 3 commits into from

Conversation

big-andy-coates
Copy link
Contributor

@big-andy-coates big-andy-coates commented Mar 23, 2020

Description

fixes: #4912

fixes a few issues with NULLs in ksqlDB. You can now:

  • insert NULL values using INSERT INTO, for example INSERT INTO foo VALUES (NULL, NULL);.
  • create MAPs with null values, for example MAP('k0' := NULL, 'k1' := 10). (At least one key needs to be non-null or CAST so that ksqlDB can determine the schema of the map).
  • CAST NULL to any type, (or itself), including the complex types ARRAY, MAP and STRUCT.

Reviewing notes

FYI, I've noticed a few times recently that our NULL handling wasn't great. This came to ahead when I was trying to test #4867 and couldn't insert NULLs. I fixed INSERT VALUES to I could test, and then finished off with some more stuff.

Previously ExpressionTypeManager would return Java null for SQL NULL. I've seen this result in NPEs if NULL is used in expressions and have fixed a few of these.

With this commit, I've introduced a NULL SqlType and fixed up any code that was handling null to now handle NULL.

Testing done

Usual

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

fixes a few issues with NULLs in ksqlDB. You can now:

 - insert NULL values using `INSERT INTO`, for example `INSERT INTO foo VALUES (NULL, NULL);`.
 - create MAPs with null values, for example `MAP('k0' := NULL, 'k1' := 10)`.  (At least one key needs to be non-null or CAST so that ksqlDB can determine the schema of the map).
 - CAST NULL to any type, (or itself), including the complex types `ARRAY`, `MAP` and `STRUCT`.
@big-andy-coates big-andy-coates requested a review from a team as a code owner March 23, 2020 16:16
Conflicting files
ksqldb-functional-tests/src/test/resources/query-validation-tests/null.json
Conflicting files
ksqldb-engine/src/main/java/io/confluent/ksql/engine/InsertValuesExecutor.java
ksqldb-functional-tests/src/test/resources/query-validation-tests/null.json
import io.confluent.ksql.schema.ksql.FormatOptions;
import io.confluent.ksql.schema.ksql.SqlBaseType;

public final class SqlNull extends SqlType {
Copy link
Contributor

@purplefox purplefox Mar 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @big-andy-coates
I'm sure there's a good reason for it, but I am curious to understand why null is being modelled as a SqlType. I would have thought that null is a value that is valid for any SqlType, not a SqlType itself. E.g. if there is null in a column, then the column does not have a type of SqlNull, it would have a type of whatever the column type is (E.g. SqlInteger) and the value would be null...

@big-andy-coates big-andy-coates requested a review from a team March 25, 2020 12:59
@big-andy-coates
Copy link
Contributor Author

big-andy-coates commented Mar 25, 2020

I hear what you're saying. I'm also not 100% sure this is the right approach, but I think it works.

There are times where we don't know the type. Consider:

SELECT ID, NULL FROM FOO;

What's the type of the second column?

Or

SELECT SOME_FUNC(ID, NAME, NULL) FROM FOO;

What's the type of the third param passed to SOME_FUNC?

Or, the original reason why I was working on this:

INSERT INTO FOO VALUES (1, NULL, 'NAME', NULL);

What's the type of the second and forth values being inserted?
(Yes, we can infer the types by looking at the schema of FOO, we the arguments themselves still have a type and then we see if we can coerce the argument types to the column types).

These are all valid SQL and our type system needs to support them. Hence why I'm adding a schema type for NULL: a SQL NULL has type of SqlNull and we can then have logic to handle that.

Thoughts?

@purplefox
Copy link
Contributor

I hear what you're saying. I'm also not sure this is the right approach.

However, there are times where we don't know the type. Consider:

SELECT ID, NULL FROM FOO;

What's the type of the second column?

In this case the type is unknown so we could model that as an explicit type like SqlUnknown, or we could just choose another type (e.g SqlInteger), it probably doesn't matter.

I googled this https://dba.stackexchange.com/questions/228046/why-do-i-need-to-cast-null-to-column-type on how Postgres handles this case, which was quite interesting.

Or

SELECT SOME_FUNC(ID, NAME, NULL) FROM FOO;

What's the type of the third param passed to SOME_FUNC?

I think we can infer this from the signature(s) of some_func.

If there are multiple overloaded versions of SOME_FUNC, could just choose any one that matches, unless the 3rd param is explicitly cast to a type. (Similar to how we do in Java when selecting an overloaded method with a null param: int x = someFunc(id, name, (String)null);

Or, the original reason why I was working on this:

INSERT INTO FOO VALUES (1, NULL, 'NAME', NULL);

(though in this case we could infer the type of the second and fourth values).

Yes, I think we know the types here.

@big-andy-coates
Copy link
Contributor Author

big-andy-coates commented Mar 25, 2020

I think we can infer this from the signature(s) of some_func.

Sure, but the code that does this needs a list of the types of the parameters being passed. That's how we match it to a function signature. SqlNull can map to any NULLABLE type, (though not any NON NULL type.

Is SqlUnknown any different to SqlNull? Happy to switch the name to Unknown, though I actually think internally SqlNull is more correct due to the above function matching differentiating between NULLABLE and NON NULLABLE argument types.

@purplefox
Copy link
Contributor

Is SqlUnknown any different to SqlNull? Happy to switch the name to Unknown, though I actually think internally SqlNull is more correct due to the above function matching differentiating between NULLABLE and NON NULLABLE argument types.

Naming is hard ;)
Personally I would prefer SqlUnknown as null as is not a type, but it's probably not too important. Take your pick.

@big-andy-coates
Copy link
Contributor Author

I'd prefer to stick with SqlNull as I think it more closely imparts what it means. It will only be used for the type of a NULL. SqlUnknown wouldn't work well with parameter matching in the UDF framework, as it would be weird that SqlUnknown does match a NULLABLE parameter, but not a NON NULL parameter.

@big-andy-coates
Copy link
Contributor Author

Waiting on a second opinion from Almog as, like I said, I'm still not convinced by this approach. I'm just not sure of a better approach. I agree NULL is not really a type.

@big-andy-coates
Copy link
Contributor Author

Closing in favour of #5019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can not use NULLs in INSERT VALUES
2 participants