Type of __time column is determined by RowSignature in case of External Datasource#12770
Type of __time column is determined by RowSignature in case of External Datasource#12770kfaraz merged 7 commits intoapache:masterfrom
Conversation
| // 1. __time in the external data source may map to a different column | ||
| // 2. __time in the external data source might be ignored while ingesting the data | ||
| // 3. Even if __time is of type LONG, it may be used with some function, which would cause the abovementioned Calcite failure | ||
| // The most optimal solution would be to prevent assumption of the __time column having the type SqlTypeName.TIMESTAMP |
There was a problem hiding this comment.
Thanks for looking into this!
Is there a reason why we cannot do the more optimal solution now? Something like: for __time, have RowSignatures.toRelDataType return TIMESTAMP for regular Druid tables, and the type from the provided signature otherwise. To get the information through, we could add a parameter to toRelDataType like boolean isDruidTable.
There was a problem hiding this comment.
Let me take a go through this, but it should be possible. The thing preventing me from doing it was that I was unable to understand the reason for ignoring the RowSignature and casting the __time column to type TIMESTAMP.
There was a problem hiding this comment.
The idea was that we don't want to use the default conversion for regular Druid tables, since that would be BIGINT, and we actually want TIMESTAMP. This rationale only really applies to regular Druid tables.
…xternal datasource
|
Thanks for the comment @gianm. I have updated the approach in the PR to determine the Rel type from the RowSignature in case of ExternalDataSource. |
| { | ||
| return RowSignatures.toRelDataType(rowSignature, typeFactory); | ||
| // For external datasources, the row type should be determined by whatever the row signature has been explicitly | ||
| // passed in. Typecasting directly to SqlTypeName.TIMESTAMP will lead to inconsistencies with the Calcite functions |
There was a problem hiding this comment.
I think that for future reader, it will be good to clarify with an example of inconsistency.
There was a problem hiding this comment.
are these changes still needed?
There was a problem hiding this comment.
Yes, I think so. This is because this change is required for the Calcite layer. The two use cases that I am thinking of right now:
- Let's say we have a function FUNC(a) that accepts long. If the user passes a
__timecolumn of type long, this would be typecasted to SqlTypeName.TIMESTAMP before passing to FUNC(a). This will cause a signature mismatch, even though the query would have produced correct results downstream. - In the future, when the typecasting with the cursor is improved to accept columns of type other than LONG, this would ensure that the explicit type passed by the user is respected.
|
LGTM as well!! |
|
Please hold off merging this PR for now. I was doing my own testing, and it doesn't seem to work as expected. |
|
Reverted the PR to prevent using
|
| // of inconsistency is that functions such as TIME_PARSE evaluate incorrectly | ||
| Optional<ColumnType> timestampColumnTypeOptional = signature.getColumnType(ColumnHolder.TIME_COLUMN_NAME); | ||
| if (timestampColumnTypeOptional.isPresent() && !timestampColumnTypeOptional.get().equals(ColumnType.LONG)) { | ||
| throw new ISE("Unable to use EXTERN function with data containing a __time column of any type other than long"); |
There was a problem hiding this comment.
| throw new ISE("Unable to use EXTERN function with data containing a __time column of any type other than long"); | |
| throw new ISE("EXTERN function with __time column can be used when __time column is of type long. Please change the column name to something other than __time "); |
cryptoe
left a comment
There was a problem hiding this comment.
Minor exception message changes.
LGTM
| // of inconsistency is that functions such as TIME_PARSE evaluate incorrectly | ||
| Optional<ColumnType> timestampColumnTypeOptional = signature.getColumnType(ColumnHolder.TIME_COLUMN_NAME); | ||
| if (timestampColumnTypeOptional.isPresent() && !timestampColumnTypeOptional.get().equals(ColumnType.LONG)) { | ||
| throw new ISE("EXTERN function with __time column can be used when __time column is of type long. " |
There was a problem hiding this comment.
you mean this, right?
| throw new ISE("EXTERN function with __time column can be used when __time column is of type long. " | |
| throw new ISE("EXTERN function with __time column can be used when __time column is not of type long. " |
There was a problem hiding this comment.
I think the original one should be correct. We are only allowing EXTERN function with __time columns iff they are of type long. Because of the special handling of __time column in the cursors, the forced typcasting would produce incorrect results if it is not of type long.
There was a problem hiding this comment.
Exception is thrown when type(__time)!=long. Hence either we use double negation or let the message be as it is no ?
|
Merged as build failure was unrelated. |
|
A bit late to the party on this one. Hit this when merging code. I'm not sure the fixes are quite right. The fix in Then, in The proper place for this kind of check is in the validator. But, since The result is that I should be able to have a CSV file with a INSERT INTO foo
SELECT TIME_PARSE("__time") AS __time, ...
FROM TABLE(...)To be clear how SQL works: the first Looks like the code is trying to guess when to cast If we did want to do this, the function to get the row type, |
|
Thanks for the comment @paul-rogers. The original behavior of the code (Calcite layer) was to cast any column with the name INSERT INTO foo
SELECT TIME_PARSE("__time") AS __time, ...
FROM TABLE(...)is that even this would fail because |
Description
Queries like
would fail at the Calcite layer. This is because any column with name __time is considered to be of type
SqlTypeName.TIMESTAMP(code) without consideration for the RowSignature that is passed (i.e. the type that is getting passed inEXTERN). This messes up functions like TIME_PARSE which are expected to work with a certain signature.This PR modifies
RowSignatures.toRelDataType()so that the type of__timecolumn is determined by the RowSignature's type.`This PR has: