[CALCITE-6146] Target charset should be used when comparing two strings through CONVERT/TRANSLATE function during validation#3558
Conversation
c9f9b4d to
e851afe
Compare
| return type; | ||
| } | ||
|
|
||
| @Nullable Charset getCharsetAfterConvert(SqlBasicCall call, @Nullable Charset typeCharset) { |
There was a problem hiding this comment.
Are there other functions which should preserve charsets?
For example, if you do string concatenation, or cast from CHAR to VARCHAR?
There was a problem hiding this comment.
As far as I know, only CONVERT and TRANSLATE function explicitly use charset.
In MySQL, CAST function can be used like cast(name as CHAR CHARACTER SET utf8), but we don't support such form currently.
There was a problem hiding this comment.
I was asking about something like CAST(TRANSLATE("name" using BIG5) AS CHAR(5)).
Is the charset of the result of this expression correct?
There was a problem hiding this comment.
The char(5) has the charset ISO-8859-1 preserved in SqlCollation, so the charset of CAST(TRANSLATE("name" using BIG5) AS CHAR(5)) should also be ISO-8859-1.
I add a test for this, please take a look.
There was a problem hiding this comment.
I was asking about something like
CAST(TRANSLATE("name" using BIG5) AS CHAR(5)). Is the charset of the result of this expression correct?
The result charset of TRANSLATE("name" using BIG5) is BIG5, so the cast to CHAR(5) with ISO-8859-1 is not allowed.
| with.query("select \"name\", \"empid\" from \"hr\".\"emps\"\n" | ||
| + "where cast(convert(\"name\" using GBK) as char(5))=_BIG5'Eric'") | ||
| .throws_( | ||
| "Cannot apply = to the two different charsets ISO-8859-1 and Big5"); |
There was a problem hiding this comment.
I know you didn't write this error message, but I think it can be improved.
I would write "Cannot apply operation '=' to strings with different charsets 'ISO-8859-1' and 'Big5'`
There was a problem hiding this comment.
Have you tried a similar test as a SqlOperatorTest as well, just to make sure that it triggers there as well?
I don't think this change is JDBC specific.
There was a problem hiding this comment.
I know you didn't write this error message, but I think it can be improved. I would write "Cannot apply operation '=' to strings with different charsets 'ISO-8859-1' and 'Big5'`
Agreed, and I've changed the message.
There was a problem hiding this comment.
Have you tried a similar test as a SqlOperatorTest as well, just to make sure that it triggers there as well? I don't think this change is JDBC specific.
I've added some tests in SqlOperatorTest, however the tests testStringComparisonWithConvertFunc in JdbcTest still fail.
When creating a new JavaType with different charset(by createTypeWithCharsetAndCollation) during validation, the wrapped charset in this JavaType is overwritten through canonizing process due to the DATATYPE_CACHE.
I'm not sure this is a bug or specific designed, and I think considering the charset for JavaType's(java.lang.String) digest computing might be a way to go.
| if (SqlTypeUtil.inCharFamily(operandType0) | ||
| && SqlTypeUtil.inCharFamily(operandType1)) { | ||
| Charset cs0 = operandType0.getCharset(); | ||
| if (call.operand(0) instanceof SqlBasicCall) { |
There was a problem hiding this comment.
Is this the right place for this check?
Why does the operandType0 have the wrong charset?
Maybe the bug is in the place where the charset for operand0 type was inferred.
There was a problem hiding this comment.
You are right. I've added inferReturnType for both CONVERT and TRANSLATE function.
|
Kudos, SonarCloud Quality Gate passed! |
|
@mihaibudiu I mark the PR to draft, and I'll let you know if anything updated in the next few days. |
9c32a45 to
2951138
Compare
2951138 to
6e9ae7e
Compare
| + "where cast(convert(\"name\" using LATIN1) as char(5))='Eric'") | ||
| .returns("name=Eric; empid=200\n"); | ||
| with.query("select \"name\", \"empid\" from \"hr\".\"emps\"\n" | ||
| + "where cast(convert(\"name\" using GBK) as char(5))='Eric'") |
There was a problem hiding this comment.
why is this the expected result?
There was a problem hiding this comment.
why is this the expected result?
The result of convert(\"name\" using GBK) has GBK charset while CHAR(5) has ISO-8859-1 charset, which is not allowed in SqlCastFunction.
There was a problem hiding this comment.
That is not obvious. I would add that as a comment
There was a problem hiding this comment.
OK, I've added comment in both JdbcTest and SqlOperatorTest
|
|
||
| @Override public RelDataType deriveType(SqlValidator validator, | ||
| SqlValidatorScope scope, SqlCall call) { | ||
| // special case for TRANSLATE: don't need to derive type for Charsets |
There was a problem hiding this comment.
it's not obvious what this comment refers to. Special compared to what?
There was a problem hiding this comment.
Got your point, I'll change it to don't need to derive type for Charsets in both CONVERT and TRANSLATE. WDYT?
| @Test void testStringComparisonWithConvertFunc() { | ||
| final SqlOperatorFixture f = fixture(); | ||
| f.setFor(SqlStdOperatorTable.CONVERT, VM_JAVA); | ||
| f.check("select 'a' as col\n" |
There was a problem hiding this comment.
this example is confusing, because the column name is col and the literal is also col.
Can you change one of them?
There was a problem hiding this comment.
this example is confusing, because the column name is col and the literal is also col. Can you change one of them?
Sure, I've changed the column alia.
mihaibudiu
left a comment
There was a problem hiding this comment.
I have never used these functions myself, but, assuming the semantics implemented is the correct one, this PR looks fine.
Can some of these results be validated on another DB?
Thank you for the review! @mihaibudiu In earlier's PR, we've tested CONVERT(mysql) and TRANSLATE(bigquery) in function.iq , and you can take a look if it's satisfied. BTW, I'll try to squash the commits if no more new comments come. |
…gs through CONVERT/TRANSLATE function during validation
d79a4a6 to
d3f0aec
Compare
|











No description provided.