-
Notifications
You must be signed in to change notification settings - Fork 1.8k
fix: binaryExpr not supported for LargeUtf8 #5896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ef9a37f to
cfdf1a0
Compare
| # like and ilike for LargeUtf8 | ||
| # issue: https://github.com/apache/arrow-datafusion/issues/5893 | ||
| query B | ||
| select arrow_cast('foo', 'LargeUtf8') like '%foo%'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, btw, why is it like '%foo%', not == 'foo' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a case of issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a case of issue
Sorry, I still didn't get it, you mean inner value of Utf8("foo") not the same as inner value LargeUtf8("foo")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it's not clear.😂
This test cover issue in #5893
English isn't my native language, so I subconsciously thought issue is github issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess a more comprehensive test would be
select arrow_cast('foobar', 'LargeUtf8') like '%oba%';
—-
true
select arrow_cast('FooBar', 'LargeUtf8') ilike '%oba%';
—-
true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another test that would be great to add would be to compare columns as well as constants (as that is the other binary expr codepath)
Something like
create table t as values
(arrow_cast('Foo', 'LargeUtf8') '%f'),
(arrow_cast('Bar', 'LargeUtf8') 'B%');
select column1 like column2 from t;
select column1 ilike column2 from t;
select column1 not like column2 from t;
select column1 not ilike column2 from t;
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this @jackwener
| # like and ilike for LargeUtf8 | ||
| # issue: https://github.com/apache/arrow-datafusion/issues/5893 | ||
| query B | ||
| select arrow_cast('foo', 'LargeUtf8') like '%foo%'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another test that would be great to add would be to compare columns as well as constants (as that is the other binary expr codepath)
Something like
create table t as values
(arrow_cast('Foo', 'LargeUtf8') '%f'),
(arrow_cast('Bar', 'LargeUtf8') 'B%');
select column1 like column2 from t;
select column1 ilike column2 from t;
select column1 not like column2 from t;
select column1 not ilike column2 from t;cfdf1a0 to
2bac5a2
Compare
|
I remember there are some other issues with |
#[tokio::test]
async fn sort_on_window_null_string() -> Result<()> {
let d1: DictionaryArray<Int32Type> =
vec![Some("one"), None, Some("three")].into_iter().collect();
let d2: StringArray = vec![Some("ONE"), None, Some("THREE")].into_iter().collect();
let d3: LargeStringArray =
vec![Some("One"), None, Some("Three")].into_iter().collect();
let batch = RecordBatch::try_from_iter(vec![
("d1", Arc::new(d1) as ArrayRef),
("d2", Arc::new(d2) as ArrayRef),
("d3", Arc::new(d3) as ArrayRef),
])
.unwrap();
let ctx = SessionContext::with_config(SessionConfig::new().with_target_partitions(1));
ctx.register_batch("test", batch)?;
let sql =
"SELECT d1, row_number() OVER (partition by d1) as rn1 FROM test order by d1 asc";
let actual = execute_to_batches(&ctx, sql).await;
// NULLS LAST
let expected = vec![
"+-------+-----+",
"| d1 | rn1 |",
"+-------+-----+",
"| one | 1 |",
"| three | 1 |",
"| | 1 |",
"+-------+-----+",
];
assert_batches_eq!(expected, &actual);
let sql =
"SELECT d2, row_number() OVER (partition by d2) as rn1 FROM test ORDER BY d2 asc";
let actual = execute_to_batches(&ctx, sql).await;
// NULLS LAST
let expected = vec![
"+-------+-----+",
"| d2 | rn1 |",
"+-------+-----+",
"| ONE | 1 |",
"| THREE | 1 |",
"| | 1 |",
"+-------+-----+",
];
assert_batches_eq!(expected, &actual);
let sql =
"SELECT d2, row_number() OVER (partition by d2 order by d2 desc) as rn1 FROM test ORDER BY d2 desc";
let actual = execute_to_batches(&ctx, sql).await;
// NULLS FIRST
let expected = vec![
"+-------+-----+",
"| d2 | rn1 |",
"+-------+-----+",
"| | 1 |",
"| THREE | 1 |",
"| ONE | 1 |",
"+-------+-----+",
];
assert_batches_eq!(expected, &actual);
// FIXME sort on LargeUtf8 String has bug.
// let sql =
// "SELECT d3, row_number() OVER (partition by d3) as rn1 FROM test";
// let actual = execute_to_batches(&ctx, sql).await;
// let expected = vec![
// "+-------+-----+",
// "| d3 | rn1 |",
// "+-------+-----+",
// "| | 1 |",
// "| One | 1 |",
// "| Three | 1 |",
// "+-------+-----+",
// ];
// assert_batches_eq!(expected, &actual);
Ok(())
} |
|
@mingmwang Can we add it in following PR? Look like it don't relate with this PR |
|
if there isn't new review, I prepare to merge this PR in the tomorrow due to some review already in this PR. |
|
I agree adding additional tests for unrelated functionality (e.g. general support for LargeUtf8) would be best done in some other PR |
* fix: like and ilike not supported for LargeUtf8 * add test
Which issue does this PR close?
Closes #5893.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?