-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-11653: [Rust][DataFusion] Postgres String Functions: ascii, chr, initcap, repeat, reverse, to_hex #9625
ARROW-11653: [Rust][DataFusion] Postgres String Functions: ascii, chr, initcap, repeat, reverse, to_hex #9625
Conversation
Thanks @seddonm1 -- I plan to review this tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed through this and it looks good. The only nit I have is that we are often repeating this code:
args[0]
.as_any()
.downcast_ref::<GenericStringArray<T>>()
.ok_or_else(|| {
DataFusionError::Internal("could not cast string to StringArray".to_string())
})?;
It might be worth considering adding a helper method or macro for this.
@andygrove no problem. I will create a macro for that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review all of the code in detail but I am approving this based on the tests (which are really cool) and the fact that this is new functionality, so we can always follow up with bug fixes if issues are found. Thanks for adding the macros - I think that makes the code much easier to review.
Thanks @andygrove . I have a few more PRs to do to finish this first phase of work. Then I think it's time to tackle type coercion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through this PR carefully and I like it!
I agree with @andygrove 's comments about the excellent test coverage and new code. 🏅 🏅
Thank you so much @seddonm1 . I'll plan to merge this sometime over the next day or two
@@ -2051,13 +2054,19 @@ async fn test_string_expressions() -> Result<()> { | |||
test_expression!("character_length('chars')", "5"); | |||
test_expression!("character_length('josé')", "4"); | |||
test_expression!("character_length(NULL)", "NULL"); | |||
test_expression!("chr(CAST(120 AS int))", "x"); | |||
test_expression!("chr(CAST(128175 AS int))", "💯"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 indeed!
test_function!( | ||
Ascii, | ||
&[lit(ScalarValue::Utf8(Some("💯".to_string())))], | ||
Ok(Some(128175)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is strange to me that a function called ascii
can return something larger than 127
. However, it seems that quirkiness / awesomeness came from postgres
:)
alamb=# select ascii('💯');
ascii
--------
128175
(1 row)
👍
"could not cast fill to StringArray".to_string(), | ||
) | ||
})?; | ||
let string_array = downcast_string_arg!(args[0], "string", T); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these macros certainly make the code nicer to read. 👍
…, initcap, repeat, reverse, to_hex @alamb This is the second last of the current string functions but I think there may be one after that with new code. This implements some of the miscellaneous string functions `ascii`, `chr`, `initcap`, `repeat`, `reverse`, `to_hex`. The next PR will have more useful functions (including regex). A little bit of tidying for consistency to the other functions was applied. This PR is a child of apache#9243 Closes apache#9625 from seddonm1/ascii-chr-initcap-repeat-reverse-tohex Authored-by: Mike Seddon <seddonm1@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
…, initcap, repeat, reverse, to_hex @alamb This is the second last of the current string functions but I think there may be one after that with new code. This implements some of the miscellaneous string functions `ascii`, `chr`, `initcap`, `repeat`, `reverse`, `to_hex`. The next PR will have more useful functions (including regex). A little bit of tidying for consistency to the other functions was applied. This PR is a child of apache#9243 Closes apache#9625 from seddonm1/ascii-chr-initcap-repeat-reverse-tohex Authored-by: Mike Seddon <seddonm1@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
@alamb This is the second last of the current string functions but I think there may be one after that with new code.
This implements some of the miscellaneous string functions
ascii
,chr
,initcap
,repeat
,reverse
,to_hex
. The next PR will have more useful functions (including regex).A little bit of tidying for consistency to the other functions was applied.
This PR is a child of #9243