Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11653: [Rust][DataFusion] Postgres String Functions: ascii, chr, initcap, repeat, reverse, to_hex #9625

Closed

Conversation

seddonm1
Copy link
Contributor

@seddonm1 seddonm1 commented Mar 3, 2021

@alamb This is the second last of the current string functions but I think there may be one after that with new code.

This implements some of the miscellaneous string functions ascii, chr, initcap, repeat, reverse, to_hex. The next PR will have more useful functions (including regex).

A little bit of tidying for consistency to the other functions was applied.

This PR is a child of #9243

@github-actions
Copy link

github-actions bot commented Mar 3, 2021

@alamb
Copy link
Contributor

alamb commented Mar 3, 2021

Thanks @seddonm1 -- I plan to review this tomorrow

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed through this and it looks good. The only nit I have is that we are often repeating this code:

args[0]
        .as_any()
        .downcast_ref::<GenericStringArray<T>>()
        .ok_or_else(|| {
            DataFusionError::Internal("could not cast string to StringArray".to_string())
        })?;

It might be worth considering adding a helper method or macro for this.

@seddonm1
Copy link
Contributor Author

seddonm1 commented Mar 3, 2021

@andygrove no problem. I will create a macro for that.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review all of the code in detail but I am approving this based on the tests (which are really cool) and the fact that this is new functionality, so we can always follow up with bug fixes if issues are found. Thanks for adding the macros - I think that makes the code much easier to review.

@seddonm1
Copy link
Contributor Author

seddonm1 commented Mar 4, 2021

Thanks @andygrove . I have a few more PRs to do to finish this first phase of work. Then I think it's time to tackle type coercion.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through this PR carefully and I like it!

I agree with @andygrove 's comments about the excellent test coverage and new code. 🏅 🏅

Thank you so much @seddonm1 . I'll plan to merge this sometime over the next day or two

@@ -2051,13 +2054,19 @@ async fn test_string_expressions() -> Result<()> {
test_expression!("character_length('chars')", "5");
test_expression!("character_length('josé')", "4");
test_expression!("character_length(NULL)", "NULL");
test_expression!("chr(CAST(120 AS int))", "x");
test_expression!("chr(CAST(128175 AS int))", "💯");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 indeed!

test_function!(
Ascii,
&[lit(ScalarValue::Utf8(Some("💯".to_string())))],
Ok(Some(128175)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is strange to me that a function called ascii can return something larger than 127. However, it seems that quirkiness / awesomeness came from postgres :)

alamb=# select ascii('💯');
 ascii  
--------
 128175
(1 row)

👍

"could not cast fill to StringArray".to_string(),
)
})?;
let string_array = downcast_string_arg!(args[0], "string", T);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these macros certainly make the code nicer to read. 👍

@alamb alamb closed this in dec5ab9 Mar 5, 2021
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
…, initcap, repeat, reverse, to_hex

@alamb This is the second last of the current string functions but I think there may be one after that with new code.

This implements some of the miscellaneous string functions `ascii`, `chr`, `initcap`, `repeat`, `reverse`, `to_hex`. The next PR will have more useful functions (including regex).

A little bit of tidying for consistency to the other functions was applied.

This PR is a child of apache#9243

Closes apache#9625 from seddonm1/ascii-chr-initcap-repeat-reverse-tohex

Authored-by: Mike Seddon <seddonm1@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
…, initcap, repeat, reverse, to_hex

@alamb This is the second last of the current string functions but I think there may be one after that with new code.

This implements some of the miscellaneous string functions `ascii`, `chr`, `initcap`, `repeat`, `reverse`, `to_hex`. The next PR will have more useful functions (including regex).

A little bit of tidying for consistency to the other functions was applied.

This PR is a child of apache#9243

Closes apache#9625 from seddonm1/ascii-chr-initcap-repeat-reverse-tohex

Authored-by: Mike Seddon <seddonm1@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants