Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28123][SQL] String Functions: support btrim #31128

Closed
wants to merge 3 commits into from

Conversation

beliefer
Copy link
Contributor

What changes were proposed in this pull request?

Spark support trim/ltrim/rtrim now. The function btrim is an alternate form of TRIM(BOTH <chars> FROM <expr>).
btrim removes the longest string consisting only of specified characters from the start and end of a string.

The mainstream database support this feature show below:

Postgresql
https://www.postgresql.org/docs/11/functions-binarystring.html

Vertica
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/String/BTRIM.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CString%20Functions%7C_____5

Redshift
https://docs.aws.amazon.com/redshift/latest/dg/r_BTRIM.html

Druid
https://druid.apache.org/docs/latest/querying/sql.html#string-functions

Greenplum
http://docs.greenplum.org/6-8/ref_guide/function-summary.html

Why are the changes needed?

btrim is very useful.

Does this PR introduce any user-facing change?

Yes. btrim is a new function

How was this patch tested?

Jenkins test.

@SparkQA
Copy link

SparkQA commented Jan 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38509/

@SparkQA
Copy link

SparkQA commented Jan 11, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38509/

@SparkQA
Copy link

SparkQA commented Jan 11, 2021

Test build #133922 has finished for PR 31128 at commit 736027e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions github-actions bot added the SQL label Jan 11, 2021
@SparkQA
Copy link

SparkQA commented Jan 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38521/

@SparkQA
Copy link

SparkQA commented Jan 11, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38521/

@SparkQA
Copy link

SparkQA commented Jan 11, 2021

Test build #133933 has finished for PR 31128 at commit e4814dd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

cc @cloud-fan

@srowen
Copy link
Member

srowen commented Jan 12, 2021

There are a number of PRs floating around about this. For the record, I don't strongly oppose it, but, does not seem like a semantically useful function, even if some legacy DBs support it.

@beliefer
Copy link
Contributor Author

beliefer commented Jan 13, 2021

There are a number of PRs floating around about this. For the record, I don't strongly oppose it, but, does not seem like a semantically useful function, even if some legacy DBs support it.

This PR is different from the others, this is also because my previous misunderstanding of btrim. At the first, I Investigated the PostgreSql, the implement of btrim in PostgreSql accepts byte array and return byte array as result.
So I mistakenly thought that btrim's behavior is like this.

Then I created two PR as you know. But the two PR try to let btrim or trim accepts byte array and return byte array as result.

After that, I Investigated other database and found btrim means trim from both ends of source string and it is just an alternate form of TRIM(BOTH <chars> FROM <expr>).
btrim is a little similar to ltrim or rtrim.

This PR only need minor changes (such as giving trim an alias) can support it.

@SparkQA
Copy link

SparkQA commented Jan 13, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38579/

@SparkQA
Copy link

SparkQA commented Jan 13, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38579/

@@ -872,31 +872,21 @@ object StringTrim {
Examples:
> SELECT _FUNC_(' SparkSQL ');
SparkSQL
> SELECT _FUNC_(BOTH FROM ' SparkSQL ');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit bad to drop these examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should think about a better way to add function alias, so that they can have their own examples.

@SparkQA
Copy link

SparkQA commented Jan 13, 2021

Test build #133991 has finished for PR 31128 at commit e9b95e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

The function adopt the new PR #31390

@beliefer beliefer closed this Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants