Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builtins: Implement unaccent function #54628

Merged
merged 1 commit into from Sep 30, 2020

Conversation

mknycha
Copy link
Contributor

@mknycha mknycha commented Sep 21, 2020

Unaccent function should work the same as in PostgreSQL.

Release note (sql change): Implement the string builtin unaccent.

Described in #41299

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Sep 21, 2020

CLA assistant check
All committers have signed the CLA.

@blathers-crl
Copy link

blathers-crl bot commented Sep 21, 2020

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

I have added a few people who may be able to assist in reviewing:

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Sep 21, 2020
@cockroach-teamcity
Copy link
Member

This change is Reviewable

hELLO`

query T
SELECT unaccent('some­thing')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is actually U+00AD in the middle, it's visible in git diff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can combine these into one test case that's easier to mess with if you do:

SELECT unaccent(str) FROM ( VALUES 
   ('no_special_CHARACTERS1!'),
   ('Żółć')
) tbl(str)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done

@mknycha
Copy link
Contributor Author

mknycha commented Sep 22, 2020

I forgot to add a label "first PR", sorry

@rohany rohany removed their request for review September 22, 2020 16:11
Copy link
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hello @mknycha, thanks for doing this!
I was away last week, so apologies for the delay.
I have a couple of small comments.

@@ -0,0 +1,1519 @@
// Copyright 2015 The Cockroach Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 2020.

I think this may be useful as a separate package, i.e. in pkg/util/unaccent/unaccent.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done

hELLO`

query T
SELECT unaccent('some­thing')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can combine these into one test case that's easier to mess with if you do:

SELECT unaccent(str) FROM ( VALUES 
   ('no_special_CHARACTERS1!'),
   ('Żółć')
) tbl(str)


package builtins

var unaccentDictionary = map[string]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm curious - should this be map[rune]string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(if not, I'm not sure how strings.Split(s, "") above works with this dictionary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I have just realized that strings.Split may not work correctly. I guess that map[rune]string is indeed a way to go. Some "characters" in the dictionary consist of two code points, so I need to modify this dictionary a bit.

"unaccent": makeBuiltin(tree.FunctionProperties{Category: categoryString},
stringOverload1(
func(evalCtx *tree.EvalContext, s string) (tree.Datum, error) {
separatedStrings := strings.Split(s, "")
Copy link
Contributor

@otan otan Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is more efficient if we use a strings.Builder to write out the whole string.
I feel as if we should be iterating rune by rune, something like

var b strings.Builder
for _, ch := range s {
   if repl, ok := unaccent.Characters[ch]; ok {
      b.WriteString(repl)
   } else {
      b.WriteRune(ch)
   }
}

return tree.NewDString(b.String()), nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done

@mknycha mknycha requested a review from a team as a code owner September 30, 2020 07:32
@blathers-crl blathers-crl bot requested a review from otan September 30, 2020 07:32
@blathers-crl
Copy link

blathers-crl bot commented Sep 30, 2020

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@mknycha mknycha force-pushed the sql-support-unaccent branch 2 times, most recently from d327e71 to e4eea43 Compare September 30, 2020 10:55
@otan
Copy link
Contributor

otan commented Sep 30, 2020

looks like some linting issues -- i'll merge it after those are fixed!

Unaccent function should work the same as in PostgreSQL.

Release note (sql change): Implement the string builtin unaccent.
@otan
Copy link
Contributor

otan commented Sep 30, 2020

bors r+

thank you for the contribution!

@craig
Copy link
Contributor

craig bot commented Sep 30, 2020

Build succeeded:

@craig craig bot merged commit 8d0e636 into cockroachdb:master Sep 30, 2020
This was referenced Oct 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-community Originated from the community X-blathers-triaged blathers was able to find an owner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants