builtins: Implement unaccent function #54628

mknycha · 2020-09-21T19:58:52Z

Unaccent function should work the same as in PostgreSQL.

Release note (sql change): Implement the string builtin unaccent.

Described in #41299

cockroach-teamcity · 2020-09-21T19:58:55Z

All committers have signed the CLA.

blathers-crl · 2020-09-21T19:58:56Z

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

I have added a few people who may be able to assist in reviewing:

@rohany (author of sql: Support unaccent() #41299)
@jordanlewis (commented on sql: Support unaccent() #41299)
@otan (commented on sql: Support unaccent() #41299)

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

cockroach-teamcity · 2020-09-21T19:58:59Z

This change is

mknycha · 2020-09-21T20:02:33Z

pkg/sql/logictest/testdata/logic_test/builtin_function

+hELLO`
+
+query T
+SELECT unaccent('something')


There is actually U+00AD in the middle, it's visible in git diff.

we can combine these into one test case that's easier to mess with if you do:

SELECT unaccent(str) FROM ( VALUES ('no_special_CHARACTERS1!'), ('Żółć') ) tbl(str)

mknycha · 2020-09-22T06:01:17Z

I forgot to add a label "first PR", sorry

otan

hello @mknycha, thanks for doing this!
I was away last week, so apologies for the delay.
I have a couple of small comments.

otan · 2020-09-28T14:25:56Z

pkg/sql/sem/builtins/unaccent_dictionary.go

@@ -0,0 +1,1519 @@
+// Copyright 2015 The Cockroach Authors.


nit: 2020.

I think this may be useful as a separate package, i.e. in pkg/util/unaccent/unaccent.go

otan · 2020-09-28T14:32:04Z

pkg/sql/logictest/testdata/logic_test/builtin_function

+hELLO`
+
+query T
+SELECT unaccent('something')


we can combine these into one test case that's easier to mess with if you do:

SELECT unaccent(str) FROM ( VALUES ('no_special_CHARACTERS1!'), ('Żółć') ) tbl(str)

otan · 2020-09-28T14:33:34Z

pkg/sql/sem/builtins/unaccent_dictionary.go

+
+package builtins
+
+var unaccentDictionary = map[string]string{


i'm curious - should this be map[rune]string?

(if not, I'm not sure how strings.Split(s, "") above works with this dictionary.

You're right, I have just realized that strings.Split may not work correctly. I guess that map[rune]string is indeed a way to go. Some "characters" in the dictionary consist of two code points, so I need to modify this dictionary a bit.

otan · 2020-09-28T14:36:51Z

pkg/sql/sem/builtins/builtins.go

+	"unaccent": makeBuiltin(tree.FunctionProperties{Category: categoryString},
+		stringOverload1(
+			func(evalCtx *tree.EvalContext, s string) (tree.Datum, error) {
+				separatedStrings := strings.Split(s, "")


nit: this is more efficient if we use a strings.Builder to write out the whole string.
I feel as if we should be iterating rune by rune, something like

var b strings.Builder for _, ch := range s { if repl, ok := unaccent.Characters[ch]; ok { b.WriteString(repl) } else { b.WriteRune(ch) } } return tree.NewDString(b.String()), nil

blathers-crl · 2020-09-30T07:32:28Z

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

otan · 2020-09-30T15:18:15Z

looks like some linting issues -- i'll merge it after those are fixed!

Unaccent function should work the same as in PostgreSQL. Release note (sql change): Implement the string builtin unaccent.

otan · 2020-09-30T17:56:50Z

bors r+

thank you for the contribution!

craig · 2020-09-30T19:34:40Z

Build succeeded:

GitHub CI (Cockroach)

blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Sep 21, 2020

blathers-crl bot requested review from jordanlewis, otan and rohany September 21, 2020 19:58

mknycha commented Sep 21, 2020

View reviewed changes

rohany removed their request for review September 22, 2020 16:11

otan reviewed Sep 28, 2020

View reviewed changes

mknycha force-pushed the sql-support-unaccent branch from 546cf70 to 34f6864 Compare September 30, 2020 07:32

mknycha requested a review from a team as a code owner September 30, 2020 07:32

blathers-crl bot requested a review from otan September 30, 2020 07:32

mknycha force-pushed the sql-support-unaccent branch 2 times, most recently from d327e71 to e4eea43 Compare September 30, 2020 10:55

builtins: Implement unaccent function

d833390

Unaccent function should work the same as in PostgreSQL. Release note (sql change): Implement the string builtin unaccent.

mknycha force-pushed the sql-support-unaccent branch from e4eea43 to d833390 Compare September 30, 2020 17:33

otan approved these changes Sep 30, 2020

View reviewed changes

craig bot merged commit 8d0e636 into cockroachdb:master Sep 30, 2020

This was referenced Oct 6, 2020

sql: Support unaccent() #41299

Closed

sql: unaccent function #30540

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

builtins: Implement unaccent function #54628

builtins: Implement unaccent function #54628

mknycha commented Sep 21, 2020

cockroach-teamcity commented Sep 21, 2020 •

edited

blathers-crl bot commented Sep 21, 2020

cockroach-teamcity commented Sep 21, 2020

mknycha Sep 21, 2020

otan Sep 28, 2020

mknycha Sep 30, 2020

mknycha commented Sep 22, 2020

otan left a comment

otan Sep 28, 2020

mknycha Sep 30, 2020

otan Sep 28, 2020

otan Sep 28, 2020

otan Sep 28, 2020

mknycha Sep 29, 2020

otan Sep 28, 2020 •

edited

mknycha Sep 30, 2020

blathers-crl bot commented Sep 30, 2020

otan commented Sep 30, 2020

otan commented Sep 30, 2020

craig bot commented Sep 30, 2020

builtins: Implement unaccent function #54628

builtins: Implement unaccent function #54628

Conversation

mknycha commented Sep 21, 2020

cockroach-teamcity commented Sep 21, 2020 • edited

blathers-crl bot commented Sep 21, 2020

cockroach-teamcity commented Sep 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mknycha commented Sep 22, 2020

otan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

otan Sep 28, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blathers-crl bot commented Sep 30, 2020

otan commented Sep 30, 2020

otan commented Sep 30, 2020

craig bot commented Sep 30, 2020

cockroach-teamcity commented Sep 21, 2020 •

edited

otan Sep 28, 2020 •

edited