New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
builtins: add fuzzystrmatch metaphone() built-in #110950
Conversation
Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR. Before a member of our team reviews your PR, I have some potential action items for you:
I have added a few people who may be able to assist in reviewing:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
867c854
to
0885b00
Compare
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! can you please add tests for this in pkg/sql/logictest/testdata/logic_test/fuzzystrmatch ?
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
3 similar comments
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
f99f873
to
a875e32
Compare
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
90b7ce4
to
f99f873
Compare
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
f99f873
to
cd410b0
Compare
Thank you for updating your pull request. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
@rafiss Sorry for the delay... I intend to add tests but the built-in is a bit tricky to implement with all the rules. The PR was a draft since it was still a work in progress, but commits have been squashed and it's ready for review now. Both metaphone(...)'s behaviour and interface are intended to match with Postgres. With all these if/else-if rules, the implementation focuses on readability as much as possible as well. |
cd410b0
to
fc3fde5
Compare
Friendly ping @rafiss |
fc3fde5
to
d125556
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your continued work on this @charlespnh!
I found a few more test cases from looking at other open-source implementations of metaphone. There are a few where this PR differs:
metaphone('Campbel', 10) -> expected "KMPBL" got "MBL"
metaphone('Cammmppppbbbeeelll', 10) -> expected "KMPBL" got "M"
metaphone('What', 10) -> expected "WT" got "HT"
metaphone('astronomical', 10) -> expected "ASTRNMKL" got "ASTRNML"
metaphone('district', 10) -> expected "TSTRKT" got "TSTR"
metaphone('hockey', 10) -> expected "HK" got "H"
metaphone('capital', 10) -> expected "KPTL" got "PTL"
metaphone('lightning', 10) -> expected "LTNNK" got "LFTK"
metaphone('light', 10) -> expected "LT" got "LFT"
I have added these test cases to your PR, but I will leave it to you to fix the remaining bugs, if you're still interested in working on this. Consider using other implementations as a reference (and cite them in comments if you do so). see https://github.com/go-dedup/metaphoneand and https://github.com/vividvilla/metaphone
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @fqazi, @jordanlewis, and @michae2)
d125556
to
3b37eae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @charlespnh, @fqazi, @jordanlewis, and @michae2)
pkg/util/fuzzystrmatch/metaphone_test.go
line 90 at r5 (raw file):
{ Source: "What", Expected: "HT",
postgres returns WT
for this test case
pkg/util/fuzzystrmatch/metaphone_test.go
line 142 at r5 (raw file):
{ Source: "lightning", Expected: "LFTNNK",
postgres returns LTNNK
for this case
pkg/util/fuzzystrmatch/metaphone_test.go
line 146 at r5 (raw file):
{ Source: "light", Expected: "LFT",
postgres returns LT
for this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for helping me out! I'm using PostgreSQL 12.17 and this is what I get though:
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @fqazi, @jordanlewis, and @michae2)
3b37eae
to
17c5380
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for clarifying. i must have been checking against a different implementation; i confirmed that i see the same postgres behavior now.
just had one more nit
Release note (sql change): metaphone() builtin function was added.
17c5380
to
13fb2c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your work on this!
bors r+
Build succeeded: |
informs: #56820
Release note (sql change): The metaphone() builtin function was added.