Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builtins: add fuzzystrmatch metaphone() built-in #110950

Merged
merged 1 commit into from Jan 10, 2024

Conversation

charlespnh
Copy link
Contributor

@charlespnh charlespnh commented Sep 20, 2023

informs: #56820

Release note (sql change): The metaphone() builtin function was added.

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Sep 20, 2023

CLA assistant check
All committers have signed the CLA.

@blathers-crl
Copy link

blathers-crl bot commented Sep 20, 2023

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

I have added a few people who may be able to assist in reviewing:

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Sep 20, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@otan otan removed their request for review September 21, 2023 23:13
@charlespnh charlespnh changed the title builtins: add fuzzystrmatch metaphone built-in function builtins: add fuzzystrmatch metaphone() built-in Sep 27, 2023
@blathers-crl
Copy link

blathers-crl bot commented Sep 27, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@jordanlewis jordanlewis requested a review from a team September 28, 2023 13:54
@rharding6373 rharding6373 requested review from a team and michae2 and removed request for a team September 28, 2023 14:05
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! can you please add tests for this in pkg/sql/logictest/testdata/logic_test/fuzzystrmatch ?

Copy link

blathers-crl bot commented Nov 23, 2023

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link

blathers-crl bot commented Nov 23, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

3 similar comments
Copy link

blathers-crl bot commented Nov 23, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link

blathers-crl bot commented Nov 23, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link

blathers-crl bot commented Nov 23, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link

blathers-crl bot commented Nov 25, 2023

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link

blathers-crl bot commented Nov 25, 2023

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link

blathers-crl bot commented Nov 25, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link

blathers-crl bot commented Nov 25, 2023

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@charlespnh charlespnh marked this pull request as ready for review November 26, 2023 15:57
@charlespnh charlespnh requested a review from a team as a code owner November 26, 2023 15:57
@charlespnh
Copy link
Contributor Author

charlespnh commented Nov 26, 2023

@rafiss Sorry for the delay... I intend to add tests but the built-in is a bit tricky to implement with all the rules. The PR was a draft since it was still a work in progress, but commits have been squashed and it's ready for review now.
It seems like Postgres doesn't have many tests for metaphone. Please let me know if there are any other meaningful tests I should have added

Both metaphone(...)'s behaviour and interface are intended to match with Postgres. With all these if/else-if rules, the implementation focuses on readability as much as possible as well.

@yuzefovich
Copy link
Member

Friendly ping @rafiss

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your continued work on this @charlespnh!

I found a few more test cases from looking at other open-source implementations of metaphone. There are a few where this PR differs:

metaphone('Campbel', 10) -> expected "KMPBL" got "MBL"
metaphone('Cammmppppbbbeeelll', 10) -> expected "KMPBL" got "M"
metaphone('What', 10) -> expected "WT" got "HT"
metaphone('astronomical', 10) -> expected "ASTRNMKL" got "ASTRNML"
metaphone('district', 10) -> expected "TSTRKT" got "TSTR"
metaphone('hockey', 10) -> expected "HK" got "H"
metaphone('capital', 10) -> expected "KPTL" got "PTL"
metaphone('lightning', 10) -> expected "LTNNK" got "LFTK"
metaphone('light', 10) -> expected "LT" got "LFT"

I have added these test cases to your PR, but I will leave it to you to fix the remaining bugs, if you're still interested in working on this. Consider using other implementations as a reference (and cite them in comments if you do so). see https://github.com/go-dedup/metaphoneand and https://github.com/vividvilla/metaphone

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @fqazi, @jordanlewis, and @michae2)

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @charlespnh, @fqazi, @jordanlewis, and @michae2)


pkg/util/fuzzystrmatch/metaphone_test.go line 90 at r5 (raw file):

		{
			Source:   "What",
			Expected: "HT",

postgres returns WT for this test case


pkg/util/fuzzystrmatch/metaphone_test.go line 142 at r5 (raw file):

		{
			Source:   "lightning",
			Expected: "LFTNNK",

postgres returns LTNNK for this case


pkg/util/fuzzystrmatch/metaphone_test.go line 146 at r5 (raw file):

		{
			Source:   "light",
			Expected: "LFT",

postgres returns LT for this case

Copy link
Contributor Author

@charlespnh charlespnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for helping me out! I'm using PostgreSQL 12.17 and this is what I get though:

image.png

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @fqazi, @jordanlewis, and @michae2)

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for clarifying. i must have been checking against a different implementation; i confirmed that i see the same postgres behavior now.

just had one more nit

pkg/util/fuzzystrmatch/metaphone.go Outdated Show resolved Hide resolved
Release note (sql change): metaphone() builtin function was added.
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your work on this!

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 10, 2024

Build succeeded:

@craig craig bot merged commit dab086d into cockroachdb:master Jan 10, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-community Originated from the community X-blathers-triaged blathers was able to find an owner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants