Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: tolerance option not behaving as hoped #480

Closed
ttillberg opened this issue Sep 14, 2023 · 12 comments
Closed

Bug: tolerance option not behaving as hoped #480

ttillberg opened this issue Sep 14, 2023 · 12 comments
Labels
bug Something isn't working 💰 Rewarded

Comments

@ttillberg
Copy link

Thanks for the amazing lib and clear documentation! I'm looking at using Orama to search local chat messages (typically involving a few words up to several sentences).

Using @orama/orama ^1.2.3 I'm getting fast a correct results for exact and prefixed matching however however typos don't seem to work the way I was hoping. I'm probably missing the obvious but testing the tolerance parameter against an example in the docs returns poor results. So I'm wondering what could be wrong.

Looking at the following example.
https://docs.oramasearch.com/usage/search/introduction#typo-tolerance

If I grab a slightly bigger database:
https://github.com/erik-sytnyk/movies-list/blob/master/db.json

{ 
  term: "Christopher Nolan", 
  properties: ["director"] 
}

// result: OK: matches 1 exact result like expected
{
  term: "Cris",
  properties: ["director"],
}

// result: OK: matches 1 document "Michael Cristofer" (no tolerance was set, so this is kind of expected)
{
  term: 'Cris',
  properties: ['director'],
  tolerance: 1,
}

// result: "fails": matches 0 documents, in the documentation this query would return all "Chris's" - not this would still fail bumping the tolerance level
// one example in the DB: "director": "Pierre Coffin, Chris Renaud",

here's my playground (all output is in the console):
https://codesandbox.io/p/sandbox/keen-knuth-9wql22?file=/src/main.ts:65,28

I've played with other options, such as the tokenizer, stemming, relevance, threshold but without luck. What am I missing?

@micheleriva
Copy link
Member

I fear that's a known issue. We're performing the Levenshtein edit distance on words living in the same prefix bucket, rather than performing the edit distance calculation on trees. For instance, searching for Chris and hris will give you totally different results, as they don't share a common prefix.

I'll be putting a bounty on this bug, thanks for opening it!

/bounty 500

@algora-pbc
Copy link

algora-pbc bot commented Sep 14, 2023

💎 $500 bounty created by micheleriva
🙋 If you start working on this, comment /attempt #480 to notify everyone
👉 To claim this bounty, submit a pull request that includes the text /claim #480 somewhere in its body
📝 Before proceeding, please make sure you can receive payouts in your country
💵 Payment arrives in your account 2-5 days after the bounty is rewarded
💯 You keep 100% of the bounty award
🙏 Thank you for contributing to oramasearch/orama!

Attempt Started (GMT+0) Solution
🟢 @mnmt7 Sep 15, 2023, 5:50:54 AM WIP
🟢 @bicky21 Sep 18, 2023, 2:53:27 PM WIP
🟢 @melsonic Oct 2, 2023, 4:51:02 PM WIP
🟢 @SP321 Oct 10, 2023, 12:23:49 PM #516

@micheleriva micheleriva changed the title Question: tolerance option not behaving as hoped Bug: tolerance option not behaving as hoped Sep 14, 2023
@micheleriva micheleriva added bug Something isn't working and removed 💎 Bounty labels Sep 14, 2023
@mnmt7
Copy link

mnmt7 commented Sep 15, 2023

Hey @micheleriva, I would like to work on this issue. Can you please assign this issue to me?
/attempt #480

Options

@bicky21
Copy link

bicky21 commented Sep 18, 2023

Hey, I have a solution
/attempt #480

Options

@algora-pbc
Copy link

algora-pbc bot commented Sep 18, 2023

Note: The user @mnmt7 is already attempting to complete issue #480 and claim the bounty. If you attempt to complete the same issue, there is a chance that @mnmt7 will complete the issue first, and be awarded the bounty. We recommend discussing with @mnmt7 and potentially collaborating on the same solution versus creating an alternate solution.

@ogil7190
Copy link

@ttillberg is this open to work?

@micheleriva
Copy link
Member

@ogil7190 yes

@melsonic
Copy link
Contributor

melsonic commented Oct 2, 2023

/attempt #480

Options

@SP321
Copy link
Contributor

SP321 commented Oct 10, 2023

/attempt #480

Options

@algora-pbc
Copy link

algora-pbc bot commented Oct 10, 2023

💡 @SP321 submitted a pull request that claims the bounty. You can visit your org dashboard to reward.

@algora-pbc
Copy link

algora-pbc bot commented Oct 11, 2023

🎉🎈 @SP321 has been awarded $500! 🎈🎊

@micheleriva
Copy link
Member

Fixed with v1.2.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working 💰 Rewarded
Projects
None yet
Development

No branches or pull requests

7 participants