Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return scores with search results #13

Closed
luryus opened this issue Mar 7, 2022 · 2 comments · Fixed by #14
Closed

Return scores with search results #13

luryus opened this issue Mar 7, 2022 · 2 comments · Fixed by #14

Comments

@luryus
Copy link

luryus commented Mar 7, 2022

First, thanks for the great library! I'm using it in a hobby password manager project of mine. It's the only Rust library I've found that actually works nicely for searching password items.

The current search functions, search and search_tokens, only return the result item ids. This is fine for most situations, as the results are already sorted by the score.

However, an ability to get the scores along with the ids would allow sorting items with equal scores in a custom way. For instance, in my password manager, I would want to sort items with equal scores alphabetically.


As a side note, the biggest issue I have right now is that search result order is not stable. Searching with the same term on the same data set multiple times returns items with equal scores in different orders:

use simsearch::SimSearch;

fn main() {
    // Generate items
    let items: Vec<_> = (1..=50).map(|n| format!("Sample item {}", n)).collect();

    let mut ss = SimSearch::new();
    for i in &items {
        ss.insert(i, i);
    }

    for i in 0..3 {
        println!("Run #{i}");
        let res = ss.search("sample item");
        for id in &res[0..2] {
            println!("  {id}");
        }
        println!("---");
    }
}

Running this prints out:

Run #0
  Sample item 28
  Sample item 21
---
Run #1
  Sample item 1
  Sample item 29
---
Run #2
  Sample item 37
  Sample item 38
---

I think the reason for this instability is the use of HashMap internally. Perhaps it could be fixed by changing the hasher implementation, but ultimately I think returning the scores is a more versatile solution.

@andylokandy
Copy link
Owner

Thank you for taking your time in this small crate. You've given a strong example of why returning results with scores is necessary. But will it be sufficient if we replace HashMap with IndexMap which should make the result stable?

@luryus
Copy link
Author

luryus commented Mar 9, 2022

For this specific issue I have, yes it would suffice.

However, I still would prefer to be able to alphabetically sort the results with equal scores. Something like this:

let mut results: Vec<(&str, f32)>  = ss.search_with_scores("term");
results.sort_by_key(|(r, s)| (-s, r));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants