Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search does not find all matches #9

Closed
jwillmer opened this issue Jun 15, 2016 · 15 comments
Closed

Search does not find all matches #9

jwillmer opened this issue Jun 15, 2016 · 15 comments

Comments

@jwillmer
Copy link

Maybe I missunderstand the search but I think it does not work as it should. I use fuzisearch.js to compare urls with each other.

If someone misstypes a url like:

http://jwillmer.github.io/jekyllDecent/features

he is redirected to an error page and I add all urls from my website:

    {
        "title": "Theme Installation and Usage",
        "url": "http://jwillmer.github.io/jekyllDecent/blog/readme/Readme",
    },
    {
        "title": "Theme Features",
        "url": "http://jwillmer.github.io/jekyllDecent/blog/features/Features",
    },
    {
        "title": "YAML Custom Features",
        "url": "http://jwillmer.github.io/jekyllDecent/blog/features/YAML-Features",
    },
    {
        "title": "This post demonstrates post content styles",
        "url": "http://jwillmer.github.io/jekyllDecent/blog/features/Content",
    }

to fuzisearch.js and as search result I only get one url back:

http://jwillmer.github.io/jekyllDecent/blog/features/Content

Shouldn’t there be at least three urls that have some kind of match?

@Glench
Copy link
Owner

Glench commented Jun 15, 2016

hi @jwillmer! Is is possible for you to set up a simple test page for me to see how you're using fuzzyset and how it's failing? That would make it a lot easier for me to understand if there's a bug and how to fix it. Thanks!

@jwillmer
Copy link
Author

jwillmer commented Jun 15, 2016

What is the problem with my example? The files are not minified, the init for fuzziset.js is in offline.jsand you can find the repo at GitHub.

image

@Glench
Copy link
Owner

Glench commented Jun 16, 2016

Okay, I see now. I was worried that there was lots of other code that would interfere with the bug-finding process, but it's not bad.

I briefly looked into it, but I don't know why it's happening. It's only matching on gram size 3, returning one result, and not checking for gram size 2 because it found a result for 3. I probably won't get around to checking this for a long time, but you might try messing with the gram sizes in the mean time, @jwillmer.

@jwillmer
Copy link
Author

It could be that a fork of the project has fixed the bug without knowing that it is a bug: willlma@b6d9c59

@zbigg
Copy link

zbigg commented Sep 20, 2016

The bug is in those lines:

        for (var i = 0; i < results.length; ++i) {
            if (results[i][0] == results[0][0]) {
                newResults.push([results[i][0], this.exactSet[results[i][1]]]);
            }
        }

That alone causes that only best scored items is returned.
By some logic, this can fit implicit semantics of get method which usually returns concrete value, not list of search results.

My proposition would be to add search method that returns all results, sorted by score and delegate slicing (which this code does) to application, so it can choose how many results shall be shown.

@ryanweal
Copy link

ryanweal commented Jan 5, 2017

I'm also seeing this on the project example page.

Go here: http://glench.github.io/fuzzyset.js/#example type in "Mas" or "Mass".

Expected result: Massachusetts

Given results:
Mas -> Maine, Texas
Mass -> Maine

Interestingly, if you skip the first letter you get the correct result.

@ryanweal
Copy link

ryanweal commented Jan 5, 2017

Indeed @WillMa's fork resolves this. I had been testing with @washt's fork locally as that is what lives in npm but dropping in willma's replacement gives me multiple results and the results match my expectation.

@Glench
Copy link
Owner

Glench commented Jan 20, 2017

Ok cool, does someone want to submit a pull request with the correct changes?

@monolithpl
Copy link

Another problem in the example: "Northern Marianas Islands" is in the list, but when you search for "Islands Northern" you get no results. Not such a good fuzzy implementation if it doesn't tokenize.

@jwillmer
Copy link
Author

jwillmer commented Jun 1, 2017

@Glench would you mind to fix this issue for us? Since you know the project best and the investigation has already been done 😉

@monolithpl
Copy link

why reinvent the wheel? use something that works now https://github.com/atom/fuzzaldrin

@Glench
Copy link
Owner

Glench commented Jun 1, 2017

working on it! It will probably be fixed today.

@Glench
Copy link
Owner

Glench commented Jun 1, 2017

@monolithpl I think the problem with that particular example is that you should set useLevenshtein to false. I believe Levenshtein distance sees your example and the thing you think it should match as very far away from each other since it would require lots of transposition to make the strings match. Whereas just using cosine distance the entries have a higher match percentage.

@Glench
Copy link
Owner

Glench commented Jun 1, 2017

@ryanweal Testing this now, it seems like setting useLevenshtein = false helps match partially-typed words better. I would suggest using that option.

@Glench
Copy link
Owner

Glench commented Jun 1, 2017

Ok, so I fixed the library to look a bit more like @WillMa's fork. Now by default results with a score >= .33 are shown and that score can be passed to the get method as the third parameter e.g.

fuzzyset.get('hi', null, .5) (any result with a score >= .5 will be matched)

In general, I would also recommend trying to set useLevenshtein to false in the case of partial string matching. For such long entries such as URLs as in @jwillmer's original post, I would also try experimenting with the gram size, or trim the URLs so only relevant paths are stored in the search. Or maybe try @jeancroy's strategy (which I haven't tried). Thanks @jeancroy!

There's also now a rough debugging interface for fuzzyset.js that might help you understand what's going on a little better. I used it to try out @jwillmer's original example.

@Glench Glench closed this as completed Jun 1, 2017
@DSh3p4rd DSh3p4rd mentioned this issue May 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants