Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Autocomplete: Score not given as expected #203

Closed
borisdaeppen opened this Issue · 5 comments

4 participants

@borisdaeppen

Autocomplete has strange behavior here:

http://api.metacpan.org//v0/search/autocomplete?&q=Log++Log4perl

     [...]
     {
        "_score" : 1.169038,
        "fields" : {
           "documentation" : "Tie::Log4perl",
           "release" : "Tie-Log4perl-0.1",
           "author" : "FRODWITH",
           "distribution" : "Tie-Log4perl"
        },
        [...]
     },
     {
        "_score" : 1.169038,
        "fields" : {
           "documentation" : "Log::Log4perl",
           "release" : "Log-Log4perl-1.36",
           "author" : "MSCHILLI",
           "distribution" : "Log-Log4perl"
        },
        [...]
     },
     {
        "_score" : 1.1590381,
        "fields" : {
           "documentation" : "Test::Log4perl",
           "release" : "Test-Log4perl-0.1001",
           "author" : "FOTANGO",
           "distribution" : "Test-Log4perl"
        },
        [...]
     },
     [...]

Both, Tie::Log4perl and Log::Log4perl have the same score which is 1.169038.
Even worse, Tie::Log4perl is placed on top. If I add the parameter &size=1 to the request (in the URL) I will only get Tie::Log4perl - but I was asking for a match witch Log::Log4perl.

This looks like a bug to me...

I use the autocomplete functionality at http://perlybook.org/ to guess what the users is asking for. It's very nice because like this you can even handle typos in user input. But in this case, the user just gets a complete wrong result.

Any ideas on that?

@monken
Owner

you will see that the results for http://api.metacpan.org//v0/search/autocomplete?&q=Log++Log4perl and http://api.metacpan.org//v0/search/autocomplete?&q=Log4perl are the same. This is because the query is tokenized to the terms "log", "4", and "perl" (because duplicates are removed). I agree that this is not the result, one would expect, but it's not entirely random. We need to tweak the search algorithm a bit to address this issue.
So thanks for reporting this case!

@dvergin

Similarly, a request at http://perlybook.org/ for "Moo" returns the docs for "ppt". A visit to api.metacpan.org shows that the two have the same score. In this case it would not seem that the issue hinges on the tokening and dup removal that monken describes in connection with Log4perl.

Checking the page http://api.metacpan.org//v0/search/autocomplete?&q=Moo I note that the ppt distribution is shown with: "documentation" : "moo".

Knowing next to nothing about metacpan, I can't speculate about the source of that confusion or whether the "documentation" anomaly is cause or symptom.

@monken
Owner

This (https://github.com/CPAN-API/metacpan-web/blob/master/lib/MetaCPAN/Web/Model/API/Module.pm#L52) is the query that metacpan.org issues at the /autocomplete endpoint. This endpoint does the autocompletion for all modules on the cpan, but you need to filter the results even more to get useful results.

@monken
Owner

Sorry, I was wrong, we send that query to /file/_search instead of /search/autocomplete. Let me check how to fix your query.

@ranguard
Owner

Closing as part of Nov 2014 cleanup

@ranguard ranguard closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.