Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does the Ranking work? #1790

Open
maximilianpreisinger opened this issue Dec 30, 2019 · 6 comments
Open

How does the Ranking work? #1790

maximilianpreisinger opened this issue Dec 30, 2019 · 6 comments
Labels

Comments

@maximilianpreisinger
Copy link

@maximilianpreisinger maximilianpreisinger commented Dec 30, 2019

Hi Guys,

I searched the internet for a "configurable meta search engine", and I found THIS! Amazing, that was exactly what I was looking for. I really like your project and at the moment I am getting familiar with Searx. However, I have to say, for my taste the documentation (for users) is very poor. There are a lot of questions popping up my mind, for which I can't find an answer here or elsewhere.

One of the most important questions is: How does the website ranking work in Searx? We know that different search engines use different metrics and therefore the order of the websites differ (and also, which websites are shown at all and which are not). At least, that is the reason why you would use a meta search engine. To collect the result from multiple sources.
So the question is: How does the Searx algorithm decide, which result from which engine will be featured at position 1, 2, etc...? I tried to figure this out, but without success. Depending on the keyword, different search engines overweight. And at the beginning (page 1), two search engines often present their result in turns, while on the next page, all results are only from one search engine (however, the other search engine must have a lot more results, too?)

Also (this maybe could be a separate feature request, if it does not exist yet), is there an easy way to filter for special search engines right in the window with the search results? (Without going to preferences, toggling switches, and repeating the search? This seems very laborious...)

So far: I wish you success for the next year, and I am looking forward to start testing and tweaking Searx a bit more.

@return42

This comment has been minimized.

Copy link
Collaborator

@return42 return42 commented Dec 30, 2019

The ranking is set per engine ..

The operator of a searx instance can set the weight of each engine (e.g. https://github.com/asciimoo/searx/blob/master/searx/settings.yml#L161 ) / a normal user won't change (nor see) what the weight is.

filter for special search engines right in the window with the search results?

no .. your PR is welcome :)

I have to say, for my taste the documentation (for users) is very poor.

Yes our documentation misses many information. In the past it was a bit confusing how to add documentation therefore we reorganized the documentation process. And I released the new doc build at this second:

https://asciimoo.github.io/searx/

If you think your issue isn't solved, fell free to reopen.

@return42 return42 closed this Dec 30, 2019
@maximilianpreisinger

This comment has been minimized.

Copy link
Author

@maximilianpreisinger maximilianpreisinger commented Dec 30, 2019

@return42 Many thanks for your answer! The updated documentation looks way better than half an hour ago ;) Really nice (however not quite finished yet, as it looks like :P )

Thanks also for the link to the settings file. This is a first good hint. However, it does not clarify my question completely. (By the way: I didn't find the weight option in the new documentation as well. Would be cool, if that could be added and explained, how it works).

Let me give you an example. I enable the search engines wikidata (wd) (weight 2) and duckduckgo (ddg) (no weight, so per default =1 I guess), and search for a keyword.
The two engines return the following result:
wd: A, B, C, D
ddg: a, b, c, d

So, I know that I have to multiply the results from wd by 2, because of the weight. But what initial weight do the results A, B, C,... have? In the new docu, it says that the returned results (https://asciimoo.github.io/searx/dev/engine_overview.html#returned-results) are: url, title, content and publishedDate. No initial weight however. So how does Searx display the query? Is it like:
A, B, C, D, a, b, c, d
because wd has priority due to the weight, or is it like
A, a, B, b, C, c, D, d
or is it something else? What is the initial weight of A, B, ..., a, b, ...?

I am assuming here, that both engines return completely different results. A follow up question would be: What happens, if A == a for example. Would the weights simply be added?

@maximilianpreisinger

This comment has been minimized.

Copy link
Author

@maximilianpreisinger maximilianpreisinger commented Dec 30, 2019

By the way, I think I cannot reopen this issue. At least i didn't find out how...

@return42 return42 reopened this Dec 30, 2019
@asciimoo

This comment has been minimized.

Copy link
Owner

@asciimoo asciimoo commented Jan 2, 2020

@maximilianpreisinger the scoring algo is quite simple. This is the actual code to determine a score of a result:

def result_score(result):
    weight = 1.0

    for result_engine in result['engines']:
        if hasattr(engines[result_engine], 'weight'):
            weight *= float(engines[result_engine].weight)

    occurences = len(result['positions'])

    return sum((occurences * weight) / position for position in result['positions'])

The score is determined by sum((occurences * weight) / position for position in result['positions']) and the code above it calculates the weight value by multiplying the weights of every engine where the result appeared or keeping the default (1.0) weight if the engines have no custom weight set.

EDIT:
You can see the scores and positions of every result if you use JSON output format:
e.g. curl '127.0.0.1:8888/?q=test&format=json' | jq

@asciimoo asciimoo added the question label Jan 2, 2020
@return42 return42 added the doc label Jan 3, 2020
@return42

This comment has been minimized.

Copy link
Collaborator

@return42 return42 commented Jan 3, 2020

Thanks for the question and the answer. Please let the issue open, I will add a info to the docs and close it afterwards.

@maximilianpreisinger

This comment has been minimized.

Copy link
Author

@maximilianpreisinger maximilianpreisinger commented Jan 8, 2020

Thank you for this answer! The sorting is very clear and transparent now for me (and I also know, how to tweak it, if I am not satisfied with the order of the results, via the config file).
If this answer would make it into the docs in an comprehensible way, that would be awesome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.