-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Less results than real Google #1004
Comments
Search language already defaults to "all", but some public instance maintainers change this default to be to their preferred setting instead. Same with interface language (and country). Users can still override these settings, but the actual location the instance is hosted at can also affect results as well. I've always recommended that anyone who doesn't get decent results from a public instance should spin up their own or run it locally. The instances I personally run all "passed" the test you outlined. Also, "safe" is already a configurable search param using the home page config, and "nfpr" is enabled when clicking the "exact results only" prompt from the search page. The "filter" param doesn't seem to impact the results, at least from my testing. |
That's one of the problems. Instance maintainers can't be trusted to know what the heck they're doing (as proven), so let's take their toy away.
Many public instances disable customizations. I suppose they formed this tendency because in the past, Whoogle settings used to apply "globally", meaning that if one user changed the settings they would apply to everyone else in the world. Again, take their toy away.
The filter parameter handles the "omitted results". Currently in Whoogle they are problematic as they show like this at the bottom of a page:
but clicking on it links directly to google.com! Sample link that currently shows this behavior https://wg.vern.cc/search?safe=off&gbv=1&q=%22sfendaki%22&nfpr=1 this is a legit bug that should be taken care of, please re-open the issue @benbusby. The easiest fix would be to include |
There's no such thing as real Google results, because Google's results are inconsistent (differ between different browsers, IPs, locales, etc.). If you get 10 results for a query, it doesnt mean that everyone should get the same number of results for this query. And it doesnt mean that you should get the same number(s) from librex, searx, startpage or whoogle. All these projects try to hide your identity in order to give you more neutral results (compared to your filter bubble) than you would get by using Google directly, but they cannot be completely neutral and therefore they cannot be reproducible between users and/or instances. |
The less users use your instance, the less neutral results you get. It kinda defeats the purpose of escaping the filter bubble... unless you make your search queries trough Tor may be but its painfully slow. |
(cross-posted in searxng/searxng#2438, hnhx/librex#225)
It's been years. Google-searching through a "privacy search engine frontend" will rarely find as many results as the real Google.
Here's a simple test to verify that: come up with a unique Google query that will find as few results as possible, preferably not in English. For example in my test I used
"sfendazi"
but you might need your own unique query since the results come and go. Perform the same search on every public instance, and observe how many find the same results (if the results contain garbage unrelated stuff, consider it a failure). This was the outcome yesterday as of 2023-05-15:LibreX instances: 20 tested, 0 work
lmao
Whoogle instances: 17 tested, 3 work
https://s.tokhmi.xyz
https://whoogle.dcs0.hu
https://whoogle.privacydev.net
SearX/SearXNG instances: 92 tested, 19 work if you tweak a setting, only 1 works with defaults
(the only one that works with defaults is https://opnxng.com)
https://priv.au
https://xo.wtf
https://offtheradar.info
https://searx.oakleycord.dev
https://searx.cthd.icu
https://ooglester.com
https://search.bus-hit.me
https://myprivatesrx.us
https://coppedge.info
https://search.neet.works
https://search.zzls.xyz
https://search.us.projectsegfau.lt
https://s.frlt.one
https://searx.sev.monster
https://stalk.antelope.day
https://searx.esmailelbob.xyz
https://search.serginho.dev
https://search.cronobox.one
https://searx.mxchange.org
Those 19 instances I listed think they're "smart" and have set their
Search language
to[auto]
, which auto-selects it based on your browser headers... or they're simply set to something arbitrary, like[en-US]
. Choosing[all]
fixes the problem for them.Meanwhile, the rest of the instances somehow will not find the correct results even when set to
[all]
. From what I've tested with a local SearXNG instance, adding search query parameternfpr=1
(along with the pre-existingsafe=off
andfilter=0
) tosearxng/searx/engines/google.py
fixed it. Here's what they do:nfpr=1
-> Showing results for XXX Search instead for YYY ONsafe=off
-> SafeSearch OFFfilter=0
-> Include omitted results ONChanging the
Interface language
is fine. Actually, I'd argue language auto-detection should happen to the interface, not to the search results filter, which would be consistent with how major search engines work.Honestly, just take the
Search language
option away, it does more harm than good. Or at least make[all]
the default and lock the option behind huge warning signs with flaming skulls that searching will be seriously degraded for everyone if anything other than[all]
is selected. People don't understand this is the equivalent setting they're touching (taken from Google's official advanced search page):TL;DR
Here's a picture to sum up the problem most search frontends are facing:
Proposed fixes:
Search language
and default it to[all]
[auto]
to theInterface language
instead?safe=off&nfpr=1&filter=0
The text was updated successfully, but these errors were encountered: