See also:
You may be tempted to proxy your queries towards a monopolist through a metasearch engine. Here are some reasons why this is not desirable:
- Profile through correlation via timing, typos, some other property of successive queries or the phrase itself
- You may accidentally copy & paste personal data
- Every search you execute may finance the upstream search provider directly by way of commission
- You help improve the upstream search provider by providing input to popularity, phrases, parsing, query refinement and result ranking
- You are not helping improve the competitors by withholding these crowdsourced inputs
- https://www.activesearchresults.com/searchsubmit.php
- own index
- can be used without JavaScript
- https://www.alexandria.org/
- https://github.com/alexandria-org
- FOSS
- own index
- can be used without JavaScript
- https://www.artadosearch.com/search?Button1=Artado&i=test
- own index
- can be used without JavaScript (by typing in the search query in the URL)
- DE
- https://www.base-search.net/
- own index of academical publication
- can be used without JavaScript
- https://www.blog-search.com/cgi-bin/search.cgi
- source: ExactSeek.com
- can be used without JavaScript
- US
- https://search.brave.com
- can be used without JavaScript
- own index
- the search result description seems to be generated from the beginning of the static HTML (Brave)
- US
- https://clew.se/search?q=
- https://codeberg.org/Clew
- FOSS frontend and backend (except crawler)
- can be used without JavaScript
- own index
- https://curlie.org/search
- can be used without JavaScript
- own index
- Canada
- can be used without JavaScript
- own index
- https://ichi.do/search?q=
- can be used without JavaScript
- own index
- host: AWS
- https://infotiger.com/
- can be used without JavaScript
- own index
- also offers a Tor onion address
- https://jambot.com/se?query=
- can be used without JavaScript
- own index
- https://kukei.eu/?q=gemini
- can be used without JavaScript
- own index
- https://search.marginalia.nu/
- FOSS, can also be self-hosted
- https://git.marginalia.nu/marginalia/marginalia.nu
- can be used without JavaScript
- own index
- http://metager.de/
- https://gitlab.metager.de/open-source/MetaGer
- DE
- own index: Scopia
- free metasearch: Yahoo, Yandex
- paid metasearch: Brave, Mojeek, Bing
- returns results within an iframe - opening the link directly allows viewing the results without JavaScript
- https://github.com/bkil/static-wonders.js/blob/master/userjs/metager.de.user.js
- option to open result hits through their proxy
- the search result description seems to be a subset of DuckDuckGo
- https://mojeek.com/
- GB
- can be used without JavaScript
- own index
- the search result description seems to be unique
- https://mwmbl.org/
- https://github.com/mwmbl/mwmbl
- FOSS
- can be used without JavaScript
- own index
- the search result description seems to be generated from an interesting continuous part of the static HTML (mwmbl)
- https://search.naver.com/search.naver?where=nexearch&sm=top_hty&fbm=0&ie=utf8&query=
- Korea
- own index
- can be used without JavaScript
- https://www.qwant.com/
- FR
- has its own index
- also uses Bing
- can be used without JavaScript
- shows a CAPTCHA if searching for too long strings
- the search result description does not match Bing, seems to be faithfully summarized
- https://rightdao.com/
- can be used without JavaScript
- own index
- https://search.ch/
- can only find pages hosted in Switzerland
- can be used without JavaScript
- https://searchmysite.net/
- can be used without JavaScript
- own index, a website can only be added after a subscription
- https://search.seznam.cz/
- Czech
- can be used without JavaScript
- own index
- the search result description seems to be unique
- http://www.secretsearchenginelabs.com/find/
- own index
- can be used without JavaScript
- https://stract.com/search
- https://github.com/StractOrg/stract
- FOSS
- can be used without JavaScript
- own index
- https://www.webcrawler.com/serp
- can be used without JavaScript
- own index
- the search result description seems to match Startpage
- https://github.com/wibyweb/wiby/
- FOSS
- https://wiby.me/
- can be used without JavaScript
- own index
- https://en.wikipedia.org/wiki/Yacy
- FOSS
- https://yacy.searchlab.eu/yacysearch.html?query=yacy
- can be used without JavaScript
- own index
- the search result description seems to be generated from certain boring parts from the beginning of the static HTML (YaCy)
- https://www.yessle.com/index.php?keyword=
- own index
- can be used without JavaScript
- https://greppr.org/
- own index
- requires JavaScript
- https://plumb.one/results/?q=index
- own index
- requires JavaScript
- https://www.seekport.com/?language=en&q=
- own index
- requires JavaScript
- Poland
- https://svmetasearch.eu.org/s/search
- https://codeberg.org/SVWareHouse/SVMetaSearch
- FOSS
- SearxNG metasearch: Qwant, Yahoo, Wikipedia, Wikidata
- also own index (sometimes times out)
- can be used without JavaScript
- https://github.com/RimoChan/sese-engine
- https://github.com/YunYouJun/sese-engine-ui
- https://sese.yyj.moe/search?q=
- FOSS
- own index
- requires JavaScript
- https://yep.com/web
- requires JavaScript
- own index
- the search result description seems to be generated from a combination of the beginning and an interesting part of the static HTML (yep)
- https://www.dogpile.com/serp?q=
- can be used without JavaScript
- source: Bing
- Infospace Holdings LLC, a System1 Company
- the search result description seems to match Bing
- DE
- https://www.ecosia.org/
- source: Bing
- can be used without JavaScript
- the search result description does not match Bing, seems to be faithfully summarized
- https://www.etools.ch/
- metasearch engine: Base, Bing, Brave, DuckDuckGo, Google, Lilo, Mojeek, Qwant, Search, Tiger, Wikipedia, Yahoo, Yandex
- can be used without JavaScript
- the search result description seems to match a subset of Bing
- https://results.excite.com/serp?q=
- can be used without JavaScript
- source: Bing
- the search result description seems to match Bing
- http://www.frogfind.com/
- can be used without JavaScript
- source: DuckDuckGo
- the search result description seems to match DuckDuckGo
- https://ghosterysearch.com/search?q=
- can be used without JavaScript
- search result descriptions seem to match Brave
- https://good-search.org/en/search/?q=
- DE
- can be used without JavaScript
- source: Bing (?)
- search result descriptions seem to match Brave
- https://search.lilo.org/?q=
- source: Bing
- can be used without JavaScript
- the search result description seems to match a subset of Bing
- https://www.metacrawler.com/serp?q=
- can be used without JavaScript
- source: Bing
- the search result description seems to match Bing
- https://monocles.eu/
- DE
- can be used without JavaScript
- Searx
- the search result description seems to match DuckDuckGo
- https://www.nona.de/
- can be used without JavaScript
- source: Bing
- the search result description seems to be unique
- https://www.privacywall.org/search/secure/?q=
- can be used without JavaScript
- source: Bing (?)
- the search result description does not match Bing, seems to be generated from the beginning of the static HTML (PrivacyWall)
- metasearch engine, does not have a crawler
- https://github.com/searx/searx
- https://github.com/searxng/searxng
- https://searx.neocities.org/nojs
- https://searxng.online/search
- https://searx.nixnet.services/
- FOSS
- can be used without JavaScript
- https://whoogle.herokuapp.com/
- https://whoogle.org/
- source: Google
- can be used without JavaScript
- the search result description seems to be summarized with hallucination (Google)
- https://lite.duckduckgo.com/lite/
- USA
- source: Bing (formerly: also Yandex)
- TODO: own index
- can be used without JavaScript
- shows a custom form and image-based CAPTCHA if searching too fast or for long strings, it can be solved without JavaScript
- the search result description seems to be summarized twice with hallucination: one custom and one coming from Bing (DuckDuckGo)
- https://search11.lycos.com/web/?q=
- can be used without JavaScript
- source: Yahoo (Bing)
- the search result description seems to match Bing
- http://startpage.com/
- NL
- owned by adtech company System1
- source: Bing (formerly: Google)
- formerly: lxquick
- the search result description does not match Bing: seems to be generated from an interesting part of the static HTML (StartPage)
- prescribes the use of JavaScript, but results are actually visible with the following CSS addition:
#root {
opacity: initial;
}
- https://4get.ca/
- metasearch engine: DuckDuckGo, Brave, Yandex, Google, Qwant, Yep, Greppr, Crowdview, mwmbl, mojeek, marginalia, wiby, curlie
- can be used without JavaScript
- need to solve an image ticker CAPTCHA every 100 search queries (FIXME somehow always asks it, maybe due to cookies?), it works without JavaScript or cookies
- https://alohafind.com/search/?q=
- requires JavaScript
- https://crowdview.ai/
- requires JavaScript
- https://www.ekoru.org/?q=
- source: Yahoo (Bing)
- requires JavaScript
- https://www.findx.com/search?noscript=1&q=
- requires JavaScript
- Source: Bing
- https://gibiru.com/
- requires JavaScript
- https://search.gmx.com/web/result?q=
- requires JavaScript
- source: Google, YouTube
- https://oceanhero.today/web?q=secure
- requires JavaScript
- source: Bing
- https://presearch.com/
- requires JavaScript
- https://www.search.com/
- requires JavaScript
- https://www.swisscows.com
- Switzerland
- requires JavaScript
- source: Bing
- https://tiger.ch/
- metasearch: Alugha, Ask, Bing, Brave, Dailymotion, DuckDuckGo, Google.ch, Mojeek, Nona, Search.ch, Vimeo, Wiki-Tube.de, Wikipedia, Youtube
- can only find pages hosted in Switzerland
- requires JavaScript
- https://youcare.world/all?q=
- requires JavaScript
- https://github.com/chatnoir-eu
- https://github.com/capjamesg/indieweb-search
- https://git.mills.io/prologic/spyda
- https://github.com/presearchofficial
- https://en.wikipedia.org/wiki/Lemur_Project
- https://gitlab.com/users/infinitysearch/projects
- https://github.com/orgs/WordPress/repositories?q=openverse&type=all&language=&sort=
- https://seekseek.org/technology
- https://github.com/gigablast/open-source-search-engine
- https://www.baidu.com
- China
- can be used without JavaScript
- will start blocking if you search for the wrong keywords
- https://www.bing.com/
- US
- the search result description seems to be summarized with hallucination (Bing)
- https://www.sogou.com/web?query=
- China
- can be used without JavaScript
- https://www.yandex.com/
- Russia
- can be used without JavaScript
- the search result description seems to be summarized with hallucination (Yandex)
- https://en.wikipedia.org/wiki/Reverse_image_search#Application_in_popular_search_systems
- https://en.wikipedia.org/wiki/List_of_CBIR_engines
- https://en.wikipedia.org/wiki/Content-based_image_retrieval#Techniques
- https://badbot.org/
- https://en.wikipedia.org/wiki/Comparison_of_web_search_engines#Search_crawlers
- https://gist.github.com/JoeyBurzynski/9953198b825b9a8675715220586fb494
- https://github.com/CapnDucks/scripts/blob/master/bot-regex
- https://gist.github.com/JoeyBurzynski/b0b0606e2817158455997797e42e78c7
- https://gist.github.com/gaffling/9ed6a55023530d8440f880958e248ebb
- https://github.com/mbilozub/angular-prerender-test/blob/master/.htaccess
- https://github.com/fabiomb/is_bot/blob/master/is_bot.php
- https://github.com/prescience-data/php-cloaker/blob/master/header.php#L237
- https://github.com/VeliovGroup/ostrio/blob/master/docs/prerendering/nginx.md#simple-proxy-pass
- https://github.com/Thomas--F/BotTracker/blob/master/botlist.txt
- https://github.com/e107inc/e107/blob/master/e107_handlers/user_model.php
- https://github.com/flumono/WP_demo/blob/main/robots.txt
- https://github.com/lesterchan/wp-useronline/blob/master/bots.php
- https://jamesbachini.com/robots-disallow-all/
- https://github.com/monperrus/crawler-user-agents/blob/master/crawler-user-agents.json
- https://stackoverflow.com/questions/20084513/detect-search-crawlers-via-javascript
- https://advancedweb.fr/detecter-les-robots-de-recherche-via-javascript/
- https://community.centminmod.com/threads/blocking-bad-or-aggressive-bots.6433/
- https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/
- https://seirdy.one/posts/2021/03/10/search-engines-with-own-indexes/
- https://github.com/mwmbl/crawler-extension/blob/main/src/worker.js
- https://github.com/alexmolas/microsearch/blob/main/src/microsearch/engine.py
Crawler roots:
- https://domainsproject.org/
- https://github.com/tb0hdan/domains
- https://github.com/Kukei-eu/spider/blob/914b8dfffc10cb3a948561aef2bf86937d3a0b2e/index-sources.js
- https://www.exalead.com/search/
- FR
- TODO: what other providers does it use?
- can be used without JavaScript
- https://gigablast.com/
- USA
- FOSS, can also be self-hosted
- requires JavaScript, but copying the
rand=
andpxb=
values from the HTML source and appending&fromjs=1&rand=...&opxb=...
to the end shows the results - https://github.com/bkil/static-wonders.js/blob/master/userjs/gigablast.com.user.js
- https://seirdy.one/posts/2021/03/10/search-engines-with-own-indexes/
- https://thenewleafjournal.com/a-2021-list-of-alternative-search-engines-and-search-resources/
- https://www.searchenginemap.com/
- https://searchengine.party/
- https://en.wikipedia.org/wiki/List_of_academic_databases_and_search_engines
- https://wutsearch.com/