Skip to content

Latest commit

 

History

History
602 lines (437 loc) · 14.9 KB

web-search.md

File metadata and controls

602 lines (437 loc) · 14.9 KB

Web search engines

See also:

Concept

You may be tempted to proxy your queries towards a monopolist through a metasearch engine. Here are some reasons why this is not desirable:

  • Profile through correlation via timing, typos, some other property of successive queries or the phrase itself
  • You may accidentally copy & paste personal data
  • Every search you execute may finance the upstream search provider directly by way of commission
  • You help improve the upstream search provider by providing input to popularity, phrases, parsing, query refinement and result ranking
  • You are not helping improve the competitors by withholding these crowdsourced inputs

Independent with index

activesearchresults.com

alexandria.org

artadosearch.com

base-search.net

blog-search.com

brave.com

  • US
  • https://search.brave.com
  • can be used without JavaScript
  • own index
  • the search result description seems to be generated from the beginning of the static HTML (Brave)

clew.se

curlie.org

exactseek.com

  • Canada
  • can be used without JavaScript
  • own index

ichi.do

infotiger.com

jambot.com

kukei.eu

Marginalia

MetaGer

Mojeek

  • https://mojeek.com/
  • GB
  • can be used without JavaScript
  • own index
  • the search result description seems to be unique

mwmbl.org

naver.com

Qwant.com

  • https://www.qwant.com/
  • FR
  • has its own index
  • also uses Bing
  • can be used without JavaScript
  • shows a CAPTCHA if searching for too long strings
  • the search result description does not match Bing, seems to be faithfully summarized

rightdao.com

search.ch

  • https://search.ch/
  • can only find pages hosted in Switzerland
  • can be used without JavaScript

searchmysite.net

search.seznam.cz

secretsearchenginelabs.com

stract.com

webcrawler.com

wibyweb

YaCy

yessle.com

Worrisome with index

greppr.org

plumb.one

seekport.com

svmetasearch.eu.org

sese.yyj.moe

yep.com

  • https://yep.com/web
  • requires JavaScript
  • own index
  • the search result description seems to be generated from a combination of the beginning and an interesting part of the static HTML (yep)

Independent proxy

dogpile.com

  • https://www.dogpile.com/serp?q=
  • can be used without JavaScript
  • source: Bing
  • Infospace Holdings LLC, a System1 Company
  • the search result description seems to match Bing

ecosia.org

  • DE
  • https://www.ecosia.org/
  • source: Bing
  • can be used without JavaScript
  • the search result description does not match Bing, seems to be faithfully summarized

etools.ch

  • https://www.etools.ch/
  • metasearch engine: Base, Bing, Brave, DuckDuckGo, Google, Lilo, Mojeek, Qwant, Search, Tiger, Wikipedia, Yahoo, Yandex
  • can be used without JavaScript
  • the search result description seems to match a subset of Bing

excite.com

frogfind.com

  • http://www.frogfind.com/
  • can be used without JavaScript
  • source: DuckDuckGo
  • the search result description seems to match DuckDuckGo

ghosterysearch.com

good-search.org

search.lilo.org

  • https://search.lilo.org/?q=
  • source: Bing
  • can be used without JavaScript
  • the search result description seems to match a subset of Bing

metacrawler.com

monocles.eu

  • https://monocles.eu/
  • DE
  • can be used without JavaScript
  • Searx
  • the search result description seems to match DuckDuckGo

nona.de

  • https://www.nona.de/
  • can be used without JavaScript
  • source: Bing
  • the search result description seems to be unique

privacywall.org

Searx

whoogle.org

Monopolist proxy

DuckDuckGo.com

  • https://lite.duckduckgo.com/lite/
  • USA
  • source: Bing (formerly: also Yandex)
  • TODO: own index
  • can be used without JavaScript
  • shows a custom form and image-based CAPTCHA if searching too fast or for long strings, it can be solved without JavaScript
  • the search result description seems to be summarized twice with hallucination: one custom and one coming from Bing (DuckDuckGo)

lycos.com

Startpage

  • http://startpage.com/
  • NL
  • owned by adtech company System1
  • source: Bing (formerly: Google)
  • formerly: lxquick
  • the search result description does not match Bing: seems to be generated from an interesting part of the static HTML (StartPage)
  • prescribes the use of JavaScript, but results are actually visible with the following CSS addition:
#root {
  opacity: initial;
}

Worrisome proxy

4get.ca

  • https://4get.ca/
  • metasearch engine: DuckDuckGo, Brave, Yandex, Google, Qwant, Yep, Greppr, Crowdview, mwmbl, mojeek, marginalia, wiby, curlie
  • can be used without JavaScript
  • need to solve an image ticker CAPTCHA every 100 search queries (FIXME somehow always asks it, maybe due to cookies?), it works without JavaScript or cookies

alohafind.com

crowdview.ai

ekoru.org

findx.com

gibiru.com

gmx.com

oceanhero.today

presearch.com

search.com

swisscows.com

tiger.ch

  • https://tiger.ch/
  • metasearch: Alugha, Ask, Bing, Brave, Dailymotion, DuckDuckGo, Google.ch, Mojeek, Nona, Search.ch, Vimeo, Wiki-Tube.de, Wikipedia, Youtube
  • can only find pages hosted in Switzerland
  • requires JavaScript

youcare.world

Self-hosted FOSS

Monopolists

baidu.com

  • https://www.baidu.com
  • China
  • can be used without JavaScript
  • will start blocking if you search for the wrong keywords

bing.com

  • https://www.bing.com/
  • US
  • the search result description seems to be summarized with hallucination (Bing)

sogou.com

Yandex.com

  • https://www.yandex.com/
  • Russia
  • can be used without JavaScript
  • the search result description seems to be summarized with hallucination (Yandex)

Image search

Implementing a crawler

Crawler roots:

Defunct

ExaLead

Gigablast

References