Helios - Open Source Meta-Search Engine
C C++
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
parser
src
ver
.gitignore
LICENSE
Makefile
README.TXT
README.md
engines.dat

README.md

Helios - Meta Search Engine

=========================

History

I developed Helios in 2004 as part of my Master Thesis at the University of Pisa. Back then computers were not as powerful as they are today, they had very little RAM and threaded programming was not fully developed. Thus, I developed Helios using select() and other low I/O programming tecniques to minimize the resources requirements. It worked pretty well and the benchmarks can be found in my thesis.

The internet changed in the last years, Google is basically the only search engine which matters right now and I do not think any of the parsers works anymore. But the code was well written so distributing it on GitHub make sense.

How to Add a New Engine

Add a new engine is very simple. First, you have to create a new entry in the file engines.dat in the main directory of the software. Then, you create an empty file with the name of the engine and with the extention ".prs" in the "/parser" directory.

Now launch the software as

  helios -d -p <query> <newEngineName>

this will make HELIOS query the new search engine, without considering the parser script (-p) and duping (-d) on the hard disk the data received. Now you can go in the parser scripts directory, and create the ad-hoc parser script.

Entries in the Engines.dat file

The new entry should look like

   [newengine]
   host					=	216.239.59.104
   port					=	80
   desidered_pages	=	1
   result_per_page	=	100
   next_page_calc		=	2
   {
   	GET /search?q=%QUERYSTR&num=%RESULTS&start=%PAGE HTTP/1.0
   	HOST: www.newengine.com

   }

Between the square brakets there is the name of the new engine. In you write the address of the engine, that can either be in dot-format or as domain name (slower). The parameter is self explanatory. In <desidered_pages> you write the default number of pages that you desire to download sequentially from the new engine, and in <result_per_page> the default number of results that you want in each page.

This two last parameters will be used if HELIOS will be launched with the name of the engine, without any specific parameter. As

   helios flowers newengine

they can be overrided with the syntax

   helios flowers newengine=<r>,<p>,<s>

where determine the number of results in each page,

is the number of sequential pages to download and is the first page to download. These values will be substituted to %RESULTS and %PAGE. The query, obviously will be substituted to %QUERYSTR.

The field <next_page_calc> determine how the number of the page is handled by the search engine. If I desire page 5, for example, some search engine accept the number 5 as it is, while others use the number of results (like 50, if I requested 10 results for each page), and so on. The various scheme supported by HELIOS are

   0) %PAGE = page
   1) %PAGE = page + 1
   2) %PAGE = page * rpp
   3) %PAGE = (page * rpp) + 1

and should embrace all the possibilities.

The curling brackets contain the HTTP request, with the variables. You can write whatever you want here, but remember to leave an empty line at the end.

The file ENGINES.DAT has to be terminated with an empty line. Comments are defined by the character # and skipped by the system.