public
Description: Aino - a content-based search engine
Homepage: http://www.cs.helsinki.fi/u/tuulos/
Clone URL: git://github.com/tuulos/aino.git
aino /
name age message
file COPYING Sat Aug 09 01:09:14 -0700 2008 README and COPYING added [Ville Tuulos]
file README Sat Aug 09 01:09:14 -0700 2008 README and COPYING added [Ville Tuulos]
file SConstruct Sat Aug 09 01:08:43 -0700 2008 Removed checks for unnecessary dependencies [Ville Tuulos]
directory aino.fi/ Sun Jan 06 00:14:05 -0800 2008 aino.fi indexing scripts NB! There's a known bu... [nodeuser]
directory ainopy/ Mon Aug 04 01:31:13 -0700 2008 Added an optional num_docs-parameter to ainodex... [Ville Tuulos]
directory examples/ Tue Jan 22 13:30:40 -0800 2008 Simple example added [Ville Tuulos]
directory index/ Tue Aug 05 03:38:53 -0700 2008 bugfix: query_proc didn't work correctly in did... [Ville Tuulos]
directory lang/ Mon Dec 31 15:06:43 -0800 2007 Initial commit. This is the version that was us... [Ville Tuulos]
directory lib/ Sat Jul 12 19:08:00 -0700 2008 related to the previous commit: added a range f... [Ville Tuulos]
directory preproc/ Mon Aug 04 03:52:15 -0700 2008 Added REMOVE_PREFIX flag to create_ixicon. Remo... [Ville Tuulos]
directory utils/ Mon Dec 31 15:06:43 -0800 2007 Initial commit. This is the version that was us... [Ville Tuulos]
directory wp/ Wed Apr 22 22:11:38 -0700 2009 Cleaned indexwp scripts [Ville Tuulos]
README
August 9th 2008 / Ville Tuulos


Aino - a Content-Based Search Engine
------------------------------------

This package contains source code for the Aino search engine. In brief,
Aino is a system for building optimized indices for documents made of
discrete attributes (words). Once the index has been built, documents in
the index can be ranked against queries using statistical measures based
on attribute co-occurrences.

Aino was developed in the Complex System Computation (CoSCo) research
group at the Helsinki Institute for Information Technology and
University of Helsinki, as part of the project Search-In-a-Box (SIB). Our
home page is at http://cosco.hiit.fi.

I apologize that the package doesn't contain proper documentation (yet).
However, the overall design of the system is described in my M.S. thesis,
"Design and Implementation of a Content-Based Search Engine" that can be 
found at

http://www.cs.helsinki.fi/u/tuulos/tuulos-thesis.pdf

As I'm fully responsible for the lacking documentation, you're very welcome
to contact me directly if you have any questions about the package. Reach 
me at tuulos@cs.helsinki.fi.


Contents
--------

This package contains the following subdirectories:

preproc/   Indexing pipeline. See wp/index/indexwp.sh to get an idea how 
           different programs work together. You need this stuff for 
     building new indices from raw text documents.

index/     Query processing and ranking. index/dexvm.[c|h] is the core 
           interface for the index. You need this stuff for ranking documents
     in the index, given a query.

ainopy/    Python interface for the index. You need this stuff if you want to
           use the indices in Python, which is currently the preferred way to
     use them.

lib/       Utility modules.

lang/     Snowball-library and modules related to language recognition and 
           stemming.

wp/     Scripts for building distributed search engine for Wikipedia

aino.fi/   Scripts for building aino.hiit.fi

utils/     Scripts used in compiling

examples/  An example script that shows how to rank documents using the Python 
           interface


Compiling
---------

Aino depends on the following packages in Debian:

scons
binutils-dev
libjudy-dev
python2.4-dev
flex
gcc
libc6-dev

It should be easy to find the corresponding packages in other distros.

Aino uses a build system called Scons (http://scons.org) instead of traditional 
Makefiles. To compile the package, say

scons
scons

(twice, really -- I'll fix it at some point..)


That's all for now. Happy hacking!