Building blocks for Information Retrieval & Machine Learning
Python C++ C
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
mekano
support
tests
LICENSE
MANIFEST.in
README.markdown
setup.cfg
setup.py

README.markdown

Mekano

Provides low-level building blocks for information retrieval and machine learning, with a special focus on text processing.

Features

  • Representing text documents as sparse vectors
  • Representing a collection of documents as a dataset, which can be subsetted for cross-validation etc.
  • Evaluation using various metrics
  • Reading various common input formats like SMART and TREC
  • Parsing and tokenizing text
  • Maintaining corpus statistics (term frequecies), creating inverted indexes
  • Creating weighted document vectors (TF--IDF) based on corpus statistics

Most of the code is in Python, with some crucial functions implemented in C++.

See http://www.cs.cmu.edu/~alad/mekano for documentation.

Installation

python setup.py install

Dependencies:

  • python >= 2.6
  • cython >= 0.10
  • numpy >= 1.1.1 (required by evaluator.py)