Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Building blocks for Information Retrieval & Machine Learning
Python C++ C
Branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
mekano
support
tests
LICENSE
MANIFEST.in
README.markdown
setup.cfg
setup.py

README.markdown

Mekano

Provides low-level building blocks for information retrieval and machine learning, with a special focus on text processing.

Features

  • Representing text documents as sparse vectors
  • Representing a collection of documents as a dataset, which can be subsetted for cross-validation etc.
  • Evaluation using various metrics
  • Reading various common input formats like SMART and TREC
  • Parsing and tokenizing text
  • Maintaining corpus statistics (term frequecies), creating inverted indexes
  • Creating weighted document vectors (TF--IDF) based on corpus statistics

Most of the code is in Python, with some crucial functions implemented in C++.

See http://www.cs.cmu.edu/~alad/mekano for documentation.

Installation

python setup.py install

Dependencies:

  • python >= 2.6
  • cython >= 0.10
  • numpy >= 1.1.1 (required by evaluator.py)
Something went wrong with that request. Please try again.