Mathematical namespace discovery
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Pipeline for Mathematical namespace discovery



  • namespaces (like here)

Running It

git clone
cd namespacediscovery-pipeline/src

Modify luigi.cfg to set different configuration parameters

You need to at least change the following parameters:

  • [MlpResultsReadTask]/mlp_results - path to the output of mlp
  • [MlpResultsReadTask]/categories_processed - path to the category information
  • (optional) [DEFAULT]/intermediate_result_dir - path to directory where pre-calculated results will be stored

Other parameters ([DEFAULT] section):

  • isv_type identifier vector space model, can be nodef, weak or strong
  • vectorizer_dim_red type of dimentionality reduction, can be none, svd, nmf or random
  • clustering_algorithm, now only kmeans is implemented


  • python2
  • numpy
  • scipy
  • scikit-learn
  • nltk
  • python-Levenshtein
  • fuzzywuzzy
  • rdflib
  • luigi

for PyData stack libraries such as numpy, scipy, scikit-learn and nltk it's best to use anaconda installer

Not all dependencies come pre-installed with anaconda, use pip to install them:

pip install python-Levenshtein
pip install fuzzywuzzy
pip install luigi
pip install rdflib

We also need to download some data for nltk: the list of stopwords and the model for tokenization. Run it in the python console to install them:

import nltk'stopwords')'punkt')

see for an example how to set up the environment


We use the following datasets as input:

  • mlp ...
  • dbpedia category information

Classification schemes:

The classification schemes datasets are already available in the data directory.