Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time
 Uplug - NLP tools for processing (parallel) corpora

Uplug is a collection of tools and scripts for processing text-corpora

Uplug is provided in several packages:

uplug-main	  main components and scripts
uplug-webalign	  webinterface for interactive alignment (ICA, ISA)
uplug-xx	  language-specific tools and models (<xx> = language ID)
uplug-treetagger  config files for integrating the TreeTagger
uplug-cwb	  scripts for indexing and querying parallel corpora with CWB


You can either download the latest sources from bitbucket or install distributed packages. If you want to download the entire Uplug system with all its components, use git for downloading the sources:

 git clone

and run

 make all
 sudo make install
 make test

Note that this will take some time to install. Note also that you need
o agree with the licenses of external tools that will be included like
the TreeTagger and its models which are not free for non-academic use.

Uplug is also distributed in several packages including various
components.  Start by downloading uplug-main and installing the main
components. Select the language packs that you'd like to include and
install them. Look at the documentation and readme's inside of the
packages. Of course, you can also install selected packages from
source when downloading via git.

More information and the newest sources are at

The new project home page:
The old pages at SourceForge:

Please cite the following dissertation if you use Uplug:

  author =	 {J\"org Tiedemann},
  title =	 {Recycling Translations -- {E}xtraction of Lexical
                  Data from Parallel Corpora and their Application in
                  Natural Language Processing},
  school =	 {Uppsala University},
  year =	 2003,
  address =	 {Uppsala, Sweden},
  note =	 {Anna S{\aa}gvall{ }Hein, {\AA}ke Viberg (eds): Studia
                  Linguistica Upsaliensia},
  url =		 {},


Copyrhight (C) 2004-2012 Joerg Tiedemann
The software is distributed under the GPL v3 license.