Skip to content

Helsinki-NLP/Uplug

Repository files navigation

-----------------------------------------------------
 Uplug - NLP tools for processing (parallel) corpora
-----------------------------------------------------

Uplug is a collection of tools and scripts for processing text-corpora


Uplug is provided in several packages:

uplug-main	  main components and scripts
uplug-webalign	  webinterface for interactive alignment (ICA, ISA)
uplug-xx	  language-specific tools and models (<xx> = language ID)
uplug-treetagger  config files for integrating the TreeTagger
uplug-cwb	  scripts for indexing and querying parallel corpora with CWB


INSTALLATION

You can either download the latest sources from bitbucket or install distributed packages. If you want to download the entire Uplug system with all its components, use git for downloading the sources:

 git clone https://bitbucket.org/tiedemann/uplug.git

and run

 make all
 sudo make install
 make test

Note that this will take some time to install. Note also that you need
o agree with the licenses of external tools that will be included like
the TreeTagger and its models which are not free for non-academic use.

Uplug is also distributed in several packages including various
components.  Start by downloading uplug-main and installing the main
components. Select the language packs that you'd like to include and
install them. Look at the documentation and readme's inside of the
packages. Of course, you can also install selected packages from
source when downloading via git.


More information and the newest sources are at

The new project home page:	https://bitbucket.org/tiedemann/uplug
The old pages at SourceForge:	http://sourceforge.net/projects/uplug/


Please cite the following dissertation if you use Uplug:

 @PhdThesis{Tiedemann:PhD03,
  author =	 {J\"org Tiedemann},
  title =	 {Recycling Translations -- {E}xtraction of Lexical
                  Data from Parallel Corpora and their Application in
                  Natural Language Processing},
  school =	 {Uppsala University},
  year =	 2003,
  address =	 {Uppsala, Sweden},
  note =	 {Anna S{\aa}gvall{ }Hein, {\AA}ke Viberg (eds): Studia
                  Linguistica Upsaliensia},
  url =		 {http://uu.diva-portal.org/smash/record.jsf?pid=diva2:163715},
 }


COPYRIGHT AND LICENSE

Copyrhight (C) 2004-2012 Joerg Tiedemann
The software is distributed under the GPL v3 license.