Skip to content
A multilingual lexical and semantic resource that links words of natural languages to abstract semantic concepts. Also called U++ Common UNL Dictionary.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data update Dec 23, 2018
docs
snapshots New data files format Jan 4, 2015
CHANGELOG Imported to GitLab Jun 6, 2018
LICENSE.CC main directory structure Jan 28, 2014
LICENSE.GPL
README.md

README.md

Universal Dictionary of Concepts

Common U++ UNL dictionary

This is a multilingual pivot dictionary designed for various computer linguistic applications. An artificial language UNL (Universal Networking Language) http://www.undl.org is used as a neutral pivot to link words of different natural languages with each other. The basic units of the dictionary are "UNL concepts" or "Interlingual lexemes", which are usually equivalent to individual word senses. They are uniquely labeled with UNL "Universal Words" (UW). Each UW in the dictionary may be linked to one or several words in different natural languages (NL).

For example, in order to translate a word into another human language we should 1) get the list of its possible senses 2) choose one or more senses, that we are interested in, and get the corresponding UWs 3) check what words may be used to represent this sense in the target language.

For example, the English word milk has several senses:

  • milk(icl>dairy_product>thing) "A white nutritious liquid secreted by mammals.." [semantic class "Milk"]
  • milk(icl>foodstuff>thing) "Any of several nutritive milklike liquids" [semantic class "Beverage"]
  • milk(iof>river>thing) "A tributary of the Missouri river" [instance of the semantic class "River"]

Your appication will know that the first two can be translated into Russian as молоко, French as lait, Malay as susu, etc., but in the last case it should be left as is or transliterated, e.g. as Милк in Russian.

The dictionary consists of several volumes, one for each language/orthography variant/dialect. An additional volume holds links between UWs. Each NL volume contains links between words of some natural language and UNL UWs. It may also provide links to the dictionaries of computer systems used to convert NL text to UNL or deconvert UNL back to the NL. The idea is to enable continuous automated dictionary data exchange and synchronization between different systems that support UNL for the sake of interoperability. The UNL volume provides definitions of the UWs, and links to other semantic resources.

This repository provides only data. There is no dedicated GUI shell to view or edit it here. To browse the data locally you can get xdxf formatted version in data/xdxf, which is compatible with several dictionary shells.

License:

This data as a whole is available under the terms of GNU General Public License v3 or later and Creative Commons License (CC-BY-SA). It means that you are welcome to download, view, use, convert to other formats, modify and distribute the data. Commercial use and distribution in opensource GPL projects is permitted by GPL. Other terms may be negotiated with the authors and members of the U++ Consortium: http://www.unl.fi.upm.es/consorcio/index.php?estado=publicaciones&idioma=ingles&grupo=Publico IITP RAS (Moscow), GETALP (Grenoble), UPM (Madrid), IIT (Mumbai).

Data from external resources may also fall under futher licenses, notably WOLF is under CeCILL-C. The NL volumes may include bindings of the NL words to the dictionaries of linguistic processors, such as ETAP-3 (Russian, English) and Ariane (French). These systems and their dictionaries are not covered or restricted in any way by the licenses above.

Authors:

  • Viacheslav Dikonov - Russian and initial versions of the French, Malay and Vietnamese volumes.
  • Spanish UNL center at UPM (Madrid) - Initial set of UWs generated automatically from Princeton Wordnet 2.1.
  • French UNL center "Study Group for Machine Translation and Automated Processing of Languages and Speech" at the Laboratory of Informatics (LIG) in Grenoble (UMR CNRS/INPG/UJF 5217) - French dictionary, dBPedia conversion, terminology and support in making the initial version of the Russian volume.
  • Center for Indian Language Technology (CFILT) at IIT Mumbai - Hindi dictionary and the Hindi Wordnet

Contributors:

  • Dr. Enya Kong Tang - Malay data
  • Dr. Hung Vo Trung - Vietnamese data
  • Mathieu Mangeot - Pivax system

Other resources used:

  • Princeton Wordnet 2.1, 3.0
  • Hindi wordnet, part of the Indonet project
  • WOLF French Wordnet (part of the French volume)
  • GPL subset of a Spanish Wordnet translation
  • FEV dictionary for Vietnamese
  • SUMO
  • DBPedia

Related links:

Online demos of UNL enabled MT systems:

  • Russian/English ⇄ UNL converter and deconverter is available at http://www.unl.ru.

Other UNL dictionaries:

Git repositories

Our resource is hosted at Gitlab and mirrored at GitHub.

You can’t perform that action at this time.