lemon lexicon for DBpedia
Python Scala Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
de
en small bugs in English lexicon Oct 23, 2014
es
jp
linking
machine_translations
target
test
.gitignore
RDFmerger.py
README.md
allURIs
de_lexicalizedURIs
de_todoURIs
en_lexicalizedURIs
en_todoURIs
es_lexicalizedURIs
es_todoURIs
export.sh
references.ttl
statistics.py

README.md

lemon lexica for DBpedia

The folders en, de, es and jp contain the development versions of an English, German, Spanish and Japanese lexicon for the DBpedia ontology. They comprise several LDP files with entries using the lemon design patterns and pooled by domain (persons, organizations, arts and entertainment, animals and plants, etc.), together with a file containing all entries that could not be created using those patterns but only by writing lemon RDF triples (extra.ttl). Additionally, the file references.ttl defines classes and properties that are not part of the DBpedia ontology but are used in the lexicalizations.

In order to create a single RDF lexicon file, run the export script with the language folder as argument, for example:

$ ./export.sh en

This requires:

The file allURIs contains a list of all URIs in the DBpedia 3.8 ontology (schema but no instance data). Exporting the English lexicon creates the files en_lexicalizedURIs (all URIs that occur in the lexicalizations) and en_todoURIs (all URIs that do not yet occur in any lexicalization).

Further, the statistics.py script outputs the number of verbalizations (per classes, properties and total) as well as the average number of entries and their distribution.

Latest release

English lexicon version 1 (July 2013)

The first release of the English lexicon for DBpedia 3.8 covers 353 classes as well as 300 properties (all those that have more than 10,000 occurrences in the DBpedia dataset, with only a few exceptions).

  • Total lexicalizations: 1,216 (1.8 entries per concept)
  • Class lexicalizations: 443 (1.3 entries per class)
  • Property lexicalizations: 773 (2.4 entries per property)

Published on: lemon-model.net/lexica/dbpedia_en (under Creative Commons BY 3.0 license)

Want to contribute?

If you want to help to improve and extend the lexicon, if you want to port it to others languages, or if you are using the lexicon, we'd love to hear from you!