jp contain the development versions of an English, German, Spanish and Japanese lexicon for the DBpedia ontology.
They comprise several LDP files with entries using the lemon design patterns
and pooled by domain (persons, organizations, arts and entertainment, animals and plants, etc.),
together with a file containing all entries that could not be created using those patterns
but only by writing lemon RDF triples (
extra.ttl). Additionally, the file
references.ttl defines classes and properties
that are not part of the DBpedia ontology but are used in the lexicalizations.
In order to create a single RDF lexicon file, run the
export script with the language folder as argument, for example:
$ ./export.sh en
allURIs contains a list of all URIs in the DBpedia 3.8 ontology (schema but no instance data).
Exporting the English lexicon creates the files
en_lexicalizedURIs (all URIs that occur in the lexicalizations)
en_todoURIs (all URIs that do not yet occur in any lexicalization).
statistics.py script outputs the number of verbalizations (per classes, properties and total)
as well as the average number of entries and their distribution.
English lexicon version 1 (July 2013)
The first release of the English lexicon for DBpedia 3.8 covers 353 classes as well as 300 properties (all those that have more than 10,000 occurrences in the DBpedia dataset, with only a few exceptions).
- Total lexicalizations: 1,216 (1.8 entries per concept)
- Class lexicalizations: 443 (1.3 entries per class)
- Property lexicalizations: 773 (2.4 entries per property)
Want to contribute?
If you want to help to improve and extend the lexicon, if you want to port it to others languages, or if you are using the lexicon, we'd love to hear from you!