DBpedia Open Text Extraction Challenge - a never ending knowledge acquisition spiral
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


DBpedia Open Text Extraction Challenge - a never ending knowledge acquisition spiral

Join the chat at https://gitter.im/NLP2RDF/DBpediaOpenDBpediaTextExtractionChallenge

The DBpedia Open Text Extraction Challenge differs significantly from other challenges in the language technology and other areas in that it is not a one time call, but a continuous growing and expanding challenge with the focus to sustainably advance the state of the art and transcend boundaries in a systematic way. The DBpedia Association and the people behind this challenge are committed to provide the necessary infrastructure and drive the challenge for an indefinite time as well as potentially extend the challenge beyond Wikipedia.

We provide the extracted and cleaned full text for all Wikipedia articles from 9 different languages in regular intervals for download and as Docker in the machine readable NIF-RDF format. Challenge participants are asked to wrap their NLP and extraction engines in Docker images and submit them to us. We will run participants’ tools in regular intervals in order to extract: Facts, relations, events, terminology, ontologies as RDF triples (Triple track) Useful NLP annotations such as pos-tags, dependencies, co-reference (Annotation track)

Get more at http://wiki.dbpedia.org/textext


  • nlp2rdf/dbpediaopendbpediatextextractionchallenge:tools

How to run

  1. Create a volume to store the data

docker volume create --name nif-datasets

  1. Start our software image

docker run -v nif-datasets:/home/developer -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -it nlp2rdf/dbpediaopendbpediatextextractionchallenge:tools bash

  1. Run once our install script (Get some coffee. This will take some time)


Now, you are ready to start to work. Our software image contains some useful tools to help you to take advantage of NIF


NIF / Turtle files:

String content = ResourceLoader.getContent("PATH");

  NIFParser parser = new NIFParser(content);
  NIF nif = parser.getNIF();

N3 files:

Stream<String> content = ResourceLoader.getStream("PATH");

   NTripleParser parser = new NTripleParser(content);

   List<NIF> nif = parser.getNIF();

Didn't found a software or a info that you need and not in list? Please tell us opening an issue

How to submit

Create a repo with the name dbpediaopendbpediatextextractionchallenge at DockerHub;

Backup your Docker volume;

tar cvfz content.tgz /var/lib/docker/volumes/nif-datasets/_data

Create a Dockerfile with your Docker volume content

FROM java:8

ADD content.tgz /opt

Create an image using your Dockerfile

docker build -t response .

Then push this image to DockerHub and open an issue to telling us your repository name

Supported Docker versions

This image is officially supported on Docker version 1.9.1.

Please see the Docker installation documentation for details on how to upgrade your Docker daemon.


Documentation for this image is stored in GitHub repo.


If you have any problems with or questions about this image, please contact us through a GitHub issue.