DBpedia Open Text Extraction Challenge - a never ending knowledge acquisition spiral
The DBpedia Open Text Extraction Challenge differs significantly from other challenges in the language technology and other areas in that it is not a one time call, but a continuous growing and expanding challenge with the focus to sustainably advance the state of the art and transcend boundaries in a systematic way. The DBpedia Association and the people behind this challenge are committed to provide the necessary infrastructure and drive the challenge for an indefinite time as well as potentially extend the challenge beyond Wikipedia.
We provide the extracted and cleaned full text for all Wikipedia articles from 9 different languages in regular intervals for download and as Docker in the machine readable NIF-RDF format. Challenge participants are asked to wrap their NLP and extraction engines in Docker images and submit them to us. We will run participants’ tools in regular intervals in order to extract: Facts, relations, events, terminology, ontologies as RDF triples (Triple track) Useful NLP annotations such as pos-tags, dependencies, co-reference (Annotation track)
Get more at http://wiki.dbpedia.org/textext
How to run
- Create a volume to store the data
docker volume create --name nif-datasets
- Start our software image
docker run -v nif-datasets:/home/developer -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -it nlp2rdf/dbpediaopendbpediatextextractionchallenge:tools bash
- Run once our install script (Get some coffee. This will take some time)
Now, you are ready to start to work. Our software image contains some useful tools to help you to take advantage of NIF
- JDK 1.8
- Maven 3.x
- Rapper - Raptor RDF Syntax Library
- Python 2.7.x
- IntelliJ Community Edition 2016 - Path: /opt/idea-IC-163.12024.16/bin/idea.sh
- Datasets in NIF - Path: /home/developer/data/
- And finally, a small java stub to read NIF/turtle or N3 files to help you build your own code
NIF / Turtle files:
String content = ResourceLoader.getContent("PATH"); NIFParser parser = new NIFParser(content); NIF nif = parser.getNIF();
Stream<String> content = ResourceLoader.getStream("PATH"); NTripleParser parser = new NTripleParser(content); List<NIF> nif = parser.getNIF();
Didn't found a software or a info that you need and not in list? Please tell us opening an issue
How to submit
Create a repo with the name dbpediaopendbpediatextextractionchallenge at DockerHub;
Backup your Docker volume;
tar cvfz content.tgz /var/lib/docker/volumes/nif-datasets/_data
Create a Dockerfile with your Docker volume content
FROM java:8 ADD content.tgz /opt
Create an image using your Dockerfile
docker build -t response .
Then push this image to DockerHub and open an issue to telling us your repository name
Supported Docker versions
This image is officially supported on Docker version 1.9.1.
Please see the Docker installation documentation for details on how to upgrade your Docker daemon.
Documentation for this image is stored in GitHub repo.
If you have any problems with or questions about this image, please contact us through a GitHub issue.