Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Warm up tasks
Clone this wiki locally
If you are going to develop with the DBpedia Spotlight team, it is important that you do the "everybody should do" tasks, and at least one of the "would be cool to have" tasks.
Everybody should do
- Playing with Spotlight demo (http://dbpedia-spotlight.github.io/demo/)
- Compile / run it locally
- Checking the issue pages
- A Generative Entity-Mention Model for Linking Entities with Knowledge Base explains the main ideas behind spotlight
- Understanding what data is saved and how stores work. Take a look at: model editor post
The best warm up task is to polish the user documentation. Read it up, ask questions, add the answers to these questions to the documentation so that the next guy doesn't have to ask them.
A good starting point would be to update them as you complete warmup tasks.
- Write how to solve issues you went through
- Rephrase confusing sentences
- Gather recurrent questions from Mailing list/Github Issues ( maybe creating a FAQ )
Build from source
Download the latest code, build and try to run the software. Add to the documentation any error messages that you may find, or any information that we may have forgotten to add. Try first to run the server with files offered for download, instead of running the indexing process from scratch, because that can take a while.
Play a bit with the endpoints (check
Learn Scala, Basic Functional and Solid OO Programming
You will quickly fall in love with the elegance of Scala's functional programming combined with object-oriented programming. We don't need to be the most idiomatic Scala programmers, but you'll quickly notice that some patterns just stick with you. You should invest at least an hour a day during the "community bonding period" to learn Scala. See, for example, the Scala School by the Twitter folks. We learned the language while building DBpedia Spotlight, so you can do it too.
If you are going to work on the tool generating the models out of a Wikipedia Dump It would be best to get familiar with Spark as well. Spark's youtube channel offers some good material to get a grasp of its concepts.
- Check how to set up a Simple project using Spark and SBT
- Try some of the examples, run them locally on your IDE
- Setup a local spark master and try submitting some of the packaged examples via the submit script
Data in the Models
If you want to peek on how is the data in the models stored, what data is saved in each of the stores have a play with [Model Editor] (https://github.com/idio/spotlight-model-editor).
Would be cool to have
Look for tasks in our issue tracker, assign yourself, if nobody already is, and start discussing/working on the issue.
Here are some additional ideas:
- Design some "powered by dbpedia spotlight" logos. Size suggestions: 80x20 (example), 200x80, 140x56, 100x40, and 70x28 (example)
- Create/enhance step-by-step instructions to configure dev environment on IntelliJ, Eclipse, TypeSafe Scala-IDE, or whatever is your favourite IDE.
- Run indexing for one or two entity types or categories (small data set).
- Run indexing for another language besides English (as long as you have working knowledge of that language)
Earn major props if you build test cases for ongoing issues, or others that you may find for yourself:
Other small tasks that we'd be very glad to receive as contributions:
- Google Freebase Annotations of TREC KBA 2014 Stream Corpus is a huge annotated corpus with MID's it would be good to:
- Play with it
- Slice a small portion of it
- Write a script which matches MIDs into DbpediaIds and replaces annotations.