Skip to content
Sean Gallagher edited this page Nov 30, 2015 · 10 revisions

Since this is a knowledge-based application, we need serious sources of knowledge in order to power it. So there's a lot of data and a lot to download. We try to make it easy to setup, just two steps. But depending on your connection and machine, these two steps could take quite some time.

The Local Indexes (Indri, Lucene and Jena)

We distribute our data on Dropbox but occasionally we get throttled. Report an issue (on the right) if there are problems. Just download the data/ archive and extract it into the already existing data/ directory, overwriting if necessary.

The Relational Database

Depending on the scale of the installation, you may prefer to use the Postgres database backup, but we recommend using the SQLite database-in-a-file until you are sure you have outgrown it. (Unpack it first though; it's compressed to save bandwidth.)

What all we are using

Our data comes from several sources, including:

We also have pre-indexed these in the archive:

  • Using Indri search on all the articles and paragraphs
  • Using Lucene on articles, paragraphs, and DBPedia labels
  • Using Jena on everything else we downloaded from DBPedia.
  • Using PostgreSQL and SQLite for the relational database dump, both containing many tables indexed in several ways.

In Development

We're deciding better ways to synchronize the data we need for the project and we think we can do it with Bittorrent Sync instead of Dropbox, to avoid problems with bandwidth. So you should be able to load the external data archive using a Bittorrent Sync link.