Skip to content

Commit

Permalink
README: Add Docker reference
Browse files Browse the repository at this point in the history
  • Loading branch information
pasky committed Jun 13, 2016
1 parent 6665106 commit 00d66e9
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 18 deletions.
25 changes: 7 additions & 18 deletions README.md
Expand Up @@ -97,7 +97,10 @@ To connect YodaQA to IRC, see ``contrib/irssi-brmson-pipe.pl``.

By default, YodaQA uses preconfigured data sources running on the authors'
infrastructure that supply open domain information. Detailed documentation
on setup of these open domain data sources is available below.
on setup of these open domain data sources is available below. Furthermore,
all the data source components are now compartmentalized and easy to deploy
using Docker - see the Dockerfiles in respective ``data/`` subdirectories
and ``data/README_DockerCompose.txt`` for details.

It is certainly possible to adapt YodaQA for a particular domain and use
custom data sources, but this process is not documented in detail yet.
Expand All @@ -118,15 +121,7 @@ memory and IO intensive process. You will need about 80-100GiB of disk space and
bandwidth to download 10GiB source file; indexing will require roughly 8GiB RAM.

To index and then search in Wikipedia, we need to set it up as a standalone Solr
source:

* Download solr (http://www.apache.org/dyn/closer.cgi/lucene/solr/ - we use
version 4.6.0), unpack and cd to the ``example/`` subdirectory.
* Symlink or copy the ``data/enwiki/`` directory from this repository to the
``example/`` subdirectory; it contains the data import configuration.
* Proceed with instructions in ``data/enwiki/README.md``.

You may want to edit the URL in ``src/main/java/cz/brmlab/yodaqa/pipeline/YodaQA.java``.
source. See ``data/enwiki/README.md`` for instructions.

### Database Data Source

Expand All @@ -141,14 +136,8 @@ below.
Regarding Freebase, we use its RDF export with SPARQL endpoint,
running on infrastructure
provided by the author's academic group (Jan Šedivý's 3C Group at the
Dept. of Cybernetics, FEE CTU Prague). If the endpoint is not available
for some reason, you can also disable Freebase usage by editing the
method getConceptProperties() (instructions inside) of:

src/main/java/cz/brmlab/yodaqa/pipeline/structured/FreebaseOntologyPrimarySearch.java

You can start your own instance by following the instructions in
``data/freebase/README.md`` but it is quite arduous and resource intensive.
Dept. of Cybernetics, FEE CTU Prague). See ``data/freebase/README.md``
for details.

### Ontology Data Source

Expand Down
4 changes: 4 additions & 0 deletions data/enwiki/README.md
Expand Up @@ -43,6 +43,10 @@ and somewhat smaller dump. Then, we import this into Solr.

### Solr Import

* Download solr (http://www.apache.org/dyn/closer.cgi/lucene/solr/ - we use
version 4.6.0), unpack and cd to the ``example/`` subdirectory.
* Symlink or copy the ``data/enwiki/`` directory from this repository to the
``example/`` subdirectory; it contains the data import configuration.
* Revise the enwiki-text XML file reference in ``collection1/conf/data-config.xml``
according to the dump date you used.
* In the parent directory (``example/``), start the standalone Solr server:
Expand Down

0 comments on commit 00d66e9

Please sign in to comment.