Skip to content

Commit

Permalink
Add Dockerfile and README.md to help development
Browse files Browse the repository at this point in the history
  • Loading branch information
Stegallo committed Dec 27, 2018
1 parent 24c375b commit 1e58a46
Show file tree
Hide file tree
Showing 2 changed files with 80 additions and 0 deletions.
26 changes: 26 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
FROM ubuntu

RUN apt-get update \
&& apt-get install -y \
apt-utils \
curl \
wget \
nano \
libsm6 \
libxrender1 \
libxext6 \
ghostscript \
python3-minimal \
python3-setuptools \
python3-pip \
&& ln -s /usr/bin/python3 /usr/bin/python \
&& ln -s /usr/bin/pip3 /usr/bin/pip

RUN pip install excalibur-py[dev]

EXPOSE 5000

ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

WORKDIR /excalibur/
54 changes: 54 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
<p align="center">
<img src="https://raw.githubusercontent.com/camelot-dev/excalibur/master/docs/_static/excalibur-logo.png" width="200">
</p>

# Excalibur: Docker
This is the Docker configuration which allows you to run Apache Spark without installing any dependencies on your machine!<br/>
OK, any except `docker`.

## Prerequisites

As stated, the thing you need is `docker`.

Follow the instructions on [Install Docker](https://docs.docker.com/engine/installation/) for your environment if you haven't got `docker` already.

## Usage

### Prepare the image

Switch to `docker` directory here and run `docker build -t excalibur .` (don't forget the final `.`) to build your docker image. That may take some time but is only required once. Or perhaps a few times after you tweak something in a `Dockerfile`.

After the process is finished you have a `excalibur` image, that will be the base for your experiments. You can confirm that looking on results of `docker images` command.

### Run the container

From your project folder, run `docker run -it -p 5000:5000 -v $(pwd):/excalibur/ excalibur /bin/bash`
This will start the container and open up a bash console inside it.

At this point you need to initialize the metadata database using:

<pre>
$ excalibur initdb
</pre>

Once initialized, you need to enable connectivity from outside the container:

Use nano to open the config file ...

<pre>
$ nano /root/excalibur/excalibur.cfg
</pre>

... and modify the [webserver] section as:

<pre>
web_server_host = 0.0.0.0
</pre>

And then start the webserver using:

<pre>
$ excalibur webserver
</pre>

That's it! Now you can go to http://localhost:5000 and start extracting tabular data from your PDFs.

0 comments on commit 1e58a46

Please sign in to comment.