Skip to content

Commit

Permalink
Add how-to guides
Browse files Browse the repository at this point in the history
  • Loading branch information
vinayak-mehta committed Nov 5, 2018
1 parent 889a867 commit a9f59a5
Show file tree
Hide file tree
Showing 12 changed files with 112 additions and 16 deletions.
14 changes: 13 additions & 1 deletion HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,19 @@ Release History
master
------

*
0.2.0 (2018-11-05)
------------------

**Improvements**

* Add MySQL and Celery support. [#8](https://github.com/camelot-dev/excalibur/pull/8) by Vinayak Mehta.
* Add table auto-detection. [#7](https://github.com/camelot-dev/excalibur/pull/7) by Vinayak Mehta.
* Add static website. [#6](https://github.com/camelot-dev/excalibur/pull/6) by Nikhil Sikka.

0.1.1 (2018-10-22)
------------------

* Add Windows and Linux executables.

0.1.0 (2018-10-20)
------------------
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,4 @@ publish:

build-executable:
pip install pyinstaller
pyinstaller --add-data 'excalibur/www/templates:excalibur/www/templates' --add-data 'excalibur/www/static:excalibur/www/static' arthur.py
pyinstaller --add-data 'excalibur/www/templates:excalibur/www/templates' --add-data 'excalibur/www/static:excalibur/www/static' --add-data 'excalibur/config_templates:excalibur/config_templates' arthur.py
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,13 @@

## Using Excalibur

After [installation with pip](https://excalibur-py.readthedocs.io/en/master/user/install.html), you can start the webserver using:
After [installation with pip](https://excalibur-py.readthedocs.io/en/master/user/install.html), you can initialize the metadata database using:

<pre>
$ excalibur initdb
</pre>

And then start the webserver using:

<pre>
$ excalibur webserver
Expand Down
3 changes: 1 addition & 2 deletions arthur.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,9 @@

import multiprocessing

from excalibur.cli import webserver, initdb
from excalibur.cli import webserver


if __name__ == '__main__':
multiprocessing.freeze_support()
initdb()
webserver()
Binary file modified docs/_static/screenshots/rule.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/screenshots/stream/rule_options.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 8 additions & 6 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,8 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Excalibur: A web interface for Camelot
======================================

(PDF Table Extraction for Humans)
---------------------------------
Excalibur: PDF Table Extraction for Humans
==========================================

Release v\ |version|. (:ref:`Installation <install>`)

Expand Down Expand Up @@ -36,7 +33,11 @@ Release v\ |version|. (:ref:`Installation <install>`)
Using Excalibur
---------------

After :ref:`installation <install>`, you can start the webserver using::
After :ref:`installation with pip <install>`, you can initialize the metadata database using::

$ excalibur initdb

And then start the webserver using::

$ excalibur webserver

Expand Down Expand Up @@ -78,6 +79,7 @@ This part of the documentation focuses on instructions to get you up and running

user/intro
user/install
user/howto
user/concepts
user/usage

Expand Down
71 changes: 71 additions & 0 deletions docs/user/howto.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
.. _howto:

How-to Guides
=============

Excalibur's architecture is heavily inspired from Airflow, so you may get a feeling of déjà vu while reading this page of the documentation. `Airflow LICENSE`_.

.. _Airflow LICENSE: https://github.com/apache/incubator-airflow/blob/master/LICENSE

Setting Configuration Options
-----------------------------

The first time you run Excalibur, it will create a file called ``excalibur.cfg`` in your ``$EXCALIBUR_HOME`` directory (``~/excalibur`` by default). This file contains Excalibur’s configuration and you can edit it to change any of the settings.

For example, the metadata database connection string can be set in ``excalibur.cfg`` like this::

[core]
sql_alchemy_conn = my_conn_string

Using the MySQL Database Backend
--------------------------------

Excalibur uses SqlAlchemy to connect to a database backend. By default, stores all metadata in a sqlite database. To use MySQL, you need to first install MySQL and then create a database and a user.

Installing MySQL
^^^^^^^^^^^^^^^^

To use the MySQL database backend, you need to install Excalibur using::

$ pip install excalibur-py[mysql]

You can install MySQL using your system's package manager. For Ubuntu::

$ sudo apt update
$ sudo apt install mysql-server libmysqlclient-dev

And then set it up using::

$ mysql_secure_installation

Setup
^^^^^

Now you can create the a database and a user for Excalibur::

> CREATE DATABASE excalibur CHARACTER SET utf8 COLLATE utf8_unicode_ci;
> grant all on excalibur.* TO 'excalibur'@'%' IDENTIFIED BY '1234';

Finally, you need to change the ``sql_alchemy_conn`` in ``excalibur.cfg`` to::

[core]
sql_alchemy_conn = mysql://excalibur:1234@localhost:3306/excalibur

And initialize the metadata database using::

$ excalibur initdb

Scaling Out with Celery
-----------------------

``CeleryExecutor`` is one of the ways you can scale out the number of workers. For this to work, you need to setup a Celery backend (RabbitMQ, Redis, …) and change your excalibur.cfg to point the executor parameter to ``CeleryExecutor`` and provide the related Celery settings.

For more information about setting up a Celery broker, refer to the exhaustive `Celery documentation on the topic`_.

.. _Celery documentation on the topic: http://docs.celeryproject.org/en/latest/getting-started/brokers/index.html

To kick off a worker, you need to setup Excalibur and kick off the worker subcommand::

$ excalibur worker

Your worker should start picking up tasks as soon as they get fired in its direction.
7 changes: 6 additions & 1 deletion excalibur/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

import click

from . import __version__
from . import __version__, settings
from . import configuration as conf
from .operators.python_operator import PythonOperator
from .tasks import split, extract
Expand Down Expand Up @@ -39,6 +39,11 @@ def resetdb(*args, **kwargs):

@cli.command('webserver')
def webserver(*args, **kwargs):
if conf.USING_SQLITE:
sqlite_path = settings.SQL_ALCHEMY_CONN.replace('sqlite:///', '')
if not os.path.isfile(sqlite_path):
initialize_database()

app = create_app(conf)
app.run(use_reloader=False)

Expand Down
1 change: 1 addition & 0 deletions excalibur/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@ def parameterized_config(template):
SECRET_KEY = conf.get('webserver', 'SECRET_KEY')
PROJECT_ROOT = os.path.dirname(os.path.abspath(__file__))
PDFS_FOLDER = os.path.join(PROJECT_ROOT, 'www/static/uploads')
USING_SQLITE = True if conf.get('core', 'SQL_ALCHEMY_CONN').startswith('sqlite') else False

get = conf.get
has_option = conf.has_option
3 changes: 2 additions & 1 deletion excalibur/executors/celery_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@

app = Celery(
conf.get('celery', 'CELERY_APP_NAME'),
config_source=celery_configuration)
config_source=celery_configuration,
fixups=[])


@app.task
Expand Down
5 changes: 2 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,14 @@

requires = [
'camelot-py[cv]>=0.2.3',
'celery>=4.1.1',
'Click>=7.0',
'configparser>=3.5.0, <3.6.0',
'Flask>=1.0.2',
'SQLAlchemy>=1.2.12'
]
celery = ['celery>=4.1.1']
mysql = ['mysqlclient>=1.3.6']
all_requires = requires + celery + mysql
all_requires = requires + mysql
dev_requires = [
'codecov>=2.0.15',
'pytest>=3.8.0',
Expand All @@ -46,7 +46,6 @@ def setup_package():
install_requires=requires,
extras_require={
'all': all_requires,
'celery': celery,
'mysql': mysql,
'dev': dev_requires
},
Expand Down

0 comments on commit a9f59a5

Please sign in to comment.