spacyapp is a NLP service provided by ACDH. It is built around spaCy, but extends spacys functionalities and provides an easy to use webservice.
Our service is currently under heavy development, but it provides so far:
- RestAPI endpoints for all services
- an endpoint that provides a standard spaCy pipline
- an endpoint that uses spaCy to extract named entities
- an endpoint that returns POS tags for tokens provided
- an pipline endpoint that allows batch processing for TEI documents:
- accepts a ZIP of TEI documents
- uses the xtx tokenizer developed at ACDH to tokenize TEI documents while preserving existing tags
- allows to choose between a Treetagger based service - also developed at ACDH - and spaCy for POS tagging
- provides the processed files as zipped TEIs
- informs users logged in via email that their job is finished (processing a lot of TEI files can take a while)
Have a look at https://spacyapp.acdh.oeaw.ac.at/ for a running version
- clone the repo
- set up a virtual environment (optional)
- install required package (
pip install -r requirements.txt
)
spacyapp uses modularized settings. To start the developement server you'll need to add a settings parameter, e.g. python manage.py runserver --settings=spacyapp.settings.dev
Settings for celery are stored in celery_settings.py
. Celery depends on django-settings. You can either provide them as environement variables
- TODO Add example
- or adapt in
celery_settings.py
the lineos.environ.setdefault('DJANGO_SETTINGS_MODULE', 'spacyapp.settings.dev_custom')
- start spacaypp
python manage.py runserver --settings=spacyapp.settings.dev
- start celery worker
celery -A celery_settings worker --loglevel=info
- on Windows you'll need to add
--pool=solo
- on Windows you'll need to add
spacyal is a python package for training your own spacy language models using active learning. To plug spacyal to spacyapp you'll need to
- install the package
pip install spacyall
- add spacyall to your project's
INSTALLED_APPS
e.g. inspacyapp/settings/base.py
- add
spacyal.urls
andpacyal.api_urls
to your project's main URL definitionspacyapp/urls.py
, something like
urlpatterns = [
...
path('spacyal_api/', include('spacyal.api_urls')),
path('spacyal/', include('spacyal.urls'))),
...
]
- run
python manage.py migrate --settings=spacaypp.settings.your_custom_settings
For further information about spacyal please refer to spacyal