Skip to content
This repository has been archived by the owner on Dec 8, 2022. It is now read-only.
/ spacyapp Public archive

spacyapp is a small web-application to expose spacy's nlp-functionalities through http.

License

Notifications You must be signed in to change notification settings

acdh-oeaw/spacyapp

Repository files navigation

DOI

nlp - An NLP-App/Service

spacyapp is a NLP service provided by ACDH. It is built around spaCy, but extends spacys functionalities and provides an easy to use webservice.

Our service is currently under heavy development, but it provides so far:

  • RestAPI endpoints for all services
  • an endpoint that provides a standard spaCy pipline
  • an endpoint that uses spaCy to extract named entities
  • an endpoint that returns POS tags for tokens provided
  • an pipline endpoint that allows batch processing for TEI documents:
    • accepts a ZIP of TEI documents
    • uses the xtx tokenizer developed at ACDH to tokenize TEI documents while preserving existing tags
    • allows to choose between a Treetagger based service - also developed at ACDH - and spaCy for POS tagging
    • provides the processed files as zipped TEIs
    • informs users logged in via email that their job is finished (processing a lot of TEI files can take a while)

Have a look at https://spacyapp.acdh.oeaw.ac.at/ for a running version

install

  • clone the repo
  • set up a virtual environment (optional)
  • install required package (pip install -r requirements.txt)

customize settings

spacyapp uses modularized settings. To start the developement server you'll need to add a settings parameter, e.g. python manage.py runserver --settings=spacyapp.settings.dev

celery settings

Settings for celery are stored in celery_settings.py. Celery depends on django-settings. You can either provide them as environement variables

  • TODO Add example
  • or adapt in celery_settings.py the line os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'spacyapp.settings.dev_custom')

start the app

  • start spacaypp python manage.py runserver --settings=spacyapp.settings.dev
  • start celery worker celery -A celery_settings worker --loglevel=info
    • on Windows you'll need to add --pool=solo

Spacy Active learning (spacyal)

spacyal is a python package for training your own spacy language models using active learning. To plug spacyal to spacyapp you'll need to

  • install the package pip install spacyall
  • add spacyall to your project's INSTALLED_APPS e.g. in spacyapp/settings/base.py
  • add spacyal.urls and pacyal.api_urls to your project's main URL definition spacyapp/urls.py, something like
urlpatterns = [
    ...
    path('spacyal_api/', include('spacyal.api_urls')),
    path('spacyal/', include('spacyal.urls'))),
    ...
]
  • run python manage.py migrate --settings=spacaypp.settings.your_custom_settings

For further information about spacyal please refer to spacyal