This repository is dedicated to exposing and tracking a customized Docker Image of the traditional Stanford CoreNLP server. This build intends to be used as a dedicated service for personal or small research teams.
A ready-to-use image from Docker Hub is provided, with the deploy instructions and the possibility of downloading and customizing the image through the Dockerfile using simple build instructions. The current supported version is 4.5.7; however, it is possible to modify the VERSION
variable via the Dockerfile.
Also, deployment testing is provided using plain HTTP
via the curl
command using the official python library stanza
.
DISCLAIMER: this is not an official documentation guide.
To deploy the prebuilt docker image, two options are provided: use the docker
command or the docker compose
tool from the CLI utility.
Directly run the docker
command like the following example. e.g., changing two variables. For more information, check the Official Documentation
docker run -e JAVA_XMX=12g -e ANNOTATORS=tokenize,ssplit,parse -p 9000:9000 d1egoprog/stanford-corenlp
Download the prepared 'docker-compose.yaml' file from the repository via wget
and execute the command using the utility.
wget https://raw.githubusercontent.com/d1egoprog/docker-stanford-corenlp/main/compose.yaml
docker compose up -d
Happy hacking!! 🖖🖖.
To check the functionality, you can open a web browser window to your docker-engine IP
and the chosen service, e.g., PORT=9000
, generally on localhost:9000.
Also, to test your local or remote service, just send the following curl request into your preferred CLI.
curl --data 'The quick brown fox jumped over the lazy dog.' 'http://localhost:9000/?properties={%22annotators%22%3A%22tokenize%2Cssplit%2Cpos%22%2C%22outputFormat%22%3A%22json%22}' -o -
To use the official python library stanza
a small example has been prepared in a jupyter notebook, stanza-example
For more information on the configuration and functionality of the Stanford OpenNLP Server use the official documentation.
To build the image locally, clone the repository:
git clone https://github.com/d1egoprog/stanford-corenlp-docker.git
Use the docker
command CLI tool to build and run:
docker build -t stanford-corenlp:4.5.7 stanford-corenlp/.
docker run -p 9000:9000 stanford-corenlp
All the JVM parameters can be accessed by editing the Dockerfile and rebuilding the image by default; the parameters configured are:
ENV JAVA_XMX 8G
ENV ANNOTATORS all
ENV TIMEOUT_MILLISECONDS 60000
ENV THREADS 5
ENV MAX_CHAR_LENGTH 100000
ENV PORT 9000
If you do not want to edit the Dockerfile, environment variables can be overwritten from the docker run
command, e.g., changing the JVM memory parameter JAVA_XMX
to reserve more memory.
docker run -e JAVA_XMX=12g -p 9000:9000 stanford-corenlp:4.5.6
If preferred, a docker compose file is also available with the standard build from the docker file and an override configuration (same parameters from Dockerfile); this should be changed to set your desired annotators specific computing requirements. To run the service, run the command:
docker compose -f build.yaml up -d
Or use the compose file to build the Docker image, storing the following into a new build.yaml
file. Also is possible to override the variables, e.g., changing the JVM memory parameter JAVA_XMX
to reserve more memory or changing ANNOTATORS
task performed by the server.
services:
stanford_corenlp:
image: d1egoprog/stanford-corenlp
ports:
- "9000:9000"
environment:
- JAVA_XMX=12G
- ANNOTATORS=tokenize
restart: always
If you have any questions in deployment or if any error is found, please contact me or open an issue. And contributing is always welcome. The Github repository Issues.