An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks.
PLEASE NOTE that AllenNLP is currently pre-release! There are many rough edges. We will have our first release in early September.
The fastest way to get an environment to run AllenNLP is with Docker. Once you have installed Docker
just run docker run -it --rm allennlp/allennlp-cpu
to get an environment that will run on the cpu (use allennlp-gpu
if you have a CUDA-supported GPU).
Now you can do any of the following:
- Run a model on example sentences with
allennlp/run bulk
. - Start a web service to host our models with
allennlp/run serve
. - Interactively code against AllenNLP from the Python interpreter with
python
.
Built on PyTorch, AllenNLP makes it easy to design and evaluate new deep learning models for nearly any NLP problem, along with the infrastructure to easily run them in the cloud or on your laptop. AllenNLP was designed with the following principles:
- Hyper-modular and lightweight. Use the parts which you like seamlessly with PyTorch.
- Extensively tested and easy to extend. Test coverage is above 90% and the example models provide a template for contributions.
- Take padding and masking seriously, making it easy to implement correct models without the pain.
- Experiment friendly. Run reproducible experiments from a json specification with comprehensive logging.
AllenNLP includes reference implementations of high quality models for Semantic Role Labelling, Question and Answering (BiDAF), Entailment (decomposable attention), and more.
AllenNLP is built and maintained by the Allen Institute for Artificial Intelligence, in close collaboration with researchers at the University of Washington and elsewhere. With a dedicated team of best-in-field researchers and software engineers, the AllenNLP project is uniquely positioned to provide state of the art models with high quality engineering.
allennlp | an open-source NLP research library, built on PyTorch |
allennlp.commands | functionality for a CLI and web service |
allennlp.data | a data processing module for loading datasets and encoding strings as integers for representation in matrices |
allennlp.models | a collection of state-of-the-art models |
allennlp.modules | a collection of PyTorch modules for use with text |
allennlp.nn | tensor utility functions, such as initializers and activation functions |
allennlp.service | a web server to serve our demo and API |
allennlp.training | functionality for training models |
Conda will set up a virtual environment with the exact version of Python used for development along with all the dependencies needed to run AllenNLP.
-
Create a Conda environment with Python 3.
conda create -n allennlp python=3.5
-
Now activate the Conda environment.
source activate allennlp
-
Install the required dependencies.
INSTALL_TEST_REQUIREMENTS="true" ./scripts/install_requirements.sh
-
Visit http://pytorch.org/ and install the relevant pytorch package.
-
Set the
PYTHONHASHSEED
for repeatable experiments.export PYTHONHASHSEED=2157
You should now be able to test your installation with pytest -v
. Congratulations!
Docker provides a virtual machine with everything set up to run AllenNLP--whether you will leverage a GPU or just run on a CPU. Docker provides more isolation and consistency, and also makes it easy to distribute your environment to a compute cluster.
It is easy to run a pre-built Docker development environment. AllenNLP is configured with Docker Cloud to build a new image on every update to the master branch. To download an image from Docker Hub:
docker pull allennlp/allennlp-cpu:latest
You can alternatively download an environment set up to use a GPU.
docker pull allennlp/allennlp-gpu:latest
Following are instructions on creating a Docker environment that use the CPU. To use the GPU, use the same instructions
but substitute gpu
for cpu
. The following command will take some time, as it completely builds the environment
needed to run AllenNLP.
docker build --file Dockerfile.cpu --tag allennlp/allennlp-cpu .
You should now be able to see this image listed by running docker images allennlp-cpu
.
REPOSITORY TAG IMAGE ID CREATED SIZE
allennlp/allennlp-cpu latest b66aee6cb593 5 minutes ago 2.38GB
You can run the image with docker run --rm -it allennlp/allennlp-cpu
. The --rm
flag cleans up the image on exit and the
-it
flags make the session interactive so you can use the bash shell the Docker image starts.
The Docker environment uses Conda to install Python and automatically enters the Conda environment "allennlp".
You can test your installation by running pytest -v
.
Kubernetes will deploy your Docker images into the cloud, so you can have a reproducible development environment on AWS.
-
Set up
kubectl
to connect to your Kubernetes cluster. -
Run
kubectl create -f /path/to/kubernetes-dev-environment.yaml
. This will create a "job" on the cluster which you can later connect to using bash. Note that you will be using the last Dockerfile that would pushed, and so the source code may not match what you have locally. -
Retrieve the name of the pod created with
kubectl describe job <JOBNAME> --namespace=allennlp
. The pod name will be your job name followed by some additional characters. -
Get a shell inside the container using
kubectl exec -it <PODNAME> bash
-
When you are done, don't forget to kill your job using
kubectl delete -f /path/to/kubernetes-dev-environment.yaml
AllenNLP is an open-source project backed by the Allen Institute for Artificial Intelligence (AI2). AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering. To learn more about who specifically contributed to this codebase, see our contributors page.