An open-source NLP research library, built on PyTorch.
Clone or download
joelgrus serialization in the tutorial (#2412)
* serialization in the tutorial

* add serialization to tutorial

* address PR feedback

* add disclaimer about programmatic generation
Latest commit 55b9bd0 Jan 21, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github/ISSUE_TEMPLATE bump version number to v0.8.1 Jan 7, 2019
allennlp Replace scripts with entry_points.console_scripts (#2232) Jan 21, 2019
doc Text classification JSON dataset reader (#2366) Jan 17, 2019
scripts New NLVR language (#2319) Jan 15, 2019
training_config Remove outdated reference to custom extensions (#2401) Jan 21, 2019
tutorials serialization in the tutorial (#2412) Jan 21, 2019
.dockerignore Improve Dockerfile. (#110) Aug 10, 2017
.gitignore Add support of tokenized input for coref and srl predictors (#2076) Nov 18, 2018
.pylintrc remove custom extensions (#2332) Jan 10, 2019
CONTRIBUTING.md Remove --all flag from verify. (#808) Feb 9, 2018
Dockerfile Replace scripts with entry_points.console_scripts (#2232) Jan 21, 2019
Dockerfile.pip Update the base image in the Dockerfiles. (#2298) Jan 7, 2019
LICENSE Initial commit May 15, 2017
MANIFEST.in Replace scripts with entry_points.console_scripts (#2232) Jan 21, 2019
MODELS.md fix BiMPM url in MODEL.md (#1923) Oct 19, 2018
README.md Replace scripts with entry_points.console_scripts (#2232) Jan 21, 2019
STYLE.md Adding a proposed style guide (#522) Nov 30, 2017
codecov.yml remove custom extensions (#2332) Jan 10, 2019
pytest.ini WIP: Skip tests that require Java in test-install (#1551) Jul 31, 2018
requirements.txt Bump up pytorch-pretrained-bert to v0.4.0 (#2349) Jan 15, 2019
setup.cfg initial commit with data api and minimal docker file + instructions Jun 23, 2017
setup.py Replace scripts with entry_points.console_scripts (#2232) Jan 21, 2019

README.md

Build Status codecov

An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks.

Quick Links

Package Overview

allennlp an open-source NLP research library, built on PyTorch
allennlp.commands functionality for a CLI and web service
allennlp.data a data processing module for loading datasets and encoding strings as integers for representation in matrices
allennlp.models a collection of state-of-the-art models
allennlp.modules a collection of PyTorch modules for use with text
allennlp.nn tensor utility functions, such as initializers and activation functions
allennlp.service a web server to that can serve demos for your models
allennlp.training functionality for training models

Installation

AllenNLP requires Python 3.6.1 or later. The preferred way to install AllenNLP is via pip. Just run pip install allennlp in your Python environment and you're good to go!

If you need pointers on setting up an appropriate Python environment or would like to install AllenNLP using a different method, see below.

Windows is currently not officially supported, although we try to fix issues when they are easily addressed.

Installing via pip

Setting up a virtual environment

Conda can be used set up a virtual environment with the version of Python required for AllenNLP. If you already have a Python 3.6 or 3.7 environment you want to use, you can skip to the 'installing via pip' section.

  1. Download and install Conda.

  2. Create a Conda environment with Python 3.6

    conda create -n allennlp python=3.6
  3. Activate the Conda environment. You will need to activate the Conda environment in each terminal in which you want to use AllenNLP.

    source activate allennlp

Installing the library and dependencies

Installing the library and dependencies is simple using pip.

pip install allennlp

That's it! You're now ready to build and train AllenNLP models. AllenNLP installs a script when you install the python package, meaning you can run allennlp commands just by typing allennlp into a terminal.

You can now test your installation with allennlp test-install.

pip currently installs Pytorch for CUDA 9 only (or no GPU). If you require an older version, please visit http://pytorch.org/ and install the relevant pytorch binary.

Installing using Docker

Docker provides a virtual machine with everything set up to run AllenNLP-- whether you will leverage a GPU or just run on a CPU. Docker provides more isolation and consistency, and also makes it easy to distribute your environment to a compute cluster.

Once you have installed Docker just run the following command to get an environment that will run on either the cpu or gpu.

docker run -it -p 8000:8000 --rm allennlp/allennlp:v0.8.1

You can test the Docker environment with docker run -it -p 8000:8000 --rm allennlp/allennlp:v0.8.1 test-install.

Installing from source

You can also install AllenNLP by cloning our git repository:

git clone https://github.com/allenai/allennlp.git

Create a Python 3.6 virtual environment, and install the necessary requirements by running:

INSTALL_TEST_REQUIREMENTS=true scripts/install_requirements.sh

Changing the flag to false if you don't want to be able to run tests. Once the requirements have been installed, run:

pip install --editable .

To install the AllenNLP library in editable mode into your environment. This will make allennlp available on your system but it will use the sources from the local clone you made of the source repository.

You can test your installation with allennlp test-install. The full development environment also requires the JVM and perl, which must be installed separately. ./scripts/verify.py will run the full suite of tests used by our continuous build environment.

Running AllenNLP

Once you've installed AllenNLP, you can run the command-line interface either with the allennlp command (if you installed via pip) or allennlp (if you installed via source).

$ allennlp
Run AllenNLP

optional arguments:
  -h, --help    show this help message and exit
  --version     show program's version number and exit

Commands:

    configure   Generate configuration stubs.
    train       Train a model
    evaluate    Evaluate the specified model + dataset
    predict     Use a trained model to make predictions.
    make-vocab  Create a vocabulary
    elmo        Create word vectors using a pretrained ELMo model.
    fine-tune   Continue training a model on a new dataset
    dry-run     Create a vocabulary, compute dataset statistics and other
                training utilities.
    test-install
                Run the unit tests.

Docker images

AllenNLP releases Docker images to Docker Hub for each release. For information on how to run these releases, see Installing using Docker.

Building a Docker image

For various reasons you may need to create your own AllenNLP Docker image. The same image can be used either with a CPU or a GPU.

First, you need to install Docker. Then run the following command (it will take some time, as it completely builds the environment needed to run AllenNLP.)

docker build -f Dockerfile.pip --tag allennlp/allennlp:latest .

You should now be able to see this image listed by running docker images allennlp.

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
allennlp/allennlp            latest              b66aee6cb593        5 minutes ago       2.38GB

Running the Docker image

You can run the image with docker run --rm -it allennlp/allennlp:latest. The --rm flag cleans up the image on exit and the -it flags make the session interactive so you can use the bash shell the Docker image starts.

You can test your installation by running allennlp test-install.

Issues

Everyone is welcome to file issues with either feature requests, bug reports, or general questions. As a small team with our own internal goals, we may ask for contributions if a prompt fix doesn't fit into our roadmap. We allow users a two week window to follow up on questions, after which we will close issues. They can be re-opened if there is further discussion.

Contributions

The AllenNLP team at AI2 (@allenai) welcomes contributions from the greater AllenNLP community, and, if you would like to get a change into the library, this is likely the fastest approach. If you would like to contribute a larger feature, we recommend first creating an issue with a proposed design for discussion. This will prevent you from spending significant time on an implementation which has a technical limitation someone could have pointed out early on. Small contributions can be made directly in a pull request.

Pull requests (PRs) must have one approving review and no requested changes before they are merged. As AllenNLP is primarily driven by AI2 (@allenai) we reserve the right to reject or revert contributions that we don't think are good additions.

Citing

If you use AllenNLP in your research, please cite AllenNLP: A Deep Semantic Natural Language Processing Platform.

@inproceedings{Gardner2017AllenNLP,
  title={AllenNLP: A Deep Semantic Natural Language Processing Platform},
  author={Matt Gardner and Joel Grus and Mark Neumann and Oyvind Tafjord
    and Pradeep Dasigi and Nelson F. Liu and Matthew Peters and
    Michael Schmitz and Luke S. Zettlemoyer},
  year={2017},
  Eprint = {arXiv:1803.07640},
}

Team

AllenNLP is an open-source project backed by the Allen Institute for Artificial Intelligence (AI2). AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering. To learn more about who specifically contributed to this codebase, see our contributors page.