Stanford CoreNLP: A Java suite of core NLP tools.
Switch branches/tags
Nothing to show
Clone or download
Latest commit 58ab70c Sep 19, 2018
Permalink
Failed to load latest commit information.
classes Update .gitignore a bit Jan 21, 2014
data fix broken itest Sep 13, 2018
doc No longer normalizeCurrency by default, only if explicitly asked to. Sep 11, 2018
gradle/wrapper Update to more recent gradle. Closes #190. Nov 20, 2016
itest/src/edu/stanford/nlp fix for broken itest Sep 18, 2018
lib Revert "Bravely attempt to upgrade ejml to the current version (v0.32)." Nov 30, 2017
liblocal merge master in Apr 2, 2017
libsrc Revert "Bravely attempt to upgrade ejml to the current version (v0.32)." Nov 30, 2017
licenses store v2 and v3 of gpl Jul 8, 2016
scripts Merge branch 'master' of jamie.stanford.edu:/u/nlp/git/javanlp Jul 21, 2018
src/edu/stanford/nlp serialize ner label prob info for tokens Sep 19, 2018
test Get rid of IndexedWordUnaryPred. Replace with Predicate<UnaryWord>. Sep 12, 2018
.gitignore Update .gitignore a bit Jan 21, 2014
.travis.yml fix travis Mar 13, 2018
CONTRIBUTING.md merge master in Apr 2, 2017
JavaNLP-core.eml working on some propernoun chain creation Jan 28, 2016
JavaNLP-core.iml working on some propernoun chain creation Jan 28, 2016
LICENSE.txt merge master in Apr 2, 2017
README.md update readme Mar 13, 2018
RESOURCE-LICENSES license info for resources used Jan 3, 2018
build.gradle fix travis Mar 13, 2018
build.xml add new itest groups Jun 1, 2018
commonbuildjsp.xml Merge remote-tracking branch 'origin/master' Jun 11, 2014
gradlew Update to more recent gradle. Closes #190. Nov 20, 2016
gradlew.bat Update to more recent gradle. Closes #190. Nov 20, 2016
module_core.xml added modules Apr 2, 2017
pom.xml update poms Mar 13, 2018

README.md

Stanford CoreNLP

Build Status Maven Central Twitter

Stanford CoreNLP provides a set of natural language analysis tools written in Java. It can take raw human language text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize and interpret dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases or word dependencies, and indicate which noun phrases refer to the same entities. It was originally developed for English, but now also provides varying levels of support for (Modern Standard) Arabic, (mainland) Chinese, French, German, and Spanish. Stanford CoreNLP is an integrated framework, which make it very easy to apply a bunch of language analysis tools to a piece of text. Starting from plain text, you can run all the tools with just two lines of code. Its analyses provide the foundational building blocks for higher-level and domain-specific text understanding applications. Stanford CoreNLP is a set of stable and well-tested natural language processing tools, widely used by various groups in academia, industry, and government. The tools variously use rule-based, probabilistic machine learning, and deep learning components.

The Stanford CoreNLP code is written in Java and licensed under the GNU General Public License (v3 or later). Note that this is the full GPL, which allows many free uses, but not its use in proprietary software that you distribute to others.

Build Instructions

Several times a year we distribute a new version of the software, which corresponds to a stable commit.

During the time between releases, one can always use the latest, under development version of our code.

Here are some helpful instructions to use the latest code:

Provided build

Sometimes we will provide updated jars here which have the latest version of the code.

At present the current released version of the code is our most recent released jar, though you can always build the very latest from GitHub HEAD yourself.

Build with Ant

  1. Make sure you have Ant installed, details here: http://ant.apache.org/
  2. Compile the code with this command: cd CoreNLP ; ant
  3. Then run this command to build a jar with the latest version of the code: cd CoreNLP/classes ; jar -cf ../stanford-corenlp.jar edu
  4. This will create a new jar called stanford-corenlp.jar in the CoreNLP folder which contains the latest code
  5. The dependencies that work with the latest code are in CoreNLP/lib and CoreNLP/liblocal, so make sure to include those in your CLASSPATH.
  6. When using the latest version of the code make sure to download the latest versions of the corenlp-models, english-models, and english-models-kbp and include them in your CLASSPATH. If you are processing languages other than English, make sure to download the latest version of the models jar for the language you are interested in.

Build with Maven

  1. Make sure you have Maven installed, details here: https://maven.apache.org/
  2. If you run this command in the CoreNLP directory: mvn package , it should run the tests and build this jar file: CoreNLP/target/stanford-corenlp-3.7.0.jar
  3. When using the latest version of the code make sure to download the latest versions of the corenlp-models, english-models, and english-models-kbp and include them in your CLASSPATH. If you are processing languages other than English, make sure to download the latest version of the models jar for the language you are interested in.
  4. If you want to use Stanford CoreNLP as part of a Maven project you need to install the models jars into your Maven repository. Below is a sample command for installing the Spanish models jar. For other languages just change the language name in the command. To install stanford-corenlp-models-current.jar you will need to set -Dclassifier=models. Here is the sample command for Spanish: mvn install:install-file -Dfile=/location/of/stanford-spanish-corenlp-models-current.jar -DgroupId=edu.stanford.nlp -DartifactId=stanford-corenlp -Dversion=3.9.1 -Dclassifier=models-spanish -Dpackaging=jar

Useful resources

You can find releases of Stanford CoreNLP on Maven Central.

You can find more explanation and documentation on the Stanford CoreNLP homepage.

The most recent models associated with the code in the HEAD of this repository can be found here.

Some of the larger (English) models -- like the shift-reduce parser and WikiDict -- are not distributed with our default models jar. The most recent version of these models can be found here.

We distribute resources for other languages as well, including Arabic models, Chinese models, French models, German models, and Spanish models.

For information about making contributions to Stanford CoreNLP, see the file CONTRIBUTING.md.

Questions about CoreNLP can either be posted on StackOverflow with the tag stanford-nlp, or on the mailing lists.