Skip to content

Commit 9af5a65

Browse files
Grace MuznyStanford NLP
authored andcommitted
merge master
1 parent 2275ca3 commit 9af5a65

File tree

168 files changed

+99233
-94398
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

168 files changed

+99233
-94398
lines changed

README.md

Lines changed: 21 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,25 +5,39 @@ Stanford CoreNLP provides a set of natural language analysis tools written in Ja
55

66
The Stanford CoreNLP code is written in Java and licensed under the GNU General Public License (v3 or later). Note that this is the full GPL, which allows many free uses, but not its use in proprietary software that you distribute to others.
77

8-
#### How To Compile (with ant)
8+
#### Build Instructions
99

10-
1. cd CoreNLP ; ant
10+
Several times a year we distribute a new version of the software, which corresponds to a stable commit.
1111

12-
#### How To Create A Jar
12+
During the time between releases, one can always use the latest, under development version of our code.
1313

14-
1. compile the code
15-
2. cd CoreNLP/classes ; jar -cf ../stanford-corenlp.jar edu
14+
Here are some helfpul instructions to use the latest code:
15+
16+
1. Make sure you have ant installed.
17+
2. Compile the code with this command: `cd CoreNLP ; ant`
18+
3. Then run this command to build a jar with the latest version of the code: `cd CoreNLP/classes ; jar -cf ../stanford-corenlp.jar edu`
19+
4. This will create a new jar called stanford-corenlp.jar in the CoreNLP folder which contains the latest code
20+
5. The dependencies that work with the latest code are in CoreNLP/lib and CoreNLP/liblocal, so make sure to include those in your CLASSPATH.
21+
6. Also make sure to download the latest versions of the [corenlp-models](http://nlp.stanford.edu/software/stanford-corenlp-models-current.jar),
22+
and [english-models](http://nlp.stanford.edu/software/stanford-english-corenlp-models-current.jar), and include them in your CLASSPATH. If you
23+
are processing languages other than English, make sure to download the latest version of the models jar for the language you are interested in.
1624

1725
You can find releases of Stanford CoreNLP on [Maven Central](http://search.maven.org/#browse%7C11864822).
1826

1927
You can find more explanation and documentation on [the Stanford CoreNLP homepage](http://nlp.stanford.edu/software/corenlp.shtml#Demo).
2028

2129
The most recent models associated with the code in the HEAD of this repository can be found [here](http://nlp.stanford.edu/software/stanford-corenlp-models-current.jar).
2230

23-
Some of the larger (English) models -- like the shift-reduce parser and WikiDict -- are not distributed with our default models jar.
31+
Some of the larger (English) models -- like the shift-reduce parser and WikiDict -- are not distributed with our default models jar.
2432
The most recent version of these models can be found [here](http://nlp.stanford.edu/software/stanford-english-corenlp-models-current.jar).
2533

34+
We distribute resources for other languages as well, including [Arabic models](http://nlp.stanford.edu/software/stanford-arabic-corenlp-models-current.jar),
35+
[Chinese models](http://nlp.stanford.edu/software/stanford-chinese-corenlp-models-current.jar),
36+
[French models](http://nlp.stanford.edu/software/stanford-french-corenlp-models-current.jar),
37+
[German models](http://nlp.stanford.edu/software/stanford-german-corenlp-models-current.jar),
38+
and [Spanish models](http://nlp.stanford.edu/software/stanford-spanish-corenlp-models-current.jar).
39+
2640
For information about making contributions to Stanford CoreNLP, see the file [CONTRIBUTING.md](CONTRIBUTING.md).
2741

28-
Questions about CoreNLP can either be posted on StackOverflow with the tag [stanford-nlp](http://stackoverflow.com/questions/tagged/stanford-nlp),
42+
Questions about CoreNLP can either be posted on StackOverflow with the tag [stanford-nlp](http://stackoverflow.com/questions/tagged/stanford-nlp),
2943
or on the [mailing lists](http://nlp.stanford.edu/software/corenlp.shtml#Mail).

build.xml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@
160160
<target name="itest" depends="classpath,compile"
161161
description="Run core integration tests">
162162
<echo message="${ant.project.name}" />
163-
<junit fork="yes" maxmemory="8g" printsummary="off" outputtoformatters="false" forkmode="perTest" haltonfailure="true">
163+
<junit fork="yes" maxmemory="10g" printsummary="off" outputtoformatters="false" forkmode="perTest" haltonfailure="true">
164164
<classpath refid="classpath"/>
165165
<classpath path="${build.path}"/>
166166
<classpath path="${data.path}"/>
@@ -389,7 +389,9 @@
389389
<zipfileset prefix="WEB-INF/data"
390390
file="/u/nlp/data/lexparser/arabicFactored.ser.gz"/>
391391
<zipfileset prefix="WEB-INF/data"
392-
file="/u/nlp/data/lexparser/xinhuaFactored.ser.gz"/>
392+
file="/u/nlp/data/lexparser/frenchFactored.ser.gz"/>
393+
<zipfileset prefix="WEB-INF/data"
394+
file="/u/nlp/data/lexparser/chineseFactored.ser.gz"/>
393395
<zipfileset prefix="WEB-INF/data/chinesesegmenter"
394396
file="/u/nlp/data/gale/segtool/stanford-seg/classifiers-2010/05202008-ctb6.processed-chris6.lex.gz"/>
395397
<zipfileset prefix="WEB-INF/data/chinesesegmenter"

data/edu/stanford/nlp/upos/ENUniversalPOS.tsurgeon

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,11 @@ NN=target <... {/\\%/}
9898

9999
relabel target SYM
100100

101+
% fused det-noun pronouns -> PRON
102+
NN=target < (/^(?i:(somebody|something|someone|anybody|anything|anyone|everybody|everything|everyone|nobody|nothing))$/)
103+
104+
relabel target PRON
105+
101106
% NN -> NOUN (otherwise)
102107
NN=target <... {/.*/}
103108

doc/corenlp/README.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,9 @@ LICENSE
4242
CHANGES
4343
---------------------------------
4444

45+
2016-10-30 3.7.0 KBP Annotator, improved coreference, Arabic
46+
pipeline
47+
4548
2015-12-09 3.6.0 Improved coreference, OpenIE integration,
4649
Stanford CoreNLP server
4750

doc/corenlp/pom-full.xml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
<modelVersion>4.0.0</modelVersion>
33
<groupId>edu.stanford.nlp</groupId>
44
<artifactId>stanford-corenlp</artifactId>
5-
<version>3.6.0</version>
5+
<version>3.7.0</version>
66
<packaging>jar</packaging>
77
<name>Stanford CoreNLP</name>
88
<description>Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases and word dependencies, and indicate which noun phrases refer to the same entities. It provides the foundational building blocks for higher level text understanding applications.</description>
@@ -14,8 +14,8 @@
1414
</license>
1515
</licenses>
1616
<scm>
17-
<url>http://nlp.stanford.edu/software/stanford-corenlp-2015-12-06.zip</url>
18-
<connection>http://nlp.stanford.edu/software/stanford-corenlp-2015-12-06.zip</connection>
17+
<url>http://nlp.stanford.edu/software/stanford-corenlp-2016-10-30.zip</url>
18+
<connection>http://nlp.stanford.edu/software/stanford-corenlp-2016-10-30.zip</connection>
1919
</scm>
2020
<developers>
2121
<developer>
@@ -88,7 +88,7 @@
8888
<configuration>
8989
<artifacts>
9090
<artifact>
91-
<file>${project.basedir}/stanford-corenlp-3.6.0-models.jar</file>
91+
<file>${project.basedir}/stanford-corenlp-3.7.0-models.jar</file>
9292
<type>jar</type>
9393
<classifier>models</classifier>
9494
</artifact>

doc/lexparser/README.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ cross-linguistically valid representation. Note that some constructs such as pre
182182
phrases are now analyzed differently and that the set of relations was updated. Please
183183
look at the Universal Dependencies documentation for more information:
184184

185-
http://universaldependencies.github.io/docs/
185+
http://www.universaldependencies.org
186186

187187
The parser also still supports the original Stanford Dependencies representation
188188
as described in the StanfordDependenciesManual.pdf. Use the flag

doc/lexparser/README_dependencies.txt

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
1-
UNIVERSAL/STANFORD DEPENDENCIES. Stanford Parser v3.5.2
1+
UNIVERSAL/STANFORD DEPENDENCIES. Stanford Parser v3.7.0
22
-----------------------------------------------------------
33

44
IMPORTANT: Starting with version 3.5.2 the default dependencies
55
representation output by the Stanford Parser is the new Universal
66
Dependencies Representation. Universal Dependencies were developed
77
with the goal of being a cross-linguistically valid representation.
8-
Note that some constructs such as prepositional phrases are now
8+
Note that some constructions such as prepositional phrases are now
99
analyzed differently and that the set of relations was updated. The
1010
online documentation of English Universal Dependencies at
1111

12-
http://universaldependencies.github.io/docs/#language-en
12+
http://www.universaldependencies.org
1313

1414
should be consulted for the current set of dependency relations.
1515

@@ -20,7 +20,10 @@ manual. Use the flag
2020

2121
-originalDependencies
2222

23-
to obtain the original Stanford Dependencies.
23+
to obtain the original Stanford Dependencies. Note, however, that we
24+
are no longer maintaining the SD converter or representation and we
25+
therefore recommend to use the Universal Dependencies representation
26+
for any new projects.
2427

2528

2629
The manual for the English version of the Stanford Dependencies
@@ -49,18 +52,36 @@ For an overview of the original English Universal Dependencies schemes, please l
4952
at:
5053

5154
Marie-Catherine de Marneffe, Timothy Dozat, Natalia Silveira, Katri Haverinen,
52-
Filip Ginter, Joakim Nivre and Christopher D. Manning. 2014. Universal Stanford
55+
Filip Ginter, Joakim Nivre, and Christopher D. Manning. 2014. Universal Stanford
5356
dependencies: A cross-linguistic typology. 9th International Conference on
5457
Language Resources and Evaluation (LREC 2014).
55-
http://nlp.stanford.edu/pubs/USD_LREC14_paper_camera_ready.pdf
56-
57-
Please note, though, that some of the relations discussed in this paper
58+
http://nlp.stanford.edu/~manning/papers/USD_LREC14_UD_revision.pdf
59+
60+
and
61+
62+
Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič,
63+
Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira,
64+
Reut Tsarfaty, and Daniel Zeman. 2016. Universal Dependencies v1: A Multilingual
65+
Treebank Collection. In Proceedings of the Tenth International Conference on Language
66+
Resources and Evaluation (LREC 2016).
67+
http://nlp.stanford.edu/pubs/nivre2016ud.pdf
68+
69+
Please note, though, that some of the relations discussed in the first paper
5870
were subsequently updated and please refer to the online documentation at
5971

60-
http://universaldependencies.github.com/docs/
72+
http://www.universaldependencies.org
6173

6274
for an up to date documention of the set of relations.
6375

76+
For an overview of the enhanced and enhanced++ dependency representations, please look
77+
at:
78+
79+
Sebastian Schuster and Christopher D. Manning. 2016. Enhanced English Universal
80+
Dependencies: An Improved Representation for Natural Language Understanding Tasks.
81+
In Proceedings of the Tenth International Conference on Language Resources and
82+
Evaluation (LREC 2016).
83+
http://nlp.stanford.edu/~sebschu/pubs/schuster-manning-lrec2016.pdf
84+
6485
For an overview of the original typed dependencies scheme, please look
6586
at:
6687

@@ -86,7 +107,7 @@ CHANGES IN ENGLISH TYPED DEPENDENCIES CODE -- v3.5.2
86107
Switch to Universal Dependencies as the default representation.
87108
Please see the Universal Dependencies documentation at
88109

89-
http://universaldependencies.github.io/docs/
110+
http://www.universaldependencies.org
90111

91112
for more information on the new relations.
92113

doc/lexparser/StanfordDependenciesManual.bib

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -553,4 +553,11 @@ @InProceedings{chang-tseng-jurafsky-manning:2009:SSST
553553
year = {2009},
554554
address = {Boulder, Colorado},
555555
url = {pubs/ssst09-chang.pdf}
556-
}
556+
}
557+
558+
@inproceedings{schuster2016enhanced,
559+
author = {Schuster, Sebastian and Manning, Christopher D.},
560+
booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
561+
title = {Enhanced {E}nglish {U}niversal {D}ependencies: An Improved Representation for Natural Language Understanding Tasks},
562+
year = {2016}
563+
}
463 Bytes
Binary file not shown.

doc/lexparser/StanfordDependenciesManual.tex

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
baseline=-0.6ex, inner sep=-0.1cm, edge horizontal padding=3pt, edge unit distance=1.5ex}
1111
\usepackage{natbib}
1212
\bibpunct{(}{)}{,}{a}{}{;}
13+
\usepackage{color}
1314

1415
\setlength{\textwidth}{16cm}
1516
\setlength{\oddsidemargin}{-0.04cm}
@@ -34,19 +35,21 @@
3435
% Revised for the Stanford Parser v.\ 3.3 in November 2013
3536
% Revised for the Stanford Parser v.\ 3.3 in December 2013
3637
%Revised for the Stanford Parser v.\ 3.5.1 in February 2015
37-
Revised for the Stanford Parser v.\ 3.5.2 in April 2015
38+
%Revised for the Stanford Parser v.\ 3.5.2 in April 2015
39+
Revised for the Stanford Parser v.\ 3.7.0 in September 2016
3840
}
3941

4042
\begin{document}
4143
\maketitle
4244

45+
\color{red}
4346
Please note that this manual describes the original Stanford
44-
Dependencies representation. As of version 3.5.2 the default representation
47+
Dependencies representation. As of version 3.5.2, the default representation
4548
output by the Stanford Parser and Stanford CoreNLP is the new Universal Dependencies (UD)
46-
representation. Take a look at the Universal Dependencies documentation
47-
at \textsf{http://universaldependencies.github.com/docs/} for a description of UD
48-
relations.
49-
49+
representation, and we no longer maintain the original Stanford Dependencies representation. For a description of the UD
50+
representation, take a look at the Universal Dependencies documentation at \textsf{http:/www.universaldependencies.org} and
51+
the discussion of the \textit{enhanced} and \textit{enhanced++} UD representations by \citet{schuster2016enhanced}.
52+
\color{black}
5053
\section{Introduction}
5154

5255
The Stanford typed dependencies representation was designed to provide
@@ -98,9 +101,8 @@ \section{Introduction}
98101
available for Chinese, but it is not further discussed here. Starting
99102
in 2014, there has been work to extend Stanford Dependencies to be
100103
generally applicable cross-linguistically. Initial work appeared in
101-
\citet{marneffe14universal}, and the current proposal for Universal Dependencies (UD) can be found at
102-
\url{http://universaldependencies.github.io/docs/}. This work is not
103-
(yet) reflected in this manual or in our software.
104+
\citet{marneffe14universal}, and the current guidelines for Universal Dependencies (UD) can be found at
105+
\url{http://www.universaldependencies.org}.
104106
For SD, Section~\ref{def} of the manual defines the grammatical relations and the taxonomic hierarchy over
105107
them appears in section~\ref{hierarchy}. This is then followed by a description of the several variant
106108
dependency representations available, aimed at different use cases
@@ -1068,7 +1070,7 @@ \subsubsection*{$\star$ \textbf{edu.stanford.nlp.parser.lexparser.LexicalizedPar
10681070
\texttt{String[] sent = { "This", "is", "an", "easy", "sentence", "." }; \\
10691071
Tree parse = lp.apply(Sentence.toWordList(sent)); \\
10701072
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse); \\
1071-
Collection$\langle$TypedDependency$\rangle$ tdl = gs.typedDependenciesCCprocessed(); \\
1073+
Collection$\langle$TypedDependency$\rangle$ tdl = gs.typedDependencies(); \\
10721074
System.out.println(tdl); }
10731075
\end{quote}
10741076

@@ -1250,7 +1252,7 @@ \section{Further references for Stanford Dependencies}\label{refs}
12501252
team of collaborators has led to a new synthesis spanning
12511253
tokenization, morphological features, parts of speech, and
12521254
dependencies, known as Universal Dependencies:
1253-
\url{http://universaldependencies.github.io/docs/}. Since version 3.5.2
1255+
\url{http://www.universaldependencies.org}. Since version 3.5.2
12541256
the default representation output by our parser is the Universal Dependencies
12551257
representation.
12561258

doc/tagger/README-Models.txt

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -105,15 +105,11 @@ University of Stuttgart and the Seminar für Sprachwissenschaft of the
105105
University of Tübingen. See:
106106
http://www.ims.uni-stuttgart.de/projekte/CQPDemos/Bundestag/help-tagset.html
107107
This model uses features from the distributional similarity clusters
108-
built over the HGC.
108+
built over the HGC (Huge German Corpus).
109109
Performance:
110110
96.90% on the first half of the remaining 20% of the Negra corpus (dev set)
111111
(90.33% on unknown words)
112112

113-
german-dewac.tagger
114-
This model uses features from the distributional similarity clusters
115-
built from the deWac web corpus.
116-
117113
german-fast.tagger
118114
Lacks distributional similarity features, but is several times faster
119115
than the other alternatives.

0 commit comments

Comments
 (0)