Skip to content

Commit f0b44ad

Browse files
Grace MuznyStanford NLP
authored andcommitted
updates to CQSC extraction
1 parent 88b5102 commit f0b44ad

File tree

193 files changed

+94870
-99914
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

193 files changed

+94870
-99914
lines changed

README.md

Lines changed: 7 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -5,39 +5,25 @@ Stanford CoreNLP provides a set of natural language analysis tools written in Ja
55

66
The Stanford CoreNLP code is written in Java and licensed under the GNU General Public License (v3 or later). Note that this is the full GPL, which allows many free uses, but not its use in proprietary software that you distribute to others.
77

8-
#### Build Instructions
8+
#### How To Compile (with ant)
99

10-
Several times a year we distribute a new version of the software, which corresponds to a stable commit.
10+
1. cd CoreNLP ; ant
1111

12-
During the time between releases, one can always use the latest, under development version of our code.
12+
#### How To Create A Jar
1313

14-
Here are some helfpul instructions to use the latest code:
15-
16-
1. Make sure you have ant installed.
17-
2. Compile the code with this command: `cd CoreNLP ; ant`
18-
3. Then run this command to build a jar with the latest version of the code: `cd CoreNLP/classes ; jar -cf ../stanford-corenlp.jar edu`
19-
4. This will create a new jar called stanford-corenlp.jar in the CoreNLP folder which contains the latest code
20-
5. The dependencies that work with the latest code are in CoreNLP/lib and CoreNLP/liblocal, so make sure to include those in your CLASSPATH.
21-
6. Also make sure to download the latest versions of the [corenlp-models](http://nlp.stanford.edu/software/stanford-corenlp-models-current.jar),
22-
and [english-models](http://nlp.stanford.edu/software/stanford-english-corenlp-models-current.jar), and include them in your CLASSPATH. If you
23-
are processing languages other than English, make sure to download the latest version of the models jar for the language you are interested in.
14+
1. compile the code
15+
2. cd CoreNLP/classes ; jar -cf ../stanford-corenlp.jar edu
2416

2517
You can find releases of Stanford CoreNLP on [Maven Central](http://search.maven.org/#browse%7C11864822).
2618

2719
You can find more explanation and documentation on [the Stanford CoreNLP homepage](http://nlp.stanford.edu/software/corenlp.shtml#Demo).
2820

2921
The most recent models associated with the code in the HEAD of this repository can be found [here](http://nlp.stanford.edu/software/stanford-corenlp-models-current.jar).
3022

31-
Some of the larger (English) models -- like the shift-reduce parser and WikiDict -- are not distributed with our default models jar.
23+
Some of the larger (English) models -- like the shift-reduce parser and WikiDict -- are not distributed with our default models jar.
3224
The most recent version of these models can be found [here](http://nlp.stanford.edu/software/stanford-english-corenlp-models-current.jar).
3325

34-
We distribute resources for other languages as well, including [Arabic models](http://nlp.stanford.edu/software/stanford-arabic-corenlp-models-current.jar),
35-
[Chinese models](http://nlp.stanford.edu/software/stanford-chinese-corenlp-models-current.jar),
36-
[French models](http://nlp.stanford.edu/software/stanford-french-corenlp-models-current.jar),
37-
[German models](http://nlp.stanford.edu/software/stanford-german-corenlp-models-current.jar),
38-
and [Spanish models](http://nlp.stanford.edu/software/stanford-spanish-corenlp-models-current.jar).
39-
4026
For information about making contributions to Stanford CoreNLP, see the file [CONTRIBUTING.md](CONTRIBUTING.md).
4127

42-
Questions about CoreNLP can either be posted on StackOverflow with the tag [stanford-nlp](http://stackoverflow.com/questions/tagged/stanford-nlp),
28+
Questions about CoreNLP can either be posted on StackOverflow with the tag [stanford-nlp](http://stackoverflow.com/questions/tagged/stanford-nlp),
4329
or on the [mailing lists](http://nlp.stanford.edu/software/corenlp.shtml#Mail).

build.xml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@
160160
<target name="itest" depends="classpath,compile"
161161
description="Run core integration tests">
162162
<echo message="${ant.project.name}" />
163-
<junit fork="yes" maxmemory="10g" printsummary="off" outputtoformatters="false" forkmode="perTest" haltonfailure="true">
163+
<junit fork="yes" maxmemory="8g" printsummary="off" outputtoformatters="false" forkmode="perTest" haltonfailure="true">
164164
<classpath refid="classpath"/>
165165
<classpath path="${build.path}"/>
166166
<classpath path="${data.path}"/>
@@ -389,9 +389,7 @@
389389
<zipfileset prefix="WEB-INF/data"
390390
file="/u/nlp/data/lexparser/arabicFactored.ser.gz"/>
391391
<zipfileset prefix="WEB-INF/data"
392-
file="/u/nlp/data/lexparser/frenchFactored.ser.gz"/>
393-
<zipfileset prefix="WEB-INF/data"
394-
file="/u/nlp/data/lexparser/chineseFactored.ser.gz"/>
392+
file="/u/nlp/data/lexparser/xinhuaFactored.ser.gz"/>
395393
<zipfileset prefix="WEB-INF/data/chinesesegmenter"
396394
file="/u/nlp/data/gale/segtool/stanford-seg/classifiers-2010/05202008-ctb6.processed-chris6.lex.gz"/>
397395
<zipfileset prefix="WEB-INF/data/chinesesegmenter"

data/edu/stanford/nlp/upos/ENUniversalPOS.tsurgeon

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -98,11 +98,6 @@ NN=target <... {/\\%/}
9898

9999
relabel target SYM
100100

101-
% fused det-noun pronouns -> PRON
102-
NN=target < (/^(?i:(somebody|something|someone|anybody|anything|anyone|everybody|everything|everyone|nobody|nothing))$/)
103-
104-
relabel target PRON
105-
106101
% NN -> NOUN (otherwise)
107102
NN=target <... {/.*/}
108103

doc/corenlp/README.txt

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,9 +42,6 @@ LICENSE
4242
CHANGES
4343
---------------------------------
4444

45-
2016-10-30 3.7.0 KBP Annotator, improved coreference, Arabic
46-
pipeline
47-
4845
2015-12-09 3.6.0 Improved coreference, OpenIE integration,
4946
Stanford CoreNLP server
5047

doc/corenlp/pom-full.xml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
<modelVersion>4.0.0</modelVersion>
33
<groupId>edu.stanford.nlp</groupId>
44
<artifactId>stanford-corenlp</artifactId>
5-
<version>3.7.0</version>
5+
<version>3.6.0</version>
66
<packaging>jar</packaging>
77
<name>Stanford CoreNLP</name>
88
<description>Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases and word dependencies, and indicate which noun phrases refer to the same entities. It provides the foundational building blocks for higher level text understanding applications.</description>
@@ -14,8 +14,8 @@
1414
</license>
1515
</licenses>
1616
<scm>
17-
<url>http://nlp.stanford.edu/software/stanford-corenlp-2016-10-30.zip</url>
18-
<connection>http://nlp.stanford.edu/software/stanford-corenlp-2016-10-30.zip</connection>
17+
<url>http://nlp.stanford.edu/software/stanford-corenlp-2015-12-06.zip</url>
18+
<connection>http://nlp.stanford.edu/software/stanford-corenlp-2015-12-06.zip</connection>
1919
</scm>
2020
<developers>
2121
<developer>
@@ -88,7 +88,7 @@
8888
<configuration>
8989
<artifacts>
9090
<artifact>
91-
<file>${project.basedir}/stanford-corenlp-3.7.0-models.jar</file>
91+
<file>${project.basedir}/stanford-corenlp-3.6.0-models.jar</file>
9292
<type>jar</type>
9393
<classifier>models</classifier>
9494
</artifact>

doc/lexparser/README.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ cross-linguistically valid representation. Note that some constructs such as pre
182182
phrases are now analyzed differently and that the set of relations was updated. Please
183183
look at the Universal Dependencies documentation for more information:
184184

185-
http://www.universaldependencies.org
185+
http://universaldependencies.github.io/docs/
186186

187187
The parser also still supports the original Stanford Dependencies representation
188188
as described in the StanfordDependenciesManual.pdf. Use the flag

doc/lexparser/README_dependencies.txt

Lines changed: 10 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
1-
UNIVERSAL/STANFORD DEPENDENCIES. Stanford Parser v3.7.0
1+
UNIVERSAL/STANFORD DEPENDENCIES. Stanford Parser v3.5.2
22
-----------------------------------------------------------
33

44
IMPORTANT: Starting with version 3.5.2 the default dependencies
55
representation output by the Stanford Parser is the new Universal
66
Dependencies Representation. Universal Dependencies were developed
77
with the goal of being a cross-linguistically valid representation.
8-
Note that some constructions such as prepositional phrases are now
8+
Note that some constructs such as prepositional phrases are now
99
analyzed differently and that the set of relations was updated. The
1010
online documentation of English Universal Dependencies at
1111

12-
http://www.universaldependencies.org
12+
http://universaldependencies.github.io/docs/#language-en
1313

1414
should be consulted for the current set of dependency relations.
1515

@@ -20,10 +20,7 @@ manual. Use the flag
2020

2121
-originalDependencies
2222

23-
to obtain the original Stanford Dependencies. Note, however, that we
24-
are no longer maintaining the SD converter or representation and we
25-
therefore recommend to use the Universal Dependencies representation
26-
for any new projects.
23+
to obtain the original Stanford Dependencies.
2724

2825

2926
The manual for the English version of the Stanford Dependencies
@@ -52,36 +49,18 @@ For an overview of the original English Universal Dependencies schemes, please l
5249
at:
5350

5451
Marie-Catherine de Marneffe, Timothy Dozat, Natalia Silveira, Katri Haverinen,
55-
Filip Ginter, Joakim Nivre, and Christopher D. Manning. 2014. Universal Stanford
52+
Filip Ginter, Joakim Nivre and Christopher D. Manning. 2014. Universal Stanford
5653
dependencies: A cross-linguistic typology. 9th International Conference on
5754
Language Resources and Evaluation (LREC 2014).
58-
http://nlp.stanford.edu/~manning/papers/USD_LREC14_UD_revision.pdf
59-
60-
and
61-
62-
Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič,
63-
Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira,
64-
Reut Tsarfaty, and Daniel Zeman. 2016. Universal Dependencies v1: A Multilingual
65-
Treebank Collection. In Proceedings of the Tenth International Conference on Language
66-
Resources and Evaluation (LREC 2016).
67-
http://nlp.stanford.edu/pubs/nivre2016ud.pdf
68-
69-
Please note, though, that some of the relations discussed in the first paper
55+
http://nlp.stanford.edu/pubs/USD_LREC14_paper_camera_ready.pdf
56+
57+
Please note, though, that some of the relations discussed in this paper
7058
were subsequently updated and please refer to the online documentation at
7159

72-
http://www.universaldependencies.org
60+
http://universaldependencies.github.com/docs/
7361

7462
for an up to date documention of the set of relations.
7563

76-
For an overview of the enhanced and enhanced++ dependency representations, please look
77-
at:
78-
79-
Sebastian Schuster and Christopher D. Manning. 2016. Enhanced English Universal
80-
Dependencies: An Improved Representation for Natural Language Understanding Tasks.
81-
In Proceedings of the Tenth International Conference on Language Resources and
82-
Evaluation (LREC 2016).
83-
http://nlp.stanford.edu/~sebschu/pubs/schuster-manning-lrec2016.pdf
84-
8564
For an overview of the original typed dependencies scheme, please look
8665
at:
8766

@@ -107,7 +86,7 @@ CHANGES IN ENGLISH TYPED DEPENDENCIES CODE -- v3.5.2
10786
Switch to Universal Dependencies as the default representation.
10887
Please see the Universal Dependencies documentation at
10988

110-
http://www.universaldependencies.org
89+
http://universaldependencies.github.io/docs/
11190

11291
for more information on the new relations.
11392

doc/lexparser/StanfordDependenciesManual.bib

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -553,11 +553,4 @@ @InProceedings{chang-tseng-jurafsky-manning:2009:SSST
553553
year = {2009},
554554
address = {Boulder, Colorado},
555555
url = {pubs/ssst09-chang.pdf}
556-
}
557-
558-
@inproceedings{schuster2016enhanced,
559-
author = {Schuster, Sebastian and Manning, Christopher D.},
560-
booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
561-
title = {Enhanced {E}nglish {U}niversal {D}ependencies: An Improved Representation for Natural Language Understanding Tasks},
562-
year = {2016}
563-
}
556+
}
-463 Bytes
Binary file not shown.

doc/lexparser/StanfordDependenciesManual.tex

Lines changed: 11 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@
1010
baseline=-0.6ex, inner sep=-0.1cm, edge horizontal padding=3pt, edge unit distance=1.5ex}
1111
\usepackage{natbib}
1212
\bibpunct{(}{)}{,}{a}{}{;}
13-
\usepackage{color}
1413

1514
\setlength{\textwidth}{16cm}
1615
\setlength{\oddsidemargin}{-0.04cm}
@@ -35,21 +34,19 @@
3534
% Revised for the Stanford Parser v.\ 3.3 in November 2013
3635
% Revised for the Stanford Parser v.\ 3.3 in December 2013
3736
%Revised for the Stanford Parser v.\ 3.5.1 in February 2015
38-
%Revised for the Stanford Parser v.\ 3.5.2 in April 2015
39-
Revised for the Stanford Parser v.\ 3.7.0 in September 2016
37+
Revised for the Stanford Parser v.\ 3.5.2 in April 2015
4038
}
4139

4240
\begin{document}
4341
\maketitle
4442

45-
\color{red}
4643
Please note that this manual describes the original Stanford
47-
Dependencies representation. As of version 3.5.2, the default representation
44+
Dependencies representation. As of version 3.5.2 the default representation
4845
output by the Stanford Parser and Stanford CoreNLP is the new Universal Dependencies (UD)
49-
representation, and we no longer maintain the original Stanford Dependencies representation. For a description of the UD
50-
representation, take a look at the Universal Dependencies documentation at \textsf{http:/www.universaldependencies.org} and
51-
the discussion of the \textit{enhanced} and \textit{enhanced++} UD representations by \citet{schuster2016enhanced}.
52-
\color{black}
46+
representation. Take a look at the Universal Dependencies documentation
47+
at \textsf{http://universaldependencies.github.com/docs/} for a description of UD
48+
relations.
49+
5350
\section{Introduction}
5451

5552
The Stanford typed dependencies representation was designed to provide
@@ -101,8 +98,9 @@ \section{Introduction}
10198
available for Chinese, but it is not further discussed here. Starting
10299
in 2014, there has been work to extend Stanford Dependencies to be
103100
generally applicable cross-linguistically. Initial work appeared in
104-
\citet{marneffe14universal}, and the current guidelines for Universal Dependencies (UD) can be found at
105-
\url{http://www.universaldependencies.org}.
101+
\citet{marneffe14universal}, and the current proposal for Universal Dependencies (UD) can be found at
102+
\url{http://universaldependencies.github.io/docs/}. This work is not
103+
(yet) reflected in this manual or in our software.
106104
For SD, Section~\ref{def} of the manual defines the grammatical relations and the taxonomic hierarchy over
107105
them appears in section~\ref{hierarchy}. This is then followed by a description of the several variant
108106
dependency representations available, aimed at different use cases
@@ -1070,7 +1068,7 @@ \subsubsection*{$\star$ \textbf{edu.stanford.nlp.parser.lexparser.LexicalizedPar
10701068
\texttt{String[] sent = { "This", "is", "an", "easy", "sentence", "." }; \\
10711069
Tree parse = lp.apply(Sentence.toWordList(sent)); \\
10721070
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse); \\
1073-
Collection$\langle$TypedDependency$\rangle$ tdl = gs.typedDependencies(); \\
1071+
Collection$\langle$TypedDependency$\rangle$ tdl = gs.typedDependenciesCCprocessed(); \\
10741072
System.out.println(tdl); }
10751073
\end{quote}
10761074

@@ -1252,7 +1250,7 @@ \section{Further references for Stanford Dependencies}\label{refs}
12521250
team of collaborators has led to a new synthesis spanning
12531251
tokenization, morphological features, parts of speech, and
12541252
dependencies, known as Universal Dependencies:
1255-
\url{http://www.universaldependencies.org}. Since version 3.5.2
1253+
\url{http://universaldependencies.github.io/docs/}. Since version 3.5.2
12561254
the default representation output by our parser is the Universal Dependencies
12571255
representation.
12581256

0 commit comments

Comments
 (0)