stanfordnlp
diff --git a/‎README.md
Lines changed: 21 additions & 7 deletions b/‎README.md
Lines changed: 21 additions & 7 deletions
diff --git a/‎build.xml
Lines changed: 4 additions & 2 deletions b/‎build.xml
Lines changed: 4 additions & 2 deletions
diff --git a/‎data/edu/stanford/nlp/upos/ENUniversalPOS.tsurgeon
Lines changed: 5 additions & 0 deletions b/‎data/edu/stanford/nlp/upos/ENUniversalPOS.tsurgeon
Lines changed: 5 additions & 0 deletions
diff --git a/‎doc/corenlp/README.txt
Lines changed: 3 additions & 0 deletions b/‎doc/corenlp/README.txt
Lines changed: 3 additions & 0 deletions
diff --git a/‎doc/corenlp/pom-full.xml
Lines changed: 4 additions & 4 deletions b/‎doc/corenlp/pom-full.xml
Lines changed: 4 additions & 4 deletions
diff --git a/‎doc/lexparser/README.txt
Lines changed: 1 addition & 1 deletion b/‎doc/lexparser/README.txt
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/lexparser/README_dependencies.txt
Lines changed: 31 additions & 10 deletions b/‎doc/lexparser/README_dependencies.txt
Lines changed: 31 additions & 10 deletions
diff --git a/‎doc/lexparser/StanfordDependenciesManual.bib
Lines changed: 8 additions & 1 deletion b/‎doc/lexparser/StanfordDependenciesManual.bib
Lines changed: 8 additions & 1 deletion
diff --git a/‎doc/lexparser/StanfordDependenciesManual.pdf
463 Bytes b/‎doc/lexparser/StanfordDependenciesManual.pdf
463 Bytes
diff --git a/‎doc/lexparser/StanfordDependenciesManual.tex
Lines changed: 13 additions & 11 deletions b/‎doc/lexparser/StanfordDependenciesManual.tex
Lines changed: 13 additions & 11 deletions
diff --git a/‎doc/tagger/README-Models.txt
Lines changed: 1 addition & 5 deletions b/‎doc/tagger/README-Models.txt
Lines changed: 1 addition & 5 deletions
@@ -5,25 +5,39 @@ Stanford CoreNLP provides a set of natural language analysis tools written in Ja
 
 The Stanford CoreNLP code is written in Java and licensed under the GNU General Public License (v3 or later). Note that this is the full GPL, which allows many free uses, but not its use in proprietary software that you distribute to others.
 
-#### How To Compile (with ant)
+#### Build Instructions
 
-1. cd CoreNLP ; ant
+Several times a year we distribute a new version of the software, which corresponds to a stable commit.
 
-#### How To Create A Jar 
+During the time between releases, one can always use the latest, under development version of our code.
 
-1. compile the code
-2. cd CoreNLP/classes ; jar -cf ../stanford-corenlp.jar edu
+Here are some helfpul instructions to use the latest code:
+
+1. Make sure you have ant installed.
+2. Compile the code with this command: `cd CoreNLP ; ant`
+3. Then run this command to build a jar with the latest version of the code: `cd CoreNLP/classes ; jar -cf ../stanford-corenlp.jar edu`
+4. This will create a new jar called stanford-corenlp.jar in the CoreNLP folder which contains the latest code
+5. The dependencies that work with the latest code are in CoreNLP/lib and CoreNLP/liblocal, so make sure to include those in your CLASSPATH.
+6. Also make sure to download the latest versions of the [corenlp-models](http://nlp.stanford.edu/software/stanford-corenlp-models-current.jar), 
+and [english-models](http://nlp.stanford.edu/software/stanford-english-corenlp-models-current.jar), and include them in your CLASSPATH.  If you
+are processing languages other than English, make sure to download the latest version of the models jar for the language you are interested in.
 
 You can find releases of Stanford CoreNLP on [Maven Central](http://search.maven.org/#browse%7C11864822).
 
 You can find more explanation and documentation on [the Stanford CoreNLP homepage](http://nlp.stanford.edu/software/corenlp.shtml#Demo).
 
 The most recent models associated with the code in the HEAD of this repository can be found [here](http://nlp.stanford.edu/software/stanford-corenlp-models-current.jar).
 
-Some of the larger (English) models -- like the shift-reduce parser and WikiDict -- are not distributed with our default models jar. 
+Some of the larger (English) models -- like the shift-reduce parser and WikiDict -- are not distributed with our default models jar.
 The most recent version of these models can be found [here](http://nlp.stanford.edu/software/stanford-english-corenlp-models-current.jar).
 
+We distribute resources for other languages as well, including [Arabic models](http://nlp.stanford.edu/software/stanford-arabic-corenlp-models-current.jar),
+[Chinese models](http://nlp.stanford.edu/software/stanford-chinese-corenlp-models-current.jar),
+[French models](http://nlp.stanford.edu/software/stanford-french-corenlp-models-current.jar),
+[German models](http://nlp.stanford.edu/software/stanford-german-corenlp-models-current.jar),
+and [Spanish models](http://nlp.stanford.edu/software/stanford-spanish-corenlp-models-current.jar).
+
 For information about making contributions to Stanford CoreNLP, see the file [CONTRIBUTING.md](CONTRIBUTING.md).
 
-Questions about CoreNLP can either be posted on StackOverflow with the tag [stanford-nlp](http://stackoverflow.com/questions/tagged/stanford-nlp), 
+Questions about CoreNLP can either be posted on StackOverflow with the tag [stanford-nlp](http://stackoverflow.com/questions/tagged/stanford-nlp),
   or on the [mailing lists](http://nlp.stanford.edu/software/corenlp.shtml#Mail).
@@ -160,7 +160,7 @@
   <target name="itest" depends="classpath,compile"
           description="Run core integration tests">
     <echo message="${ant.project.name}" />
-    <junit fork="yes" maxmemory="8g" printsummary="off" outputtoformatters="false" forkmode="perTest" haltonfailure="true">
+    <junit fork="yes" maxmemory="10g" printsummary="off" outputtoformatters="false" forkmode="perTest" haltonfailure="true">
       <classpath refid="classpath"/>
       <classpath path="${build.path}"/>
       <classpath path="${data.path}"/>
@@ -389,7 +389,9 @@
         <zipfileset prefix="WEB-INF/data"
                     file="/u/nlp/data/lexparser/arabicFactored.ser.gz"/>
         <zipfileset prefix="WEB-INF/data"
-                    file="/u/nlp/data/lexparser/xinhuaFactored.ser.gz"/>
+                    file="/u/nlp/data/lexparser/frenchFactored.ser.gz"/>
+        <zipfileset prefix="WEB-INF/data"
+                    file="/u/nlp/data/lexparser/chineseFactored.ser.gz"/>
         <zipfileset prefix="WEB-INF/data/chinesesegmenter"
                     file="/u/nlp/data/gale/segtool/stanford-seg/classifiers-2010/05202008-ctb6.processed-chris6.lex.gz"/>
         <zipfileset prefix="WEB-INF/data/chinesesegmenter"
 
@@ -98,6 +98,11 @@ NN=target <... {/\\%/}
 
 relabel target SYM
 
+% fused det-noun pronouns -> PRON
+NN=target < (/^(?i:(somebody|something|someone|anybody|anything|anyone|everybody|everything|everyone|nobody|nothing))$/)
+
+relabel target PRON
+
 % NN -> NOUN (otherwise)
 NN=target <... {/.*/}
 
 
@@ -42,6 +42,9 @@ LICENSE
 CHANGES
 ---------------------------------
 
+2016-10-30    3.7.0     KBP Annotator, improved coreference, Arabic 
+                        pipeline 
+
 2015-12-09    3.6.0     Improved coreference, OpenIE integration, 
                         Stanford CoreNLP server 
 
 
@@ -2,7 +2,7 @@
   <modelVersion>4.0.0</modelVersion>
   <groupId>edu.stanford.nlp</groupId>
   <artifactId>stanford-corenlp</artifactId>
-  <version>3.6.0</version>
+  <version>3.7.0</version>
   <packaging>jar</packaging>
   <name>Stanford CoreNLP</name>
   <description>Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases and word dependencies, and indicate which noun phrases refer to the same entities. It provides the foundational building blocks for higher level text understanding applications.</description>
@@ -14,8 +14,8 @@
     </license>
   </licenses>
   <scm>
-    <url>http://nlp.stanford.edu/software/stanford-corenlp-2015-12-06.zip</url>
-    <connection>http://nlp.stanford.edu/software/stanford-corenlp-2015-12-06.zip</connection>
+    <url>http://nlp.stanford.edu/software/stanford-corenlp-2016-10-30.zip</url>
+    <connection>http://nlp.stanford.edu/software/stanford-corenlp-2016-10-30.zip</connection>
   </scm>
   <developers>
     <developer>
@@ -88,7 +88,7 @@
             <configuration>
               <artifacts>
                 <artifact>
-                  <file>${project.basedir}/stanford-corenlp-3.6.0-models.jar</file>
+                  <file>${project.basedir}/stanford-corenlp-3.7.0-models.jar</file>
                   <type>jar</type>
                   <classifier>models</classifier>
                 </artifact>
 
@@ -182,7 +182,7 @@ cross-linguistically valid representation. Note that some constructs such as pre
 phrases are now analyzed differently and that the set of relations was updated. Please 
 look at the Universal Dependencies documentation for more information:
 
-      http://universaldependencies.github.io/docs/
+      http://www.universaldependencies.org
 
 The parser also still supports the original Stanford Dependencies representation 
 as described in the StanfordDependenciesManual.pdf. Use the flag
 
@@ -1,15 +1,15 @@
-UNIVERSAL/STANFORD DEPENDENCIES.  Stanford Parser v3.5.2
+UNIVERSAL/STANFORD DEPENDENCIES.  Stanford Parser v3.7.0
 -----------------------------------------------------------
 
 IMPORTANT: Starting with version 3.5.2 the default dependencies
 representation output by the Stanford Parser is the new Universal
 Dependencies Representation. Universal Dependencies were developed
 with the goal of being a cross-linguistically valid representation.
-Note that some constructs such as prepositional phrases are now 
+Note that some constructions such as prepositional phrases are now 
 analyzed differently and that the set of relations was updated. The
 online documentation of English Universal Dependencies at
 
-    http://universaldependencies.github.io/docs/#language-en
+    http://www.universaldependencies.org
 
 should be consulted for the current set of dependency relations.
 
@@ -20,7 +20,10 @@ manual. Use the flag
 
     -originalDependencies
 
-to obtain the original Stanford Dependencies.
+to obtain the original Stanford Dependencies. Note, however, that we
+are no longer maintaining the SD converter or representation and we
+therefore recommend to use the Universal Dependencies representation
+for any new projects.
 
 
 The manual for the English version of the Stanford Dependencies
@@ -49,18 +52,36 @@ For an overview of the original English Universal Dependencies schemes, please l
 at:
 
   Marie-Catherine de Marneffe, Timothy Dozat, Natalia Silveira, Katri Haverinen,
-  Filip Ginter, Joakim Nivre and Christopher D. Manning. 2014. Universal Stanford
+  Filip Ginter, Joakim Nivre, and Christopher D. Manning. 2014. Universal Stanford
   dependencies: A cross-linguistic typology. 9th International Conference on
   Language Resources and Evaluation (LREC 2014).
-  http://nlp.stanford.edu/pubs/USD_LREC14_paper_camera_ready.pdf
-
-Please note, though, that some of the relations discussed in this paper
+  http://nlp.stanford.edu/~manning/papers/USD_LREC14_UD_revision.pdf
+  
+  and
+  
+  Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič,
+  Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira,
+  Reut Tsarfaty, and Daniel Zeman. 2016. Universal Dependencies v1: A Multilingual 
+  Treebank Collection. In Proceedings of the Tenth International Conference on Language 
+  Resources and Evaluation (LREC 2016).
+  http://nlp.stanford.edu/pubs/nivre2016ud.pdf
+  
+Please note, though, that some of the relations discussed in the first paper
 were subsequently updated and please refer to the online documentation at
 
-    http://universaldependencies.github.com/docs/
+    http://www.universaldependencies.org
 
 for an up to date documention of the set of relations.
 
+For an overview of the enhanced and enhanced++ dependency representations, please look 
+at:
+
+  Sebastian Schuster and Christopher D. Manning. 2016. Enhanced English Universal 
+  Dependencies: An Improved Representation for Natural Language Understanding Tasks. 
+  In Proceedings of the Tenth International Conference on Language Resources and 
+  Evaluation (LREC 2016).
+  http://nlp.stanford.edu/~sebschu/pubs/schuster-manning-lrec2016.pdf
+
 For an overview of the original typed dependencies scheme, please look
 at:
 
@@ -86,7 +107,7 @@ CHANGES IN ENGLISH TYPED DEPENDENCIES CODE -- v3.5.2
 Switch to Universal Dependencies as the default representation.
 Please see the Universal Dependencies documentation at
 
-      http://universaldependencies.github.io/docs/
+      http://www.universaldependencies.org
 
 for more information on the new relations.
 
 
@@ -553,4 +553,11 @@ @InProceedings{chang-tseng-jurafsky-manning:2009:SSST
 year      = {2009},
 address   = {Boulder, Colorado},
 url       = {pubs/ssst09-chang.pdf}
-}
+}
+
+@inproceedings{schuster2016enhanced,
+author = {Schuster, Sebastian and Manning, Christopher D.},
+booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
+title = {Enhanced {E}nglish {U}niversal {D}ependencies: An Improved Representation for Natural Language Understanding Tasks},
+year = {2016}
+}
@@ -10,6 +10,7 @@
     baseline=-0.6ex, inner sep=-0.1cm, edge horizontal padding=3pt, edge unit distance=1.5ex}
 \usepackage{natbib}
 \bibpunct{(}{)}{,}{a}{}{;}
+\usepackage{color}
 
 \setlength{\textwidth}{16cm}
 \setlength{\oddsidemargin}{-0.04cm}
@@ -34,19 +35,21 @@
 % Revised for the Stanford Parser v.\ 3.3 in November 2013
 % Revised for the Stanford Parser v.\ 3.3 in December 2013
 %Revised for the Stanford Parser v.\ 3.5.1 in February 2015
-Revised for the Stanford Parser v.\ 3.5.2 in April 2015
+%Revised for the Stanford Parser v.\ 3.5.2 in April 2015
+Revised for the Stanford Parser v.\ 3.7.0 in September 2016
 }
 
 \begin{document}
 \maketitle
 
+\color{red}
 Please note that this manual describes the original Stanford 
-Dependencies representation. As of version 3.5.2 the default representation
+Dependencies representation. As of version 3.5.2, the default representation
 output by the Stanford Parser and Stanford CoreNLP is the new Universal Dependencies (UD)
-representation. Take a look at the Universal Dependencies documentation 
-at \textsf{http://universaldependencies.github.com/docs/} for a description of UD 
-relations.
-
+representation, and we no longer maintain the original Stanford Dependencies representation. For a description of the UD 
+representation, take a look at the Universal Dependencies documentation at \textsf{http:/www.universaldependencies.org} and
+the discussion of the \textit{enhanced} and \textit{enhanced++} UD representations by \citet{schuster2016enhanced}.
+\color{black}
 \section{Introduction}
 
 The Stanford typed dependencies representation was designed to provide
@@ -98,9 +101,8 @@ \section{Introduction}
 available for Chinese, but it is not further discussed here.  Starting
 in 2014, there has been work to extend Stanford Dependencies to be
 generally applicable cross-linguistically. Initial work appeared in
-\citet{marneffe14universal}, and the current proposal for Universal Dependencies (UD) can be found at
-\url{http://universaldependencies.github.io/docs/}. This work is not
-(yet) reflected in this manual or in our software.
+\citet{marneffe14universal}, and the current guidelines for Universal Dependencies (UD) can be found at
+\url{http://www.universaldependencies.org}.
 For SD, Section~\ref{def} of the manual defines the grammatical relations and the taxonomic hierarchy over
 them appears in section~\ref{hierarchy}.  This is then followed by a description of the several variant
 dependency representations available, aimed at different use cases
@@ -1068,7 +1070,7 @@ \subsubsection*{$\star$ \textbf{edu.stanford.nlp.parser.lexparser.LexicalizedPar
 \texttt{String[] sent = { "This", "is", "an", "easy", "sentence", "." }; \\
 Tree parse = lp.apply(Sentence.toWordList(sent)); \\
 GrammaticalStructure gs = gsf.newGrammaticalStructure(parse); \\
-Collection$\langle$TypedDependency$\rangle$ tdl = gs.typedDependenciesCCprocessed(); \\
+Collection$\langle$TypedDependency$\rangle$ tdl = gs.typedDependencies(); \\
 System.out.println(tdl); }
 \end{quote}
 
@@ -1250,7 +1252,7 @@ \section{Further references for Stanford Dependencies}\label{refs}
 team of collaborators has led to a new synthesis spanning
 tokenization, morphological features, parts of speech, and
 dependencies, known as Universal Dependencies: 
-\url{http://universaldependencies.github.io/docs/}.  Since version 3.5.2 
+\url{http://www.universaldependencies.org}.  Since version 3.5.2 
 the default representation output by our parser is the Universal Dependencies
 representation.
 
 
@@ -105,15 +105,11 @@ University of Stuttgart and the Seminar für Sprachwissenschaft of the
 University of Tübingen. See: 
 http://www.ims.uni-stuttgart.de/projekte/CQPDemos/Bundestag/help-tagset.html
 This model uses features from the distributional similarity clusters
-built over the HGC.
+built over the HGC (Huge German Corpus).
 Performance:
 96.90% on the first half of the remaining 20% of the Negra corpus (dev set)
 (90.33% on unknown words)
 
-german-dewac.tagger
-This model uses features from the distributional similarity clusters
-built from the deWac web corpus.
-
 german-fast.tagger
 Lacks distributional similarity features, but is several times faster
 than the other alternatives.