Skip to content

[Deprecated] A simple Julia interface to the Stanford CoreNLP toolkit.

License

Notifications You must be signed in to change notification settings

JuliaText/CoreNLP.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CoreNLP.jl

An interface to Stanford's CoreNLP toolkit via the corenlp-python bindings.

DEPRECATION NOTICE This package is not currently being maintained for current versions of julia. If you have interest in getting it working, feel free to raise issues.

Features

State-of-the-art dependency parsing, part-of-speech tagging, coreference resolution, tokenization, lemmatization, and named-entity recognition for English text.

Installation

You must have Python 2.7 installed in a place where PyCall can find it. The Enthought distribution has proven problematic, but Anaconda works fine.

  1. Install CoreNLP.jl from the Julia REPL:
Pkg.clone("CoreNLP")
  1. Download the CoreNLP package and unzip it to a location of your choosing. Direct download link: Version 3.3.1.

Usage

Interactive mode (uses CoreNLP's command-line interface behind the scenes, suitable for parsing ~20 sentences or less):

> using CoreNLP
> corenlp_init("/Users/malmaud/stanford-corenlp-full-2014-01-04") # Replace this with wherever you extracted Stanford's CoreNLP toolkit to. This may take a few minutes to execute as the large statistical language models are loaded into memory.
> my_parse = parse("This is a simple sentence. We will see how well it parses.")
> my_parse.sentences[2].dep_parse # The dependency parse for the second sentence. The first two numbers are the index of the child and parent word token.
DepParse([DepNode(3,0,"root"),DepNode(1,3,"nsubj"),DepNode(2,3,"aux"),DepNode(4,5,"advmod"),DepNode(5,7,"advmod"),DepNode(6,7,"nsubj"),DepNode(7,3,"ccomp")])
> my_parse.sentences[2].words[3] # The third word token in the second sentence, with all its annotations
Word("see","see","O","VB")
> my_parse.corefs[1].mentions # The set of all mentions that correspond to my_parse.corefs[1].repr (The representative mention), identified by a (sentence, word-start, word-end) address. The last coordinate is of the root word of the coference.
2-element Array{Mention,1}:
 Mention(1,1,1,1)
 Mention(2,6,6,6)
> pprint(my_parse) # Pretty-printing

This outputs

Coreferencing "a simple sentence (Mention(1,3,5,5))":
This (Mention(1,1,1,1))
it (Mention(2,6,6,6))

Sentence 1:
Words:
Word("This","this","O","DT")
Word("is","be","O","VBZ")
Word("a","a","O","DT")
Word("simple","simple","O","JJ")
Word("sentence","sentence","O","NN")
Word(".",".","O",".")
Dependency parse:
sentence <=(root) ROOT
This <=(nsubj) sentence
is <=(cop) sentence
a <=(det) sentence
simple <=(amod) sentence

Sentence 2:
Words:
Word("We","we","O","PRP")
Word("will","will","O","MD")
Word("see","see","O","VB")
Word("how","how","O","WRB")
Word("well","well","O","RB")
Word("it","it","O","PRP")
Word("parses","parse","O","VBZ")
Word(".",".","O",".")
Dependency parse:
see <=(root) ROOT
We <=(nsubj) see
will <=(aux) see
how <=(advmod) well
well <=(advmod) parses
it <=(nsubj) parses
parses <=(ccomp) see

Batch mode (process a directory of files):

> using CoreNLP
> batch_parse("/data/my_files", "/Users/malmaud/stanford-corenlp-full-2014-01-04", memory="8g")

This processes each text file in the folder /data/my_files and return an array of Annotation objects, one for each file. The memory keyword controls how much memory the Java virtual machine is allocated. Each invocation of batch_parse reloads CoreNLP into memory.

About

[Deprecated] A simple Julia interface to the Stanford CoreNLP toolkit.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published