Clojure

README.md

corenlp

Current build: CircleCI

Natural language processing in Clojure/ClojureScript based on the Stanford-CoreNLP parser.

Warning: Under heavy rewrite. Please refrain from trying to use this until it is complete!

Usage

Tokenization

(use 'corenlp)
(def text "This is a simple sentence.")
(tokenize text)

Part-of-Speech Tagging

(use 'corenlp)
(pos-tag (tokenize "Colorless green ideas sleep furiously."))
;; => [#<TaggedWord Colorless/JJ> #<TaggedWord green/JJ> ...]

Returns a list of TaggedWord objects. Call .tag() on a TaggedWord instance to get its tag. For more information, see the relevant Javadoc

Parsing

To parse a sentence:

(use 'corenlp)
(parse (tokenize text))

You will get back a LabeledScoredTreeNode which you can plug in to other Stanford CoreNLP functions or can convert to a standard Treebank string with:

(str (parse (tokenize text)))

Stanford Dependencies

(dependency-graph "I like cheese.")

will parse the sentence and return the dependency graph as a loom graph, which you can then traverse with standard graph algorithms like shortest path, etc. You can also view it:

(def graph (dependency-graph "I like cheese."))
(use 'loom.io)
(view graph)

This requires GraphViz to be installed.

License

Copyright (C) 2011-2016 Contributors (Clojure code only)

Distributed under the Eclipse Public License, the same as Clojure.

Contributors

  • Cory Giles
  • Hans Engel
  • Damien Stanton