Please sign in to comment.
I think NER is done.
- Loading branch information...
Showing with 320 additions and 81 deletions.
- +17 −0 README_ner.md
- +3 −2 corenlp.thrift
- +15 −8 gen-py/corenlp/StanfordCoreNLP-remote
- +260 −51 gen-py/corenlp/StanfordCoreNLP.py
- BIN gen-py/corenlp/StanfordCoreNLP.pyc
- BIN gen-py/corenlp/__init__.pyc
- BIN gen-py/corenlp/ttypes.pyc
- +8 −7 scripts/ner_client.py
- +11 −3 src/StanfordCoreNLPHandler.java
- +6 −1 src/ner/StanfordNERThrift.java
- +0 −9 src/parser/StanfordParserThrift.java
- BIN stanford-corenlp-wrapper.jar
|@@ -0,0 +1,17 @@|
|+How to Get Named Entities from the Stanford Named Entity Recognizer (NER) via this Apache Thrift Server|
|+## How to Interact with the Methods and Data Structures|
|+The core return type here is a data structure called `NamedEntity` which has four members:|
|+* `entity`: A string containing the actual named entity itself, potentially a multi-word expression if that's what Stanford NER recognized.|
|+* `tag`: A string containing the tag assigned to this named entity (PERSON, LOCATION, etc.). Should always be upper-case.|
|+* `startOffset`: All named entities exist in some sentence. This integer represents the starting character offset of this named entity in its sentence.|
|+* `endOffset`: Like `startOffset`, only tells you the character offset of the last character of the named entity in its sentence.|
|+In order to get these `NamedEntity` objects, you have three choices, depending on what kind of data you'd like to recognize named entities in. The return type for ALL of these is a Java `ArrayList`/Python list containing `NamedEntity` objects corresponding to entities recognized across the ENTIRETY of your text, no matter how many sentences, parse trees, etc. were passed in. If you'd like to recognize named entities in:|
|+* arbitrary (potentially several sentences worth of), untokenized, un-parsed, un-tagged text, and you're cool with CoreNLP handling all of those tasks for you, call `get_entities_from_text(text)`. `text` is a Java `String`/Python `str` or `unicode`.|
|+* one sentence worth of tokens (the output from some sentence and then word tokenizer), call `get_entities_from_tokens(tokens)`, where `tokens` is a Java `List<String>`/Python list containing `str`/`unicode`. Since Stanford NER requires either parse trees or POS-tagged text, the Stanford Parser will be called.|
|+* one or more parse trees in Stanford Parser's "oneline" output format, call `get_entities_from_trees(trees)`, where `trees` is a Java `List<String>`/Python list containing `str`/`unicode`.|
Oops, something went wrong.