## Stanford NLP

- [usage](https://stanfordnlp.github.io/stanfordnlp/installation_download.html)
- [supported languages](https://stanfordnlp.github.io/stanfordnlp/installation_download.html)
- [python support](https://github.com/Lynten/stanford-corenlp)

In [27]:
import stanfordnlp
import json

#stanfordnlp.download('en')   # This downloads the English models for the neural pipeline

In [28]:
nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
doc = nlp("Biff Tannen")

for sent in doc.sentences:
    sent.print_dependencies()

Use device: gpu
---
Loading: tokenize
With settings: 
{'model_path': '/home/casey/stanfordnlp_resources/en_ewt_models/en_ewt_tokenizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}


RuntimeError: CUDA error: unspecified launch failure

In [30]:
from stanfordcorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('/home/casey/git/stanford-corenlp-full-2017-06-09', lang='en')

sentence = 'Guangdong University of Foreign Studies is located in Guangzhou.'
print('Tokenize:', nlp.word_tokenize(sentence))
print('Part of Speech:', nlp.pos_tag(sentence))
print('Named Entities:', nlp.ner(sentence))
print('Constituency Parsing:', nlp.parse(sentence))
print('Dependency Parsing:', nlp.dependency_parse(sentence))

nlp.close() 

Tokenize: ['Guangdong', 'University', 'of', 'Foreign', 'Studies', 'is', 'located', 'in', 'Guangzhou', '.']
Part of Speech: [('Guangdong', 'NNP'), ('University', 'NNP'), ('of', 'IN'), ('Foreign', 'NNP'), ('Studies', 'NNPS'), ('is', 'VBZ'), ('located', 'JJ'), ('in', 'IN'), ('Guangzhou', 'NNP'), ('.', '.')]
Named Entities: [('Guangdong', 'ORGANIZATION'), ('University', 'ORGANIZATION'), ('of', 'ORGANIZATION'), ('Foreign', 'ORGANIZATION'), ('Studies', 'ORGANIZATION'), ('is', 'O'), ('located', 'O'), ('in', 'O'), ('Guangzhou', 'LOCATION'), ('.', 'O')]
Constituency Parsing: (ROOT
  (S
    (NP
      (NP (NNP Guangdong) (NNP University))
      (PP (IN of)
        (NP (NNP Foreign) (NNPS Studies))))
    (VP (VBZ is)
      (ADJP (JJ located)
        (PP (IN in)
          (NP (NNP Guangzhou)))))
    (. .)))
Dependency Parsing: [('ROOT', 0, 7), ('compound', 2, 1), ('nsubjpass', 7, 2), ('case', 5, 3), ('compound', 5, 4), ('nmod', 2, 5), ('auxpass', 7, 6), ('case', 9, 8), ('nmod', 7, 9), ('punct', 7, 

#### Running from Server

- download [core-nlp-full](https://stanfordnlp.github.io/CoreNLP/download.html)
- from the extracted (or cloned) folder: `java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000`


#### annotators: tokenize, ssplit, pos, lemma, ner, parse, depparse, dcoref
[annotators info](https://github.com/Lynten/stanford-corenlp)

In [24]:
from IPython.display import JSON

nlp = StanfordCoreNLP('http://localhost', port=9000)
text = 'Guangdong University of Foreign Studies is located in Guangzhou. ' \
       'GDUFS is active in a full range of international cooperation and exchanges in education. '

props={'annotators': 'tokenize,ssplit,pos','pipelineLanguage':'en','outputFormat':'json'}

obj = nlp.annotate(text, properties=props)
nlp.close()

json.loads(obj) # convert from string to json object

{'sentences': [{'index': 0,
   'tokens': [{'index': 1,
     'word': 'Guangdong',
     'originalText': 'Guangdong',
     'characterOffsetBegin': 0,
     'characterOffsetEnd': 9,
     'pos': 'NNP',
     'before': '',
     'after': ' '},
    {'index': 2,
     'word': 'University',
     'originalText': 'University',
     'characterOffsetBegin': 10,
     'characterOffsetEnd': 20,
     'pos': 'NNP',
     'before': ' ',
     'after': ' '},
    {'index': 3,
     'word': 'of',
     'originalText': 'of',
     'characterOffsetBegin': 21,
     'characterOffsetEnd': 23,
     'pos': 'IN',
     'before': ' ',
     'after': ' '},
    {'index': 4,
     'word': 'Foreign',
     'originalText': 'Foreign',
     'characterOffsetBegin': 24,
     'characterOffsetEnd': 31,
     'pos': 'NNP',
     'before': ' ',
     'after': ' '},
    {'index': 5,
     'word': 'Studies',
     'originalText': 'Studies',
     'characterOffsetBegin': 32,
     'characterOffsetEnd': 39,
     'pos': 'NNPS',
     'before': ' ',
     