# Stanford CoreNLP Server usage

Running Stanford CoreNLP https://stanfordnlp.github.io/CoreNLP/index.html  
Using nltk https://github.com/nltk/nltk/wiki/Stanford-CoreNLP-API-in-NLTK

1. Download latest version of CoreNLP from above website (here 4.2.2)
2. Start server as on github link
3. Use nltk as shown below

Java server start
```bash
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-preload tokenize,ssplit,pos,lemma,ner,parse,depparse \
-status_port 9000 -port 9000 -timeout 15000 & 
```

Stopping server:

- on mac: find and kill -> `ps aux | grep StanfordCoreNLPServer`
- on windows: ctrl+c (worked somehow)
- https://stackoverflow.com/questions/55896197/an-elegant-way-to-shut-down-the-stanford-corenlp-server-on-macos

In [1]:
from nltk.parse import CoreNLPParser
from nltk.parse.corenlp import CoreNLPDependencyParser

# Lexical Parser
parser = CoreNLPParser(url='http://localhost:9000')
# POS tagger
pos_tagger = CoreNLPParser(url='http://localhost:9000', tagtype='pos')
# NER Tagger
ner_tagger = CoreNLPParser(url='http://localhost:9000', tagtype='ner')
# Neural Dependency Parser
dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')

# sentence as example
sentence = "Alice was a beautiful girl, because she didn't do homework."

### Tokenize

In [2]:
tokens = list(parser.tokenize(sentence))
tokens

['Alice',
 'was',
 'a',
 'beautiful',
 'girl',
 ',',
 'because',
 'she',
 'did',
 "n't",
 'do',
 'homework',
 '.']

### POS tagger (part-of-speech)

In [3]:
pos_tags = list(pos_tagger.tag(tokens))
pos_tags

[('Alice', 'NNP'),
 ('was', 'VBD'),
 ('a', 'DT'),
 ('beautiful', 'JJ'),
 ('girl', 'NN'),
 (',', ','),
 ('because', 'IN'),
 ('she', 'PRP'),
 ('did', 'VBD'),
 ("n't", 'RB'),
 ('do', 'VB'),
 ('homework', 'NN'),
 ('.', '.')]

### NER tagger (named entity recognition)

In [4]:
ner_tags = list(ner_tagger.tag(tokens))
ner_tags

[('Alice', 'PERSON'),
 ('was', 'O'),
 ('a', 'O'),
 ('beautiful', 'O'),
 ('girl', 'O'),
 (',', 'O'),
 ('because', 'O'),
 ('she', 'O'),
 ('did', 'O'),
 ("n't", 'O'),
 ('do', 'O'),
 ('homework', 'O'),
 ('.', 'O')]

### Dependency parsing

In [5]:
parses = dep_parser.parse(tokens)
[[(governor, dep, dependent) for governor, dep, dependent in parse.triples()] for parse in parses]

[[(('girl', 'NN'), 'nsubj', ('Alice', 'NNP')),
  (('girl', 'NN'), 'cop', ('was', 'VBD')),
  (('girl', 'NN'), 'det', ('a', 'DT')),
  (('girl', 'NN'), 'amod', ('beautiful', 'JJ')),
  (('girl', 'NN'), 'punct', (',', ',')),
  (('girl', 'NN'), 'advcl', ('do', 'VB')),
  (('do', 'VB'), 'mark', ('because', 'IN')),
  (('do', 'VB'), 'nsubj', ('she', 'PRP')),
  (('do', 'VB'), 'aux', ('did', 'VBD')),
  (('do', 'VB'), 'advmod', ("n't", 'RB')),
  (('do', 'VB'), 'obj', ('homework', 'NN')),
  (('girl', 'NN'), 'punct', ('.', '.'))]]

### General parsing & drawing

In [6]:
annotated_tree = list(parser.raw_parse(sentence))
print(annotated_tree)
annotated_tree[0].pretty_print()

[Tree('ROOT', [Tree('S', [Tree('NP', [Tree('NNP', ['Alice'])]), Tree('VP', [Tree('VBD', ['was']), Tree('NP', [Tree('DT', ['a']), Tree('JJ', ['beautiful']), Tree('NN', ['girl'])]), Tree(',', [',']), Tree('SBAR', [Tree('IN', ['because']), Tree('S', [Tree('NP', [Tree('PRP', ['she'])]), Tree('VP', [Tree('VBD', ['did']), Tree('RB', ["n't"]), Tree('VP', [Tree('VB', ['do']), Tree('NP', [Tree('NN', ['homework'])])])])])])]), Tree('.', ['.'])])])]
                                   ROOT                                   
                                    |                                      
                                    S                                     
   _________________________________|___________________________________   
  |                                 VP                                  | 
  |     ____________________________|_____                              |  
  |    |          |           |          SBAR                           | 
  |    |          |          

## Own functions

Reason: Not all CoreNLP server features are implemented in nltk (3.6.2)

In [7]:
from CoreNLPHelper import CoreNLPHelper
nlp_helper = CoreNLPHelper(core_nlp_server_url='http://localhost:9000')

### active/ passive identification

In [8]:
active_sentence = "Researchers found a stone so I am happy."
passive_sentence = "A stone was found by researchers."

print(nlp_helper.is_sentence_passive(active_sentence))
print(nlp_helper.is_sentence_passive(passive_sentence))

False
True


### Sentiment

In [9]:
print(f"Sentiment value is: {nlp_helper.get_sentiment_value(sentence)}")
print("Sentiment distribution:")
nlp_helper.get_sentiment_distribution(sentence)

Sentiment value is: 3
Sentiment distribution:


[0.01376237703559,
 0.06090227273432,
 0.13703558107487,
 0.50267238875958,
 0.28562738039563]

### Coreference

In [10]:
corefs = nlp_helper.get_coreferences(sentence)
corefs

{'2': [{'id': 0,
   'text': 'Alice',
   'type': 'PROPER',
   'number': 'SINGULAR',
   'gender': 'FEMALE',
   'animacy': 'ANIMATE',
   'startIndex': 1,
   'endIndex': 2,
   'headIndex': 1,
   'sentNum': 1,
   'position': [1, 1],
   'isRepresentativeMention': True},
  {'id': 2,
   'text': 'she',
   'type': 'PRONOMINAL',
   'number': 'SINGULAR',
   'gender': 'FEMALE',
   'animacy': 'ANIMATE',
   'startIndex': 8,
   'endIndex': 9,
   'headIndex': 8,
   'sentNum': 1,
   'position': [1, 3],
   'isRepresentativeMention': False}]}