## Surprisal Models
**Brief**:<br>
This will be the main file for all of our model loading, data organizing, searching, etc.
**Sections**:
1. [Starting A StanfordCoreNLP Server](#1)
    - [Background](#1_a)
    - [How To Run The Server](#1_b)
    - [Resources](#1_c)
    - [Code](#1_d)
2. [Applying Models](#2)
___
<a id='1'>

### Starting A StanfordCoreNLP Server
<a id='1_a'>

**Background**:<br>
We've spent a lot of time looking at the StanfordCoreNLP software, and we ultimately decided we want to use both the parser and tagger (really, the parser automatically implements the tagger, but we'll see later). While the software is very useful, it's written in Java--which is not quite as nice to play with as Python for a number of reasons (lacks NLP libraries, lower level language, etc). The problem then is finding a way to use a Java program like it's a Python program.<br><br>
The two main options I looked into was a traditional import and a private server. One route of solving our problem is to use a traditional "wrapper" library/program. This program is essentially a translator between Java and Python. Unfortunately, the Stanford team itself doesn't actually make these wrappers (they would have to make them for a *lot* of languages). The existing wrappers--specifically the <u>stanfordcorenlp</u> library--weren't available through Anaconda (the platform through which we are running this exciting program right now), so I went another route.<br><br>
The direction I chose was to host a server that runs the out-of-the-box Java program, and to access it through a Python API. This involves a small amount of command line setup, but it saves the trouble of changing environment variables or using directories in Python.<br><br>
<a id='1_b'>

**How To Run A Server**<br>
*Note*: Huge thanks to Khalid Alnajjar, linked his guide in resources.
I've never actually hosted any sort of server before, so here's a quick summary:
1. download and extract the CoreNLP somewhere.
2. on the command line, cd into that directory
    - *stanford-corenlp-full-2018-10-05* should be the folder
3. run this command to host the server:
    - java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,parse,sentiment" -port 9000 -timeout 30000 
4. pick up [here](#1_d)
<a id='1_c'>

**Resources**:<br>
1. documentation for the parser: http://www.nltk.org/_modules/nltk/parse/stanford.html
2. for more on running the server: https://www.khalidalnajjar.com/setup-use-stanford-corenlp-server-python/
3. StanfordCoreNLP's GitHub page: https://github.com/stanfordnlp/CoreNLP

<a id='1_d'>

**Code**:

In [6]:
#Imports
import nltk
import pickle
from nltk import StanfordPOSTagger
from nltk.parse import stanford
from nltk.parse import CoreNLPParser
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [7]:
#builds parser
parser = CoreNLPParser(url='http://localhost:9000')

In [9]:
#testing parser on sentence
list(parser.raw_parse('The King of France is Bald.'))
#if you want to see the list of commands
#dir(parser)

[Tree('ROOT', [Tree('S', [Tree('NP', [Tree('NP', [Tree('DT', ['The']), Tree('NNP', ['King'])]), Tree('PP', [Tree('IN', ['of']), Tree('NP', [Tree('NNP', ['France'])])])]), Tree('VP', [Tree('VBZ', ['is']), Tree('ADJP', [Tree('JJ', ['Bald'])])]), Tree('.', ['.'])])])]

TypeError: make_tree() missing 1 required positional argument: 'result'