# Syntax (Lab 6)

In [1]:
__author__ = "Alex Wang"
__version__ = "DSGA 1012, NYU, Spring 2019 term"

In this lab, we will familiarize ourselves with constituency and dependency parsing grammars.

## Playing with Parsing

We will be using online parsers that output tree structures based on Penn Treebank tags, and dependency structures based on Universal Dependencies tags.
Access the online parser here: http://nlp.stanford.edu:8080/parser/. It supports five languages: Arabic, Chinese, English, French, Spanish).
For all questions below, if relevant, you can pick the language of your preference from the available choices.
If you want a direct visualization of English dependency structures, go to http://nlp.stanford.edu:8080/corenlp/.


1. Give a set of sentences sentences that show at least 10 different universal dependency relation types and 10 different PTB phrase types. You can find a table of the two sets of types in the [slides](https://docs.google.com/presentation/d/14qlVKnTLTrKIrzSbtsuTpbj_7Jzsxgbs84iN2S92O-E/edit#slide=id.g1396e691b2_0_1) for the respective annotation guidelines ([UD](http://universaldependencies.org/u/dep/), [PTB](http://www.surdeanu.info/mihai/teaching/ista555-fall13/readings/PennTreebankConstituents.html)).


2. Consider [_garden path sentences_](https://en.wikipedia.org/wiki/Garden-path_sentence): sentences whose most likely interpretation is incorrect. For example, in the sentence `Time flies like an arrow; fruit flies like a banana`, we are led to believe initially that the second usage of `flies` is the same as the first, as a verb. However, this interpretation does not make semantic sense as fruit cannot fly. Upon re-reading, we realize `flies` is used as a noun. For each of the following garden path sentences, give a clear explanation of the meaning, evaluate the sentence in the parser, and comment on how well the parser performs on these sentences.
    - `The old man the boat.`
    - `The prime number few.`
    - `Because he always jogs a mile seems a short distance to him.`


3. Lexical ambiguity arises when a word has multiple meanings that depend on the context. Consider these seemingly nonsensical but grammatical and meaningful sentences that exploit a word's multiple senses:
    - `Will, will Will will Will Will's will?`
    - `Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo.`
    - `Police police Police police police police Police police.`
    - `Can can can can can can can can can can.`
    
    For each sentence, give a possible interpretation and parse, then evaluate the sentence using the parser and comment on the parser's performance. (Bonus: you may be experiencing [this](https://en.wikipedia.org/wiki/Semantic_satiation)).

## Programmatic Parsing

We have a few options for obtaining parsers programmatically (in addition to training our own, of course!):

- [spaCy](https://spacy.io/usage/linguistic-features): general NLP toolkit with POS tagger and dependency parser, as well as many other tools.
- [SyntaxNet](https://github.com/tensorflow/models/tree/master/research/syntaxnet): Google's models for POS tagging and dependency parsing built on Parsey McParseface and ParseySaurus. [Tutorial here](https://github.com/tensorflow/models/blob/master/research/syntaxnet/g3doc/syntaxnet-tutorial.md).
- [StanfordNLP](https://stanfordnlp.github.io/stanfordnlp/): Stanford's NLP toolkit with POS tagger and dependency parser, as well as other tools.
- [Kitaev and Klein](https://github.com/nikitakit/self-attentive-parser): Berkeley's constituency parser.

We'll walk through an example of using spaCy.

In [2]:
%pip install spacy
%python -m spacy download en


The following command must be run outside of the IPython shell:

    $ pip install spacy

The Python package manager (pip) can only be used from outside of IPython.
Please reissue the `pip` command in a separate terminal or command prompt.

See the Python documentation for more information on how to install packages:

    https://docs.python.org/3/installing/


UsageError: Line magic function `%python` not found (But cell magic `%%python` exists, did you mean that instead?).


In [3]:
import spacy
nlp = spacy.load('en')

In [4]:
sentence = u"My favorite part of the day is lab."
doc = nlp(sentence)

In [5]:
# POS tagging

print("\t".join([token.text for token in doc]))
print("\t".join([token.pos_ for token in doc]))

My	favorite	part	of	the	day	is	lab	.
ADJ	ADJ	NOUN	ADP	DET	NOUN	VERB	NOUN	PUNCT


In [22]:
# Dependency parsing (Universal Dependencies)

print("\t".join([token.text for token in doc]))
print("\t".join([token.head.text for token in doc]))

My	favorite	part	of	the	day	is	lab	.
part	part	is	part	day	of	is	is	is


In [23]:
from spacy import displacy
displacy.render(doc, style="dep", jupyter=True)

### Closing note

Parsing is a well-studied task, and clearly one for which models can do well enough for use in production environment. However, don't think of parsing as entirely solved; it is still an active research field, with [frequent](https://arxiv.org/abs/1812.11760) [publications](https://arxiv.org/abs/1805.01052) and almost [annual competitions](http://universaldependencies.org/conll18/).