# Proof-of-Concept 3: Determine the Features of Words in a Sentence

## How to use this PoC:
After you run it, you may have to scroll back up to the top.

To run it: in the drop-down menu, click **Kernel --> Restart & Run All --> Restart and Run All Cells**

    or

To run it: in the icon toolbar, click **the Fast-Forward button --> Restart and Run All Cells**.

## Attribution:
**Author**: Steven Kyle Crawford

Special thanks to the spaCy team and numerous authors.

## Description:
This notebook illustrates the process of determining the parts of speech, the tags, and the dependencies of each word in a sentence. This will assist in the creation and validation of rules for syntax detection.

This notebook demonstrates:
* Tokenizing the words in a sentence
* Printing a readable table of tokens containing:
    * The word
    * The part-of-speech (POS)
    * The tag
    * The dependency

## Helpful links:
* [spaCy linguistic features glossary](https://github.com/explosion/spaCy/blob/master/spacy/glossary.py#L20)
* [spaCy token attributes](https://spacy.io/api/token#attributes)

## Procedure:

### Step 0) Install the dependencies
* spaCy and its dependencies
* tabulate (for pretty printing data tables)

In [1]:
# # Run this only once to avoid unnecessary redownloading
# # To enable or disable, highlight all lines and <Ctrl> + /
# !pip install -U spacy
# !pip install -U spacy-lookups-data
# !python -m spacy download en_core_web_sm
# !pip install -U tabulate

### Step 1) Load the language model

In [2]:
import spacy


nlp = spacy.load('en_core_web_sm')

### Step 2) Tokenize the words of a sentence

In [3]:
def tokenize_sentence(sentence):
    """Tokenize a sentence.

    Given a string, return a list of Token instances.
    """

    return nlp(sentence)


sentence = "These words are all tagged."
tokens = tokenize_sentence(sentence)
pretty_print = [print(f"{token.pos_:7} {token.text}") for token in tokens]

DET     These
NOUN    words
AUX     are
ADV     all
VERB    tagged
PUNCT   .


### Step 3) Pretty print the table of tokens

In [4]:
from tabulate import tabulate


def print_token_table(sentence, pos=False, tag=True, dependency=True, lemma=False):
    """Pretty print the linguistics features of each word in a sentence.
    If pos is True, then print the part-of-speech (POS). Defaults to True.
    If tag is True, then print the tag. Defaults to True.
    If dependency is True, then print the dependencies. Defaults to True.

    Given a string, return None.
    Depends on tabulate.
    """

    # Print the sentence
    print(sentence + "\n")

    # Create the table headers
    headers = []
    headers.append("Word")
    if pos:
        headers.append("POS")
        headers.append("POS Definition")
    if tag:
        headers.append("Tag")
        headers.append("Tag Definition")
    if dependency:
        headers.append("Dep.")
        headers.append("Dep. Definition")
    if lemma:
        headers.append("Lemma.")

    # Create the table data
    tagged_words = nlp(sentence)
    data = []
    for word in tagged_words:
        entry = []
        entry.append(word.text)
        if pos:
            entry.append(word.pos_)
            entry.append(spacy.explain(word.pos_))
        if tag:
            entry.append(word.tag_)
            entry.append(spacy.explain(word.tag_))
        if dependency:
            entry.append(word.dep_)
            entry.append(spacy.explain(word.dep_))
        if lemma:
            entry.append(word.lemma_)
        data.append(entry)

    # Print the table
    print(tabulate(data, headers=headers, tablefmt="github") + "\n\n")

### Step 4) Use it

In [5]:
sentence = "Nine times the space that measures day and night To mortal men, he, with his horrid crew, Lay vanquished, rolling in the fiery gulf, Confounded, though immortal."

print_token_table(sentence, pos=False, tag=True, dependency=True)

Nine times the space that measures day and night To mortal men, he, with his horrid crew, Lay vanquished, rolling in the fiery gulf, Confounded, though immortal.

| Word       | Tag   | Tag Definition                            | Dep.      | Dep. Definition                   |
|------------|-------|-------------------------------------------|-----------|-----------------------------------|
| Nine       | CD    | cardinal number                           | nummod    | numeric modifier                  |
| times      | NNS   | noun, plural                              | nummod    | numeric modifier                  |
| the        | DT    | determiner                                | det       | determiner                        |
| space      | NN    | noun, singular or mass                    | ROOT      |                                   |
| that       | WDT   | wh-determiner                             | nsubj     | nominal subject                   |
| measures   | VBZ   | verb, 3rd

## Interactive Example:

### Try changing these settings
Ctrl + Enter = reload the cell/code block

In [6]:
# Change this: don't forget the ""
sentence = "Who first seduced them to that foul revolt?"

# Change this: True or False
show_parts_of_speech = False
show_tags = True
show_dependencies = True


# Don't change this
print_token_table(sentence, pos=show_parts_of_speech, tag=show_tags, dependency=show_dependencies)

Who first seduced them to that foul revolt?

| Word    | Tag   | Tag Definition                            | Dep.   | Dep. Definition        |
|---------|-------|-------------------------------------------|--------|------------------------|
| Who     | WP    | wh-pronoun, personal                      | nsubj  | nominal subject        |
| first   | RB    | adverb                                    | advmod | adverbial modifier     |
| seduced | VBD   | verb, past tense                          | ROOT   |                        |
| them    | PRP   | pronoun, personal                         | dobj   | direct object          |
| to      | IN    | conjunction, subordinating or preposition | prep   | prepositional modifier |
| that    | DT    | determiner                                | det    | determiner             |
| foul    | JJ    | adjective                                 | amod   | adjectival modifier    |
| revolt  | NN    | noun, singular or mass                    | pobj   | 

## Other Examples:

### Example 1: Julius Caesar (Shakespeare)

In [7]:
sentence = "Speake, what Trade art thou?"
print_token_table(sentence, pos=False, tag=True, dependency=True)

Speake, what Trade art thou?

| Word   | Tag   | Tag Definition                    | Dep.     | Dep. Definition    |
|--------|-------|-----------------------------------|----------|--------------------|
| Speake | VB    | verb, base form                   | ROOT     |                    |
| ,      | ,     | punctuation mark, comma           | punct    | punctuation        |
| what   | WP    | wh-pronoun, personal              | dobj     | direct object      |
| Trade  | NNP   | noun, proper singular             | compound | compound           |
| art    | NN    | noun, singular or mass            | compound | compound           |
| thou   | NN    | noun, singular or mass            | ccomp    | clausal complement |
| ?      | .     | punctuation mark, sentence closer | punct    | punctuation        |




### Example 2: Syntactic Ambiguity and Limitations
* "A pretty little girl"
    * A girl who is cute and little.
    * A girl who is somewhat little.

* [The conundrum](https://www.grammarphobia.com/blog/2019/01/pretty.html)
* *Pretty* can be either of these (tag/dependency):
    * JJ/amod (adjective/adjectival modifier)
        * When *pretty* is the only adjective
        * When *pretty* is followed by a comma and another adjective
    * RB/advmod (adverb/adverbial modifier)
        * When *pretty* is followed by another adjective

#### Example 2a: A pretty girl

In [8]:
sentence = "A pretty girl"
print_token_table(sentence, pos=False, tag=True, dependency=False)

A pretty girl

| Word   | Tag   | Tag Definition         |
|--------|-------|------------------------|
| A      | DT    | determiner             |
| pretty | JJ    | adjective              |
| girl   | NN    | noun, singular or mass |




#### Example 2b: A pretty, little girl

In [9]:
sentence = "A pretty, little girl"
print_token_table(sentence, pos=False, tag=True, dependency=False)

A pretty, little girl

| Word   | Tag   | Tag Definition          |
|--------|-------|-------------------------|
| A      | DT    | determiner              |
| pretty | JJ    | adjective               |
| ,      | ,     | punctuation mark, comma |
| little | JJ    | adjective               |
| girl   | NN    | noun, singular or mass  |




#### Example 2c: A pretty little girl

In [10]:
sentence = "A pretty little girl"
print_token_table(sentence, pos=False, tag=True, dependency=False)

A pretty little girl

| Word   | Tag   | Tag Definition         |
|--------|-------|------------------------|
| A      | DT    | determiner             |
| pretty | RB    | adverb                 |
| little | JJ    | adjective              |
| girl   | NN    | noun, singular or mass |




### Example 3: Emma (Austen) - An NLP Nightmare

In [11]:
sentence = "Her mother had died too long ago for her to have more than an indistinct remembrance of her caresses; and her place had been supplied by an excellent woman as governess, who had fallen little short of a mother in affection."
print_token_table(sentence, pos=False, tag=True, dependency=True)

Her mother had died too long ago for her to have more than an indistinct remembrance of her caresses; and her place had been supplied by an excellent woman as governess, who had fallen little short of a mother in affection.

| Word        | Tag   | Tag Definition                            | Dep.      | Dep. Definition           |
|-------------|-------|-------------------------------------------|-----------|---------------------------|
| Her         | PRP$  | pronoun, possessive                       | poss      | possession modifier       |
| mother      | NN    | noun, singular or mass                    | nsubj     | nominal subject           |
| had         | VBD   | verb, past tense                          | aux       | auxiliary                 |
| died        | VBN   | verb, past participle                     | ROOT      |                           |
| too         | RB    | adverb                                    | advmod    | adverbial modifier        |
| long        | RB 