# Proof-of-Concept 2: Detect Syntax Patterns in Sentences

## How to use this PoC:
After you run it, you may have to scroll back up to the top.

To run it: in the drop-down menu, click **Kernel --> Restart & Run All --> Restart and Run All Cells**

    or

To run it: in the icon toolbar, click **the Fast-Forward button --> Restart and Run All Cells**.

## Attribution:
* **Author of notebook**: Steven Kyle Crawford
* **Author of [the spaCy implementation, the passive rule, and the sample data](https://github.com/scottkleinman/python-tutorials/blob/master/passive_voice_detection.ipynb)**: Dr. Scott Kleinman (CSUN Center for Digital Humanities)

Special thanks to Dr. Kleinman, the spaCy team, and numerous authors.

## Description:
This notebook illustrates detecting syntax patterns in sentences with relative accuracy.


This notebook demonstrates:
* detecting passive voice in a sentence
* detecting passive voice in a list of sentences (i.e. a paragraph)

## Helpful links:
* [spaCy docs](https://spacy.io/usage/spacy-101)
* [spaCy part-of-speech (POS) tags](https://github.com/explosion/spaCy/blob/master/spacy/glossary.py#L20)
* [NLTK docs](http://www.nltk.org/)
* [Using NLTK corpora with spaCy](https://sp1920.github.io/nltk-spacy.pdf)

## Procedure:

### Step 0) Install the dependencies

In [1]:
# # Run this only once to avoid unnecessary redownloading
# # To enable or disable, highlight all lines and <Ctrl> + /
# !pip install -U spaCy 
# !pip install -U spacy-lookups-data
# !python -m spacy download en_core_web_sm

### Step 1) Load the language model
A Language object "containing all components and data needed to process text" (spaCy docs, 2020). Using the medium (md) or large (lg) may improve accuracy (Kleinman, 2020).

In [2]:
import spacy


nlp = spacy.load('en_core_web_sm')

### Step 2) Create and train the matcher

In [3]:
from spacy.matcher import Matcher


matcher = Matcher(nlp.vocab)

### Step 3) Create a rule for the syntax pattern
Rule created by: [Dr. Kleinman](https://render.githubusercontent.com/view/ipynb?commit=c741a022f02ac8dcdb316f32f00b9d4dbb7aa4d0&enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f73636f74746b6c65696e6d616e2f707974686f6e2d7475746f7269616c732f633734316130323266303261633864636462333136663332663030623964346462623761613464302f706173736976655f766f6963655f646574656374696f6e2e6970796e62&nwo=scottkleinman%2Fpython-tutorials&path=passive_voice_detection.ipynb&repository_id=226424899&repository_type=Repository#Simply-Implementation-with-spaCy)

See [the list of tags](https://spacy.io/api/annotation#pos-tagging) for more information.

In [4]:
passive_rule = [
    {'DEP': 'nsubjpass'},
    {'DEP': 'aux', 'OP': '*'},
    {'DEP': 'auxpass'},
    {'TAG': 'VBN'}
]

### Step 4) Add the rule to the matcher

In [5]:
# As of spaCy 2.2.2, Matcher depecrated arguments: 
# matcher.add("GoogleNow", on_match, *patterns)
#
# The new arguments:
# matcher.add("GoogleNow", patterns, on_match=on_match)

matcher.add('Passive', [passive_rule])

### Step 5) Use the matcher on the sentence
Inspired by [Dr. Kleinman's implementation](https://render.githubusercontent.com/view/ipynb?commit=c741a022f02ac8dcdb316f32f00b9d4dbb7aa4d0&enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f73636f74746b6c65696e6d616e2f707974686f6e2d7475746f7269616c732f633734316130323266303261633864636462333136663332663030623964346462623761613464302f706173736976655f766f6963655f646574656374696f6e2e6970796e62&nwo=scottkleinman%2Fpython-tutorials&path=passive_voice_detection.ipynb&repository_id=226424899&repository_type=Repository#Simply-Implementation-with-spaCy).

In [6]:
def is_passive(sentence):
    """Return True if the sentence is in passive voice. 
    Otherwise, return False. 
    The Matcher instance is globally defined to avoid recreating it over and over.

    Given a string, return a boolean.
    """

    doc = nlp(sentence)
    matches = matcher(doc) # Tuple (match_id, start, end)

    return True if matches else False

### Step 6) Put it all together

In [7]:
def print_sentence_and_voice(sentence):
    """Print one sentence and its voice (active or passive).

    Given a string, return None.
    """

    if is_passive(sentence):
        print("PASSIVE:", sentence)
    else:
        print("ACTIVE:", sentence)


def print_sentences_and_voices(sentences):
    """Print many sentences and their voice (active or passive).
    
    Given a list of strings, return None.
    """

    for sentence in sentences:
        print_sentence_and_voice(sentence)
        print("")

### Step 7) Define the text sample
Source: [Dr. Kleinman](https://render.githubusercontent.com/view/ipynb?commit=c741a022f02ac8dcdb316f32f00b9d4dbb7aa4d0&enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f73636f74746b6c65696e6d616e2f707974686f6e2d7475746f7269616c732f633734316130323266303261633864636462333136663332663030623964346462623761613464302f706173736976655f766f6963655f646574656374696f6e2e6970796e62&nwo=scottkleinman%2Fpython-tutorials&path=passive_voice_detection.ipynb&repository_id=226424899&repository_type=Repository#Configuration)

In [8]:
mostly_active_sentences = [
    "Harry ate six shrimp at dinner.",
    "Beautiful giraffes roam the savannah.",
    "Sue changed the flat tire.",
    "We are going to watch a movie tonight.",
    "I ran the obstacle course in record time.",
    "The crew paved the entire stretch of highway.",
    "Mom read the novel in one day.",
    "The critic wrote a scathing review.",
    "I will clean the house every Saturday.",
    "The staff is required to watch a safety video every year.",
]

mostly_passive_sentences = [
    "A camera is bought by him.",
    "Water is drunk by her.",
    "He is known to me.",
    "A tub is filled with water.",
    "Sugar is sold in kilograms.",
    "There is a considerable range of expertise demonstrated by the spam senders.",
    "It was determined by the committee that the report was inconclusive.",
    "We were invited by our neighbors to attend their party.",
    "Groups help participants realize that most of their problems and secrets are shared by others in the group.",
    "The proposed initiative will be bitterly opposed by abortion rights groups."
]

text_sample = []
text_sample.extend(mostly_active_sentences)
text_sample.extend(mostly_passive_sentences)

### Step 8) Use it on the text sample

In [9]:
print_sentences_and_voices(text_sample)

ACTIVE: Harry ate six shrimp at dinner.

ACTIVE: Beautiful giraffes roam the savannah.

ACTIVE: Sue changed the flat tire.

ACTIVE: We are going to watch a movie tonight.

ACTIVE: I ran the obstacle course in record time.

ACTIVE: The crew paved the entire stretch of highway.

ACTIVE: Mom read the novel in one day.

ACTIVE: The critic wrote a scathing review.

ACTIVE: I will clean the house every Saturday.

PASSIVE: The staff is required to watch a safety video every year.

PASSIVE: A camera is bought by him.

ACTIVE: Water is drunk by her.

PASSIVE: He is known to me.

PASSIVE: A tub is filled with water.

PASSIVE: Sugar is sold in kilograms.

ACTIVE: There is a considerable range of expertise demonstrated by the spam senders.

PASSIVE: It was determined by the committee that the report was inconclusive.

PASSIVE: We were invited by our neighbors to attend their party.

ACTIVE: Groups help participants realize that most of their problems and secrets are shared by others in the group.


### Step 9) Note false negatives/positives
* ACTIVE: The proposed initiative will be bitterly opposed by abortion rights groups.
* ACTIVE: Water is drunk by her.

## Interactive Example:

### Try changing these settings:

In [10]:
# Change this: don't forget the ""
sentence = "Who first seduced them to that foul revolt?"


# Don't change this
print_sentence_and_voice(sentence)

## Other Examples:

### Example 1: Active/Passive Voice Worksheet
Source: [EnglishForEveryone.org](https://www.englishforeveryone.org/viewpdf.html?pdf=/PDFs/Active%20-%20Passive%20Voice.pdf&title=Active/\Passive%20Voice%20Worksheet)

In [12]:
sentences = [
    "Thomas feeds his dog.",
    "The dog is fed by Thomas.", 
    "The family went to the beach.",
    "The letter was written by Marshall.",
    "The game had been won by the blue team.",
    "The problem was solved.",
    "The stunt man risked his life.",
    "The fire was extinguished.",
    "The car was being cleaned by its owner.",
    "It gets cold here during the winter.",
]

print_sentences_and_voices(sentences)

ACTIVE: Thomas feeds his dog.

PASSIVE: The dog is fed by Thomas.

ACTIVE: The family went to the beach.

PASSIVE: The letter was written by Marshall.

PASSIVE: The game had been won by the blue team.

PASSIVE: The problem was solved.

ACTIVE: The stunt man risked his life.

PASSIVE: The fire was extinguished.

PASSIVE: The car was being cleaned by its owner.

ACTIVE: It gets cold here during the winter.



### Example 2: Genesis - The First Day (KJV)

In [13]:
sentences = [
    "In the beginning God created the heaven and the earth.",
    "And the earth was without form, and void.", 
    "And darkness was upon the face of the deep.",
    "And the Spirit of God moved upon the face of the waters.",
    "And God said, Let there be light.",
    "And there was light.",
    "And God saw the light, that it was good.",
    "And God divided the light from the darkness.",
    "And God called the light Day, and the darkness he called Night.",
    "And the evening and the morning were the first day.",
]

print_sentences_and_voices(sentences)

ACTIVE: In the beginning God created the heaven and the earth.

ACTIVE: And the earth was without form, and void.

ACTIVE: And darkness was upon the face of the deep.

ACTIVE: And the Spirit of God moved upon the face of the waters.

ACTIVE: And God said, Let there be light.

ACTIVE: And there was light.

ACTIVE: And God saw the light, that it was good.

ACTIVE: And God divided the light from the darkness.

ACTIVE: And God called the light Day, and the darkness he called Night.

ACTIVE: And the evening and the morning were the first day.



### Example 3: Emma (Austen)

In [14]:
sentences = [
    "Emma Woodhouse, handsome, clever, and rich, with a comfortable home and happy disposition, seemed to unite some of the best blessings of existence; and had lived nearly twenty-one years in the world with very little to distress or vex her.",
    "She was the youngest of the two daughters of a most affectionate, indulgent father; and had, in consequence of her sister's marriage, been mistress of his house from a very early period.", 
    "Her mother had died too long ago for her to have more than an indistinct remembrance of her caresses; and her place had been supplied by an excellent woman as governess, who had fallen little short of a mother in affection.",
    "Sixteen years had Miss Taylor been in Mr. Woodhouse's family, less as a governess than a friend, very fond of both daughters, but particularly of Emma.",
    "Between _them_ it was more the intimacy of sisters.",
]

print_sentences_and_voices(sentences)

ACTIVE: Emma Woodhouse, handsome, clever, and rich, with a comfortable home and happy disposition, seemed to unite some of the best blessings of existence; and had lived nearly twenty-one years in the world with very little to distress or vex her.

ACTIVE: She was the youngest of the two daughters of a most affectionate, indulgent father; and had, in consequence of her sister's marriage, been mistress of his house from a very early period.

PASSIVE: Her mother had died too long ago for her to have more than an indistinct remembrance of her caresses; and her place had been supplied by an excellent woman as governess, who had fallen little short of a mother in affection.

ACTIVE: Sixteen years had Miss Taylor been in Mr. Woodhouse's family, less as a governess than a friend, very fond of both daughters, but particularly of Emma.

ACTIVE: Between _them_ it was more the intimacy of sisters.

