# Ling 450/807 SFU - Assignment 1: Indirect Quotes

This notebook presents a method for extracting indirect quotes and provides an assessment of the method's performance. 

## Import packages

In [7]:
import spacy
import re
from spacy.matcher import Matcher
from spacy import displacy
nlp = spacy.load("en_core_web_sm")

## Reading Input and Data Cleaning

In [8]:
def find_sents(text):
    doc = nlp(text)
    sentences = list(doc.sents)
    return(sentences)

In [None]:
# reads in five files (selected from the direct quote assessment, for consistency)
i = 1
sents = {}
while i <= 5: 
    with open ("A1_5/File_{}.txt".format(i), "r", encoding='utf-8') as f:
        text = f.read()
        doc = nlp(text)
        sents['file_{}'.format(i)] = find_sents(doc)
        i+=1

In [232]:
# 10 example sentences
lst = ['She said he was late.', 'She told us she was late.', 'She said that she was late.', 'She told us that she was late.', 
        'She requested to leave.', 'She asked us to wait for her.', 'She was telling us about being late.', 'What she told us was that he was late.', 
        'That she was late is all he told us.', 'According to her, he was late.']

# processing using nlp()
test_sentences = [nlp(e) for e in lst]

In [233]:
# # exploring dependency parsing
# for t in test_sentences:
#     displacy.render(t, style='dep', jupyter=True)

## Indirect Quotes


In [223]:
# preliminary list of reporting verbs (needs improvement)
reporting_verbs = ['say', 'mention', 'claim', 'tell', 'argue', 'request', 'ask'] # indicate (how to improve

def indirect_quotes(doc):
    '''
    input: Doc object
    output: None

    prints out the speaker and the indirect quote in each sentence, if both are present.
    '''
    for token in doc:
        # removes quotes that have quotation marks in them, in order to account for direct quotes (THIS IS NOT A GOOD WAY TO DO IT)
        if '\"' in doc.text or '\'' in doc.text: 
                return
        # identifies the speaker by checking whether a token is a subject and whether the head of that token is a reporting verb
        if token.dep_ == 'nsubj' and token.head.lemma_ in reporting_verbs:
            speaker = token
        # identifies the complement clause of the reporting verb or the complement clause of the copula
        if token.head.lemma_ in reporting_verbs + ['be'] and token.dep_ in ["ccomp", "xcomp", "pcomp"]:
            quote = " ".join([(t.text) for t in token.subtree])
        # if an indirect quote has been found, the quote will be returned, and the other remaining tokens will not be looked at
        try:
            print(doc, '\n', 'speaker:', speaker, '\n', 'quote:', quote, '\n\n')
            return
        # if an indirect quote has not been found, the next token will be looked at
        except:
            pass

## Assessment

### Test 1

In [230]:
doc = nlp("Scheer said he is confident the party has learned from that mistake")
indirect_quotes(doc)

Scheer said he is confident the party has learned from that mistake 
 speaker: Scheer 
 quote: he is confident the party has learned from that mistake 




### Test 2

It correctly identifie the speaker and indirect quote for sentences, (1)-(6) and (8), as expected:

In [234]:
for t in test_sentences:
    indirect_quotes(t)

She said he was late. 
 speaker: She 
 quote: he was late 


She told us she was late. 
 speaker: She 
 quote: she was late 


She said that she was late. 
 speaker: She 
 quote: that she was late 


She told us that she was late. 
 speaker: She 
 quote: that she was late 


She requested to leave. 
 speaker: She 
 quote: to leave 


She asked us to wait for her. 
 speaker: She 
 quote: to wait for her 


What she told us was that he was late. 
 speaker: she 
 quote: that he was late 




### Test 3

test and explain!!

In [150]:
for file in sents.keys():
    print(file)
    for sent in sents[file]:
        indirect_quotes(sent)

file_1
Kim and Clark Moran received a letter this week from Immigration, Refugees and Citizenship Canada informing them that the federal department has concerns about two-year-old Ayo, whom the couple claims they adopted from an orphanage in Nigeria and gained custody of in August. 
  
 speaker: couple 
 quote: they adopted from an orphanage in Nigeria and gained custody of in August 


Kim said the family is working with an immigration lawyer as well as the adoption agency, whose name she would not reveal, as they weigh their options. 
  
 speaker: Kim 
 quote: the family is working with an immigration lawyer as well as the adoption agency , whose name she would not reveal , as they weigh their options 


file_2
Wang told reporters she believes she has what it takes to take on Singh, because she has lived in the riding for 20 years and has strong connections in the community. 
  
 speaker: Wang 
 quote: she believes she has what it takes to take on Singh , because she has lived in the



# Indirect Quotes without reporting verbs

In [17]:
i = 1
sents = {}
while i <= 5: 
    with open(f"A1_data/file_1.txt", "r", encoding='utf-8') as f:
        text = f.read()
        doc = nlp(text)
        sents['file_{}'.format(i)] = find_sents(doc)
        i+=1

In [46]:
def indirect_quotes_no_reporting_verbs(doc):
    
    for token in doc:
        speaker, quote = None, None
        
        # identifies "According to" 
        if token.text.lower() == "according" and token.nbor(1).text.lower() == "to":
            speaker = token.nbor(2)  
            quote_start = token.i + 4  
            quote = " ".join([t.text for t in doc[quote_start:]])

        # identifies "that"
        if token.text.lower() == "that" and token.dep_ == "mark":
            speaker = token.nbor(-1)
            quote_start = token.i + 1  
            quote = " ".join([t.text for t in doc[quote_start:]])


        # identifies subject + complement 
        if token.dep_ == "nsubj" and token.head.dep_ in ["ROOT", "cop"]:
            speaker = token
            quote_start = token.i + 3
            quote = " ".join([t.text for t in doc[quote_start:]])

        if speaker and quote:
            print(doc, '\n', 'speaker:', speaker.text, '\n', 'quote:', quote, '\n\n')
            return



Test 1

In [52]:
test_sentences =[
    "According to Kim, the Canadian high commission in Nigeria doesn't have an immigration office, so all adoptions out of that country have to be processed in Ghana. ",
    "The fact that we are being accused right now of an unethical adoption is crazy",
    "An Abbotsford, B.C. couple that has been waiting nearly two years to bring their newly adopted son home from Africa has learned that the Canadian government is not prepared to grant the child citizenship. "]

In [53]:
test_docs = [nlp(sent) for sent in test_sentences]
for doc in test_docs:
    indirect_quotes_no_reporting_verbs(doc)

According to Kim, the Canadian high commission in Nigeria doesn't have an immigration office, so all adoptions out of that country have to be processed in Ghana.  
 speaker: Kim 
 quote: the Canadian high commission in Nigeria does n't have an immigration office , so all adoptions out of that country have to be processed in Ghana . 


The fact that we are being accused right now of an unethical adoption is crazy 
 speaker: fact 
 quote: are being accused right now of an unethical adoption is crazy 


An Abbotsford, B.C. couple that has been waiting nearly two years to bring their newly adopted son home from Africa has learned that the Canadian government is not prepared to grant the child citizenship.  
 speaker: couple 
 quote: been waiting nearly two years to bring their newly adopted son home from Africa has learned that the Canadian government is not prepared to grant the child citizenship . 




In [54]:
for file in sents.keys():
    print(file)
    for sent in sents[file]:
        indirect_quotes_no_reporting_verbs(sent)


file_1
An Abbotsford, B.C. couple that has been waiting nearly two years to bring their newly adopted son home from Africa has learned that the Canadian government is not prepared to grant the child citizenship. 
  
 speaker: Abbotsford 
 quote: nearly two years to bring their newly adopted son home from Africa has learned that the Canadian government is not prepared to grant the child citizenship . 
  


Kim and Clark Moran received a letter this week from Immigration, Refugees and Citizenship Canada informing them that the federal department has concerns about two-year-old Ayo, whom the couple claims they adopted from an orphanage in Nigeria and gained custody of in August. 
  
 speaker: Kim 
 quote: custody of in August . 
  


file_2
An Abbotsford, B.C. couple that has been waiting nearly two years to bring their newly adopted son home from Africa has learned that the Canadian government is not prepared to grant the child citizenship. 
  
 speaker: Abbotsford 
 quote: nearly two ye