### Detecting and analysing character speech.

First we need to be able to reliably detect sentence that contain speech. As a first approach: sentences immediately preceeded or follwed by "he/she/they said", where said could be any speech verb...

The we are interested in counting:
* use of question marks, exclamantion marks
* total word count

Future (and more complex):
* female characters speaking to each other
* interruptions 

#### Notes:

* Enormous crocodile: didn't have to edit. More vocabulary. More data for less work.
* Speech might need a bespoke app for tagging the speaker.

In [1]:
# coding: ascii

In [2]:
import os
import pdfplumber
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
import spacy
from spacy import displacy
from spacy.lang.en.examples import sentences 

%matplotlib inline

In [3]:
nlp = spacy.load("en_core_web_lg")

In [4]:
labels = pd.read_excel('../Book List Exel April.xlsx', sheet_name='Sheet1')
labels = labels.rename(columns={'Author ': 'Author'})

In [5]:
os.chdir('../text_pdfs')

In [6]:
df = pd.DataFrame()

def grab_text(title, labels):
    
    start = labels.loc[labels.Title==title]['Starting Page']
    if len(start)==0:
        print(title, "no start")
        start = 0
    else:
        start = start.values[0]
    end = labels.loc[labels.Title==title]['Ending Page']
    if len(end)==0:
        print(title, "no end")
        end = 0
    else:
        end = end.values[0]
    
    title = title + '.pdf'
    all_text = ''
    with pdfplumber.open(title) as pdf:
        for i, page in enumerate(pdf.pages):
            if i+1 >= start and i < end:
                single_page_text = page.extract_text()

                if single_page_text is not None:
                    all_text = all_text + '\n' + single_page_text
                
    return all_text

df['Title'] = [file.split('.')[0] for file in os.listdir() if file.split('.')[1]=='pdf']
df['Text'] = [grab_text(title, labels) for title in df.Title]

In [7]:
df.head()

Unnamed: 0,Title,Text
0,The Gruffalo,\nA mouse took a stroll through the deep dark ...
1,Peace at Last,\n \nThe hour was late. \n \nMr Bear was tired...
2,Kipper's Toybox,\nSomeone or something had been \nnibbling a h...
3,The Hungry Caterpillar,\n \nIn the light of the moon a little egg lay...
4,Harry and the Dinosaurs Go Wild,\nIt was a long drive to the safari park but i...


In [8]:
for title, text in zip(df.Title, df.Text):
    text = text.replace('\n', ' ')
    doc = nlp(text)
    sentence_list = list(doc.sents)
    
    protagonist = labels.loc[labels.Title==title]['Protagonist Name'].values
    protagonist_gender = labels.loc[labels.Title==title]['Protagonist Gender'].values

    for possible_subject in doc:
            if possible_subject.dep_ == 'nsubj' and possible_subject.head.pos_ == 'VERB':
                
                if possible_subject.head.lemma_ == "say":
                    print('pre-previous:', sentence_list[sentence_list.index(possible_subject.sent)-2])
                    print('previous:', sentence_list[sentence_list.index(possible_subject.sent)-1])
                    print('current:', possible_subject.head.sent)
                    print('next:', sentence_list[sentence_list.index(possible_subject.sent)+1])
                    print(' ')
                    

pre-previous: “Roasted fox!
previous: I’m off!”
current: Fox said.   
next: “Goodbye, little mouse," and away he sped.   
 
pre-previous: It’s    a gruffalo!" .
previous: “My favourite food!"
current: the Gruffalo said.
next: “You'll    taste good on a slice of bread!”   
 
pre-previous: the Gruffalo said.
previous: “You'll    taste good on a slice of bread!”   
current: Good?” said the mouse.
next: “Don’t call me good!
 
pre-previous: the Gruffalo said.
previous: “You'll    taste good on a slice of bread!”   
current: Good?” said the mouse.
next: “Don’t call me good!
 
pre-previous: I'm    the scariest creature in this wood.
previous: Just walk behind me and soon you’ll see, Everyone is    afraid of me."   
current: All right, said the Gruffalo, bursting w ith laughter.
next: “You go ahead and I’ll follow after. "   
 
pre-previous: All right, said the Gruffalo, bursting w ith laughter.
previous: “You go ahead and I’ll follow after. "   
current: They walked and walked till the Gruffa

 
pre-previous: and on.
previous: A week later, Sock took a stroll round the block And found her new    friend looking thin.   
current: “He’s gone off and left me!’' said Tabby McTat.   
next: Then Sock said, “My people, Prunella and Pat, Would gladly find    room for a fine tabby cat.”   
 
pre-previous: A week later, Sock took a stroll round the block And found her new    friend looking thin.   
previous: “He’s gone off and left me!’' said Tabby McTat.   
current: Then Sock said, “My people, Prunella and Pat, Would gladly find    room for a fine tabby cat.”   
next: ' She was right and they took McTat in.'     
 
pre-previous: But he dreamed of his friend    with the old checked hat       
previous: And always woke up with a meW-
current: And often he said, “What’s happened to Fred?”
next: And his paws  took him hack to the square. '
 
pre-previous: And he pulled out this… and he pulled out that….And people threw    coins in the tall black hat,
previous: But the busker was never the

pre-previous: Harry wanted to take his dinosaurs, but they  were hiding all over the place.
previous: He called all  their names.  
current: He said, “Get in the bucket, my Stegosaurus.”  
next: And out came Stegosaurus from under the  pillow.  
 
pre-previous: He said, “Get in the bucket, my Stegosaurus.”  
previous: And out came Stegosaurus from under the  pillow.  
current: He said, “Get in the bucket, my Triceratops.”  
next: And out came Triceratops from inside the  drawer.  
 
pre-previous: He didn’t want to go  because he had a lot of teeth.  
previous: He thought Mr Drake might do drilling on them.  
current: Harry said, “Don’t worry, because when we get there, I shall  press a magic button on my bucket, and that will make you  grow big.”    
next: In the waiting room, the nurse said, “Hello,  Harry.
 
pre-previous: He thought Mr Drake might do drilling on them.  
previous: Harry said, “Don’t worry, because when we get there, I shall  press a magic button on my bucket, and that

pre-previous: He tried his  hardest every day to win a golden star.   
previous: All the dragons in Year One were learning how to fly.
current: “High!” said Madam Dragon.
next: “Way up in the sky!
 
pre-previous: then crashed into a tree.
previous: Just then, a little girl ca me by.
current: “Oh, please don’t cry,” she said.
next: “Perhaps you’d like a nice sticky  plaster for your head?”
 
pre-previous: “Oh, please don’t cry,” she said.
previous: “Perhaps you’d like a nice sticky  plaster for your head?”
current: “What a good idea!” said Zog.
next: Then up and off he  flew, His plaster  gleaming pinkly as he zigzagged through the blue.
 
pre-previous: “Oh, please don’t cry,” she said.
previous: “Perhaps you’d like a nice sticky  plaster for your head?”
current: “What a good idea!” said Zog.
next: Then up and off he  flew, His plaster  gleaming pinkly as he zigzagged through the blue.
 
pre-previous: A year went by, and in Year Two the dragons lear ned to roar.
previous: •  
current: “

pre-previous: The sea    is deep and the world is wide!   
previous: How I long to sail!”
current: Said the tiny snail.       
next: These are the other snails in the flock,  Who all stuck tight to the smooth black rock  And said to the snail with the itchy foot,  “Be quiet!
 
pre-previous: The tiny snail   
previous: On the tail of the whale.           
current: And she gazed at the sky, the sea, the land,   The waves and the caves and the golden sand, She gazed and gazed, amazed by it all, And she said    to the whale, “I feel so small.”   
next: But- then came the day
 
pre-previous: The teacher turns pale.
previous: ‘Look!’
current: say the  children.
next: ‘It’s leaving a trail.’
 
pre-previous: As the whale and the snail travel safely away . . .       
previous: Back to the dock   
current: And the flock on the rock,   Who said, “How time’s flown!”   
next: And, “Haven’t you grown!”   
 
pre-previous: As the whale and the snail travel safely away . . .       
previous: Back to th

pre-previous:  It was Monday morning
previous: and it was Leopard's first  day at Jungle School.
current: Miss Bird said he could sit  with Monkey, Little Lion and Giraffe.
next: She said  Monkey, Little Lion and Giraffe could be Leopard's  friends and help him to feel welcome.  
 
pre-previous: and it was Leopard's first  day at Jungle School.
previous: Miss Bird said he could sit  with Monkey, Little Lion and Giraffe.
current: She said  Monkey, Little Lion and Giraffe could be Leopard's  friends and help him to feel welcome.  
next: But Giraffe was cross.
 
pre-previous: He did not Want to make Leopard feel  welcome.  
previous: At playtime, Monkey, Little Lion and Giraffe played  football.
current: Monkey said Leopard could play, too.
next: But  Giraffe got cross.
 
pre-previous: Monkey said Leopard could play, too.
previous: But  Giraffe got cross.
current: He said Leopard could not  play.
next: He told Leopard to go away.
 
pre-previous: Leopard was  sad.
previous: He did not like

pre-previous: “Careful, Santa!” quacked the ducks.   
previous: “You nearly squashed us.”
current: “How awful,” said Santa as he gathered    up the presents.
next: “Sorry about that.   
 
pre-previous: he cried.   
previous: “Careful, Santa!” called the squirrel.
current: “I am trying to be careful,” said Santa, as he struggled to    get free.
next: Very, very slowly, Santa climbed back down again,    dropping some presents as he went.
 
pre-previous: warned Santa’s cat.  
previous: “Eeek! I can’t stop!”
current: said Santa, as he zoomed towards the    snowman . . .
next: “Sorry, Snowman, I didn’t mean to bump you,” Santa said, as he  dusted himself down and popped the last of the presents into his    sack.
 
pre-previous: “Eeek! I can’t stop!”
previous: said Santa, as he zoomed towards the    snowman . . .
current: “Sorry, Snowman, I didn’t mean to bump you,” Santa said, as he  dusted himself down and popped the last of the presents into his    sack.
next: “That’s it!”
 
pre-previous:

IndexError: list index out of range

In [9]:
from unidecode import unidecode

In [10]:
for title, text in zip(df.Title, df.Text):
    text = text.replace('\n', ' ')
    doc = nlp(text)
    sentence_list = list(doc.sents)
    break

In [11]:
test=[s for s in sentence_list if 's Snake' in str(s)][0]

In [14]:
unidecode(str(test))

'"It\'s Snake," said the mouse.'