# `692 Team 1 Proj-1 : Crime Novel Plot Analysis with Regex - Agatha Christie`

## Objective
The goal of this project is to conduct a plot and protagonist/antagonist analysis of the famous crime novels. For this project, we will analyze five publicly available crime novels/stories by Agatha Christie at the project Gutenberg http://www.gutenberg.org/. The novels chosen are: 

- The Murder on the Links 
- The Mysterious Affair at Styles 
- The Secret Adversary 
- The Man in the Brown Suit 
- The Secret of Chimneys 

Note: Feel free to use any background resource for the understanding of the plot, protagonist and antagonist names, and other details. Look for spoilers, details, etc. Our goal is not to predict the crime, but to computationally analyze the structure of the plot.


## Data Collection and Preparation 

### Background Research for Data Collection and Preparation 

##### Data Collection
Location for Plain text UTF-8 files for novels: 
- The Mysterious Affair at Styles https://www.gutenberg.org/files/863/863-0.txt
- The Murder on the Links https://www.gutenberg.org/files/58866/58866-0.txt
- The Secret Adversary https://www.gutenberg.org/files/1155/1155-0.txt
- The Man in the Brown Suit https://www.gutenberg.org/files/61168/61168-0.txt
- The Secret of Chimneys https://www.gutenberg.org/files/65238/65238-0.txt

Note: One benefit to getting the text version is that the html version also has page number to clean, not present in text files

##### Data Preparation
- There are inconsistencies between the novel formats. Some of them start with a prologue and others dont. 
- There is START OF THE PROJECT present in the beginning of most but not all books, others have 'START OF THIS PROJECT', but   table of contents appear after that.
- Some of them have the word table of contents, others say contents
- some follow roman numeral in naming chapters, others dont
- Some use the word 'chapter' , others just kist chapter titles followed by a number
- Novel text files have license and other info at the end

These factors above will need to be considered in data cleaning. 
Listing a few key particulars below: 

- The Murder on the Links 
  - This Phrase is present at the beginning - \*** START OF THIS PROJECT 
  - The Novel plot starts at second instance of '1 A Fellow Traveller'. Novel ends at 'End of Project Gutenberg's The Murder on the Links, by Agatha Christie
  - and has \*** END OF THIS PROJECT GUTENBERG ...' at the end. .
  - Each chapter  starts with number followed by title of chapter
  
  
- The Mysterious Affair at Styles 
    - This Phrase is present at the beginning - \*** START OF THE PROJECT 
    - The Novel plot starts at second instance of 'chapter I.' # period is important here. 
    - Novel ends at 'THE END' and has \*** END OF THE PROJECT GUTENBERG EBOOK...'. 
    - Each chapter starts with 'Chapter' followed by chapter number in roman numeral, followed by new line, followed by title of chapter
  

- The Secret Adversary 
  - This Phrase is present at the beginning - \*** START OF THIS PROJECT 
  - The Novel plot starts at second instance of 'PROLOGUE'. Novel ends at 'End of the Project Gutenberg EBook of The Secret Adversary, by Agatha Christie' 
  -  has \*** END OF THIS PROJECT GUTENBERG ...' at the end. 
  - Each chapter starts with 'Chapter' followed by chapter number in roman numeral, followed by title of chapter


- The Man in the Brown Suit 
  - This Phrase is present at the beginning - \*** START OF THIS PROJECT 
  - The Novel starts at second instance of 'PROLOGUE'. Novel ends at 'End of Project Gutenberg's The Man in the Brown Suit, by Agatha Christie' 
  - has \*** END OF THIS PROJECT GUTENBERG ...' at the end. 
  - Each chapter starts with 'Chapter' followed by chapter number in roman numeral


- The Secret of Chimneys 
  - This Phrase is present at the beginning - \*** START OF THE PROJECT 
  - The Novel plot starts at second instance of '1 (new line)
  - Anthony Cade Signs on' # new line is first here. 
  - Novel ends at 'Transcriber's Notes:' and has \*** END OF THE PROJECT GUTENBERG...' at the end.
Each chapter starts with number followed by new line followed by title of chapter


##### Data Tokenization
Note: since we are not allowed to use NLTK or Spacy for tokenization, we will have to use python for this as well. 
We can use split() but that would be very basic as it doesnt achieve tokens in a linguistic sense; we should be able to use the re package that adds support for regex; after all the point for us is to learn regex better. Recommend using re.split with our custom regex 
https://docs.python.org/3/library/re.html

##### Helpful Reference Links
- https://stackoverflow.com/questions/7243750/download-file-from-web-in-python-3
- https://docs.python.org/3/howto/urllib2.html
- https://python.plainenglish.io/how-to-tokenize-sentences-without-using-any-nlp-library-in-python-a381b75f7d22 
- https://stackoverflow.com/questions/21361073/tokenize-words-in-a-list-of-sentences-python



### Data Collection and Preparation Code



##### Helper Functions for retrieving and cleaning corpus


In [5]:
#Author: Luke+Veronica
#Description: Functions for retrieving and cleaning corpus
import urllib.request, re

# this function accepts a book title as a parameter and fetches the index based on the title 
def get_index(title):
  last_reg=re.compile(r"\w+$")
  last_word=re.findall(last_reg,title)[0]
  if last_word =="Links":
    #"The Murder on the Links"
    index=1
  elif last_word=="Styles":
    #"The Mysterious Affair at Styles"
    index=2
  elif last_word=="Adversary":
    #"The Secret Adversary"
    index=3
  elif last_word=="Suit":
    #"The Man in the Brown Suit"
    index=4
  elif last_word=="Chimneys":
    #"The Secret of Chimneys"
    index=5
  return index


# this function accepts a book index as a parameter and get the text for the book from project gutenberg 
def get_text(index):
  if index==1:
    #"The Murder on the Links"
    url = "https://www.gutenberg.org/files/58866/58866-0.txt"
  elif index==2:
    #"The Mysterious Affair at Styles"
    url="https://www.gutenberg.org/files/863/863-0.txt"
  elif index==3:
    #"The Secret Adversary"
    url="https://www.gutenberg.org/files/1155/1155-0.txt"
  elif index==4:
    #"The Man in the Brown Suit"
    url="https://www.gutenberg.org/files/61168/61168-0.txt"
  elif index==5:
    #"The Secret of Chimneys"
    url="https://www.gutenberg.org/files/65238/65238-0.txt"
  response = urllib.request.urlopen(url)
  data = response.read()      # a `bytes` object
  text = data.decode('utf-8')
  return text


# this function accepts a book index  and returns an appropriate regex that can carve out chapters for that book
def get_ch_regex(index):
  if index==1:
    ch_carve=re.compile(r'\n\d\d?\s[\'\"\u201c]?[A-Z].*\n')
  elif index==2:
    ch_carve=re.compile(r'CHAPTER\s[IVX]+\.\r\n.*\r\n')
  elif index==3:
    ch_carve=re.compile(r'\r\n\r\n\r\nCHAPTER.*\r\n')
  elif index==4:
    ch_carve=re.compile(r'CHAPTER\s\w+\r\n')  
  elif index==5:
    ch_carve=re.compile(r'\d\d?\r\n\r\n[A-OQ-Z].*\r\n')
  return ch_carve


# this function accepts a book index and chapter contents as parameters and trims out any Project gutenberg related artifacts that are not part of the novel
def trim_contents(ch_contents_dict,index):
  last=len(ch_contents_dict)
  if index==1:
    ch_contents_dict[last]=ch_contents_dict[last].split('\nEnd of Project Gutenberg')[0]
  elif index==2:
    ch_contents_dict[last]=ch_contents_dict[last].split('\nTHE END')[0]
  elif index==3:
    ch_contents_dict[last-1]=ch_contents_dict[last-1].split('\nEnd of the Project Gutenberg')[0]
  elif index==4:
    ch_contents_dict[last-1]=ch_contents_dict[last-1].split('THE END')[0]
  elif index==5:
    ch_contents_dict[last]=re.split(r"TRANSCRIBER",ch_contents_dict[last])[0]
  return ch_contents_dict


# this function accepts chapter as a parameter and removes white spaces
def remove_white(chapter):
  regex=r'[\r\n\u200a_]+'
  chapter = re.sub(regex,' ',chapter)
  return chapter


# this function accepts chapter as a parameter and carves out sentences
def sent_carve(chapter):
  #chapter=re.split(r'(?<![A-H|J-Z])[\.\?!](?![\'\"\u2019\u201a\u201c\u275c\u275f\u201e\u201d\u0022\u275e]\s[a-z])(?![\'\"\u2019\u201a\u201c\u275c\u275f\u201e\u201d\u0022\u275e]\sI said)[\'\"\u2018\u2019\u201c\u201d\)\]]*\s*(?<!\w\.\w)(?<![A-Z][a-z][a-z])(?<![A-Z][a-z])\s+',chapter,flags=re.UNICODE)
  chapter=re.split(r'(?<![^A-Z][A-H|J-Z])(?<!Mr|Ms|Dr)(?<!Mrs)(?<!Mlle)(?<!Melle)(?<!\w\.\w)[\.\?!](?![\'\"\u2019\u201a\u201c\u275c\u275f\u201e\u201d\u0022\u275e]\s[a-z])[\'\"\u2018\u2019\u201c\u201d\)\]]*\s*|\u2014\u201d\s*',chapter,flags=re.UNICODE)
  chapter=chapter[:-1]
  chapter={num:contents.lower() for (num,contents) in enumerate(chapter)}
  return chapter


# this function accepts a book title and carves out chapters and returns a dictionary of book title, chapter contents and chapter title
def ch_carve(title):

  index=get_index(title)
  text=get_text(index)
  ch_regex=get_ch_regex(index)
  if index ==3:
    text=re.split("CHAPTER XXVIII.     AND AFTER\r\n\r\n\r\n\r\nPROLOGUE",text)[1]
  if index ==4:
    text=re.split("PROLOGUE",text)[1]
  ch_titles=re.findall(ch_regex,text)
  ch_titles_dict={num+1:remove_white(title.strip()) for (num,title) in enumerate(ch_titles)}
  if index==3 or index ==4:
    ch_titles_dict.update( {0 :"PROLOGUE"} )
  chapters=re.split(ch_regex,text)
  if index==3 or index==4:
    ch_contents_dict = {num:contents for (num,contents) in enumerate(chapters)}  
  elif index ==1 or index ==2 or index==5:
    chapters=chapters[1:]
    ch_contents_dict = {num+1:contents for (num,contents) in enumerate(chapters)}
  ch_contents_dict=trim_contents(ch_contents_dict,index)
  return {"title":title,"contents":ch_contents_dict,"chapters":ch_titles_dict}


# this function calls other helper functions and gets the corpus we will be working with
def get_corpus():
  #tentatively planning to index books from 1 to match chapters
  titles=["The Mysterious Affair at Styles","The Murder on the Links","The Secret Adversary","The Man in the Brown Suit","The Secret of Chimneys"]
  corpus={ get_index(title):ch_carve(title) for title in titles}
  return corpus


# this function calls other helper functions and cleans the corpus
def clean_corpus(corpus):
  for keyb,value in corpus.items():
    for  keyc,value in value["contents"].items():
      corpus[keyb]["contents"][keyc]=sent_carve(remove_white(value))
    
  return corpus


# Luke-please expand on this comment
def sent_blob(chapter):
  temp=''  
  #chapter=re.split(r'(?<![A-H|J-Z])[\.\?!](?![\'\"\u2019\u201a\u201c\u275c\u275f\u201e\u201d\u0022\u275e]\s[a-z])(?![\'\"\u2019\u201a\u201c\u275c\u275f\u201e\u201d\u0022\u275e]\sI said)[\'\"\u2018\u2019\u201c\u201d\)\]]*\s*(?<!\w\.\w)(?<![A-Z][a-z][a-z])(?<![A-Z][a-z])\s+',chapter,flags=re.UNICODE)
  chapter=re.split(r'(?<![^A-Z][A-H|J-Z])(?<!Mr|Ms|Dr)(?<!Mrs)(?<!Mlle)(?<!Melle)(?<!\w\.\w)[\.\?!](?![\'\"\u2019\u201a\u201c\u275c\u275f\u201e\u201d\u0022\u275e]\s[a-z])[\'\"\u2018\u2019\u201c\u201d\)\]]*\s*|\u2014\u201d\s*',chapter,flags=re.UNICODE)
  chapter=chapter[:-1]
  for ch in chapter:
    temp=temp+" "+ch.lower()    
  return temp


# this function removes punctuation 
def remove_punc(blob):
  blob=re.sub(r"[\u201c\u201d\?,;:\.!\u2018\u2019\u201a\u275b\u275c\u275f\s-]+" ,' ',blob)
  return blob
def tighten(blob):
  return re.sub(r"\s+"," ",blob)


# Luke-please expand on this comment
def blob_corpus(dirty_corpus):
  for keyb,value in dirty_corpus.items():
    blob=''
    for  keyc,value in value["contents"].items():
      blob=blob+" "+tighten(remove_punc(remove_white(sent_blob(dirty_corpus[keyb]["contents"][keyc]))))
    dirty_corpus[keyb]["blob"]=blob  
  return dirty_corpus


##### Functions for retrieving and cleaning corpus


In [6]:
#Author: Luke 
#Description: collect and clean corpus

dirty_corpus=get_corpus()
dirty_corpus=blob_corpus(dirty_corpus)
corpus=clean_corpus(dirty_corpus)

#print(corpus[1]["title"],len(corpus[1]["chapters"]),len(corpus[1]["contents"]))
#print(corpus[1]["contents"][28])
#print(corpus[2]["contents"][13])
#print(corpus[1]["blob"])





## Data Analysis
Goal of this project is to analyze the frequencies of occurrence of the protagonists and the perpetrator(s) across the novel - per chapter, and per sentence in a chapter, the mention of the crime, and other circumstances surrounding the antagonists. The ultimate objective is to use basic NLP tools to observe any patterns in plot structures across the works of one or all of the authors.  Specifically, analysis questions below need to be answered. 

Note: To effectively conduct this analysis, you should find resources, and read the plot summaries of each novel, so you can make your search more effective. If plot summaries are not available, use regex to search for clues, and report how well/how fast that approach worked. 


###  Background Research for Data Analysis
Details of each book compiled by reading plot summaries, books themselves, articles and fanpages:

- The Murder on the Links 
  - Lead detective(s): Hercule Poirot, Arthur Hastings
  - Other detectives/assistants:  Monsieur Giraud, Monsieur Hautet
  - Victim: Paul Renauld
  - Suspects: Jack Renauld
  - Perpetrator: Marthe Daubreuil.
  - Other important characters:  Paul Renauld, Eloise Renauld, Jack Renauld, Madame Daubreuil, Gabriel Stonor, Georges Conneau, Madame Beroldy, Marthe Daubreuil, Bella Duveen, Dulcie Duveen (Cindrella), Cindrella
  - Crime: Murder, Stabbing
  - motif: murder mystery
https://en.wikipedia.org/wiki/The_Murder_on_the_Links
https://agathachristie.fandom.com/wiki/The_Murder_on_the_Links


- The Mysterious Affair at Styles 
  - Lead detective: Hercule Poirot, Arthur Hastings
  - Other detectives/assistants: 
  - Victim: Emily Inglethorp
  - Suspects: Alfred Inglethorp , Cavendish
  - Perpetrator(s): Alfred Inglethorp, Evelyn Howard
  - Other important characters: John Cavendish, 
  - Crime: Murder, Poisoning
  - motif: murder mystery
https://agathachristie.fandom.com/wiki/The_Mysterious_Affair_at_Styles


- The Secret Adversary  (complicated, Needs to be looked at more)
  - Lead detective: Tommy and Tuppence, Tommy Beresford, Tuppence Cowley, Prudence Cowley, Prudence "Tuppence" Cowley, 
  - Other detectives/assistants: 
  - Victim: Jane Finn, Mrs. Vandemeyer
  - Suspects: Mr. Brown,  Julius Hersheimmer
  - Perpetrator: Sir James Peel Edgerton
  - Other important characters: Jane Finn
  - Crime: Espionage, Kidnapping
  - motif: thriller focus rather than detection


- The Man in the Brown Suit (complicated, Needs to be looked at more)
  - Lead detective: Anne Beddingfeld
  - Other detectives/assistants: 
  - Victim: Nadina aka Anita Grünberg, L. B. Carton
  - Suspects: Harry
  - Perpetrator: Sir Eustace Pedler
  - Other important characters: Nadina, Count Sergius Paulovitch, the Colonel,  , Suzanne Blair, Colonel Race, Guy Pagett, Harry Rayburn, Harry Rayburn, Rev. Chichester, Miss Pettigrew,Harry Parker, Chichester
  - Crime: diamond theft, murders, kidnapping
  - motif: thriller focus rather than detection


- The Secret of Chimneys (complicated, Needs to be looked at more)
  - Lead detective: Anthony Cade aka Prince Nicholas
  - Other detectives/assistants: Superintendent Battle, Monsieur Lemoine of the Sûreté, Mr. Fish aka american agent
  - Victim: Perceived: Count Stanislaus aka Prince Michael Obolovitch
  - Suspects: Anthony Cade, Prince Nicholas, King Victor, 
  - Perpetrator: Mlle Brun aka Queen Varaga aka Angèle Mory, M Lemoine aka King Victor
  - Other important characters: King Nicholas IV, Queen Varaga aka Angèle Mory, Herman Isaacstein, Prince Michael Obolovitch,  George Lomax, Count Stylptitch, Jimmy McGrath, Virginia Revel, Captain O'Neill, Captain O'Neill, Mr Holmes, Isaacstein, Hiram P. Fish, Prince Nicholas, Mademoiselle Mlle Brun, Bill Eversleigh, Monsieur Lemoine of the Sûreté, Professor Wynwood, Boris Anchoukoff,
  - Crime: sensitive document theft, murders, treasure hunt, espionage
  - motif: thriller focus rather than detection






### Data Analysis Code



#### Helper functions for answering analysis questions in the objective


In [19]:
#Author: Luke 
#Description Helper functions for answering questions

#
def get_det(index):
  if index==1:
    det=re.compile(r"hercules?(?!a)|poirot|arthur|hastings")
  elif index==2:
    det=re.compile(r"hercules?(?!a)|poirot|arthur|hastings")
  elif index==3:
    # Probably need to tewak this to capture the different versions of PTC
    det=re.compile(r"tuppence|beresford|prudence|cowley")
  elif index==4:
    det=re.compile(r"anne|beddingfeld")
  elif index==5:
    det=re.compile(r"anthony|cades")
  else:
    det=re.compile(r"nobody")
  return det


#
def get_perp(index):
  if index==1:
    #checked book for abbreviations Mlle,mlle,Melle,melle - none occured in text
    perp=re.compile(r"mademoiselle marthe daubreuil|mademoiselle marthe|mademoiselle daubreuil|marthe daubreuil|marthe")
    #perp=re.compile(r"mademoiselle( marthe)? daubreuil|marthe daubreuil|marthe")
  elif index==2:
    perp=re.compile(r"alfred inglethorp|mr\. inglethorp|alfred|second cousin|evelyn howard|evelyn|miss howard|evie")
  elif index==3:
    perp=re.compile(r"[james peel ]edgerton")
  elif index==4:
    perp=re.compile(r"eustace|pedler")
  elif index==5:
    perp=re.compile(r"mlle|brun|queen|varaga|angèmle|mory|m lemoine|king|victor")
   #ignoring the aliases for now 
   #perp= re.compile(r"mlle|brun|queen|varaga|angèle|mory|m lemoine|king|victor")
  else:
    # V: shouldn't get here
    perp=re.compile(r"someone else")
  return perp


#
def get_crime(index):
  if index==1:
    crime=re.compile(r"murdered|body (was|had been) discovered|discovered.*body|death.*occured|occured.*death|examination.*body|body.*examination|only.*committed the crime")
  elif index==2:
    # FIXME
    crime=re.compile(r"convulsion|night of the murder|(mrs\. inglethorp|emily|old lady|wife|mother).*poison|poison.*(mrs\. inglethorp|emily|old lady|wife|mother)|motionless")
  elif index==3:
    # V: could look for spy also
    crime=re.compile(r"kidnapping|espionage")
  elif index==4:
    crime=re.compile(r"murder|kidnpping|theft")
  elif index==5:
    crime=re.compile(r"murder|espionage|theft")
  else:
    # V: shouldn't get here
    crime=re.compile(r"nothing happened")
  return crime


#
def get_sus(index):
  if index==1:
    sus=re.compile(r"jack")
  elif index==2:
    # FIXME
    sus=re.compile(r"john cavendish|(?<!purchased by )mr\. cavendish|john")
  elif index==3:
    sus=re.compile(r"mr. brown|julius hersheimmer")
  elif index==4:
    sus=re.compile(r"harry")
  elif index==5:
    sus=re.compile(r"king|victor")
  else:
    # V: shouldn't get here
    sus=re.compile(r"no suspect")
  return sus


#
def get_occur(index,regex):
  occur=[]
  for ch_index,ch_contents in corpus[index]["contents"].items():
    for sent_index,sent_contents in ch_contents.items():
      matches=re.search(regex,sent_contents)
      if matches is not None:
        occur.append([ch_index,sent_index,sent_contents])
        #print("Chapter: ",ch_index, "Sentence: ", sent_index, "Contents: ",sent_contents)
  return occur


#
def get_co_occur(index, det,perp):
  co_occur=[]
  for ch_index,ch_contents in corpus[index]["contents"].items():
    for sent_index,sent_contents in ch_contents.items():
      dmatches=re.search(det,sent_contents)
      pmatches=re.search(perp,sent_contents)
      if dmatches is not None and pmatches is not None:
        co_occur.append([ch_index,sent_index,sent_contents])
        #print("Chapter: ",ch_index, "Sentence: ", sent_index, "Contents: ",sent_contents)
  return co_occur


#
def get_3words(book,perp):
  blob=corpus[book]["blob"]
  answer=[]
  splits=re.finditer(perp,blob)
  for iter in splits:
    before=re.split(r"\s+",blob[0:iter.start()-1])
    if len(before)>2:
      before=[before[-3],before[-2],before[-1]]
    elif len(before)==2:
      before=[" ", before[-2],before[-1]]
    elif len(before)==1:
      before=[" "," ",before[0]]
    elif len(before)==0:
      before=[" "]
 #   print(before)
    after=re.split(r"\s+",blob[iter.end()+1:])
    if len(after)>2:
      after=[after[0],after[1],after[2]]
    elif len(after)==2:
      after=[after[0],after[1]," "]
    elif len(after)==1:
      after=[after[0]," "," "]
    elif len(after)==0:
      after=[" "]
    answer.append(before+after)
  return answer

  #splits=[re.finditer(r"\s+",sp) for sp in splits]

    
#    
def get_3sentences(book,ch,sent):
  near3=[]
  if sent==max(corpus[book]["contents"][ch].keys()):
    near3=[[sent-2,corpus[book]["contents"][ch][sent-2]],[sent-1,corpus[book]["contents"][ch][sent-1]],[sent,corpus[book]["contents"][ch][sent]]]
  elif sent==min(corpus[book]["contents"][ch].keys()):
    near3=[[sent,corpus[book]["contents"][ch][sent]],[sent+1,corpus[book]["contents"][ch][sent+1]],[sent+2,corpus[book]["contents"][ch][sent+2]]]
  else:
    near3=[[sent-1,corpus[book]["contents"][ch][sent-1]],[sent,corpus[book]["contents"][ch][sent]],[sent+1,corpus[book]["contents"][ch][sent+1]]]
  return near3



#### Functions for answering analysis questions in the objective 

The plot summary answers derived from Regex are located below each book heading



###### The Murder on the Links 


In [20]:
#Author: Luke
#Description: demo of code for answering questions/book 1 code
  
det=get_det(1)
perp=get_perp(1)
crime=get_crime(1)
sus=get_sus(1)
det_occur=get_occur(1,det)
perp_occur=get_occur(1,perp)
co=get_co_occur(1,det,perp)
crime=get_occur(1,crime)
sus_occur=get_occur(1,sus)
print(det_occur)
print(perp_occur)
print(co)
print(crime)
print(sus_occur)
print(perp_occur)
perp_neighbors=get_3words(1,perp)
for n in perp_neighbors:
  print(n)
print(len(perp_neighbors))

[[1, 4, 'i had been transacting some business in paris and was returning by the morning service to london where i was still sharing rooms with my old friend, the belgian ex-detective, hercule poirot'], [1, 127, '“that was poirot’s first big case'], [2, 1, 'my friend poirot, exact to the minute as usual, was just tapping the shell of his second egg'], [2, 8, 'elsewhere, i have described hercule poirot'], [2, 18, '” i slipped into my seat, and remarked idly, in answer to poirot’s greeting, that an hour’s sea passage from calais to dover could hardly be dignified by the epithet “terrible'], [2, 19, 'poirot waved his egg-spoon in vigorous refutation of my remark'], [2, 41, 'poirot shook his head seriously'], [2, 59, 'poirot threw me a withering glance'], [2, 60, '“what an intelligence has my friend hastings!” he exclaimed sarcastically'], [2, 64, 'poirot shook his head with a dissatisfied air'], [2, 71, '“cheer up, poirot, the luck will change'], [2, 74, 'poirot smiled, and taking up the n



#####  The Mysterious Affair at Styles


In [13]:
# Block for book 2
#Author: Luke
  
det=get_det(2)
perp=get_perp(2)
crime=get_crime(2)
sus=get_sus(2)
det_occur=get_occur(2,det)
perp_occur=get_occur(2,perp)
co=get_co_occur(2,det,perp)
crime=get_occur(2,crime)
sus_occur=get_occur(2,sus)
print(det_occur)
print(perp_occur)
print(co)
print(crime)
print(sus_occur)
print(perp_occur)
perp_neighbors=get_3words(2,perp)
for n in perp_neighbors:
  print(n)
print(len(perp_neighbors))
#notes: avoided using "murdered" for crime as charcters often use word 
#murder in theoretial conversations and wild accusations
#when discussing actual event, more specific language "poison" is used
#poison needs to be further refined to avoid instances where acquisition of poison is discussed,
#hence adding a check whether emily/mrs. cavendish/old lady/wife/mother is menttioned in same sentence
#Mr. cavendish once refers to John's deceased father, so checked for that instance in regex

[[1, 1, 'nevertheless, in view of the world-wide notoriety which attended it, i have been asked, both by my friend poirot and the family themselves, to write an account of the whole story'], [1, 31, '“i can tell you, hastings, it’s making life jolly difficult for us'], [1, 63, 'as we turned in at the lodge gates, john said: “i’m afraid you’ll find it very quiet down here, hastings'], [1, 82, 'mr. hastings—miss howard'], [1, 105, '“my wife, hastings,” said john'], [1, 124, '“why, if it isn’t too delightful to see you again, mr. hastings, after all these years'], [1, 125, 'alfred, darling, mr. hastings—my husband'], [1, 133, 'he placed a wooden hand in mine and said: “this is a pleasure, mr. hastings'], [1, 144, 'presently mrs. inglethorp turned to give some instructions about letters to evelyn howard, and her husband addressed me in his painstaking voice: “is soldiering your regular profession, mr. hastings'], [1, 195, 'this is mr. hastings—miss murdoch'], [1, 212, 'sisters  are , you k


### Code - Do we need this anymore?


In [2]:
# Author: @verolero86

# W.I.P. - first stab takes care of white space characters
#import numpy as np # to grab unique elements 

#def find_white_space(book,nchars,debug):
  #result_book = re.findall(r'\s',book[0:nchars]);

#  if debug == True:
#    print(repr(book1_chapter1[0:nchars]))

 # print(f"Number of white space characters = {len(result_book)}")
 # print(f"Unique types of white space characters found = {np.unique(result_book)}")

  #return len(result_book)

#def remove_specific_white_space(book,regex):
#  result_book = re.sub(regex,' ',book)

 # return result_book

#def clean_data(book):
  #ws_regex=r'[\r\n\u200a]'
  #result_book = remove_specific_white_space(book,ws_regex).lower()
  #return result_book

# Set a subset of characters for easier parsing (-1 for all in chapter)
#num_ws_b1c1 = find_white_space(corpus[1]["contents"][1],-1,True);
#print(num_ws_b1c1)

#num_ws_b1c2 = find_white_space(corpus[1]["contents"][2],-1,False);
#print(num_ws_b1c2)

#num_ws_b2c1 = find_white_space(corpus[2]["contents"][1],-1,False);
#print(num_ws_b2c1)

# Cleaning up unwanted white space and breaking up int o sentences.
#b1c1 = corpus[1]["contents"][1]
#regex=r'[\r\n\u200a]'
#b1c1_no_ws = remove_specific_white_space(b1c1,regex)
#print(f"No \\r and \\n anymore: {repr(b1c1_no_ws[0:400])}")
#result_sentences = re.findall(r'[^\.\!\?]*[\.\!\?]',b1c1_no_ws);
#print(repr(result_sentences[0]))
#print(repr(result_sentences))
#sentence_regex=r'[\.\?!](?![\'\"\u2019\u201a\u201c\u275c\u275f\u201e\u201d\u0022\u275e]\s[a-z])(?![\'\"\u2019\u201a\u201c\u275c\u275f\u201e\u201d\u0022\u275e]\sI said)[\'\"\u2018\u2019\u201c\u201d\)\]]*\s*(?<!\w\.\w.)(?<![A-Z][a-z][a-z]\.)(?<![A-Z][a-z]\.)(?<![A-Z]\.)\s+'
#result2=re.split(r'[\.\?!](?![\'\"\u2019\u201a\u201c\u275c\u275f\u201e\u201d\u0022\u275e]\s[a-z])(?![\'\"\u2019\u201a\u201c\u275c\u275f\u201e\u201d\u0022\u275e]\sI said)[\'\"\u2018\u2019\u201c\u201d\)\]]*\s*(?<!\w\.\w.)(?<![A-Z][a-z][a-z]\.)(?<![A-Z][a-z]\.)(?<![A-Z]\.)\s+',b1c1_no_ws,flags=re.UNICODE)
#result2=re.split(sentence_regex,b1c1_no_ws,flags=re.UNICODE)
#print(b1c1_no_ws)
#b1c1_clean = clean_data(b1c1)
#result3=re.split(sentence_regex,b1c1_clean,flags=re.UNICODE)
#print(result2)
#print(result3)



#####   The Secret Adversary 


In [None]:
bookid=3
print(corpus[bookid]["title"],len(corpus[bookid]["chapters"]),len(corpus[bookid]["contents"]))
det=get_det(bookid)
perp=get_perp(bookid)
crime=get_crime(bookid)
sus=get_sus(bookid)
det_occur=get_occur(bookid,det)
perp_occur=get_occur(bookid,perp)
co=get_co_occur(bookid,det,perp)
crime=get_occur(bookid,crime)
sus_occur=get_occur(bookid,sus)
#print(det)
print("det_occur = ", det_occur)
#print(perp)
print("perp_occur = ", perp_occur)
print("co = ", co)
print("crime = ", crime)
print("sus_occur = ", sus_occur)
#print(get_context(1,3,116))




##### The Man in the Brown Suit 


In [None]:
det=get_det(4)
perp=get_perp(4)
crime=get_crime(4)
sus=get_sus(4)
det_occur=get_occur(4,det)
perp_occur=get_occur(4,perp)
co=get_co_occur(4,det,perp)
crime=get_occur(4,crime)
sus_occur=get_occur(4,sus)
print(det_occur)
print(perp_occur)
print(co)
print(crime)
print(sus_occur)
#print(get_context(1,7,184))
print(len(perp_occur))
print(perp_occur)


#####  The Secret of Chimneys 


In [None]:
det=get_det(5)
perp=get_perp(5)
crime=get_crime(5)
sus=get_sus(5)
det_occur=get_occur(5,det)
perp_occur=get_occur(5,perp)
co=get_co_occur(5,det,perp)
crime=get_occur(5,crime)
sus_occur=get_occur(5,sus)
print(det_occur)
print(perp_occur)
print(co)
print(crime)
print(sus_occur)
print(get_context(1,7,184))
print(len(perp_occur))
print(perp_occur)




## Results



### Q1. When does the detective (or a pair) occur for the first time -  chapter #, the sentence(s) # in a chapter


### The Murder on the Links 

######Story told from viewpoint of Arthur Hastings, who is Hercule Poirot's sidekick.
####Hercule first appears in the 4th sentence of the first chapter.
i had been transacting some business in paris and was returning by the morning service to london where i was still sharing rooms with my old friend, the belgian ex-detective, hercule poirot"

###  The Mysterious Affair at Styles 

####Story told from viewpoint of Arther Hastings, Hercule Poirot's sidekick.
####Hercule first appears in the 1st sentence of the 1st chapter.
"nevertheless, in view of the world-wide notoriety which attended it, i have been asked, both by my friend poirot and the family themselves, to write an account of the whole story"

### The Secret Adversary 

### The Man in the Brown Suit 

### The Secret of Chimneys 

### Q2. When is the crime first mentioned - the type of the crime and the details - chapter #, the sentence(s) # in a chapter,

### The Murder on the Links 


######The murder of Paul Renauld is revealed at the end of chapter 2 in sentence 323.
"m. renauld was murdered this morning"
####The time when the body is discovered is reported in chapter 3 sentence 53
"the body was discovered this morning about nine o’clock"
####The methodology is described in chapter 3 sentence 90
"going to call her mistress as usual, a younger maid, léonie, was horrified to discover her gagged and bound, and almost at the same moment news was brought that m. renauld’s body had been discovered, stone dead, stabbed in the back"
####Who found the body is answered in chapter 6 sentence 59
"it was some of the men working on them who discovered the body early this morning"
####Time of death is reported in chapter 12 sentence 65
"they declared, after examination of the body, that death had taken place between ten and seven hours previously"
####Finally, the murderer is revealed in chapter 28 (via co-occurence with Hercule Poirot) sentence 95
####Note, Poirot exposes her as the murderer earlier (end of chapter 27), but definite evidence (within a single sentence) that Poirot is making his final accusation does not occur until chapter 28 sentence 95 near the end of a long conversation between Poirot and Hastings concerning how Poirot knew that Marthe was the murderess.

###  The Mysterious Affair at Styles 

####The murder of Emily Inglethorp occurs in chapter 3.
####The strychnine poisoning triggers a slow death scene from sentences 47-79
"mrs. inglethorp was lying on the bed, her whole form agitated by violent convulsions, in one of which she must have overturned the table beside her...then she fell back motionless on the pillows"
####The methodology of the murder is first described in chapter 4 sentence 124.
"the present contention is that mrs. inglethorp died of strychnine poisoning, presumably administered in her coffee"

#NOTE: need to add murder details from later in books

### The Secret Adversary 

### The Man in the Brown Suit 

### The Secret of Chimneys 

### Q3. When is the perpetrator first mentioned - chapter #, the sentence(s) # in a chapter,ords following the mention of a perpetrator),

### The Murder on the Links 

####Mademoiselle Daubreuil (Marthe) is first mentioned in chapter 7 sentence 145.
"mademoiselle daubreuil,' said m. hautet, sweeping off his hat, 'we regret infinitely to disturb you, but the exigencies of the law—you comprehend"

###  The Mysterious Affair at Styles 

####Alfred Inglethorp and Evelyn Howard are first mentioned in chapter 1 in sentences 32 and 40 respectively

### The Secret Adversary 

### The Man in the Brown Suit 

### The Secret of Chimneys 

### Q4. What are the 3 words that occur around the perpetrator on each mention (i.e., the three words preceding, and the three words following the mention of a perpetrator),

### The Murder on the Links 

['she', 'was', 'afraid', 'said', 'm', 'hautet']

['turned', 'to', 'her', 'dear', 'but', 'the']

['to', 'speak', 'before', 'as', 'my', 'daughter']

['us', 'it', 'was', 'i', 'beg', 'your']

['our', 'amélie', 'explained', 'with', 'a', 'blush']

['your', 'heart', 'on', 'she', 'is', 'not']

['the', 'quarrel', 'was', 'renauld', 'sprang', 'round']

['admitted', 'i', 'love', 'and', 'i', 'wish']

['boy', 'you', 'too', 'is', 'as', 'good']

['have', 'nothing', 'against', 'in', 'any', 'way']

['your', 'intentions', 'towards', 'he', 'resumed', 'he']

['he', 'had', 'against', 'to', 'that', 'he']

['i', 'was', 'marrying', 'and', 'not', 'her']

['i', 'wrote', 'to', 'telling', 'her', 'what']

['yesterday', 'it', 'was', 'today', 'it', 'is']

['to', 'rag', 'me', 'is', 'a', 'very']

['him', 'out', 'with', 'but', 'i', 'fear']

['beautiful', 'girl', 'like', 'and', 'the', 'result']

['always', 'think', 'of', 'as', 'the', 'girl']

['of', 'the', 'beautiful', 'chéri', 'she', 'was']

['you', 'know', 'it', 'jack', 'renauld', 'replied']

['deep', 'anxiety', 'underlying', 's', 'tones—but', 'i']

['the', 'reason', 'of', 's', 'poignant', 'anxiety']

['thing', 'was', 'certain', 'had', 'known', 'all']

['see', 'my', 'fiancée', 'i', 'was', 'on']

['over', 'his', 'shoulder', 'may', 'find', 'herself']

['the', 'moment', 'that', 'looking', 'slightly', 'startled']

['house', 'maman', 'whispered', 'i', 'must', 'go']

['absolute', 'truth', 'unwittingly', 'told', 'us', 'the']

['if', 'he', 'saw', 'on', 'the', 'night']

['me', 'to', 'see', 'before', 'he', 'could']

['did', 'not', 'see', 'whom', 'did', 'he']

['wish', 'to', 'marry', 'son', 'leaves', 'for']

['garden', 'witnessed', 'by', 'letter', 'written', 'to']

['wish', 'to', 'marry', 'son', 'leaves', 'for']

['returning', 'to', 'see', 'come', 'face', 'to']

['the', 'ears', 'of', 'i', 'shook', 'my']

['to', 'help', 'us', 'i', 'handed', 'it']

['the', 'villa', 'marguerite', 'was', 'at', 'the']

['poirot', 'watching', 'her', 'frowned', 'screening', 'some']

['s', 'real', 'name', 'looked', 'at', 'him']

['departure', 'for', 'england', 'listened', 'spellbound', 'when']

['merlinville', 'to', 'rejoin', 'and', 'his', 'mother']

['after', 'i', 'met', 'and', 'realized', 'i']

['its', 'coming', 'to', 's', 'ears', 'and']

['order', 'to', 'see', 'before', 'going', 'to']

['in', 'person', 'to', 'eh', 'finished', 'poirot']

['are', 'jack', 'and', 'i', 'exclaimed', 'looking']

['better', 'not', 'but', 'and', 'i', 'in']

['murmured', 'poirot', 'to', 'where', 'can', 'we']

['the', 'charge', 'of', 'and', 'her', 'mother']

['the', 'profile', 'of', 'ah', 'said', 'poirot']

['into', 'the', 'bedroom', 'was', 'embroidering', 'by']

['the', 'profile', 'of', 'as', 'she', 'bent']

['beautiful', 'face', 'of', '', 'i', 'have']

['that', 'we', 'found', 's', 'body', 'in']

['them', 'yes', 'from', 's', 'own', 'lips']

['easily', 'you', 'overheard', 's', 'conversation', 'with']

['possible', 'motive', 'could', 'have', 'for', 'murdering']

['the', 'standpoint', 'of', 'marthe', 'daubreuil', 'overhears']

['of', 'marthe', 'daubreuil', 'overhears', 'what', 'passes']

['the', 'mind', 'of', 'in', 'fact', 'i']

['me', 'infallibly', 'to', 'the', 'dagger', 'jack']

['third', 'one', 'to', 'so', 'then', 'to']

['of', 'note', 'against', '(1)', 'marthe', 'daubreuil']

['marthe', 'daubreuil', '(1)', 'could', 'have', 'overheard']

['s', 'plans', '(2)', 'had', 'a', 'direct']

['s', 'death', '(3)', 'was', 'the', 'daughter']

['actual', 'blow', '(4)', 'was', 'the', 'only']

['the', 'crime', 'was', 'but', 'i', 'had']

['by', 'jack', 'to', 'why', 'then', 'bella']

['steps', 'to', 'force', 'into', 'the', 'open']

['as', 'i', 'thought', 'made', 'a', 'last']

['brains', 'that', 'beautiful', 'and', 'her', 'object']

['the', 'floor', 'by', 's', 'body', 'i']

['go', 'quite', 'as', 'had', 'planned', 'to']

['last', 'chance', 'for', 'the', 'idea', 'of']

['begin', 'to', 'suspect', 'poirot', 'when', 'she']

['have', 'thought', 'of', 'from', 'the', 'beginning']

['a', 'siren', 'and', 'as', 'the', 'girl']

['to', 'the', 'truth', 'was', 'very', 'beautiful']

###  The Mysterious Affair at Styles 


['us', 'as', 'for', 'you', 'remember', 'evie']

['for', 'evie—you', 'remember', 'no', 'oh', 'i']

['a', 'great', 'sport—old', 'not', 'precisely', 'young']

['of', 'being', 'a', 'or', 'something', 'of']

['or', 'something', 'of', 's', 'though', 'she']

['that', 'she', 'and', 'were', 'engaged', 'the']

['for', 'that', 'fellow', 'he', 'checked', 'the']

['our', 'approach', 'hullo', 'here', 's', 'our']

['hero', 'mr', 'hastings', 'miss', 'howard', 'shook']

['mr', 'hastings—miss', 'howard', 'shook', 'hands', 'with']

['re', 'a', 'cynic', 'said', 'john', 'laughing']

['refreshed', 'well', 'said', 'drawing', 'off', 'her']

['princess', 'after', 'tea', 'i', 'll', 'write']

['are', 'so', 'thoughtful', 'dear', 'the', 'french']

['all', 'these', 'years', 'darling', 'mr', 'hastings—my']

['some', 'curiosity', 'at', 'darling', 'he', 'certainly']

['upon', 'the', 'company', 'in', 'particular', 'took']

['about', 'letters', 'to', 'and', 'her', 'husband']

['story', 'myself', 'remarked', 'lots', 'of', 'nonsense']

['of', 'a', 'mess', 's', 'had', 'a']

['a', 'row', 'with', 'and', 'she', 's']

['she', 's', 'off', 'off', 'john', 'nodded']

['and—oh', '—here', 's', 'herself', 'miss', 'howard']

['s', 'evie', 'herself', 'entered', 'her', 'lips']

['mind', 'my', 'dear', 'cried', 'mrs', 'cavendish']

['t', 'be', 'true', 'nodded', 'grimly', 'true']

['just', 'ask', 'your', 'how', 'much', 'time']

['did', 'she', 'say', 'made', 'an', 'extremely']

['expressive', 'grimace', 'darling', '—', 'dearest', 'alfred']

['alfred', '—', 'dearest', '—', 'wicked', 'calumnies']

['left', 'the', 'room', 's', 'face', 'changed']

['her', 'of', 'course', 'i', 'said', 'i']

['open', 'window', 'and', 'rose', 'and', 'moved']

['time', 'for', 'more', 'was', 'swallowed', 'up']

['in', 'england', 'than', 'he', 'took', 'the']

['the', 'one', 'that', '', 'exactly', 'said']

['felt', 'that', 'with', 'something', 'indefinable', 'had']

['a', 'letter', 'from', 'a', 'couple', 'of']


['gipsy', 'face', 'and', 's', 'warnings', 'but']

['a', 'few', 'moments', 'had', 'ushered', 'the']

['i', 'realized', 'that', 'was', 'not', 'with']

['on', 'the', 'doctor', 'alfred—', 'then', 'she']

['the', 'doctor', 'alfred', '', 'then', 'she']

['met', 'where', 'was', 'his', 'absence', 'was']

['mrs', 'inglethorp', 'and', 'and', 'of', 'the']

['we', 'passed', 'through', 's', 'room', 'and']

['one', 'was', 'to', 'and', 'one', 'was']

['had', 'gone', 'to', 'writing', 'notices', 'for']

['i', 'pass', 'over', 'who', 'acted', 'the']

['family—we', 'will', 'say', 'for', 'instance—would', 'you']

['her', 'money', 'to', 'i', 'asked', 'in']

['it', 'swept', 'past', 'cried', 'john', 'excuse']

['inquiringly', 'at', 'me', 'i', 'explained', 'ah']

['the', 'hall', 'where', 'was', 'endeavouring', 'to']

['she', 'had', 'known', 'only', 'too', 'well']

['eat', 'this', 'morning', 'asked', 'john', 'no']

['us', 'you', 'know', 'miss', 'howard', 'shook']


['you', 'know', 'evie', 'shook', 'hands', 'with']

['to', 'prison', 'who', 'of', 'course', 'my']

['course', 'my', 'dear', 'do', 'be', 'careful']

['fool', 'lawrence', 'retorted', 'of', 'course', 'alfred']

['howard', 'of', 'course', 'murdered', 'poor', 'emily—as']

['would', 'my', 'dear', 'don', 't', 'shout']

['fiddlesticks', 'the', 'snort', 'gave', 'was', 'truly']

['dash', 'it', 'all', 'i', 'can', 't']

['that', 'to', 'harbour', 'and', 'alfred', 'inglethorp']

['miss', 'howard', 'and', 'under', 'the', 'same']

['sat', 'down', 'facing', 'mademoiselle', 'he', 'said']

['you', 'to', 'hang', 'with', 'pleasure', 'she']

['hang', 'the', 'criminal', 'him', 'or', 'another']

['along', 'comes', 'mr', 'and', 'within', 'two']

['presto', 'believe', 'me', 'said', 'poirot', 'very']

['s', 'better', 'said', 'more', 'enthusiastically', 'but']

['that', 'have', 'wept', 'blinked', 'and', 'a']

['mr', 'inglethorp', 'and', 'she', 'looked', 'at']

['her', 'fortune', 'to', 'it', 'must', 'have']

['yes', 'i', 'said', 'without', 'doubt', 'poirot']

['wilful', 'murder', 'against', 'what', 'becomes', 'of']

['if', 'i', 'let', 'her', 'husband', 'be']

['walked', 'on', 'sharply', 'had', 'been', 'right']

['i', 'thought', 'of', 's', 'liberality', 'with']

['and', 'gasped', 'out', 'alfred——', 'could', 'the']

['gasped', 'out', 'alfred', '—', 'could', 'the']

['that', 'is', 'all', 'miss', 'howard', 'produced']

['all', 'miss', 'howard', 'produced', 'the', 'letter']

['17th', 'my', 'dear', 'can', 'we', 'not']

['to', 'me', 'said', 'shortly', 'it', 'shows']

['of', 'the', 'jury', 'was', 'obviously', 'quite']

['apprehension', 'thank', 'you', 'that', 'is', 'all']

['simultaneously', 'to', 'where', 'was', 'sitting', 'impassive']

['a', 'breathless', 'silence', 'was', 'called', 'did']

['mace', 's', 'statement', 'replied', 'imperturbably', 'mr']

['pardon', 'me', 'interrupted', 'you', 'have', 'been']

['last', 'convinced', 'of', 's', 'guilt', 'mr']

['not', 'enough', 'and', 'must', 'not', 'be']

['the', 'possibility', 'of', 's', 'innocence', 'why']

['important', 'fact', 'that', 'wears', 'peculiar', 'clothes']

['the', 'case', 'of', 'all', 'that', 'is']

['glasses', 'was', 'not', 'it', 'may', 'be']

['the', 'fate', 'of', 'and', 'thought', 'that']

['should', 'be', 'shielding', 'yet', 'that', 'is']

['there', 's', 'john—and', 'surely', 'they', 'were']

['an', 'unpleasant', 'shock', 's', 'evidence', 'unimportant']

['so', 'i', 'asked', 'had', 'always', 'seemed']

['it', 'concerns', 'mr', 'inglethorp', 'was', 'sitting']

['with', 'a', 'groan', 'sank', 'down', 'again']

['speak', 'for', 'you', 'sprang', 'up', 'again']

['untrue', 'one', 'interrupted', 'in', 'an', 'agitated']

['baleful', 'glance', 'at', 'now', 'sir', 'said']

['suspicion', 'in', 'clearing', 'continued', 'poirot', 'i']

['absurd—but', 'i', 'suspect', 'of', 'not', 'telling']

['all', 'she', 'knows', 'yes—you', 'll', 'laugh']

['i', 'learnt', 'that', 'had', 'been', 'on']

['or', 'degenerate', 'about', 'she', 'is', 'an']

['are', 'there', 'against', 's', 'having', 'deliberately']

['a', 'child', 'if', 'were', 'capable', 'of']

['her', 'vehemence', 'against', 'is', 'too', 'violent']

['insuperable', 'objection', 'to', 's', 'being', 'the']

['s', 'death', 'benefit', 'now', 'there', 'is']

['in', 'my', 'mind', 'occupied', 'very', 'much']

['was', 'not', 'in', 's', 'favour', 'i']

['we', 'will', 'acquit', 'then', 'it', 'is']

['his', 'belief', 'in', 's', 'innocence', 'had']

['thoughtfully', 'here', 'comes', 'said', 'poirot', 'suddenly']

['was', 'barely', 'civil', 'assented', 'to', 'poirot']

['monsieur', 'poirot', 'said', 'impatiently', 'what', 'is']

['with', 'pleasure—to', 'hang', 'ah', 'poirot', 'studied']

['studied', 'her', 'seriously', 'i', 'will', 'ask']

['tell', 'lies', 'replied', 'it', 'is', 'this']

['good', 'heavens', 'cried', 'haven', 't', 'i']

['what', 'little', 'idea', 'do', 'you', 'remember']

['my', 'instinct', 'against', 'no', 'said', 'poirot']

['no', 'no', 'cried', 'wildly', 'flinging', 'up']

['it', 'must', 'be', 'poirot', 'shook', 'his']

['about', 'it', 'continued', 'because', 'i', 'shan']

['you', 'will', 'watch', 'bowed', 'her', 'head']

['we', 'are', 'right', 'on', 'whose', 'side']

['she', 'broke', 'off', 'said', 'poirot', 'gravely']

['that', 'was', 'not', 'who', 'spoke', 'she']

['proudly', 'this', 'is', 'and', 'she', 'is']

['ignored', 'you', 'and', 'seem', 'to', 'know']

['who', 'had', 'murdered', 'with', 'a', 'croquet']

['between', 'poirot', 'and', 'was', 'this', 'what']

['monstrous', 'possibility', 'that', 'had', 'tried', 'not']

['in', 'no', 'wonder', 'had', 'suggested', 'hushing']

['paul', 'prys', 'grunted', 'lawrence', 'opined', 'that']

['there', 'is', 'john—and', '', 'cynthia', 'nodded']

['and', 'of', 'course', 'for', 'all', 'her']

['me', 'she', 'wants', 'to', 'stay', 'on']

['sudden', 'entrance', 'of', 'she', 'glanced', 'round']

['existence', 'i', 'set', 'to', 'search', 'for']

['it', 'was', 'not', 'who', 'was', 'quarrelling']

['not', 'cross', 'examined', 'was', 'called', 'and']

['it', 'would', 'be', 'who', 'would', 'attend']

['anything', 'like', 'that', 'was', 'called', 'and']

['it', 'i', 'believe', 'that', 'it', 'was']

['my', 'beloved', 'husband', 'ing', 'this', 'placed']

['of', 'her', 'pride', 'had', 'been', 'right']

['her', 'animosity', 'against', 'had', 'caused', 'her']

['the', 'signature', 'of', 'in', 'the', 'chemist']

['the', 'name', 'of', 'no', 'that', 'is']

['he', 'did', 'so', 'here', 'mademoiselle', 'cynthia']

['him', 'a', 'note', 'rose', 'immediately', 'from']

['low', 'voice', 'finally', 'consented', 'to', 'return']

['few', 'minutes', 'later', 'entered', 'the', 'room']

['throat', 'read', 'dearest', 'you', 'will', 'be']

['the', 'murderer', 'mr', '', 'poirot', 'you']

['once', 'more', 'while', 'and', 'miss', 'howard']

['alfred', 'inglethorp', 'and', 'were', 'in', 'custody']

['you', 'saw', 'mr', 'that', 'astute', 'gentleman']

['the', 'conclusion', 'that', 'wanted', 'to', 'be']

['that', 'it', 'was', 'who', 'went', 'to']

['chemist', 's', 'shop', 'but', 'certainly', 'who']

['to', 'think', 'that', 'was', 'the', 'master']

['by', 'that', 'time', 'will', 'have', 'engineered']

['six', 'o', 'clock', 'arranges', 'to', 'be']

['from', 'the', 'village', 'has', 'previously', 'made']

['six', 'o', 'clock', 'disguised', 'as', 'alfred']

['howard', 'disguised', 'as', 'enters', 'the', 'chemist']

['the', 'name', 'of', 'in', 'john', 's']

['all', 'goes', 'well', 'goes', 'back', 'to']

['back', 'to', 'middlingham', 'returns', 'to', 'styles']

['since', 'it', 'is', 'who', 'has', 'the']

['her', 'husband', 'and', 'though', 'unfortunately', 'the']

['vase', 'but', 'surely', 'had', 'ample', 'opportunities']

['him', 'yes', 'but', 'did', 'not', 'know']

['never', 'spoke', 'to', 'they', 'were', 'supposed']

['begin', 'to', 'suspect', 'when', 'i', 'discovered']

['7th—the', 'day', 'after', 's', 'departure', 'the']

['myself', 'why', 'does', 'suppress', 'the', 'letter']

['two', 'reasons', 'why', 'could', 'not', 'have']

['that', 'she', 'and', 'were', 'cousins', 'she']

### The Secret Adversary 

### The Man in the Brown Suit 

### The Secret of Chimneys 

## 5. When and how the detective/detectives and the perpetrators co-occur - chapter #, the sentence(s) # in a chapter,

### The Murder on the Links 

####Arthur Hastings (sidekick) and Marthe (perpetrator) first co-occur in chapter 18 sentence 114.
'now, hastings, what was jack renauld doing here on that eventful evening, and if he did not see mademoiselle marthe whom did he see'
####Hercule Poirot (detective) and Marthe (perpetrator) first co-occur in chapter 24 sentence 187
'marthe was at the door to meet us, and led poirot in, clinging with both hands to one of his'
####Poirot and Marthe again co-occur in chapter 27 sentence 82
“while you break it in person to mademoiselle marthe, eh?” finished poirot, with a twinkle'
####They co-occur again in chapter 27 sentence 156
“he is overdone,” murmured poirot to marthe'
####As well as chapter 27 sentence 260
'poirot looked over his shoulder once at the lighted window and the profile of marthe as she bent over her work'
#### Hastings and Marthe co-occur in chapter 28 sentence 137
“however, hastings, things did not go quite as mademoiselle marthe had planned'
####Finally, Poirot and Marthe co-occur in chapter 28 sentence 145
“when did you first begin to suspect marthe daubreuil, poirot'

###  The Mysterious Affair at Styles 

####Arthur Hastings and Evelyn Howard first occur together in chapter 1 sentence 82
####Arthur Hastings and Alfred Inglethorp first occur together in chapter 1 sentence 125
###Arthur Hastings, Alfred Inglethorp and Evelyn Howard co-occur in chapter 1 sentence 144

####Hercule Poirot and Alfred Inglethorp first co-occur in chapter 4 sentence 293
####Hercule Poirot and Evelyn Howard first co-occur in chapter 5 sentence 233
####Further co-occurrences:
#####chapter 5  sentence 377: Evelyn and Hercule
#####chapter 5  sentence 422: Evelyn and Hercule
#####chapter 5  sentence 438: Evelyn and Hercule
#####chapter 7  sentence 276: Alfred and Hercule
#####chapter 7  sentence 317: Alfred and Hercule
#####chapter 8  sentence 139: Alfred and Hercule
#####chapter 8  sentence 400: Evelyn and Hercule
#####chapter 8  setnence 404: Evelyn and Hercule
#####chapter 8  sentence 406: Evelyn and Hercule
#####chapter 8  sentence 501: Evelyn and Hercule
#####chapter 9  sentence 295: Evelyn and Hercule
#####chapter 12 sentence 246: Evelyn and Hercule
#####chapter 12 sentence 262: Alfred and Hercule
#####chapter 13 sentence  28: Alfred and Hercule

### The Secret Adversary 

### The Man in the Brown Suit 

### The Secret of Chimneys 

### Q6. When are other suspects first introduced - chapter #, the sentence(s) # in a chapter

### The Murder on the Links 

####Jack Renauld is the red herring in this book.
####He first appears in chapter 3 sentence 118.
####'finally there are madame renauld and her son, m. jack renauld'

###  The Mysterious Affair at Styles 

####John Cavendish is the red herring in this book.
####He first appears in chapter 1 sentence 5.
"having no near relations or friends, i was trying to make up my mind what to do, when i ran across john cavendish"

### The Secret Adversary 

### The Man in the Brown Suit 

### The Secret of Chimneys 



## Discussion



## Additional/Extra Analysis



## Dev notes

In [None]:
#Dev Notes: will refactor fetch() to generate dict of titles and indices rather than take in index
# I.e., fetch() is backwards. should assign index based on title while fetching url


#artifacts I've spotted in data:
# I noticed and "[Illustration]" artifact in the mysterious affair at style.

#possible start for sentence splitting regex
#sentence_regex = r'([\.\?!][\'\"\u2018\u2019\u201c\u201d\)\]]*\s*(?<!\w\.\w.)(?<![A-Z][a-z][a-z]\.)(?<![A-Z][a-z]\.)(?<![A-Z]\.)\s+)

# V: do we need to clean up contractions to be spelled out? e.g., "I'm" to "I am", "don't" to "do not".
#  