# Interactive Essay

by Katarzyna Przerada

## Introduction: Shirley Jackson

**Shirley Jackson** (December 14, 1916 – August 8, 1965) was an American writer, best known for her short story [***"The Lottery"***](https://www.cusd200.org/cms/lib/IL01001538/Centricity/Domain/361/jackson_lottery.pdf), published in 1948. Miles Hayman, Jackson’s great-grandson, defines it as 

>*“a perfect mechanism that leaves little room for our invention. Some texts twist, spin, and stimulate the imagination, but “The Lottery” is not the case. Its largely hermetic structure is wholly rational, and its words are combined with clockwork precision”*. 

This story is to this day one of the most popular American literary texts; however, it constitutes only a small percent of Jackson's creative output. Throughout her life, she wrote six novels and over one hundred short stories. Recognized primarily for her gothic fiction, she also wrote excellent prose on subjects like:
- racism, 
- identity, 
- womanhood, 
- relations of adults and children,
- the significance of home

![Shirley Jackson](https://upload.wikimedia.org/wikipedia/commons/d/d1/Jackson_shirley.jpg)

## The story of focus: ***"Got a Letter from Jimmy"***

Published in 1949, *“Got a Letter from Jimmy”* explores women's capacity of dealing with **anxiety, hopelessness, and anger** caused by living under patriarchal domination. The reader is limited to the thoughts of a clearly despaired wife, whose husband received a letter from a man called Jimmy. The woman begins to obsess over the letter and tries to convince her partner to open it since he plans to send it unopened.

### 1 Prepocessing the story

Text preprocessing can be defined as cleaning and transforming text data into a usable format. In order to do that, I will focus on three porcesses:
- tokenization, 
- part of speech tagging,
- stop word removal

#### 1.1 NLTK

In [1]:
import nltk

In order to preprocess the story I will use **NLTK**, Natural Language Toolkit. It is a is a platform used for building Python programs to process human languages. 

The story can be accessed via [tentoinfinity.com](https://tentoinfinity.com/2013/10/23/got-a-letter-from-jimmy-by-shirley-jackson/).

In [59]:
import requests
page = requests.get("https://tentoinfinity.com/2013/10/23/got-a-letter-from-jimmy-by-shirley-jackson/")
page

<Response [200]>

In [5]:
page.status_code

200

#### 1.2 Beautiful soup

Beautiful Soup is a Python library for pulling data out of HTML and XML files.

In [6]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')

In [6]:
#print(soup.prettify())

In [16]:
soup.title

<title>Got a Letter from Jimmy by Shirley Jackson | Ten to Infinity</title>

In [17]:
paras = soup.find_all('p')

In [19]:
#print(paras)

**get.text** function allows to extract all the text from a page on which the story is posted.

In [26]:
full_story = soup.get_text()
#print(full_story)

The code below creates a list, allowing to work with raw sentences, without empty lines and indentation.

In [23]:
plain_story = []
for para in paras:
    processed_para = para.get_text()
    processed_para = processed_para.strip()
    if len(processed_para) > 1:
        plain_story.append(processed_para)
print(plain_story)

['As a reader I have my biases (who doesn’t), and one of them is that I am often left frustrated with short stories that emphasize their mundanity. With such a small space to work with, and not allowed to justify their existence with a dizzying high concept, most of them don’t work for me. They may be witty or eloquent, but I’m frequently left wishing I knew more of the characters before they come onstage. Often they feel like they would have been better as a scene in a novel than left to flounder on their own.', 'Unduly harsh? Perhaps. But it gives me all the more pleasure to talk about today’s story, which doesn’t need a single additional word to be thoroughly immersed in the situation.', 'It starts with a husband and wife finishing dinner in silence when, unexpectedly: “Got a letter from Jimmy today,” he said, when he was unfolding his napkin.', 'So you got it at last, she thought, so he finally broke down and wrote you, maybe now it will be all right, everything settled and friendl

#### 1.3 Tokenization

Process of breaking out long-form text into sentences and words called **tokens**. It turns an unstructured string into a numerical data structure suitable for machine learning. It's the foremost step while modelling text data.

In [28]:
from nltk.tokenize import word_tokenize

In [30]:
tokenized_story=word_tokenize(full_story)
print(tokenized_story)

['Got', 'a', 'Letter', 'from', 'Jimmy', 'by', 'Shirley', 'Jackson', '|', 'Ten', 'to', 'Infinity', 'Got', 'a', 'Letter', 'from', 'Jimmy', 'by', 'Shirley', 'Jackson', 'Posted', 'by', '10toinfinityOctober', '23', ',', '2013', 'Home', 'As', 'a', 'reader', 'I', 'have', 'my', 'biases', '(', 'who', 'doesn', '’', 't', ')', ',', 'and', 'one', 'of', 'them', 'is', 'that', 'I', 'am', 'often', 'left', 'frustrated', 'with', 'short', 'stories', 'that', 'emphasize', 'their', 'mundanity', '.', 'With', 'such', 'a', 'small', 'space', 'to', 'work', 'with', ',', 'and', 'not', 'allowed', 'to', 'justify', 'their', 'existence', 'with', 'a', 'dizzying', 'high', 'concept', ',', 'most', 'of', 'them', 'don', '’', 't', 'work', 'for', 'me', '.', 'They', 'may', 'be', 'witty', 'or', 'eloquent', ',', 'but', 'I', '’', 'm', 'frequently', 'left', 'wishing', 'I', 'knew', 'more', 'of', 'the', 'characters', 'before', 'they', 'come', 'onstage', '.', 'Often', 'they', 'feel', 'like', 'they', 'would', 'have', 'been', 'better', 

#### 1.4 Part of speech tagging

***POS tagging*** assigns a word in a text with its part of speech depending on its definition and context.

In [22]:
posTag = []
for sentence in plain_text:
    tokenized = nltk.word_tokenize(sentence)
    tagged = nltk.pos_tag(tokenized)
    for item in tagged:
        posTag.append(item)

print(posTag)

[('As', 'IN'), ('a', 'DT'), ('reader', 'NN'), ('I', 'PRP'), ('have', 'VBP'), ('my', 'PRP$'), ('biases', 'NNS'), ('(', '('), ('who', 'WP'), ('doesn', 'VBP'), ('’', 'NNP'), ('t', 'NN'), (')', ')'), (',', ','), ('and', 'CC'), ('one', 'CD'), ('of', 'IN'), ('them', 'PRP'), ('is', 'VBZ'), ('that', 'IN'), ('I', 'PRP'), ('am', 'VBP'), ('often', 'RB'), ('left', 'VBN'), ('frustrated', 'VBN'), ('with', 'IN'), ('short', 'JJ'), ('stories', 'NNS'), ('that', 'WDT'), ('emphasize', 'VBP'), ('their', 'PRP$'), ('mundanity', 'NN'), ('.', '.'), ('With', 'IN'), ('such', 'JJ'), ('a', 'DT'), ('small', 'JJ'), ('space', 'NN'), ('to', 'TO'), ('work', 'VB'), ('with', 'IN'), (',', ','), ('and', 'CC'), ('not', 'RB'), ('allowed', 'VBN'), ('to', 'TO'), ('justify', 'VB'), ('their', 'PRP$'), ('existence', 'NN'), ('with', 'IN'), ('a', 'DT'), ('dizzying', 'JJ'), ('high', 'JJ'), ('concept', 'NN'), (',', ','), ('most', 'JJS'), ('of', 'IN'), ('them', 'PRP'), ('don', 'VBP'), ('’', 'JJ'), ('t', 'NN'), ('work', 'NN'), ('for', 

#### 1.5 Stopwords

Stopwords are the most common words in any language such as:

- articles, 
- prepositions, 
- pronouns, 
- conjunctions, 

which doesnt add much context.


In [34]:
from nltk.corpus import stopwords

stop_words=set(stopwords.words("english"))
#print(stop_words)

##### Deleting stopwords

By removing these words, we remove the low-level information from our text in order to give more focus to the important information. Removal of stop words definitely reduces the dataset size and thus reduces the training time due to the fewer number of tokens involved in the training.

In [35]:
no_stopwords=[]
for token in tokenized_story:    
    if token not in stop_words:
         no_stopwords.append(token)
print("A story without stopwords:",no_stopwords)

A story without stopwords: ['Got', 'Letter', 'Jimmy', 'Shirley', 'Jackson', '|', 'Ten', 'Infinity', 'Got', 'Letter', 'Jimmy', 'Shirley', 'Jackson', 'Posted', '10toinfinityOctober', '23', ',', '2013', 'Home', 'As', 'reader', 'I', 'biases', '(', '’', ')', ',', 'one', 'I', 'often', 'left', 'frustrated', 'short', 'stories', 'emphasize', 'mundanity', '.', 'With', 'small', 'space', 'work', ',', 'allowed', 'justify', 'existence', 'dizzying', 'high', 'concept', ',', '’', 'work', '.', 'They', 'may', 'witty', 'eloquent', ',', 'I', '’', 'frequently', 'left', 'wishing', 'I', 'knew', 'characters', 'come', 'onstage', '.', 'Often', 'feel', 'like', 'would', 'better', 'scene', 'novel', 'left', 'flounder', '.', 'Unduly', 'harsh', '?', 'Perhaps', '.', 'But', 'gives', 'pleasure', 'talk', 'today', '’', 'story', ',', '’', 'need', 'single', 'additional', 'word', 'thoroughly', 'immersed', 'situation', '.', 'It', 'starts', 'husband', 'wife', 'finishing', 'dinner', 'silence', ',', 'unexpectedly', ':', '“', 'Got

##### Removing punctuation

In [43]:
import string

# punctuations
punctuations=list(string.punctuation)

no_punctuation=[]

for pun in no_stopwords:
    if pun not in punctuations:
        no_punctuation.append(pun)
        
print("A story without punctuation:",no_punctuation)

A story without punctuation: ['Got', 'Letter', 'Jimmy', 'Shirley', 'Jackson', 'Ten', 'Infinity', 'Got', 'Letter', 'Jimmy', 'Shirley', 'Jackson', 'Posted', '10toinfinityOctober', '23', '2013', 'Home', 'As', 'reader', 'I', 'biases', '’', 'one', 'I', 'often', 'left', 'frustrated', 'short', 'stories', 'emphasize', 'mundanity', 'With', 'small', 'space', 'work', 'allowed', 'justify', 'existence', 'dizzying', 'high', 'concept', '’', 'work', 'They', 'may', 'witty', 'eloquent', 'I', '’', 'frequently', 'left', 'wishing', 'I', 'knew', 'characters', 'come', 'onstage', 'Often', 'feel', 'like', 'would', 'better', 'scene', 'novel', 'left', 'flounder', 'Unduly', 'harsh', 'Perhaps', 'But', 'gives', 'pleasure', 'talk', 'today', '’', 'story', '’', 'need', 'single', 'additional', 'word', 'thoroughly', 'immersed', 'situation', 'It', 'starts', 'husband', 'wife', 'finishing', 'dinner', 'silence', 'unexpectedly', '“', 'Got', 'letter', 'Jimmy', 'today', '”', 'said', 'unfolding', 'napkin', 'So', 'got', 'last', 

### 2 Text Analysis

#### 2.1 Frequency list

In [47]:
from nltk.probability import FreqDist

frequency = FreqDist(no_punctuation)
print(frequency.most_common(30))

[('’', 19), ('The', 10), ('Shirley', 8), ('Jackson', 8), ('Jimmy', 6), ('“', 6), ('”', 6), ('Got', 5), ('I', 5), ('said', 5), ('thought', 5), ('Log', 5), ('Letter', 4), ('story', 4), ('It', 4), ('husband', 4), ('wife', 4), ('letter', 4), ('know', 4), ('collection', 4), ('...', 4), ('Email', 4), ('account', 4), ('email', 4), ('Follow', 4), ('Ten', 3), ('Infinity', 3), ('left', 3), ('short', 3), ('work', 3)]


#### 2.2 Sentiment Analysis

**Sentiment analysis**, also called **opinion mining**, allows to analize emotions that underline any given text. It helps to ***understand the opinion and feelings in a text*** and classify them **positive**, **negative** or **neutral**. Additionally there is also a compound score which is the sum of positive, negative & neutral scores.

In [54]:
from nltk.sentiment import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()
sia.polarity_scores(full_story)

{'neg': 0.042, 'neu': 0.812, 'pos': 0.146, 'compound': 0.9986}

A presented above, the polarity of the text is majorily neutral. There are more positive than negative words. Having interpreted this story, I can agree with the results. Shirly Jackson's style is refered to as **quotidian Gothic**. It shows a deep interplay of repression, fear, and disgust inside a more or less 'normal' world. 

In [56]:
from nltk.tree import *