In [None]:
%matplotlib inline
from matplotlib import pyplot as plt
from nltk import sentiment
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk import tokenize
from IPython.display import display, Markdown
from typing import *

# https://pypi.org/project/afinn/
!pip install afinn
from afinn import Afinn

In [None]:
def show_markdown_table(headers: List[str], data: List) -> str:
    s = f"| {' | '.join(headers)} |\n| {' | '.join([(max(1, len(header) - 1)) * '-' + ':' for header in headers])} |\n"
    for row in data:
        s += f"| {' | '.join([str(item) for item in row])} |\n"
    display(Markdown(s))

In [None]:
def all_sentences_poems(text: str) -> List[str]:
    dis= []
    for sent in text.splitlines():
        dis.append(sent)
    return dis

poem_lines_list = all_sentences_poems(open('/kaggle/input/csci-270-poems-2022/The road not taken.txt').read())

# Modify the function so that it:
# * Retains punctuation
# * Returns a list of strings, where each sentence is a single string, instead of a list of lists.

# Open your poem file
# Create a list of your poem sentences. Store it in a variable named poem_lines_list.

## Poem Sentiment

`AFFIN` rates sentiment on a scale of -5 to 5, while `VADER` rates sentiment on a scale from -1 to 1. `VADER` additionally breaks down subscores of positive and negative sentiment. Run the code below to view the sentiment scores for each line of your poem. Then answer the questions that follow.

In [None]:
afinn = Afinn()
sid = SentimentIntensityAnalyzer()

headers = ['Line', 'AFINN', 'VADER']
rows = [[line, 
         afinn.score(line), 
         sid.polarity_scores(line)] 
        for line in poem_lines_list]
show_markdown_table(headers, rows)

Examine each line of your poem. 

1. How accurately did `AFFIN` rate each line of your poem? 

Affin rated each line fairly appropriately.

2. How about `VADER`? 

Vader also did a satisfactory job.

3. Was one of them preferable overall? Explain.

Both performed about as well, personally I would choose VADER because it tells more about how it calculates each score.

 

## Word analysis

Select six lines of your poem for extra scrutiny. List those lines here:

To analyze these lines, complete the `poem_line_analyzer()` function below. It should use `AFINN` and `VADER` (via `sid`) to print the analysis of each word in the given line. Then write code to call `poem_line_analyzer()` on each line. Then answer the questions that follow.

In [None]:
def poem_line_analyzer(line):
    afinn = Afinn()
    sid = SentimentIntensityAnalyzer()
    headers = ['Word', 'AFINN', 'VADER']
    rows = [[word, 
         afinn.score(word), 
         sid.polarity_scores(word)] 
        for word in line.split(' ')]
    show_markdown_table(headers, rows)
        


In [None]:
## Line analysis code here
dis = ['And sorry I could not travel both', 
'Then took the other, as just as fair',
'And having perhaps the better claim',
'Had worn them really about the same',
'In leaves no step had trodden black',
'I shall be telling this with a sigh']
for element in dis:
    poem_line_analyzer(element)

1. For each line, which words from that line made the greatest contribution to the line's sentiment score?

    1. sorry
    2. fair
    3. better
    4. worn
    5. no
    6. sigh

2. Does the analysis of the line sentiment make sense in the context of the poem? Explain, including detailed examples.

    It makes sense, the poem is fairly neutral it does not talk about any intense subjects. 
    ""

## Book Sentiment



In [None]:
def all_sentences_from(text: str) -> List[str]:    
    last_list = tokenize.sent_tokenize(text)
    return last_list 

# Copy your book_sentences() function from Lab 2 here
# Modify the function so that it:
# * Retains punctuation
# * Returns a list of strings, where each sentence is a single string, instead of a list of lists.


# Open your book file
# Create a list of your book sentences. Store it in a variable named book_lines_list.
book_lines_list = all_sentences_from(open('/kaggle/input/csci-270-books-2022/huck-finn_fixed.txt').read())

In [None]:
# Write code to create a list of sentiment values from AFINN for each sentence in your book.
# Call the list `sent_afinn`.
# Also create a similar list `sent_vader` using VADER.

# Write code to find the top 5 most positive and top 5 most negative sentences using each of AFINN and VADER. 
# Then answer the questions below.

# afinn = Afinn()
# sid = SentimentIntensityAnalyzer()
# sent_afinn = []
# sent_vader = []
# for line in book_lines_list:
#     sent_afinn.append(afinn.score(line))
#     sent_vader.append(sid.polarity_scores(line))
#for x in sent_afinn:
    #print top & bottom 5 values    
#for x in sent_vader:
    #print top & bottom 5 values
    
sent_afinn = []
for line in book_lines_list:
    if afinn.score(line) > -50:
        sent_afinn.append(afinn.score(line))

sent_vader = []
values_dict = {}
for line in book_lines_list:
    values = sid.polarity_scores(line)
    values_dict[line] = values['compound']
    if values['compound'] > -0.9999999:
        sent_vader.append(values['compound']) 

In [None]:
five_pos_afinn = []
for line in book_lines_list:
    if afinn.score(line) > 12: 
        five_pos_afinn.append(line)
for el in five_pos_afinn:
    print(el)
    print('\n')
#prints top 6, but close enough

pos_vader = []
values_dict = {}
for line in book_lines_list:
    values = sid.polarity_scores(line)
    values_dict[line] = values['compound']
    if values['compound'] > 0.955:
        pos_vader.append(line)
for el in pos_vader:
    print(el)
    print('\n')
#actually prints top 5

In [None]:
five_neg_afinn = []
for line in book_lines_list:
    if afinn.score(line) < -14:
        five_neg_afinn.append(line)
for el in five_neg_afinn:
    print(el)
    print('\n')
#actually prints bot 5

neg_vader = []
values_dict = {}
for line in book_lines_list:
    values = sid.polarity_scores(line)
    values_dict[line] = values['compound']
    if values['compound'] < -0.94:
        neg_vader.append(line)
for el in neg_vader:
    print(el)
    print('\n')
#actually prints bot 5

1. How similar were the lists produced by AFINN and VADER? 

     Both lists had 2/5 of the same elements contained.

2. For each of the sentences identified:
   * How well did AFINN/VADER classify the sentiment of this sentence?
   * What role does this sentence play in the plot of your book?
   (I will not be doing repeat elements)
   
   pos_afinn = [
   1: It classified the sentiment of this sentence well. (I dont remember where this was in the book), 
   
   
   2: It did not classify this one ass well, I dont see how this is a real positive sentnece (I believe this is when they picked up two other hobos on their raft and were pretending to be royal persons.),
   
   
   3: It classified this sentence fairly well (This was when Huck and his buddies were  pretending to be pirates),
   
   
   4: Fairly well it describes some pretty positive imagery (I do not remember the context of this sentence),
   
   
   5: Pretty well, describes a crucial bondin moment between jim and huck. 
   ]
   
   
   
   |
   
   
   neg_afinn = [
   1: fairly well (I dont remember the context of this sentence.), 
   
   
   2: Very well(I believe this sentence is talking about miss watson wanting to sell jim because of some incedent. This is the reason he runs away, it is better to runaway then to be sold down the river into worse conditions),
   
   
   3: pretty well, describes his internal monologue (Huck is rethinking his decision of running away),
   
   
   4: Very well, this is a depressing sentence. (I dont remember the context of this sentence. I assume it is when huck loses jim on the river),
   
   
   5: Not well this one is a throwaway element.
   ]
   
   
   
   |
   
   

   pos_vader = [
   3: Very well describes one of hucks imaginative moments. (I believe this is when huck and his pals are still playing pirates),
   
   
   4: Fairly well (Just describes huck fantasizing about some girl),
   
   
   5: Not well throwaway element 
   ]
   


   |

   
   
   neg_vader = [
   1: Very well(I think this is when a major fight between the feuding families starts.),
   
   
   4: kinda well(I dont remember the context of this sentence) ,
   
   
   5: Very well, describes a tenst, stressful situation (I dont remember the context, but I think this is also having to do with the feuding families)
   ]


   
3. Overall, how well did AFINN and VADER do in identifying sentence sentiment?

    Both did fairly well except for the insertion of copyright/meaningless lines from proj gutenburg.

## Sentiment Over Time

Run the code below to create a sentiment graph over time. 

Adjust the `book_window` variable to be a good match for the length of your text. If it is too small, you will see a lot of noise. If it is too large, you will only have a few data points.

In [None]:
def moving_average(data, window):
    return [sum(data[i:i+window]) / window for i in range(len(data) - window)]

book_window = 2**10
n = moving_average(sent_afinn, book_window)
plt.plot(range(len(n)), n)
n = moving_average([s*5 for s in sent_vader], 2**10)
plt.plot(range(len(n)), n)

1. How similar are the sentiment plots for the two analyzers?

    The graphs are very similiar. They seem to be running parallel to each other. The bottom just seems to be operating on a wider scale.

2. How did you determine a satisfactory window size?

    The default window size seemed to be satisfactory.
    If i change the value too much in one direction either graph becomes larger/smaller than the other.

3. To what degree does the sentiment graph track with critical passages and plot points of your book?

    Remarkably well. From what i remember the book starts off on a positive note with Huck-finn having a large sum of money to his name, and just goofing around with his friends playing pirates. The first downward turn (around 1000) probably represents Hucks father coming back and being the generally abusive drunk that he is. After that huck runs away with Jim. They spend their time on a raft on the mississippi river. They have various adventures and encounters on the river, and this takes up the majority of the book. Some of their adventures are fun, others are life threatening for them. But all in all they both grow as people on the raft and come to appreciate one another. Towards the end of the book (around 3000) they end up caught between a feud of two families. There is a signigicant amount of drama, death, and destruction in this part of the book, which is represented by the sharp decline in both graphs. At the end of the book Huck-finn reunites with Tom sawyer, and they learn that Hucks dad is dead, and that jim is now a free man. The end of the book leaves the characters with few remaining problems, so it is appropriate that the graph ends in a sharp upwards turn.