# <font face="times"><font size="6pt"><p style = 'text-align: center;'> The City University of New York, Queens College

<font face="times"><font size="6pt"><p style = 'text-align: center;'><b>Introduction to Computational Social Science</b><br/><br/>

<p style = 'text-align: center;'><font face="times"><b>Lesson 10 | Natural Language Processing II: Web Scraping and Text Analysis </b><br/><br/>

<p style = 'text-align: center;'><font face="times"><b>5 Checkpoints</b><br/><br/>



***
***

# Begin Lesson 10
# Using Text as Data

Now that we covered the basics, let's see what we can really do. It's this Notebook, we're going to learn how to 

- Extract data from the web
- Sentiment analysis on text and identify its part of speech




***
***

## Extracting Text from HTML

Now, we'll start with a pretty basic and commonly-faced task: extracting text content from an HTML page. Python's `urllib3` package  gives us the tools we need to fetch a web page from a given URL, but we see that the output is full of HTML markup that we don't want to deal with.

First, let's install it. 

In [1]:
!pip3.6 install --user urllib3

Looking in links: /usr/share/pip-wheels


In [2]:
from urllib.request import urlopen

Let's test it out with some code from a newssite called "venturebeat.com" (Although, this ought to work many different websites.)

In [3]:
url = "http://venturebeat.com/2014/07/04/facebooks-little-social-experiment-got-you-bummed-out-get-over-it/"
html = urlopen(url).read()

Let's see what this looks like (just a sneak-peek, so we'll only look at the first 500 characters.)

In [4]:
html[:500]

b'<!DOCTYPE html>\n<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US" xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:addthis="http://www.addthis.com/help/api-spec"> <![endif]-->\n<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US" xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:addthis="http://www.addthis.com/help/api-spec"> <![endif]-->\n<!--[if IE 8]>    <html class="no-js ie8 o'

That doesn't make any sense, unless you know `html`. Thankfully for us, we have options in `Python` that will help us extract useful information. 

***
***

# Checkpoint 1 of 5
## Now you try!

### Pick the website so any news article (or any website with a lot of text). Use `urlopen` to read in the page's `HTML` and look at the first 500 characters. 

### **Note:** You'll need a stable internet connection to to this if you're not using it on the cloud. 

### What do you see?


In [5]:
url_2 = "https://www.forbes.com/sites/jackbrewster/2020/04/03/coronavirus-by-the-numbers-worldwide-cases-top-1-million-louisiana-michigan-connecticut-indiana-georgia-and-illinois-are-new-hotspots-in-us/#4967afda7fbc"
html_2 = urlopen(url_2).read()

In [6]:
html_2[:500]

b'<!DOCTYPE html><html lang="en"><head><title>Coronavirus By The Numbers: Worldwide Cases Top 1 Million, Louisiana, Michigan, Connecticut, Indiana, Georgia And Illinois Are Next Hotspots In U.S.</title><meta charset="utf-8"><meta http-equiv="Content-Language" content="en_US"><link rel="shortcut icon" href="https://i.forbesimg.com/favicon.ico"><meta name="referrer" content="no-referrer-when-downgrade"><link rel="canonical" itemprop="url" href="https://www.forbes.com/sites/jackbrewster/2020/04/03/co'

In [None]:
# It is displaying the title of the page

***
***

***
## Stripping-out HTML formatting

Fortunately, we can use a method called `BeautifulSoup()` to get the raw text out of an HTML-formatted string. BeautifulSoup, is a Python library for pulling data out of HTML and XML files. It parses HTML content into easily-navigable nested data structure.

It's still not perfect, though, since the output will contain page navigation and all kinds of other junk that we don't want, especially if our goal is to focus on the body content from a news article, for example.

In [7]:
import bs4
from bs4 import BeautifulSoup

For `BeautifulSoup` to work, just pass in the string (in this, called `html`) and specify the format of what the string represents (this case `XML`).

In [8]:
text = BeautifulSoup(html,'xml')

Now, take a look at how nicely it converted the raw text into acutal 'XML'. It still looks overwhelming and it's hard for us to interpret, but it's a step in the right direction. 

**Note:** It's going to be long!

In [9]:
text

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US" xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:addthis="http://www.addthis.com/help/api-spec"> <![endif]--><!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US" xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:addthis="http://www.addthis.com/help/api-spec"> <![endif]--><!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US" xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:addthis="http://www.addthis.com/help/api-spec"> <![endif]--><!--[if gt IE 8]><!--><html class="no-js" lang="en-US" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://opengraphprotocol.org/schema/"> <!--<![endif]-->
<head profile="http://gmpg.org/xfn/11">
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/>
<meta cont

***
***

# Checkpoint 2 of 5
## Now you try!

### With the `HTML` text you extracted from the previous checkpoint, now apply `BeautifulSoup` and convert it to `xml`. 

### How does it compare to the `HTML` you saw before?

In [10]:
text_2 = BeautifulSoup(html_2,'xml')

In [11]:
text_2

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>
<html lang="en"><head><title>Coronavirus By The Numbers: Worldwide Cases Top 1 Million, Louisiana, Michigan, Connecticut, Indiana, Georgia And Illinois Are Next Hotspots In U.S.</title><meta charset="utf-8"><meta content="en_US" http-equiv="Content-Language"><link href="https://i.forbesimg.com/favicon.ico" rel="shortcut icon"><meta content="no-referrer-when-downgrade" name="referrer"><link href="https://www.forbes.com/sites/jackbrewster/2020/04/03/coronavirus-by-the-numbers-worldwide-cases-top-1-million-louisiana-michigan-connecticut-indiana-georgia-and-illinois-are-new-hotspots-in-us/" itemprop="url" rel="canonical"><link href="https://www.forbes.com/sites/jackbrewster/2020/04/03/coronavirus-by-the-numbers-worldwide-cases-top-1-million-louisiana-michigan-connecticut-indiana-georgia-and-illinois-are-new-hotspots-in-us/amp/" rel="amphtml"><link href="https://www.forbes.com/sites/jackbrewster/feed/" rel="alternate" title="Coronavirus

In [None]:
# The HTML before checkpoint #2 seems more orgenized than the one I did above.
# With HTML_2 I extracted from previous checkpoint and converted into xml, I see there is a lot of detail
# in regards to the formatting of the content.

***
***

***

## Identifying the Main Content
If we just want the body content from the article, we'll need to use two additional packages. The first is a package called `Readability`, which pulls the main body content out of an HTML document and subsequently "cleans it up." 

Using Readability and BeautifulSoup together, we can quickly get exactly the text we're looking for out of the HTML, (*mostly*) free of page navigation, comments, ads, etc. Now we're ready to start analyzing this text content.

***NOTE***: In order for `readability` to work, we'll need to download the module using `pip`. This make take a few minutes. 

In [12]:
!pip3.6 install --user readability

Looking in links: /usr/share/pip-wheels
Collecting readability
  Downloading https://files.pythonhosted.org/packages/26/70/6f8750066255d4d2b82b813dd2550e0bd2bee99d026d14088a7b977cd0fc/readability-0.3.1.tar.gz
Building wheels for collected packages: readability
  Building wheel for readability (setup.py) ... [?25l- \ | / - \ done
[?25h  Created wheel for readability: filename=readability-0.3.1-cp36-none-any.whl size=35464 sha256=a023e1e8c1ea516624700986d9a0d4f4e8c08d47de00a1a81fbafd85b2727b3e
  Stored in directory: /home/ahegu/.cache/pip/wheels/36/3f/65/bc327f4cdd5bff9ff510834e07522f94389e28858311b33b41
Successfully built readability
Installing collected packages: readability
Successfully installed readability-0.3.1


In [13]:
!pip3.6 install --user readability-lxml

Looking in links: /usr/share/pip-wheels
Collecting readability-lxml
  Downloading https://files.pythonhosted.org/packages/af/a7/8ea52b2d3de4a95c3ed8255077618435546386e35af8969744c0fa82d0d6/readability-lxml-0.7.1.tar.gz
Building wheels for collected packages: readability-lxml
  Building wheel for readability-lxml (setup.py) ... [?25l- \ | / - \ | done
[?25h  Created wheel for readability-lxml: filename=readability_lxml-0.7.1-cp36-none-any.whl size=16480 sha256=f0249aef73371f20b1d176c5750f53b5ba5ef4e5b83d841dfc008777d42db8b1
  Stored in directory: /home/ahegu/.cache/pip/wheels/94/48/e5/d944e616d8b0734c3b9cf30a21f4afcf855a1e2b85f82f34fb
Successfully built readability-lxml
Installing collected packages: readability-lxml
Successfully installed readability-lxml-0.7.1


In [14]:
import readability   

In [15]:
from readability.readability import Document #Note, we need to call it from readability.readability, a strange quirk to how this module was originally named. 

Let's use the function `Document()` and pass in our string `html` and extract the summary and title of the article, using the `summary()` and `title()` methods respectively. 

And let's store it as `readable_article` and `readable_title`. 

In [16]:
readable_article = Document(html).summary()
readable_title = Document(html).title()

Let's take a peak at the summary, but let's only look at the first 500 characters.

In [17]:
readable_article[0:500]

'<html><body><div><div class="article-content">\n\t\t\t\t\t<p>OP-ED — You would think by the reaction some are having to it that Facebook’s recent admission that <a href="https://venturebeat.com/2014/07/02/facebooks-mood-manipulation-experiment-may-be-under-investigation-in-u-k-ireland/">it experimented with some people’s feeds</a> is tantamount to Watergate.</p>\n<p>You would think there had been some terrible violation of privacy or a breach of confidential user data. Instead, 700,000 people read a sl'

It has a lot of `html` code embedded around it, and a lot of unhelpful tags. Here, `BeautifulSoup` can clear out much of this from the string.

Since this is converted into `xml` we need to let `BeautifulSoup` know that's the format it's in with the parameter `lxml`.

In [18]:
soup = BeautifulSoup(readable_article,'lxml')

Now, let's print out the title and the summary of the article that we cleaned up with `BeautifulSoup`. Here, we can use the method `.text` that will extract readable text from `soup`. Let's restrict it to the first 500 characters. 

In [19]:
print('*** The Title is *** \n\"' + readable_title + '\"\n')
print('*** The Content is *** \n\"' + soup.text[:500])

*** The Title is *** 
"Facebook’s little social experiment got you bummed out? Get over it | VentureBeat"

*** The Content is *** 
"
OP-ED — You would think by the reaction some are having to it that Facebook’s recent admission that it experimented with some people’s feeds is tantamount to Watergate.
You would think there had been some terrible violation of privacy or a breach of confidential user data. Instead, 700,000 people read a slightly different version of their news feed than the rest of us.
Here’s what happened: Over the course of a one-week period back in 2012, Facebook altered the balance of content in the news fe


As you can see, it's now much easier on the eyes!

***
***

# Checkpoint 3 of 5 
## Now you try! 

### Apply `readability` to the **original** `HTML` text you extracted in checkpoint 1. Print out the title and content of the webpage. 

### How does this compare to what you read in checkpoint 2?

In [20]:
readable_article_2 = Document(html_2).summary()
readable_title_2 = Document(html_2).title()

In [21]:
readable_article_2[0:500]

'<html><body><div><div class="article-body fs-article fs-responsive-text current-article"><figure class="embed-base image-embed embed-1" role="presentation"> \n  \n <figcaption> \n  <fbs-accordion class="expandable" current="-1"> \n   <p class="color-body light-text">NEW YORK, NY - MARCH 31: A temporary hospital is built in Central Park on the East Meadow lawn <span class="plus" data-ga-track="caption expand">... [+]</span><span class="expanded-caption"> during the Coronavirus pandemic on March 31, 2'

In [22]:
soup_2= BeautifulSoup(readable_article_2,'lxml')

In [23]:
print('*** The Title is *** \n\"' + readable_title_2 + '\"\n')
print('*** The Content is *** \n\"' + soup_2.text[:500])

*** The Title is *** 
"Coronavirus By The Numbers: Worldwide Cases Top 1 Million, Louisiana, Michigan, Connecticut, Indiana, Georgia And Illinois Are Next Hotspots In U.S."

*** The Content is *** 
"


NEW YORK, NY - MARCH 31: A temporary hospital is built in Central Park on the East Meadow lawn ... [+] during the Coronavirus pandemic on March 31, 2020 in New York City. The facility is a partnership between Mt. Sinai Hospital and Christian humanitarian aid organization Samaritan’s Purse, equipped with 68 beds to treat COVID-19 patients. (Photo by Noam Galai/Getty Images)

Getty Images


(This story was updated at 9:15 a.m. on Friday, April 3) 
Topline: The coronavirus outbreak —  which like


In [None]:
# Checkpoint #2 includes additional information about the text formatting and display. 
# Checkpoint #3, instead, focuses solely on the content of the website.

***
***

***
***

## Part of Speech (PoS) Tagging 

We now look at an example of part of speech tagging using NLTK. Looking at the part of speech for terms is helpful for a variety of purposes. For instance, in the case of sentiment analysis--which looks at whether a term is used in a positive or negative way--understanding whether the term is used as a noun, an modifier (adjective), or a verb can help us understand the rhetorical style of a particular text. 

Here, we're not going to rehash (or try to recall) our gradeschool grammar classes. Essentially, using `nltk`, the PoS tagger will process a sentence (as a string) and provide what it believes the part of speech is for each term. 

Below, I've outlined most of the tags that it will output. Hopefully, a lot of these seem familiar (e.g., noun, adjective, verb, pronoun, etc.)

### Tagset

    N = noun
    NP = noun phrase
    Adj = adjective
    AdjP = adjective phrase
    Adv = adverb
    Prep = preposition
    PP = prepositional phrase
    Quant = quantifier
    Ord = ordinal numeral
    Card = cardinal numeral	Rel-Cl = relative clause
    Rel-Pro = relative pronoun
    V = verb
    S = sentence
    Det = determiner
    Dem-Det = demonstrative determiner
    Wh-Det = wh-determiner
    PPron = personal pronoun
    PoPron = possessive pronoun

So, let's test it out a sample sentence: 

"WASHINGTON -- In the wake of a string of abuses by New York police officers in the 1990s, Loretta E. Lynch, the top federal prosecutor in Brooklyn, spoke forcefully about the pain of a broken trust that African-Americans felt and said the responsibility for repairing generations of miscommunication and mistrust fell to law enforcement."

Let's import `nltk` and save this sentence as a string.

In [24]:
import nltk 

In [25]:
sentence = "WASHINGTON -- In the wake of a string of abuses by New York police officers in the 1990s, Loretta E. Lynch, the top federal prosecutor in Brooklyn, spoke forcefully about the pain of a broken trust that African-Americans felt and said the responsibility for repairing generations of miscommunication and mistrust fell to law enforcement."

We first need to clean up this sentence. Let's use nltk's `.word_tokenize()` method. Once that's done, we can then use `nltk` to tag the part of speech of each term in this sentence. 

In [26]:
pos_sentence = nltk.pos_tag(nltk.word_tokenize(sentence))

Let's see what this looks like. `pos_sentence` is a list of `tuples`, so let's use a for loop and print out each term and its PoS. 

In [27]:
for term, part_of_speech in pos_sentence:
    print("Term: " + term+ " | Part of Speech: " + part_of_speech)

Term: WASHINGTON | Part of Speech: NNP
Term: -- | Part of Speech: :
Term: In | Part of Speech: IN
Term: the | Part of Speech: DT
Term: wake | Part of Speech: NN
Term: of | Part of Speech: IN
Term: a | Part of Speech: DT
Term: string | Part of Speech: NN
Term: of | Part of Speech: IN
Term: abuses | Part of Speech: NNS
Term: by | Part of Speech: IN
Term: New | Part of Speech: NNP
Term: York | Part of Speech: NNP
Term: police | Part of Speech: NN
Term: officers | Part of Speech: NNS
Term: in | Part of Speech: IN
Term: the | Part of Speech: DT
Term: 1990s | Part of Speech: CD
Term: , | Part of Speech: ,
Term: Loretta | Part of Speech: NNP
Term: E. | Part of Speech: NNP
Term: Lynch | Part of Speech: NNP
Term: , | Part of Speech: ,
Term: the | Part of Speech: DT
Term: top | Part of Speech: JJ
Term: federal | Part of Speech: JJ
Term: prosecutor | Part of Speech: NN
Term: in | Part of Speech: IN
Term: Brooklyn | Part of Speech: NNP
Term: , | Part of Speech: ,
Term: spoke | Part of Speech: VBD


If you are unsure of what each of these tags means, you can always use `nltk` to return what it is and examples, as shown below:

In [28]:
nltk.help.upenn_tagset('NNP')

NNP: noun, proper, singular
    Motown Venneboerger Czestochwa Ranzer Conchita Trumplane Christos
    Oceanside Escobar Kreisler Sawyer Cougar Yvette Ervin ODI Darryl CTCA
    Shannon A.K.C. Meltex Liverpool ...


***
***

# Checkpoint 4 of 5
## Now you try!

### Read in some sentence. It can come from anywhere! Use the part-of-speech tagger from `nltk` and a for loop to identify every term's part-fo-speech. 

### You can use `nltk.help.upenn_tagset()` function to identify what the part-of-speech code means (e.g., verb, adverb, pronoun, etc.). 

### How well did it do?

In [30]:
sentence_2 = "The year 1866 was signalized by a remarkable incident, a mysterious and inexplicable phenomenon, which doubtless no one has yet forgotten."

In [31]:
pos_sentence_2 = nltk.pos_tag(nltk.word_tokenize(sentence_2))

In [32]:
for term, part_of_speech in pos_sentence_2:
    print("Term: " + term+ " | Part of Speech: " + part_of_speech)

Term: The | Part of Speech: DT
Term: year | Part of Speech: NN
Term: 1866 | Part of Speech: CD
Term: was | Part of Speech: VBD
Term: signalized | Part of Speech: VBN
Term: by | Part of Speech: IN
Term: a | Part of Speech: DT
Term: remarkable | Part of Speech: JJ
Term: incident | Part of Speech: NN
Term: , | Part of Speech: ,
Term: a | Part of Speech: DT
Term: mysterious | Part of Speech: JJ
Term: and | Part of Speech: CC
Term: inexplicable | Part of Speech: JJ
Term: phenomenon | Part of Speech: NN
Term: , | Part of Speech: ,
Term: which | Part of Speech: WDT
Term: doubtless | Part of Speech: VBZ
Term: no | Part of Speech: DT
Term: one | Part of Speech: NN
Term: has | Part of Speech: VBZ
Term: yet | Part of Speech: RB
Term: forgotten | Part of Speech: VBN
Term: . | Part of Speech: .


In [33]:
nltk.help.upenn_tagset('JJ')

JJ: adjective or numeral, ordinal
    third ill-mannered pre-war regrettable oiled calamitous first separable
    ectoplasmic battery-powered participatory fourth still-to-be-named
    multilingual multi-disciplinary ...


***
***

***
***

## Sentiment Analysis

Now that we know how to determine the PoS of a sentence, now let's turn and do some sentiment analysis using Empath (empath.stanford.edu), which is a dictionary tool that counts words in various categories (e.g., positive sentiment, negative sentiment). 

First, we need to import the library and create a lexicon. 

(**Note:** This module isn't readily available on Anaconda, so we'll import it from a file in this directory (e.g., folder) if you use Anaconda, instead for future work.)

You can actually play around with it here: empath.stanford.edu

We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like “bleed” and “punch” to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. 

Empath can generate new lexical categories and analyze text over 200 built-in human-validated categories. 

First, let's install `empath` and then import it. 

In [40]:
!pip3 install --user empath #not working

Looking in links: /usr/share/pip-wheels


In [47]:
pip install --user empath

Looking in links: /usr/share/pip-wheels
Processing /home/ahegu/.cache/pip/wheels/84/ea/2f/2bc54d4f9985ce61753ebc5b00cb2df51d855589267c667308/empath-0.89-cp36-none-any.whl
Installing collected packages: empath
Successfully installed empath-0.89
Note: you may need to restart the kernel to use updated packages.


In [48]:
import empath

In [49]:
from empath import Empath
lexicon = Empath()

Let's start analyzing a sentence. 

With setting normalize to True, the counts are normalized according to sentence length. Here, let's tokenize the sentence as we did last week using the `nltk` method called `.word_tokenize()`.

Let's test it out with this sentence:

"Bullshit, you can't even post FACTS on this sub- like Clinton lying about sniper fire."

In [50]:
sentiment_dictionary = lexicon.analyze(nltk.word_tokenize("Bullshit, you can't even post FACTS on this sub- like Clinton lying about sniper fire."), normalize=True)

Now, let's go through this sentiment_dictionary and just look at the values that are greater than zero. We can use a list to extract the values that are greater than zero, as shown here:

In [51]:
[(k,v) for k,v in sentiment_dictionary.items() if v > 0]

[('social_media', 0.05555555555555555),
 ('internet', 0.05555555555555555),
 ('military', 0.05555555555555555),
 ('deception', 0.05555555555555555),
 ('war', 0.05555555555555555),
 ('fire', 0.05555555555555555),
 ('warmth', 0.05555555555555555),
 ('weapon', 0.1111111111111111)]

Not bad. It picked up on words like "fire," "sub," "lying" to associate with "social media," "deception," and "weapon." 

Let's try it out with another sentence:

"Totally agree. Planning to beat your opponent is not a sign of corruption. That's politics. "

In [52]:
sentiment_dictionary = lexicon.analyze(nltk.word_tokenize("Totally agree. Planning to beat your opponent is not a sign of corruption. That's politics. "), normalize=True)

In [53]:
[(k,v) for k,v in sentiment_dictionary.items() if v > 0]

[('wedding', 0.05263157894736842),
 ('crime', 0.05263157894736842),
 ('dispute', 0.05263157894736842),
 ('government', 0.05263157894736842),
 ('violence', 0.05263157894736842),
 ('dominant_heirarchical', 0.05263157894736842),
 ('communication', 0.05263157894736842),
 ('trust', 0.05263157894736842),
 ('deception', 0.05263157894736842),
 ('fight', 0.05263157894736842),
 ('music', 0.05263157894736842),
 ('war', 0.05263157894736842),
 ('speaking', 0.05263157894736842),
 ('listen', 0.05263157894736842),
 ('economics', 0.05263157894736842),
 ('politics', 0.10526315789473684),
 ('negative_emotion', 0.05263157894736842),
 ('competing', 0.05263157894736842),
 ('law', 0.05263157894736842),
 ('giving', 0.05263157894736842)]

***
***

# Checkpoint 5 of 5

## Now you try!

### Let's explore the tool with some more examples. What happens in cases of sarcasm, negation, or very informal text?

### Identify three sentences: One that is sarcastic, one that negates, and an informal text with slang. 

### Repeat the same steps as above with your three sentences. 

### What categories do you pick up? Which are the top categories for each sentence?

In [57]:
#sarcasm
sentiment_dictionary_2 = lexicon.analyze(nltk.word_tokenize("Everyone has the right to be stupid, but you are abusing the privilege."), normalize=True)

In [58]:
[(k,v) for k,v in sentiment_dictionary_2.items() if v > 0]

[('royalty', 0.06666666666666667),
 ('wealthy', 0.06666666666666667),
 ('ridicule', 0.06666666666666667),
 ('violence', 0.06666666666666667),
 ('dominant_heirarchical', 0.06666666666666667),
 ('gain', 0.06666666666666667),
 ('power', 0.06666666666666667),
 ('negative_emotion', 0.06666666666666667),
 ('giving', 0.06666666666666667)]

In [59]:
#negation
sentiment_dictionary_3 = lexicon.analyze(nltk.word_tokenize("There is no point in using the word 'impossible' to describe something that has clearly happened."), normalize=True)

In [60]:
[(k,v) for k,v in sentiment_dictionary_3.items() if v > 0]

[('communication', 0.05555555555555555),
 ('speaking', 0.1111111111111111),
 ('disappointment', 0.05555555555555555),
 ('writing', 0.05555555555555555)]

In [65]:
#negation
sentiment_dictionary_4 = lexicon.analyze(nltk.word_tokenize("It was, like, five bucks, so I was like “okay”."), normalize=True)

In [66]:
[(k,v) for k,v in sentiment_dictionary_4.items() if v > 0]

[]