<a href="https://colab.research.google.com/github/GeneralKoot/MCS-Python-Projects-/blob/main/Sentiment_Analysis_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is a sentiment analysis project based on news article headlines. All of the headlines for top news are extracted from NBC News through BeautifulSoup, and then NLTK's vader sentiment analysis tool is used to find the negative, positive, and compounded score of sentiment for each headline. Finally, the pandas dataframe is used to display news story headlines in tables.


In [1]:
from bs4 import BeautifulSoup as bs
import requests
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import pandas as pd


nltk.download('vader_lexicon')

r = requests.get("https://www.nbcnews.com")
soup = bs(r.content)

contents = soup.prettify()



[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


In [2]:
header_list = []


top_info = soup.find_all('h2', {'class','tease-card__headline tease-card__title tease-card__title--news relative'})

for row in top_info:
  header_list.append(row.get_text())



bottom_info = soup.find_all('li', {'class','related-content__item'})
for row in bottom_info:
  header_list.append(row.get_text())



temp_list = []

other_info = soup.find_all('li', {'class','styles_item__sANtw'})
for row in other_info:
  temp_list.append(row.get_text().split('/'))


for row in temp_list:
  header_list.append(row[0])


more_info = soup.find_all('div', {'class','styles_info__XOx8D'})

for row in more_info:
  header_list.append(row.get_text())



header_list = list(filter(lambda i: len(i) >= 20, header_list))

header_list = list(set(header_list))


In [3]:
df = pd.DataFrame(header_list, columns=["Title"])

vader = SentimentIntensityAnalyzer()

x1 = lambda title: vader.polarity_scores(title)['neg']
x2 = lambda title: vader.polarity_scores(title)['pos']
x3 = lambda title: vader.polarity_scores(title)['compound']

df['Negative'] = df['Title'].apply(x1)
df['Positive'] = df['Title'].apply(x2)
df['Compound Score'] = df['Title'].apply(x3)


df.sort_values(by='Compound Score', ascending=False)
df = df.drop_duplicates(keep='first')
df

Unnamed: 0,Title,Negative,Positive,Compound Score
0,VIDEO: NFL hall of famer Brett Favre questione...,0.264,0.0,-0.5106
1,SpecialsResidents were pitched on building a s...,0.0,0.0,0.0
2,Father charged after 4-year-old allegedly brou...,0.344,0.0,-0.4939
3,Argentine vice president survives assassinatio...,0.474,0.0,-0.743
4,Residents were pitched on building a spaceport...,0.0,0.0,0.0
5,"In a blistering speech, Biden says 'democracy ...",0.0,0.0,0.0
6,"After 45 years in space, the Voyager probes ar...",0.0,0.0,0.0
7,"New Covid boosters, which target BA.5, haven't...",0.0,0.116,0.2732
8,"How, where and when to get updated Covid boost...",0.0,0.0,0.0
9,Michael M. Santiago,0.0,0.0,0.0


In [4]:
positive_counter = 0
negative_counter = 0

for ind in df.index:
  if df['Compound Score'][ind] < 0:
    negative_counter += 1
  if df['Compound Score'][ind] > 0:
    positive_counter += 1

if negative_counter > positive_counter:
  print("Today, there were more negative news than positive with {} negative news stories and {} positive news stories.\nThe remaining stories were calculated to be neutral".format(negative_counter, positive_counter))

if negative_counter < positive_counter:
  print("Today, there were more positive news than negative with {} positive news stories and {} negative news stories.\nThe remaining stories were calculated to be neutral".format(positive_counter, negative_counter))



Today, there were more negative news than positive with 24 negative news stories and 10 positive news stories.
The remaining stories were calculated to be neutral


In [5]:
negative_df = df.loc[df['Compound Score'] < 0]
print("Here are all the negative news story headlines: ")
negative_df

Here are all the negative news story headlines: 


Unnamed: 0,Title,Negative,Positive,Compound Score
0,VIDEO: NFL hall of famer Brett Favre questione...,0.264,0.0,-0.5106
2,Father charged after 4-year-old allegedly brou...,0.344,0.0,-0.4939
3,Argentine vice president survives assassinatio...,0.474,0.0,-0.743
11,Mom sues Alabama youth facility where son died...,0.389,0.082,-0.8126
13,Russia-Ukraine ConflictUkraine and Russia trad...,0.409,0.0,-0.8402
15,"Monkeypox cases are falling, but experts warn ...",0.209,0.0,-0.2263
20,'a dismal situation'U.S. life expectancy dropp...,0.116,0.0,-0.3612
21,WorldArgentine vice president survives assassi...,0.474,0.0,-0.743
22,Health newsMore evidence links highly processe...,0.454,0.109,-0.836
28,Michael Swensen for NBC Newskentucky floodingR...,0.274,0.0,-0.7506


In [6]:
positive_df = df.loc[df['Compound Score'] > 0]
print("Here are all the positive news story headlines: ")
positive_df

Here are all the positive news story headlines: 


Unnamed: 0,Title,Negative,Positive,Compound Score
7,"New Covid boosters, which target BA.5, haven't...",0.0,0.116,0.2732
14,How monkeypox spoiled gay men's plans for an i...,0.0,0.262,0.4939
25,TODAY FoodThe great Gatorade debate: What colo...,0.0,0.291,0.6249
31,Courtesy of LuisLatinoHe served the U.S. but w...,0.0,0.099,0.1901
37,CDC recommends Pfizer's and Moderna's new Covi...,0.0,0.213,0.2263
38,"Driver accused of dragging woman, 78, out of c...",0.105,0.129,0.128
43,New Orleans Saints safety Marcus Maye accused ...,0.122,0.156,0.1531
45,Getty ImagesCulture MattersInfluencer Oli Lond...,0.0,0.278,0.6124
46,Culture MattersNetflix agrees to give Iñárritu...,0.0,0.114,0.2023
51,TODAYWhat are Serena Williams' chances to win ...,0.0,0.412,0.6808
