## BlogMe: Sentiment and Keyword Analysis

### Tableau public link: https://public.tableau.com/app/profile/shravani.mahadeshwar/viz/BlogMeSentimentAnalysis_16845658940930/Dashboard1?publish=yes

### Importing necessary libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

### Importing data

In [2]:
data=pd.read_excel("articles.xlsx")

In [3]:
data.head()

Unnamed: 0,article_id,source_id,source_name,author,title,description,url,url_to_image,published_at,content,top_article,engagement_reaction_count,engagement_comment_count,engagement_share_count,engagement_comment_plugin_count
0,0,reuters,Reuters,Reuters Editorial,NTSB says Autopilot engaged in 2018 California...,The National Transportation Safety Board said ...,https://www.reuters.com/article/us-tesla-crash...,https://s4.reutersmedia.net/resources/r/?m=02&...,2019-09-03T16:22:20Z,WASHINGTON (Reuters) - The National Transporta...,0.0,0.0,0.0,2528.0,0.0
1,1,the-irish-times,The Irish Times,Eoin Burke-Kennedy,Unemployment falls to post-crash low of 5.2%,Latest monthly figures reflect continued growt...,https://www.irishtimes.com/business/economy/un...,https://www.irishtimes.com/image-creator/?id=1...,2019-09-03T10:32:28Z,The States jobless rate fell to 5.2 per cent l...,0.0,6.0,10.0,2.0,0.0
2,2,the-irish-times,The Irish Times,Deirdre McQuillan,"Louise Kennedy AW2019: Long coats, sparkling t...",Autumn-winter collection features designer’s g...,https://www.irishtimes.com/\t\t\t\t\t\t\t/life...,https://www.irishtimes.com/image-creator/?id=1...,2019-09-03T14:40:00Z,Louise Kennedy is showing off her autumn-winte...,1.0,,,,
3,3,al-jazeera-english,Al Jazeera English,Al Jazeera,North Korean footballer Han joins Italian gian...,Han is the first North Korean player in the Se...,https://www.aljazeera.com/news/2019/09/north-k...,https://www.aljazeera.com/mritems/Images/2019/...,2019-09-03T17:25:39Z,"Han Kwang Song, the first North Korean footbal...",0.0,0.0,0.0,7.0,0.0
4,4,bbc-news,BBC News,BBC News,UK government lawyer says proroguing parliamen...,"The UK government's lawyer, David Johnston arg...",https://www.bbc.co.uk/news/av/uk-scotland-4956...,https://ichef.bbci.co.uk/news/1024/branded_new...,2019-09-03T14:39:21Z,,0.0,0.0,0.0,0.0,0.0


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10437 entries, 0 to 10436
Data columns (total 15 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   article_id                       10437 non-null  int64  
 1   source_id                        10437 non-null  object 
 2   source_name                      10437 non-null  object 
 3   author                           9417 non-null   object 
 4   title                            10435 non-null  object 
 5   description                      10413 non-null  object 
 6   url                              10436 non-null  object 
 7   url_to_image                     9781 non-null   object 
 8   published_at                     10436 non-null  object 
 9   content                          9145 non-null   object 
 10  top_article                      10435 non-null  float64
 11  engagement_reaction_count        10319 non-null  float64
 12  engagement_comment

In [5]:
data.describe()

Unnamed: 0,article_id,top_article,engagement_reaction_count,engagement_comment_count,engagement_share_count,engagement_comment_plugin_count
count,10437.0,10435.0,10319.0,10319.0,10319.0,10319.0
mean,5218.0,0.122089,381.39529,124.032949,196.236263,0.011629
std,3013.046714,0.327404,4433.344792,965.351188,1020.680229,0.268276
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,2609.0,0.0,0.0,0.0,1.0,0.0
50%,5218.0,0.0,1.0,0.0,8.0,0.0
75%,7827.0,0.0,43.0,12.0,47.5,0.0
max,10436.0,1.0,354132.0,48490.0,39422.0,15.0


In [6]:
data=data.drop(['engagement_comment_plugin_count'],axis=1)

In [7]:
data.columns

Index(['article_id', 'source_id', 'source_name', 'author', 'title',
       'description', 'url', 'url_to_image', 'published_at', 'content',
       'top_article', 'engagement_reaction_count', 'engagement_comment_count',
       'engagement_share_count'],
      dtype='object')

In [8]:
#Number of articles per source
data.groupby(['source_id'])['source_name'].count()

source_id
1                             1
abc-news                   1139
al-jazeera-english          499
bbc-news                   1242
business-insider           1048
cbs-news                    952
cnn                        1132
espn                         82
newsweek                    539
reuters                    1252
the-irish-times            1232
the-new-york-times          986
the-wall-street-journal     333
Name: source_name, dtype: int64

In [9]:
#Number of reactions per publisher
data.groupby(['source_id'])['engagement_reaction_count'].sum()

source_id
1                                0.0
abc-news                    343779.0
al-jazeera-english          140410.0
bbc-news                    545396.0
business-insider            216545.0
cbs-news                    459741.0
cnn                        1218206.0
espn                             0.0
newsweek                     93167.0
reuters                      16963.0
the-irish-times              26838.0
the-new-york-times          790449.0
the-wall-street-journal      84124.0
Name: engagement_reaction_count, dtype: float64

In [10]:
def wordflag(keyword):
    length=len(data)
    keyword_flag=[]
    for x in range(0,length):
        heading=data['title'][x]
        try:
            if keyword in heading:
                flag=1
            else:
                flag=0
        except:
            flag=0
        keyword_flag.append(flag)
    return keyword_flag
k1=wordflag("crash")
data['Keyword_Flag']=pd.Series(k1)

### VADER Sentiment Analysis

In [11]:
#!pip install vaderSentiment

In [12]:
#SentimentIntensityAnalyzer
neg_title=[]
pos_title=[]
neu_title=[]
length=len(data)
for x in range(0,length):
    try:
        text=data['title'][x]
        sent_intensity=SentimentIntensityAnalyzer()
        sent=sent_intensity.polarity_scores(text)
        neg=sent['neg']
        pos=sent['pos']
        neu=sent['neu']
    except:
        neg=0
        pos=0
        neu=0
    neg_title.append(neg)
    pos_title.append(pos)
    neu_title.append(neu)

In [13]:
neg_title=pd.Series(neg_title)
pos_title=pd.Series(pos_title)
neu_title=pd.Series(neu_title)
data['title_neg_sentiment']=neg_title
data['title_pos_sentiment']=pos_title
data['title_new_sentiment']=neu_title

In [14]:
data.tail(10)

Unnamed: 0,article_id,source_id,source_name,author,title,description,url,url_to_image,published_at,content,top_article,engagement_reaction_count,engagement_comment_count,engagement_share_count,Keyword_Flag,title_neg_sentiment,title_pos_sentiment,title_new_sentiment
10427,10427,cnn,CNN,"Dakin Andone, CNN","30,000 Chicago school staff will strike on Oct...","More than 30,000 Chicago Public Schools teache...",https://www.cnn.com/2019/10/03/us/chicago-teac...,https://cdn.cnn.com/cnnnext/dam/assets/1910030...,2019-10-03T13:21:27Z,"(CNN)More than 30,000 Chicago Public Schools t...",0.0,17.0,4.0,33.0,0,0.091,0.0,0.909
10428,10428,abc-news,ABC News,Melanie Curtsinger,Disney Cruise Line Begins Sailings from New Yo...,"Just last week, the Disney Magic arrived to Ne...",https://disneyparks.disney.go.com/blog/2019/10...,https://cdn1.parksmedia.wdprapps.disney.com/me...,2019-10-03T17:00:59Z,"Just last week, the Disney Magic arrived to Ne...",0.0,405.0,71.0,62.0,0,0.0,0.0,1.0
10429,10429,cbs-news,CBS News,CBS News,Picturing male breast cancer,Photographer David Jay portrays breast cancer ...,https://www.cbsnews.com/pictures/picturing-mal...,https://cbsnews3.cbsistatic.com/hub/i/r/2014/0...,2019-10-03T14:53:44Z,For his latest installation of the SCAR Projec...,0.0,0.0,0.0,0.0,0,0.595,0.0,0.405
10430,10430,cbs-news,CBS News,CBS News,Walking Free,"Three murders, three trials, 13 years of repor...",https://www.cbsnews.com/video/walking-free-2/,https://cbsnews2.cbsistatic.com/hub/i/r/2019/1...,2019-10-03T16:41:36Z,,1.0,0.0,0.0,0.0,0,0.0,0.767,0.233
10431,10431,business-insider,Business Insider,Akin Oyedele,GOLDMAN SACHS: Buy these 11 stocks poised to s...,Goldman Sachs has updated its quarterly list o...,https://www.businessinsider.com/stock-picks-fr...,https://image.businessinsider.com/5d9617d6d598...,2019-10-03T17:39:25Z,There are numerous bargains waiting to be pick...,0.0,2.0,0.0,1.0,0,0.0,0.111,0.889
10432,10432,abc-news,ABC News,The Associated Press,Drop in US service sector activity raises econ...,"Get breaking national and world news, broadcas...",https://abcnews.go.com/Business/wireStory/drop...,https://s.abcnews.com/images/US/WireAP_eb147c9...,2019-10-03T16:30:16Z,Growth in the U.S. economys vast services sect...,0.0,0.0,0.0,0.0,0,0.208,0.0,0.792
10433,10433,reuters,Reuters,Sumeet Chatterjee,Banker defections pose challenge for Credit Su...,The announcement by Julius Baer this week that...,https://www.reuters.com/article/us-credit-suis...,https://s3.reutersmedia.net/resources/r/?m=02&...,2019-10-03T15:59:52Z,ZURICH/HONG KONG (Reuters) - The announcement ...,0.0,0.0,0.0,627.0,0,0.219,0.342,0.439
10434,10434,cnn,CNN,"Lauren M. Johnson, CNN","A 5-year-old cancer survivor donates 3,000 toy...",Weston Newswanger is just a normal 5-year-old ...,https://www.cnn.com/2019/10/03/us/five-year-ol...,https://cdn.cnn.com/cnnnext/dam/assets/1910021...,2019-10-03T11:20:06Z,,0.0,4072.0,179.0,466.0,0,0.221,0.126,0.653
10435,10435,cbs-news,CBS News,CBS News,Fateful Connection,A detective is haunted by the case of two wome...,https://www.cbsnews.com/video/fateful-connecti...,https://cbsnews1.cbsistatic.com/hub/i/r/2019/1...,2019-10-03T16:40:03Z,,0.0,0.0,0.0,0.0,0,0.0,0.0,1.0
10436,10436,cbs-news,CBS News,CBS News,"Love, Hate & Obsession",Who wanted one-time millionaire Lanny Horwitz ...,https://www.cbsnews.com/video/love-hate-obsess...,https://cbsnews1.cbsistatic.com/hub/i/r/2017/0...,2019-10-03T16:35:13Z,,0.0,0.0,0.0,0.0,0,0.54,0.372,0.088


In [15]:
#Exporting the data
data.to_excel('Blog_Me_data.xlsx',sheet_name='data',index=False)