### News Mood Sentiment Analysis

#### By: Naser Erwemi

https://www.linkedin.com/in/aerwemi/

https://github.com/aerwemi

## Background

__Twitter__ has become a wildly sprawling jungle of information&mdash;140 characters at a time. Somewhere between 350 million and 500 million tweets are estimated to be sent out _per day_. With such an explosion of data, on Twitter and elsewhere, it becomes more important than ever to tame it in some way, to concisely capture the essence of the data.

## News Mood

A python script to perform sentiment analysis of the Twitter activity of various news outlets, and to present your findings visually.

Your final output should provide a visualized summary of the sentiments expressed in Tweets sent out by the following news organizations: __BBC, CBS, CNN, Fox, New York Times and 
The Washington Post__.



The first plot will be and/or feature the following:

* Be a scatter plot of sentiments of the last __100__ tweets sent out by each news organization, ranging from -1.0 to 1.0, where a score of 0 expresses a neutral sentiment, -1 the most negative sentiment possible, and +1 the most positive sentiment possible.
* Each plot point will reflect the _compound_ sentiment of a tweet.
* Sort each plot point by its relative timestamp.

The second plot will be a bar plot visualizing the _overall_ sentiments of the last 100 tweets from each organization. For this plot, you will again aggregate the compound sentiments analyzed by VADER.

The tools of the trade you will need for your task as a data analyst include the following: tweepy, pandas, matplotlib, seaborn, textblob, and VADER.

Included analysis:

* Pull last 100 tweets from each outlet.
* Sentiment analysis with the compound, positive, neutral, and negative scoring for each tweet. 
* A DataFrame the tweet's source account, its text, its date, and its compound, positive, neutral, and negative sentiment scores.
* DataFrame into a CSV file.
* PNG images for each plot.

# Studay Summary and Conclusions


BBC and CBS has much fewer tweets and followers than other outlets.

BBC and CBS tweets are more positive than other outlets. Tweets normally distributed for most of the news outlets with most of the tweets are neutral scores.

BBC and CBS have similar polarity distribution. 

CNN, Fox, New York Times and The Washington Post have similar polarity distribution. 

CNN, Fox, NY time and WP are more negative than BBC can CBS with distribution slightly skewed to the negative side. 

Time series analysis may help understand the frequency of tweets Where BBC and CBS tweets less than other outlets. 


# MongoDB

In [56]:
import pymongo ###
import time 

In [None]:
pymongo.Mon

In [57]:
conn = 'mongodb://localhost:27017'
client = pymongo.MongoClient(conn)

In [58]:
client

MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

In [59]:
# Declare the database
db = client.NewsMood2
collection = db.NewsMood2

In [63]:
import tweepy
from config import consumer_key, consumer_secret, access_token, access_token_secret

# Import and Initialize Sentiment Analyzer
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

In [66]:
analyzer.polarity_scores('that is so good')

{'compound': 0.5777, 'neg': 0.0, 'neu': 0.445, 'pos': 0.555}

In [64]:
# Setup Tweepy API Authentication
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, parser=tweepy.parsers.JSONParser())

In [68]:
#list of the news outlet - on Tweeter 

#news_outlets=['CNN', 'FoxNews' , 'nytimes', 'washingtonpost'] # added WP as it is simlar to nytimes
#news_outlets=['BBC', 'CBS', 'CNN', 'FoxNews' , 'nytimes', 'washingtonpost']
news_outlets=['BBC']
counter=0

#Create Dir to hold extracted data 
#sentement_scores={}

# if you want to delet privous tweets 
collection.delete_many({})

for news_outlet in news_outlets:


    # Create a loop to iteratively run API requests
    time.sleep(1)
    for tweet_page in range(1,2):

        # Get all tweets from home feed (for each page specified)
        time.sleep(1)
        public_tweets = api.user_timeline(news_outlet, page=tweet_page)

        # Loop through all tweets
        time.sleep(1)
        
        for tweet in public_tweets:
            

            tweetText = tweet["text"]
            tweetDate = tweet['created_at']
                        
            comScore=analyzer.polarity_scores(tweetText)['compound']
            negScore=analyzer.polarity_scores(tweetText)['neg']         
            posScore=analyzer.polarity_scores(tweetText)['pos']
            neuScore=analyzer.polarity_scores(tweetText)['neu']
            
            
            thisTweet ={
                'NewsSource' :news_outlet,
                "Date"       :tweetDate,
                "Text"       :tweetText,
                "comScore"   :comScore,
                "negScore"   :negScore,
                "posScore"   :posScore,
                "neuScore"   :neuScore
            }
            
            collection.insert_one(thisTweet)
            
            
            # print most neg news
            if negScore > .55:
                print(tweetText)
                print(news_outlet)
                print(f'Negative News Score (0 to 1) : {negScore}')

                print(":-(:-(:-(:-(:-(:-(:-(:-(:-(:-(:-(:-(:-(:-(:-(:-(:-(:-(:-(:-(:-(:-(:-(:-(")
            # print most pos news
            if posScore > .55:
                print(tweetText)
                print(news_outlet)
                print(f'Positive News Score (0 to 1): {posScore}')

                print(":-):-):-):-):-):-):-):-):-):-):-):-):-):-):-):-):-):-):-):-):-):-):-):-)")
                
            time.sleep(1)
            counter+=1
print(f'Tweets exteracted {counter}')
print('**************************************************************************************************')
thisTweet

Tweets exteracted 20
**************************************************************************************************


{'Date': 'Thu Jul 12 14:47:18 +0000 2018',
 'NewsSource': 'BBC',
 'Text': 'RT @BBCOne: Love and trust, shattered by secrets and lies. Keeping Faith from @BBCWales starring @TeamEveMyles starts tonight at 9pm on BBC…',
 '_id': ObjectId('5b4932d03a8902249c09e077'),
 'comScore': 0.6597,
 'negScore': 0.178,
 'neuScore': 0.512,
 'posScore': 0.31}