## Sentiment Analysis Demo

This notebook serves as a visual compilation for the sentiment analysis on key buzzwords which would be the sectors where the startups operate in. 

#### The outputs of this analysis are:

- a snapshot of the general interest of specific sectors given twitter and news content, and
- a time series assessment of the interest of those sectors in the past 30 days, given news content only


In [18]:
import pandas as pd
import sentiment_pipeline as sp
import configuration

### Configure Inputs
For ease of setup, there is a configuration script which takes "ConfigurationData" as input. 

"ConfigurationData" is a csv where credentials for the news and twitter APIs are inputted, ouputting the relevant keys and connections

We selected the buzzwords below as they are realistic "niche" sectors where some new disruptive startups could be operating in. 

In [19]:
tw, news = configuration.config()

In [20]:
buzzwords = ['Gamification', 'Social shopping', 'The "quantified self"']

In [21]:
buzzwords

['Gamification', 'Social shopping', 'The "quantified self"']

### News and Twitter Sentiment Analysis

#### Name of source code file: "sentiment_pipeline.py"
The sentiment analysis enables the assessor to get a view on the sentiment of each sector/buzzword on Twitter and in the news. The metric used to assess sentiment is a polarity measure which ranges from -1 to 1 where a tweet/news content with a score of -1 would mean that the content is strongly negative, 0 neutral and 1 positive. 

The model is ran on the latest 100 tweets (in engligh, not older than 1 week) given a specific sector/buzzword and 100 news content from the past month. Check the news api documentation if you wish to amend the source code from sentiment_pipeline.py file (function name: def news_test_set_content(string_buzzword, secret_api)) and change some of the parameters: https://newsapi.org/docs/endpoints/everything

To use the sentiment analysis model, the following steps should be taken:
1. Obtain News API key (https://newsapi.org/)
2. Get Twitter API tokens and keys. There two keys and two tokens (https://developer.twitter.com/en/docs)
3. Install the following packages if not already installed: 
    - twitter-python package (https://pypi.org/project/python-twitter/)
    - NLTK packages (https://pypi.org/project/nltk/) and run: nltk.download('popular')
    - time, re, pandas, numpy, requests, TextBlob, ploly, matplolib, datetime, dateutil (https://pypi.org/project/python-dateutil/1.4/)

#### Warning: only limited number of requests are available (500/day)
#### !!!! When limit reached, code will no longer work and you will have to wait 12 hours before it can be used again !!!!

#### 5.1 Obtain Polarity Scores

In [13]:
polarity = []
for buzzword in buzzwords:
    polarity.append(sp.polarity_twitter_news(buzzword, tw['consumer_key'], tw['consumer_secret'],
                                      tw['access_token_key'], tw['access_token_secret'], news['secret_api']))

#### 5.2 Polarity Visualisation

In [14]:
sp.visualisation_polarity_each_buzzword(polarity, buzzwords)

#### 5.3 Polarity Level Check
Following code flags if news polarity level is too low and assessor is provided a list of news articles with negative content

In [15]:
sp.news_flag_low_polarity(buzzwords, polarity, news['secret_api'])

 News Polarity level for Gamification sector is fine
 News Polarity level for Social shopping sector is fine
 News Polarity level for The "quantified self" sector is fine


#### 5.4  Time Series Visualisation - To be used with News API Pro (free version has restrictions - only limited number of requests available / day and can't fetch news older than 1 month)

- Limited usage: if you run the code below too often, the limited number of requests will be reached
- News API Pro solves the problem

In order to have a look at the polarity evolution in the past days, we first ask the user to select a number of days:

In [16]:

try:
    duration_days = int(input('Choose number of days for polarity evolution (Max 30 days): '))

except ValueError:
    print("you must enter an integer")

if duration_days > 30: 
        print('Enter a number lower than 30, otherwise Time Series will not work')


Choose number of days for polarity evolution (Max 30 days): 30


In [17]:
sp.polarity_time_series_visualisation(buzzwords, duration_days, news['secret_api'])