# Analysing and Predicting Public Perception on Social Media

Understanding public brand perception can be a challenge.  With the rise of social media, good data on pulic opinion about specific topics and brands is widely available. Twitter is the perfect platform for this.  By scrapping twitter data we will try to implement sentiment analysis on particular brands and topics, we will then implement a Time Series model and train it on past sentiment trends to help it predict future sentiment trajectory.  (Nike vs Adidas - Twitter sentiment trend analysis + prediction)

> Importing our standard libraries, the autoreload module..

In [1]:
import sys
sys.path.append("twint/")

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

%load_ext autoreload
%autoreload 2

# Scrapping Twitter with TWINT

We'll begin by scrapping Twitter using the TWINT module, since Twitter's standard search API is very limited. The TWINT modules allows us to search for target tweets by keyword, within a date range, and much more almost without limitations, the enitre Twittersphere is now available to us.  We can then perform sentiment analysis on specific tweets.
  
We've installed TWINT through the command line and appended it to our system path in the cell above.  Next, we will import the module and set up its configuration and start running queries.  
  

In [3]:
# load TWINT and set up its configuration
import twint
c = twint.Config()

In [9]:
# Solve compatibility issues with notebooks and RunTime errors.
import nest_asyncio
nest_asyncio.apply()

In [10]:
c.Search = "bitcoin"
c.Limit = 1 # results are returned in blocks of 20 tweets, 1 here means 20
c.Pandas = True
twint.run.Search(c)

1153788185316478976 2019-07-23 18:05:14 EDT <cryptowealth10> Russia Prepares to Test Cryptocurrencies in Four of Its Regions -   http://ht.ly/eUeT30ozSd0  #bitcoin #valuable #reality #ethereum #cryptocurrency #blockchain #altcoins #income #cryptomining #eth #btc #cryptonews #digitalmoney
1153788178278436865 2019-07-23 18:05:12 EDT <digitalcoinnewz> HEREISTITLE  https://www.digitalcoinnews.com/a6tyrqfg/  #cryptocurrency #bitcoin #cryptonews #btcnews
1153788172347691008 2019-07-23 18:05:11 EDT <1jl4com> Leo Dias surge “estranho” em programa e choca público: ..  @1jl4com - Metropoles - Twitter - News - Noticias - Bitcoin - CryptoCurrency  http://bit.ly/2M7oJ5e 
1153788168937725959 2019-07-23 18:05:10 EDT <CryptoTraderPro> Crypto Panic: Not Fake News: TD Ameritrade CEO Confirms REAL Demand for Bitcoin  http://dlvr.it/R8yRgr  🙋Crypto Cashflow via →  http://cashdaily.pro 
1153788155125854210 2019-07-23 18:05:07 EDT <1jl4com> Boris Johnson é eleito para ser próximo primeiro-ministro ..  @1jl4

### Great!
> We have tweets being output as our result!  Now let's format this output into a dataframe we can work with

In [11]:
def available_columns():
    return twint.output.panda.Tweets_df.columns

def twint_to_pandas(columns):
    return twint.output.panda.Tweets_df[columns]

In [12]:
# see what columns are available
available_columns()

Index(['cashtags', 'conversation_id', 'created_at', 'date', 'day', 'geo',
       'hashtags', 'hour', 'id', 'link', 'name', 'near', 'nlikes', 'nreplies',
       'nretweets', 'place', 'quote_url', 'retweet', 'search', 'timezone',
       'tweet', 'user_id', 'user_id_str', 'username'],
      dtype='object')

In [13]:
# create Pandas dataframe with desired columns
df = twint_to_pandas(['conversation_id', 'created_at', 'id', 'user_id', 'username', 'tweet', 'hashtags', 'date', 'day', 'nlikes', 'nretweets'])
print(df.shape)
df.head()

(20, 11)


Unnamed: 0,conversation_id,created_at,id,user_id,username,tweet,hashtags,date,day,nlikes,nretweets
0,1153788185316478976,1563919514000,1153788185316478976,983317355899031552,cryptowealth10,Russia Prepares to Test Cryptocurrencies in Fo...,"[#bitcoin, #valuable, #reality, #ethereum, #cr...",2019-07-23 18:05:14,2,0,0
1,1153788178278436865,1563919512000,1153788178278436865,990599195684954113,digitalcoinnewz,HEREISTITLE https://www.digitalcoinnews.com/a...,"[#cryptocurrency, #bitcoin, #cryptonews, #btcn...",2019-07-23 18:05:12,2,0,0
2,1153788172347691008,1563919511000,1153788172347691008,904160286860562432,1jl4com,Leo Dias surge “estranho” em programa e choca ...,[],2019-07-23 18:05:11,2,0,0
3,1153788168937725959,1563919510000,1153788168937725959,15977038,CryptoTraderPro,Crypto Panic: Not Fake News: TD Ameritrade CEO...,[],2019-07-23 18:05:10,2,0,0
4,1153788155125854210,1563919507000,1153788155125854210,904160286860562432,1jl4com,Boris Johnson é eleito para ser próximo primei...,[],2019-07-23 18:05:07,2,0,0


### Success!
> We now have a data frame with 20 tweets all containing the keyword "bitcoin", along with some additional information about the tweets

> Now let's make our code a bit more modular so that we can run constant queries

In [14]:
# load system utilities 
%load_ext autoreload
%autoreload 2

import sys, os
from os import path
sys.path.append("twint/")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [15]:
# Solve compatibility issues with notebooks and RunTime errors.
import nest_asyncio
nest_asyncio.apply()

In [16]:
# Disable annoying printing
class HiddenPrints:
    def __enter__(self):
        self._original_stdout = sys.stdout
        sys.stdout = open(os.devnull, 'w')

    def __exit__(self, exc_type, exc_val, exc_tb):
        sys.stdout.close()
        sys.stdout = self._original_stdout

In [20]:
# function to easily get tweets
def get_tweets(search_term, limit=100):
    c = twint.Config()
    c.Search = search_term
    c.Limit = limit
    c.Pandas = True
    c.Pandas_clean = True
    
    result_columns = ['id', 'username', 'tweet', 'hashtags', 'date', 'day', 'nlikes', 'nretweets']
    with HiddenPrints():
        print(twint.run.Search(c))
    return twint.output.panda.Tweets_df[result_columns]

In [None]:
# run this if you want to overwrite the original adidas tweets that are stored in memory
# tweets = get_tweets("bitcoin", limit=10000)

In [19]:
tweets.head() # this is saved in memoery from original adidas tweets crawl

Unnamed: 0,conversation_id,created_at,id,user_id,username,tweet,hashtags,date,day,nlikes,nretweets
0,1153789629167853568,1563919858000,1153789629167853568,112351018,fltrotta,"Basta Adidas, no hay sueldo que aguante https...",[],2019-07-23 18:10:58,6,0,0
1,1153789580560044036,1563919847000,1153789580560044036,16151454,rynoel,Celebrating Simeon’s birthday in the best way ...,"[#arsenalinusa, #adidas, #yagunnersya, #dareto...",2019-07-23 18:10:47,5,0,0
2,1153748179885547522,1563919843000,1153789567452901376,111188883,betoocontreras,"H&M, Zara, ASOS, GAP, Adidas, Calvin Klein, Fo...",[],2019-07-23 18:10:43,5,0,0
3,1153789567104770048,1563919843000,1153789567104770048,1772606058,JustFreshKicks,Pre-Order via JD Sports Manchester United x ad...,[],2019-07-23 18:10:43,5,2,1
4,1153689431007621120,1563919837000,1153789541775314945,47786516,jezlai,enam buah premis yang beroperasi menjual baran...,[],2019-07-23 18:10:37,5,0,0


In [21]:
adidas_tweets = get_tweets("adidas", limit=10000)
print(adidas_tweets.shape)
adidas_tweets.head()

(10019, 8)


Unnamed: 0,id,username,tweet,hashtags,date,day,nlikes,nretweets
0,1153793884054380544,CurtisLewis7,Nike > Adidas,[],2019-07-23 18:27:53,3,0,0
1,1153793878530486272,AmericaDeCol,"Ni para el jueves están, será que es adidas y ...",[],2019-07-23 18:27:51,3,0,0
2,1153793845886275584,MrSlingsh0t,Thanks for the response. I assume you’re a 10....,[],2019-07-23 18:27:43,3,0,0
3,1153793758158188544,SneakersNation_,Por el aniversario número 30 de “Paul’s Boutiq...,[],2019-07-23 18:27:23,3,0,0
4,1153793752571371520,__GDB,Another note: adidas thought it would be cool ...,[],2019-07-23 18:27:21,3,1,0


In [23]:
adidas_tweets.tail()

Unnamed: 0,id,username,tweet,hashtags,date,day,nlikes,nretweets
10014,1153420611064127488,RIGO_NY,🤣🤣🤣🤣🤣 o Bale o el mencionado Lucas son adidas ...,[],2019-07-22 17:44:37,2,0,0
10015,1153420599005499396,FlashMQT,"Adidas shouting out child rapists now? Wow, r...",[],2019-07-22 17:44:34,2,0,0
10016,1153420572933709825,karahead_,Does Adidas just have one very overwhelmed kit...,[],2019-07-22 17:44:28,2,15,1
10017,1153420466792607744,Chollomaton,adidas Harden Short2 Pantalón Corto de Balonce...,[],2019-07-22 17:44:03,2,0,0
10018,1153420459075145733,on_steals,You can score select sizes under 11 for the “A...,[],2019-07-22 17:44:01,2,1,1


In [22]:

nike_tweets = get_tweets("nike", limit=10000)
print(nike_tweets.shape)
nike_tweets.head()

(10012, 8)


Unnamed: 0,id,username,tweet,hashtags,date,day,nlikes,nretweets
0,1153795835068481537,VirgoJ24,Ima see if nike will at least let me get store...,[],2019-07-23 18:35:38,2,0,0
1,1153795826239414273,BobCratchitt,What you mean you can but the same shirt witho...,[],2019-07-23 18:35:36,2,0,0
2,1153795816902946816,aknowsense,from the website: http://soccernx.com offer...,[],2019-07-23 18:35:33,2,0,0
3,1153795813681684482,JeanEsquives,Vendo camiseta nueva con etiquetas de Alianza ...,[],2019-07-23 18:35:33,2,0,0
4,1153795736590413834,BrandonDuenas5,Nike outlets ain’t the same anymore,[],2019-07-23 18:35:14,1,0,0


# Vader Module for Sentiment Analysis

> first let's test out Vader on a simple line of text and analyze the results

In [None]:
#!pip install vaderSentiment

In [None]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyser = SentimentIntensityAnalyzer()

In [None]:
def sentiment_analyzer_scores(sentence):
    score = analyser.polarity_scores(sentence)
    print('sentence: "{}"'.format(sentence))
    print('scores: {}'.format(str(score))) 

In [None]:
sentiment_analyzer_scores("Nike is the best.")

> Let's try another simple example

In [None]:
sentiment_analyzer_scores("Adidas sucks, but I like their sustainability initiative.")