# TwtrConvo

TwtrConvo is a python package that utilizes tweepy, pandas, TextBlob, and plotly to generate an overall sentiment of a company (given it's ticker symbol).  It does this by querying for tweets using tweepy and Twitter API keys, then organizing an the tweets using pandas DataFrame, then parsing the text and getting the sentiment using TextBlob and regex, and finally graphically displaying statistics using plotly.

nbviewer link: https://nbviewer.jupyter.org/github/LAdaKid/TwtrConvo/blob/master/README.ipynb

## Tweets module (tweets.py)

The tweets module acts as a wrapper layer around tweepy with the main functions:

    get_tweets
    get_replies

### Setup

In order to use this module you will first need to setup your Twitter API keys.  If you don't have Twitter API keys, get them by following this guide:

https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens.html

Once you get your Twitter API keys you will need to add them to your environment with the variable names:

    TWITTER_CONSUMER_KEY
    TWITTER_CONSUMER_SECRET
    TWITTER_ACCESS_TOKEN
    TWITTER_ACCESS_TOKEN_SECRET

This will allow the TwtrConvo "tweets" module access to the Twitter API in order to query for tweets.

## TwtrConvo module (twtrconvo.py)

The twtrconvo module houses the main logic of the package including the methods to build or load the dataset as well as the ranking function for ranking the tweets that were queried. Let's step through each portion of the main function and show each step in creating the statistical analysis of the company's overall Twitter sentiment.

### Import package and load dataset

In [1]:
import os
import TwtrConvo

ticker = 'TSLA'

# Generally you would use the "build_dataset" method to get tweets and replies, however with
# the default of 500 tweets this generally maxes out you're hourly queries using the Twitter
# API if the ticker has a lot of interaction on Twitter and if you call build_dataset multiple
# times within an hour.  For this reason, you are also able to load previously built data and
# use it to conduct statistical analysis.

#tweet_df, reply_df = twtrconvo.build_dataset(ticker)

tweet_df, reply_df = TwtrConvo.twtrconvo.load_dataset(
    os.path.join(os.getcwd(), 'datasets', ticker)
)

print(tweet_df.head())

   index                   id         username  \
0     30  1123422707872223232   GerberKawasaki   
1     20  1123424897047367681  Paul_M_Huettner   
2     46  1123416816506683392   GerberKawasaki   
3     14  1123426124246085632   GerberKawasaki   
4     16  1123425794859196416      WallStCynic   

                                               tweet  \
0  Chamath Palihapitiya: Musk's Tesla is the 'cle...   
1  🚨🚨BREAKING🚨🚨\n\nFidelity's big funds sell near...   
2  Famed investor Ron Baron expresses support for...   
3  No big deal. Contempt of court. Lol. #tesla $t...   
4  Wait, what? Commissioner Jackson dissented on ...   

                                                text  favorites  retweets  \
0  Chamath Palihapitiya Musk's Tesla is the 'clea...         42         6   
1  BREAKING Fidelity's big funds sell nearly 1 mi...         77        34   
2  Famed investor Ron Baron expresses support for...         35         3   
3       No big deal Contempt of court Lol tesla tsla  

### Top Five Tweets and their stats

In [2]:
for i in range(5):
    tweet = tweet_df.iloc[i]
    print(
        tweet['username'],
        '({} Favorites, {} Retweets, Net Influence {}) : \n'.format(
            tweet['favorites'], tweet['retweets'], tweet['net_influence']),
        tweet['tweet'], '\n')

GerberKawasaki (42 Favorites, 6 Retweets, Net Influence 54242) : 
 Chamath Palihapitiya: Musk's Tesla is the 'clear winner' in electric cars $tsla  https://t.co/FmbX3SQ5rh 

Paul_M_Huettner (77 Favorites, 34 Retweets, Net Influence 4428) : 
 🚨🚨BREAKING🚨🚨

Fidelity's big funds sell nearly 1 million shares of $TSLA / $TSLAQ, 24% of their entire Tesla position in March! That's 6 million shares, or 66%, in the last year.

Blue Chip Growth sold 29% in March alone while OTC completely liquidated. Contra cut 40%. https://t.co/4XblJddQEO 

GerberKawasaki (35 Favorites, 3 Retweets, Net Influence 54242) : 
 Famed investor Ron Baron expresses support for Tesla, Elon Musk, and Model 3 demand - $tsla  https://t.co/4stRJ3hVYx 

GerberKawasaki (14 Favorites, 4 Retweets, Net Influence 54242) : 
 No big deal. Contempt of court. Lol. #tesla $tsla  https://t.co/K969pp984D 

WallStCynic (14 Favorites, 5 Retweets, Net Influence 26788) : 
 Wait, what? Commissioner Jackson dissented on the Musk settlement? $

### Word Frequency

The first thing we'd like to look at is word frequency within the top ranks tweets and their replies.  This could identify any patterns and could point out key words that will effect the current social sentiment that we observe.  We'll use TextBlob and our functions "get_blob" and "get_word_count" to do this then display the word count data using plotly pie charts

In [3]:
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

# Get text blobs and word count
tweet_blob = TwtrConvo.twtrconvo.get_blob(ticker, tweet_df)
tweet_word_count = TwtrConvo.twtrconvo.get_word_count(tweet_blob)
reply_blob = TwtrConvo.twtrconvo.get_blob(ticker, reply_df)
reply_word_count = TwtrConvo.twtrconvo.get_word_count(reply_blob)

# The pie chart will default to the top ten words for each word count unless n is
# specified to be different
fig = TwtrConvo.plots.create_pie_chart(tweet_word_count, reply_word_count)

iplot(fig)

### Sentiment Gauge

Using TextBlob we will get a general sentiment of all the tweets and display it on a guage using plotly.

In [4]:
fig = TwtrConvo.plots.create_sentiment_gauge(tweet_blob)
iplot(fig)

### Boxplots

Let's take a look at spread of retweets and favorites from the top ranked tweets.

In [5]:
fig = TwtrConvo.plots.create_boxplot(tweet_df)
iplot(fig)

### Distribution Plot

Next let's take a look at the overall net influence of the top tweets.  In order to do this we'll look at the distribution with the bins 0, 500, 1000 and 5000.

In [6]:
fig = TwtrConvo.plots.create_distplot(tweet_df)
iplot(fig)