# TwtrConvo

TwtrConvo is a python package that utilizes tweepy, pandas, TextBlob, and plotly to generate an overall sentiment of a company (given it's ticker symbol).  It does this by querying for tweets using tweepy and Twitter API keys, then organizing an the tweets using pandas DataFrame, then parsing the text and getting the sentiment using TextBlob and regex, and finally graphically displaying statistics using plotly.

nbviewer link: https://nbviewer.jupyter.org/github/LAdaKid/TwtrConvo/blob/master/README.ipynb

## Tweets module (tweets.py)

The tweets module acts as a wrapper layer around tweepy with the main functions:

    get_tweets
    get_replies

### Setup

In order to use this module you will first need to setup your Twitter API keys.  If you don't have Twitter API keys, get them by following this guide:

https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens.html

Once you get your Twitter API keys you will need to add them to your environment with the variable names:

    TWITTER_CONSUMER_KEY
    TWITTER_CONSUMER_SECRET
    TWITTER_ACCESS_TOKEN
    TWITTER_ACCESS_TOKEN_SECRET

This will allow the TwtrConvo "tweets" module access to the Twitter API in order to query for tweets.

## TwtrConvo module (twtrconvo.py)

The twtrconvo module houses the main logic of the package including the methods to build or load the dataset as well as the ranking function for ranking the tweets that were queried. Let's step through each portion of the main function and show each step in creating the statistical analysis of the company's overall Twitter sentiment.

### Import package and load dataset

In [18]:
import os
import TwtrConvo

ticker = 'TSLA'

# Generally you would use the "build_dataset" method to get tweets and replies, however with
# the default of 500 tweets this generally maxes out you're hourly queries using the Twitter
# API if the ticker has a lot of interaction on Twitter and if you call build_dataset multiple
# times within an hour.  For this reason, you are also able to load previously built data and
# use it to conduct statistical analysis.

#tweet_df, reply_df = twtrconvo.build_dataset(ticker)

tweet_df, reply_df = TwtrConvo.twtrconvo.load_dataset(
    os.path.join(os.getcwd(), 'datasets', ticker)
)

print(tweet_df.head())

   index                   id         username  \
0     81  1123030520558960640          FedPorn   
1     60  1123034536252727296         SamAntar   
2     44  1123040250807422976   NetflixAndLamp   
3     69  1123033053008240640  whistlerian1834   
4     47  1123039259911389184         SamAntar   

                                               tweet  \
0  If I buy a $TSLA do I get back a subsidy that ...   
1  Crazy Eddie Memoirs: It wouldn’t change a thin...   
2  My favorite software companies are the ones th...   
3  $tsla 1/ Here's a summary of my intuition on #...   
4  I don’t love or hate Elon Musk. For me, he’s s...   

                                                text  favorites  retweets  \
0  If I buy a TSLA do I get back a subsidy that I...         24         4   
1  Crazy Eddie Memoirs It wouldn t change a thing...         15         3   
2  My favorite software companies are the ones th...         47         6   
3  tsla 1 Here's a summary of my intuition on crc...  

### Top Five Tweets and their stats

In [19]:
for i in range(5):
    tweet = tweet_df.iloc[i]
    print(
        tweet['username'],
        '({} Favorites, {} Retweets, Net Influence {}) : \n'.format(
            tweet['favorites'], tweet['retweets'], tweet['net_influence']),
        tweet['tweet'], '\n')

FedPorn (24 Favorites, 4 Retweets, Net Influence 14730) : 
 If I buy a $TSLA do I get back a subsidy that I subsidized? https://t.co/J21hrFjRba 

SamAntar (15 Favorites, 3 Retweets, Net Influence 10486) : 
 Crazy Eddie Memoirs: It wouldn’t change a thing if Wall St. analysts had the opportunity to read 10-Qs before an earnings call because 99.9% of them are stupid. $TSLA $TSLAQ https://t.co/IyyZsC9Lbn 

NetflixAndLamp (47 Favorites, 6 Retweets, Net Influence 1643) : 
 My favorite software companies are the ones that don't actually make software but have single digit gross margins and negative double digit net margins with massive capital intensity. Those are definitely my favorite software companies. They're the best. Bigly. No doubt. $TSLA https://t.co/QgftVO0eGI 

whistlerian1834 (45 Favorites, 9 Retweets, Net Influence 1501) : 
 $tsla 1/ Here's a summary of my intuition on #crcl: I don't believe there's a singular disclosure that is prohibiting a raise. I don't believe the SEC is ho

### Word Frequency

The first thing we'd like to look at is word frequency within the top ranks tweets and their replies.  This could identify any patterns and could point out key words that will effect the current social sentiment that we observe.  We'll use TextBlob and our functions "get_blob" and "get_word_count" to do this then display the word count data using plotly pie charts

In [20]:
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

# Get text blobs and word count
tweet_blob = TwtrConvo.twtrconvo.get_blob(ticker, tweet_df)
tweet_word_count = TwtrConvo.twtrconvo.get_word_count(tweet_blob)
reply_blob = TwtrConvo.twtrconvo.get_blob(ticker, reply_df)
reply_word_count = TwtrConvo.twtrconvo.get_word_count(reply_blob)

# The pie chart will default to the top ten words for each word count unless n is
# specified to be different
fig = TwtrConvo.plots.create_pie_chart(tweet_word_count, reply_word_count)

iplot(fig)

### Sentiment Gauge

Using TextBlob we will get a general sentiment of all the tweets and display it on a guage using plotly.

In [21]:
fig = TwtrConvo.plots.create_sentiment_gauge(tweet_blob)
iplot(fig)

### Boxplots

Let's take a look at spread of retweets and favorites from the top ranked tweets.

In [25]:
fig = TwtrConvo.plots.create_boxplot(tweet_df)
iplot(fig)