# Some Tweepy Twitter Tweets
## Brianna Hill
First things first, I did not do a whole lot of exploring with this, but I did discover some cool stuff!
<br><br>
Generally followed [this](http://blog.impiyush.com/2015/03/data-analysis-using-twitter-api-and.html) tutorial then did some things on my own.
<br><br>
What would be super interesting would be to try and do this analysis during the premiere of a new episode on a Thursday night since there is always a lot of live-tweeting going on from both the cast and audience members!

In [1]:
# import packages necessary
import tweepy
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
# consumer keys, redacted in submission
consumerKey = 'XXXXXXXX'
consumerSecret = 'XXXXXXXX'

# create authentication using tweepy
auth = tweepy.OAuthHandler(consumer_key=consumerKey, consumer_secret=consumerSecret)

# connect to Twitter API using authenication created above
api = tweepy.API(auth)

In [3]:
# searching the GreysAnatomy tag!!
result = api.search(q='%23GreysAnatomy')

len(result)     # default length displayed

15

In [4]:
# create list of most recent 1000 tweets in the tag (I decided to go a bit big)
results = []

for tweet in tweepy.Cursor(api.search, q='%23GreysAnatomy').items(1000):
    results.append(tweet)

len(results)

1000

In [5]:
# definition to create a DataFrame of the tweets with the specified info about each
# the def is from the tutorial but I added in the language
def toDataFrame(tweets):

    DataSet = pd.DataFrame()

    DataSet['tweetID'] = [tweet.id for tweet in tweets]
    DataSet['tweetText'] = [tweet.text for tweet in tweets]
    DataSet['tweetRetweetCt'] = [tweet.retweet_count for tweet 
    in tweets]
    DataSet['tweetFavoriteCt'] = [tweet.favorite_count for tweet 
    in tweets]
    DataSet['tweetSource'] = [tweet.source for tweet in tweets]
    DataSet['tweetCreated'] = [tweet.created_at for tweet in tweets]
    DataSet['tweetLang'] = [tweet.lang for tweet in tweets]


    DataSet['userID'] = [tweet.user.id for tweet in tweets]
    DataSet['userScreen'] = [tweet.user.screen_name for tweet 
    in tweets]
    DataSet['userName'] = [tweet.user.name for tweet in tweets]
    DataSet['userCreateDt'] = [tweet.user.created_at for tweet 
    in tweets]
    DataSet['userDesc'] = [tweet.user.description for tweet in tweets]
    DataSet['userFollowerCt'] = [tweet.user.followers_count for tweet 
    in tweets]
    DataSet['userFriendsCt'] = [tweet.user.friends_count for tweet 
    in tweets]
    DataSet['userLocation'] = [tweet.user.location for tweet in tweets]
    DataSet['userTimezone'] = [tweet.user.time_zone for tweet 
    in tweets]

    return DataSet

In [6]:
DataSet = toDataFrame(results)     # create DataFrame

In [7]:
DataSet.tail()     # check DataFrame creation

Unnamed: 0,tweetID,tweetText,tweetRetweetCt,tweetFavoriteCt,tweetSource,tweetCreated,tweetLang,userID,userScreen,userName,userCreateDt,userDesc,userFollowerCt,userFriendsCt,userLocation,userTimezone
995,920825531867762688,RT @ltsGREYSquotes: The power pose. #GreysAnat...,615,0,Twitter for iPhone,2017-10-19 01:34:49,en,310340635,A_Milla_Killa,Slamiller🍍🤙🏾,2011-06-03 15:45:21,future sign language interpreter and radiologi...,469,520,,Central Time (US & Canada)
996,920825509419896833,RT @ltsGREYSquotes: The power pose. #GreysAnat...,615,0,Twitter for iPhone,2017-10-19 01:34:44,en,573691021,sophiapuente19,Soph,2012-05-07 14:32:42,,75,111,,Pacific Time (US & Canada)
997,920825506118946818,"For the first time in my life, I’m completely ...",0,3,Twitter for iPhone,2017-10-19 01:34:43,en,1666071397,laura_manoc,Laura,2013-08-12 21:14:37,Certified Child Life Specialist. Do small thin...,188,363,"Columbus, OH",Atlantic Time (Canada)
998,920825458555572225,I've just watched episode S14E04 of Grey's Ana...,0,0,"TV Time, TV show tracker",2017-10-19 01:34:32,en,68801680,boldtypex,thaís,2009-08-25 21:03:54,"another day, another drama",1199,428,,Brasilia
999,920825457158901760,RT @CamillaLNews: When someone tells you there...,10,0,Twitter for iPhone,2017-10-19 01:34:31,en,132281843,serinnade_,Rena Giraffasaurus,2010-04-12 20:19:42,Lector: Cavete - Would you like you if you met...,343,235,DSLR Shutter Home,Eastern Time (US & Canada)


In [8]:
tz = DataSet[DataSet.userTimezone.notnull()]

print("Users with specified timezones:", len(tz))

Users with specified timezones: 625


In [9]:
# count tweets per timezone (null vals are not counted automatically)
timezones = DataSet['userTimezone'].value_counts()

timezones[:10]     # display top 10 timezones & tweet counts

Pacific Time (US & Canada)    137
Brasilia                       63
Central Time (US & Canada)     59
Eastern Time (US & Canada)     46
Atlantic Time (Canada)         28
Santiago                       21
Amsterdam                      20
Paris                          19
Quito                          16
London                         16
Name: userTimezone, dtype: int64

In [10]:
# count number of tweets per user (who is tweeting a lot about this?)
user_count = DataSet['userScreen'].value_counts()

user_count[:10]     # display Twitter handles of top 10 users

CHRISTINABUXTO1    20
Mowowww             9
griggsjolex         8
Chy_Badass          8
blackburneyesss     7
Dianas891           5
sauterelles3        5
DianaAnhgeles       4
annaroocha_         4
kepswizzle          4
Name: userScreen, dtype: int64

In [11]:
# what platforms are being used to tweet?
# how many tweets are being sent from each platform?
DataSet['tweetSource'].value_counts()

Twitter for iPhone               496
Twitter for Android              338
Twitter Web Client                62
TV Time, TV show tracker          45
Twitter Lite                      17
Twitter for iPad                  16
Twitter for Windows                7
Facebook                           2
Twitter for Mac                    2
TweetDeck                          2
The Social Jukebox                 1
Grey's Anatomy Bot                 1
Instagram                          1
Hootsuite                          1
Mobile Web (M2)                    1
Sprinklr                           1
WordPress.com                      1
Echofon                            1
twittbot.net                       1
RoundTeam                          1
Twibble.io                         1
What Can We Watch? (TV Shows)      1
Tweetbot for iΟS                   1
Name: tweetSource, dtype: int64

In [12]:
# what languages were the tweets sent in?
# how many tweets per language?
DataSet['tweetLang'].value_counts()

en     559
und    210
pt     106
fr      59
es      31
it      28
ro       3
ar       1
th       1
da       1
nl       1
Name: tweetLang, dtype: int64

In [13]:
# I re-created the method from above to specifically look at retweet information
# it took some trail & error to figure out which tweet properties were going to be useful
def toRTDF(tweets):

    rtDF = pd.DataFrame()

    rtDF['tweetID'] = [tweet.id for tweet in tweets]
    rtDF['tweetText'] = [tweet.text for tweet in tweets]
    
    rtDF['tweetFavorite'] = [tweet.favorite for tweet in tweets]
    rtDF['tweetFaved'] = [tweet.favorited for tweet in tweets]
    rtDF['tweetFavoriteCt'] = [tweet.favorite_count for tweet 
    in tweets]
    
    rtDF['tweetRT'] = [tweet.retweet for tweet in tweets]
    rtDF['tweetRTed'] = [tweet.retweeted for tweet in tweets]
    rtDF['tweetRetweetCt'] = [tweet.retweet_count for tweet 
    in tweets]
    rtDF['tweetRTs'] = [tweet.retweets for tweet in tweets]
    
    rtDF['tweetSource'] = [tweet.source for tweet in tweets]
    rtDF['tweetCreated'] = [tweet.created_at for tweet in tweets]
    rtDF['tweetLang'] = [tweet.lang for tweet in tweets]


    rtDF['userID'] = [tweet.user.id for tweet in tweets]
    rtDF['userScreen'] = [tweet.user.screen_name for tweet 
    in tweets]
    rtDF['userName'] = [tweet.user.name for tweet in tweets]
    rtDF['userLocation'] = [tweet.user.location for tweet in tweets]
    rtDF['userTimezone'] = [tweet.user.time_zone for tweet 
    in tweets]

    return rtDF

In [14]:
# find the desired retweet info of the results & put into DataFrame
rtDF = toRTDF(results)

In [15]:
rtDF.tail()     # confirm DataFrame was made

Unnamed: 0,tweetID,tweetText,tweetFavorite,tweetFaved,tweetFavoriteCt,tweetRT,tweetRTed,tweetRetweetCt,tweetRTs,tweetSource,tweetCreated,tweetLang,userID,userScreen,userName,userLocation,userTimezone
995,920825531867762688,RT @ltsGREYSquotes: The power pose. #GreysAnat...,<bound method Status.favorite of Status(_api=<...,False,0,<bound method Status.retweet of Status(_api=<t...,False,615,<bound method Status.retweets of Status(_api=<...,Twitter for iPhone,2017-10-19 01:34:49,en,310340635,A_Milla_Killa,Slamiller🍍🤙🏾,,Central Time (US & Canada)
996,920825509419896833,RT @ltsGREYSquotes: The power pose. #GreysAnat...,<bound method Status.favorite of Status(_api=<...,False,0,<bound method Status.retweet of Status(_api=<t...,False,615,<bound method Status.retweets of Status(_api=<...,Twitter for iPhone,2017-10-19 01:34:44,en,573691021,sophiapuente19,Soph,,Pacific Time (US & Canada)
997,920825506118946818,"For the first time in my life, I’m completely ...",<bound method Status.favorite of Status(_api=<...,False,3,<bound method Status.retweet of Status(_api=<t...,False,0,<bound method Status.retweets of Status(_api=<...,Twitter for iPhone,2017-10-19 01:34:43,en,1666071397,laura_manoc,Laura,"Columbus, OH",Atlantic Time (Canada)
998,920825458555572225,I've just watched episode S14E04 of Grey's Ana...,<bound method Status.favorite of Status(_api=<...,False,0,<bound method Status.retweet of Status(_api=<t...,False,0,<bound method Status.retweets of Status(_api=<...,"TV Time, TV show tracker",2017-10-19 01:34:32,en,68801680,boldtypex,thaís,,Brasilia
999,920825457158901760,RT @CamillaLNews: When someone tells you there...,<bound method Status.favorite of Status(_api=<...,False,0,<bound method Status.retweet of Status(_api=<t...,False,10,<bound method Status.retweets of Status(_api=<...,Twitter for iPhone,2017-10-19 01:34:31,en,132281843,serinnade_,Rena Giraffasaurus,DSLR Shutter Home,Eastern Time (US & Canada)


In [16]:
# no one got any "Favorites", so this is a useless column
# as are all others with info about favorites
rtDF['tweetFaved'].value_counts()

False    1000
Name: tweetFaved, dtype: int64

In [17]:
# delete columns containing info about favorites as well as columns with code
# (ex: tweetFavorite, tweetRT, etc.)
rtDF = rtDF.drop(['tweetFavorite','tweetFaved', 'tweetFavoriteCt', 'tweetRT', 'tweetRTs'], axis=1)

rtDF.tail()     # confirm column deletion

Unnamed: 0,tweetID,tweetText,tweetRTed,tweetRetweetCt,tweetSource,tweetCreated,tweetLang,userID,userScreen,userName,userLocation,userTimezone
995,920825531867762688,RT @ltsGREYSquotes: The power pose. #GreysAnat...,False,615,Twitter for iPhone,2017-10-19 01:34:49,en,310340635,A_Milla_Killa,Slamiller🍍🤙🏾,,Central Time (US & Canada)
996,920825509419896833,RT @ltsGREYSquotes: The power pose. #GreysAnat...,False,615,Twitter for iPhone,2017-10-19 01:34:44,en,573691021,sophiapuente19,Soph,,Pacific Time (US & Canada)
997,920825506118946818,"For the first time in my life, I’m completely ...",False,0,Twitter for iPhone,2017-10-19 01:34:43,en,1666071397,laura_manoc,Laura,"Columbus, OH",Atlantic Time (Canada)
998,920825458555572225,I've just watched episode S14E04 of Grey's Ana...,False,0,"TV Time, TV show tracker",2017-10-19 01:34:32,en,68801680,boldtypex,thaís,,Brasilia
999,920825457158901760,RT @CamillaLNews: When someone tells you there...,False,10,Twitter for iPhone,2017-10-19 01:34:31,en,132281843,serinnade_,Rena Giraffasaurus,DSLR Shutter Home,Eastern Time (US & Canada)


In [18]:
# no one got retweeted in this set, so that is useless as well
rtDF['tweetRTed'].value_counts()

False    1000
Name: tweetRTed, dtype: int64

In [19]:
# delete column & confirm deletion
rtDF = rtDF.drop(['tweetRTed'], axis=1)

rtDF.tail()

Unnamed: 0,tweetID,tweetText,tweetRetweetCt,tweetSource,tweetCreated,tweetLang,userID,userScreen,userName,userLocation,userTimezone
995,920825531867762688,RT @ltsGREYSquotes: The power pose. #GreysAnat...,615,Twitter for iPhone,2017-10-19 01:34:49,en,310340635,A_Milla_Killa,Slamiller🍍🤙🏾,,Central Time (US & Canada)
996,920825509419896833,RT @ltsGREYSquotes: The power pose. #GreysAnat...,615,Twitter for iPhone,2017-10-19 01:34:44,en,573691021,sophiapuente19,Soph,,Pacific Time (US & Canada)
997,920825506118946818,"For the first time in my life, I’m completely ...",0,Twitter for iPhone,2017-10-19 01:34:43,en,1666071397,laura_manoc,Laura,"Columbus, OH",Atlantic Time (Canada)
998,920825458555572225,I've just watched episode S14E04 of Grey's Ana...,0,"TV Time, TV show tracker",2017-10-19 01:34:32,en,68801680,boldtypex,thaís,,Brasilia
999,920825457158901760,RT @CamillaLNews: When someone tells you there...,10,Twitter for iPhone,2017-10-19 01:34:31,en,132281843,serinnade_,Rena Giraffasaurus,DSLR Shutter Home,Eastern Time (US & Canada)


In [20]:
# get rid of the rows where the retweet count is 0 since I'm looking at RTs
rtDF = rtDF[rtDF.tweetRetweetCt != 0]

rtDF.tail()     # confirm new DataFrame

Unnamed: 0,tweetID,tweetText,tweetRetweetCt,tweetSource,tweetCreated,tweetLang,userID,userScreen,userName,userLocation,userTimezone
992,920825873837649921,RT @ltsGREYSquotes: The power pose. #GreysAnat...,615,Twitter for iPhone,2017-10-19 01:36:11,en,3296881862,stephanietianaa,steph 🏳️‍🌈,so-cal,
993,920825856418869248,RT @MemeeVida: #DiaDoMédico Parabéns p todo e...,314,Twitter for Android,2017-10-19 01:36:06,pt,1414078926,hemy_marques,🌸 Hemy 🌸,"São Sebastião do Caí, Brasil",Brasilia
995,920825531867762688,RT @ltsGREYSquotes: The power pose. #GreysAnat...,615,Twitter for iPhone,2017-10-19 01:34:49,en,310340635,A_Milla_Killa,Slamiller🍍🤙🏾,,Central Time (US & Canada)
996,920825509419896833,RT @ltsGREYSquotes: The power pose. #GreysAnat...,615,Twitter for iPhone,2017-10-19 01:34:44,en,573691021,sophiapuente19,Soph,,Pacific Time (US & Canada)
999,920825457158901760,RT @CamillaLNews: When someone tells you there...,10,Twitter for iPhone,2017-10-19 01:34:31,en,132281843,serinnade_,Rena Giraffasaurus,DSLR Shutter Home,Eastern Time (US & Canada)


In [21]:
len(rtDF)     # how many tweets are we left with?

782

In [22]:
# it looked like the same tweet was appearing multiple times, let's see
# what the top 10 tweets are in the set based on their text!
rtDF['tweetText'].value_counts()[:10]

RT @ltsGREYSquotes: #GreysAnatomy https://t.co/uHs7Ajardg                                                                                       191
RT @GreysABC: Eventually you have a new normal. #GreysAnatomy https://t.co/10Chnvapa2                                                            95
RT @GreysABrasil: A Mid season finale de #GreysAnatomy será no episódio 14x08 e irá ao ar no dia 16 de Novembro, e retornará somente em 201…     26
RT @GreysABC: The power pose. #GreysAnatomy https://t.co/0WtwWtPWXW                                                                              24
RT @GreysABrasil: A última dança Mertina... a gente nunca esquece. #GreysAnatomy https://t.co/sphkNmxwWP                                         23
RT @TheJCappers: No #GreysAnatomy today. Back next week. https://t.co/Sv3mTOwqCO                                                                 22
RT @shondarhimes: If you haven't see @gordonejames break down some secrets of #GreysAnatomy...Where u been? http

In [23]:
# I think I (originally) got a lot of tweets from a Brazil airing of an episode, let's
# look at the locations and see how many tweets are coming from Brazil
# based on the location of the user
brasil = list(set([x for x in rtDF.userLocation if ('Brasil' in x) or ('Brazil' in x)]))
brasil

['Minas Brasil, Belo Horizonte',
 'São Paulo, Brasil',
 'Manaus, Brasil',
 'Florianópolis, Brasil',
 'A Brazilian fan in Capshawland',
 'Itanhaém - São Paulo - Brasil ',
 'Maceió, Brasil',
 'Trindade, Brasil',
 'Brasil/Fiim de Mundo',
 'Campo Grande, Brasil',
 'Goiânia, Brasil',
 'Santa Gertrudes, Brasil',
 'Pernambuco, Brasil',
 'São Sebastião do Caí, Brasil',
 ' Belo Horizonte Brasil ',
 'Tubarão, Brasil',
 'Joinville, Brasil',
 'SP, Brasil',
 'Espírito Santo, Brasil',
 'Porto Alegre, Brasil',
 'Minas Gerais, Brasil',
 'Capão da Canoa, Brasil',
 'Brasil',
 'Guarulhos, Brasil',
 'Rio de Janeiro, Brasil',
 'Araxá, Brasil']

In [24]:
# how many Brazilian locations?
len(brasil)

26

In [25]:
# how many tweets came from Brazil? (that we know location of)
count = 0

for index, row in rtDF.iterrows():
    if row.userLocation in brasil:
        count += 1

count

39