Application
=======
![twitter is an application](figs/application.png)

Interface
=====
![twitter has an interface](figs/interface.png)

Which can be accessed using a Programming langauge
=====================================================
![twitter api screenshot](figs/twitterapi.png)

The twitter Applications Programming Interface (API) can be found at: https://developer.twitter.com/en/docs


There are many ways to talk to twitter using Python, but for this workshop we will start by using the [tweepy](http://www.tweepy.org/) library. If you haven't installed it yet, [open a terminal](https://github.com/GCDigitalFellows/installdri.github.io/blob/master/anaconda.md) and type:
```bash
conda install -c conda-forge tweepy -y
```

What are authentication keys and access tokens?
===============
Just like people need usernames and passwords, so do programs that talk to websites. Twitter uses a protocal called [OAuth Autentication](http://tweepy.readthedocs.io/en/v3.6.0/auth_tutorial.html). Manage yours at 
# https://apps.twitter.com/

![app management page](figs/register.png)

In [3]:
#import tweepy and my private file with my access
import tweepy

#replace my authentication credentials with yours
import my_tokens 
consumer_key = my_tokens.twitter_consumer_key
consumer_secret = my_tokens.twitter_consumer_secret
access_token = my_tokens.twitter_access_token
access_token_secret = my_tokens.twitter_access_token_secret

In [2]:
consumer_key

'JGZ88stPTbUE0aS2PQ3hcDDME'

In [4]:
# connect to twitter
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
# write up about API object
api = tweepy.API(auth)

How does twitter search?
=============

* https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets

* tweepy interface: http://docs.tweepy.org/en/v3.5.0/api.html#API.search

parameters:
* q – **required**, the search query string
* lang – Restricts tweets to the given language, given by an ISO 639-1 code.
* locale – Specify the language of the query you are sending. This is intended for language-specific clients and the default should work in the majority of cases.
* rpp – The number of tweets to return per page, up to a max of 100.
* page – The page number (starting at 1) to return, up to a max of roughly 1500 results (based on rpp * page.
* since_id – Returns only statuses with an ID greater than (that is, more recent than) the specified ID.
* geocode – Returns tweets by users located within a given radius of the given latitude/longitude. The location is preferentially taking from the Geotagging API, but will fall back to their Twitter profile. The parameter value is specified by “latitide,longitude,radius”, where radius units must be specified as either “mi” (miles) or “km” (kilometers). Note that you cannot use the near operator via the API to geocode arbitrary locations; however you can use this geocode parameter to search near geocodes directly.
* show_user – When true, prepends “<user>:” to the beginning of the tweet. This is useful for readers that do not display Atom’s author field. The default is false.


In [5]:
# let's get every tweet with hashtag "#digitalgc" in for the past week
# twitter api only allows searches for the past week
# api.search
digitalgc_tweets = api.search(q="#digitalgc", 
                              count=100, lang="en", 
                              since="2018-01-01")


In [10]:
# print 1st search result to see what we have
tweet = digitalgc_tweets[0]
tweet 

Status(_api=<tweepy.api.API object at 0x112637c18>, _json={'created_at': 'Wed Mar 14 14:41:36 +0000 2018', 'id': 973932161563848705, 'id_str': '973932161563848705', 'text': "Wish I could go to the podcasting event but I'll be at a conference! Y'all should go and report back. #digitalgc… https://t.co/ljbk1sR8fm", 'truncated': True, 'entities': {'hashtags': [{'text': 'digitalgc', 'indices': [102, 112]}], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/ljbk1sR8fm', 'expanded_url': 'https://twitter.com/i/web/status/973932161563848705', 'display_url': 'twitter.com/i/web/status/9…', 'indices': [114, 137]}]}, 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'}, 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 2370577693, 'id_str': '23705776

# How do we just get the fields we're interested in?
Look at the response object, which is documented in the [tweet data dictionary](https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object). I'm interested in:
* created_at
* text
* retweet_count
* favorite_count
* user.name
* user.screen_name

In [11]:
# these are attributes of our tweet
print(tweet.created_at)
print(tweet.text)
print(tweet.retweet_count)
print(tweet.favorite_count)
print(tweet.user.name)
print(tweet.user.screen_name)

2018-03-14 14:41:36
Wish I could go to the podcasting event but I'll be at a conference! Y'all should go and report back. #digitalgc… https://t.co/ljbk1sR8fm
0
1
Christina Katopodis
nemersonian


In [13]:
# lets extract and store that information from all the tweets
# We're using Python dictionaries so as to retain the info about which field was parsed
tweet_list = []
for tweet in digitalgc_tweets:
    td = dict()
    td['created'] = tweet.created_at
    td['text'] = tweet.text
    td['retweets'] = tweet.retweet_count
    td['favorites'] = tweet.favorite_count
    td['user'] = tweet.user.name
    tweet_list.append(td)

In [14]:
# lets turn that list into a spreadsheet
import pandas as pd

tweets = pd.DataFrame(tweet_list)
tweets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53 entries, 0 to 52
Data columns (total 5 columns):
created      53 non-null datetime64[ns]
favorites    53 non-null int64
retweets     53 non-null int64
text         53 non-null object
user         53 non-null object
dtypes: datetime64[ns](1), int64(2), object(2)
memory usage: 2.1+ KB


In [15]:
#lets look at the first 5
tweets.head()

Unnamed: 0,created,favorites,retweets,text,user
0,2018-03-14 14:41:36,1,0,Wish I could go to the podcasting event but I'...,Christina Katopodis
1,2018-03-14 13:12:37,0,3,RT @nemersonian: Folks introduce themselves at...,Lisa Marie Rhody
2,2018-03-14 12:05:03,0,3,RT @jojokarlin: Don't miss @psmyth01's new blo...,Christina Katopodis
3,2018-03-14 03:30:39,0,12,RT @psmfCUNY: Join us for an interdisciplinary...,GC LAILAC
4,2018-03-14 02:41:32,0,3,RT @jojokarlin: Don't miss @psmyth01's new blo...,Stephen Zweibel


In [16]:
# lets find the most retweeted tweet
tweets.sort_values(by="retweets", ascending=False).head()

Unnamed: 0,created,favorites,retweets,text,user
18,2018-03-13 19:41:40,0,12,RT @psmfCUNY: Join us for an interdisciplinary...,Humanities Center GC
17,2018-03-13 19:45:48,0,12,RT @psmfCUNY: Join us for an interdisciplinary...,Alise Tifentale
15,2018-03-13 20:30:14,0,12,RT @psmfCUNY: Join us for an interdisciplinary...,Danica Savonick
14,2018-03-13 20:43:47,0,12,RT @psmfCUNY: Join us for an interdisciplinary...,Gerry Martini
13,2018-03-13 21:24:57,0,12,RT @psmfCUNY: Join us for an interdisciplinary...,GC Art History


In [None]:
# What if the data had more than a 100 results?
# need cursor to do paging to go further back - plus API has limits 
tweet_list = []
for tweet in tweepy.Cursor(api.search, q="#digitalgc", lang="en", count=100, since="2018-01-01").items():
    td = dict()
    td['created'] = tweet.created_at
    td['text'] = tweet.text
    td['retweets'] = tweet.retweet_count
    td['favorites'] = tweet.favorite_count
    td['user'] = tweet.user.name
    tweet_list.append(td)
    
tweets = pd.DataFrame(tweet_list)
#let's save our results
tweets.to_csv("digitalgc_tweets.csv")

# Now try with a hashtag that interests you

# What if we want results that are older than a week?
The twitter API only lets you obtain tweets from the last week, so anything older requires the use of a scraping library. This one is already built and fairly robust:
* https://github.com/jonbakerfish/TweetScraper