# Twitter API

## Lecture Notes

##### Why Twitter ?


"While physics and math may tell us how the universe began, they are not much use in predicting Human Behavior because there are far too many Equations to Solve" 
    
    -Stephan Hawking
    

There are a lot of other great APIs out there as well that you should keep in mind.

[Wikipedia](https://www.mediawiki.org/wiki/API:Main_page)  
[Stack exchange](https://api.stackexchange.com/docs)  
[Spotify](http://spotipy.readthedocs.io/en/latest/)

Often there's already a python package with a much easier interface for using the API, like spotipy that's linked to above. 

#### Rest API vs Streaming API: 


REST:  
    - Query user accounts using OAuth
    - Allows you to access 'historical' tweets
    - Only lets you access tweets going back 7 days

STREAM: 
    - Essentially long-running request (left Open) using OAuth
    - Access realtime stream of data
       
Check out the [twitter developer documentation](https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets) for more info.

#### Rest API

In [1]:
from __future__ import print_function
import pandas as pd

In [2]:
#!pip install requests_oauthlib

In [3]:
import requests
from requests_oauthlib import OAuth1

#OAuth ~ simple way to to publish & interact with data

In [4]:
#!pip install cnfg

In [5]:
# Importing our Config - note that with this setup we can safely conceal
# our personal keys but still share generalizable code

# Replace the path below with the correct path to your config file
import cnfg
config = cnfg.load("/home/ubuntu/Documents/.twitter_config")

oauth = OAuth1(config["consumer_key"],
               config["consumer_secret"],
               config["access_token"],
               config["access_token_secret"])

In [6]:
response = requests.get("https://api.twitter.com/1.1/statuses/user_timeline.json",
                        auth=oauth)

tweets = response.json()

for key in tweets[0].keys():
    print(key)
    

created_at
id
id_str
text
truncated
entities
extended_entities
source
in_reply_to_status_id
in_reply_to_status_id_str
in_reply_to_user_id
in_reply_to_user_id_str
in_reply_to_screen_name
user
geo
coordinates
place
contributors
is_quote_status
retweet_count
favorite_count
favorited
retweeted
possibly_sensitive
lang


In [7]:
# I only have one tweet (I'm not really a twitter user )
for tweet in tweets:
    print(tweet['favorite_count'])
    print(tweet['text'])

0
https://t.co/RxGwfEhT2c
0
@TheVeganSociety No more choosing phobia.
0
That's exact my position regarding self-driving car/automation. That's why I became so passionate about education.… https://t.co/WtrrN0IQPG
0
AI may be coming for your job, but Andrew Ng thinks a new New Deal can help - via @techreview https://t.co/7La9yRwJux
0
RT @TriMyData: Here is your #MakeoverMonday recap for week 32.
Well done everyone on impactful vizzes and #MMVizReview!

https://t.co/SPa1t…
0
RT @drfeifei: Finally! Very excited! Democratize #AI and disseminate AI knowledge! https://t.co/Vn1VSeD6lM
0
I just published “Rejection Hurts But Pain Keeps Us Alive” https://t.co/3YJSvVT75t
3
@codechick1 Hey Kristine, I read you posts on Quora on @holbertonschool. I want to enroll, but feel insecure to quit my well paid job.


Now let's use the twitter API to search for 5 tweets that are about our favorite topic.

In [13]:
parameters = {"q": "vegan"}
response = requests.get("https://api.twitter.com/1.1/search/tweets.json",
                        params = parameters,
                        auth=oauth)

from pprint import pprint
pprint(response.json()['search_metadata'])

TypeError: request() got an unexpected keyword argument 'tweet_mode'

In [14]:
tweets = response.json()['statuses']

print('PAGE 1')
for tweet in tweets:
    try:
        'here'
        print(tweet['extended_entities'])
    except:
        print(tweet['id'], tweet['text'])

PAGE 1
968686192899313664 @savannimalz 6 months vegan ♥️♥️♥️♥️ best decision I’ve ever made
{'media': [{'id': 967447213705080832, 'id_str': '967447213705080832', 'indices': [36, 59], 'media_url': 'http://pbs.twimg.com/media/DW0QHXfVMAAD6Tt.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DW0QHXfVMAAD6Tt.jpg', 'url': 'https://t.co/ZnpdXl8VVy', 'display_url': 'pic.twitter.com/ZnpdXl8VVy', 'expanded_url': 'https://twitter.com/miliondollameat/status/967447219199709184/photo/1', 'type': 'photo', 'sizes': {'medium': {'w': 749, 'h': 768, 'resize': 'fit'}, 'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 663, 'h': 680, 'resize': 'fit'}, 'large': {'w': 749, 'h': 768, 'resize': 'fit'}}, 'source_status_id': 967447219199709184, 'source_status_id_str': '967447219199709184', 'source_user_id': 395104995, 'source_user_id_str': '395104995'}]}
968686135428837376 @todd_c_sq #MeToo - except I have kids 🤓and helping all beings is what makes life worth living 💕 kick ass vegan food helps 

In [10]:
search_url = "https://api.twitter.com/1.1/search/tweets.json"
next_page_url = search_url + response.json()['search_metadata']['next_results']

response = requests.get(next_page_url, auth=oauth)

print('PAGE 2')
for tweet in response.json()['statuses']:
    print(tweet['text'])

PAGE 2
RT @kalenminaj: BREAKING NEWS: Mariah Carey has officially become a gas, making her the skinniest person alive. She says she achieved this…
RT @Notherepauline: Lisa :

- Vegan / activiste / féministe / ecolo
- Vote pour Mélenchon 
- Un peu chiante 
- Son fond d'écran c'est le da…
RT @Lovelypuppies_: My first day of work! 🐶⛱️🚣‍♀️
#Puppies #dogs #PetsAreFamily #lovedogs #labradorretriever #RescueDogs #LoveYourPetDay #V…
RT @Lovelypuppies_: My first day of work! 🐶⛱️🚣‍♀️
#Puppies #dogs #PetsAreFamily #lovedogs #labradorretriever #RescueDogs #LoveYourPetDay #V…
RT @isobelwenham: u know what, if you don't have the willpower to go vegan or even veggie, fine, fairs but if ur really gunna complain abou…
@1andonlyfran Lmfao literally as I wrote it I was like #notspons
RT @MarlowFM: Don't miss #London2012 Olympic Champion @EtienneStott on @MarlowFM tomorrow (Monday) 7pm-9pm on the Watt Next Show with Dave…
Please donate to 501(c)3 @GoVeganRadio - it will  help toward creating a reggaeVEGAN

#### STREAMING API ~ TWEEPY

In [14]:
!pip install tweepy

Collecting tweepy
  Downloading tweepy-3.5.0-py2.py3-none-any.whl
Installing collected packages: tweepy
Successfully installed tweepy-3.5.0


In [26]:
import tweepy

auth = tweepy.OAuthHandler(config["consumer_key"],
                           config["consumer_secret"])
auth.set_access_token(config["access_token"],
                      config["access_token_secret"])

api=tweepy.API(auth)

In [27]:
max_tweets=1

#Tweepy Cursor handles pagination .. 

for tweet in tweepy.Cursor(api.search,q="vegan").items(max_tweets):
    print(tweet)

Status(_api=<tweepy.api.API object at 0x7f6d4679d908>, _json={'created_at': 'Sun Feb 25 01:32:11 +0000 2018', 'id': 967572902500433920, 'id_str': '967572902500433920', 'text': "Me: let's have lunch at a vegan spot.\nBro: alright I'm down.\n*Post lunch*\nBro: let's go to Knott's.\nMe: alright I'm… https://t.co/aTbQPW9iKx", 'truncated': True, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/aTbQPW9iKx', 'expanded_url': 'https://twitter.com/i/web/status/967572902500433920', 'display_url': 'twitter.com/i/web/status/9…', 'indices': [117, 140]}]}, 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'}, 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 775054538591997952, 'id_str': '775054538591997952', 'name': 'pris

In [28]:
results=[]

for tweet in tweepy.Cursor(api.search,q="vegan").items(10):
    results.append(tweet)

#### Import tweets into Pandas

In [29]:
#  import pandas as pd
def structure_results(results):
    id_list=[tweet.id for tweet in results]
    data=pd.DataFrame(id_list,columns=['id'])
    
    data["text"]= [tweet.text.encode('utf-8') for tweet in results]
    data["datetime"]=[tweet.created_at for tweet in results]
    data["Location"]=[tweet.place for tweet in results]
    
    return data

In [30]:
data=structure_results(results)
data.head()

Unnamed: 0,id,text,datetime,Location
0,967572902500433920,"b""Me: let's have lunch at a vegan spot.\nBro: ...",2018-02-25 01:32:11,
1,967572901175021569,b'I need more vegetarian/vegan friends bc I ha...,2018-02-25 01:32:10,Place(_api=<tweepy.api.API object at 0x7f6d467...
2,967572895521263616,b'RT @miliondollameat: all vegan baby https://...,2018-02-25 01:32:09,
3,967572886088187905,b'\xf0\x9f\x98\x86\xf0\x9f\x98\x86 from @caile...,2018-02-25 01:32:07,
4,967572885761155072,b'RT @EllenXiaoting: The TRUTH WILL BLOW YOU A...,2018-02-25 01:32:07,


For more on tweepy and using it to set up a true stream, check out the [tweepy doc](http://docs.tweepy.org/en/v3.4.0/streaming_how_to.html) as well as this [nice guide from dataquest](https://www.dataquest.io/blog/streaming-data-python/). 

Combining these steps with writing to a mongo database as below, you can setup a live data acquisition pipeline that continually grabs data from twitter as tweets hit and stores them locally for later static analysis/modeling. We'll show an example at the very bottom of the notebook.

#### Import Tweets into MongoDB

Install mongo locally with brew:    
```
brew update    
brew install mongodb
```
After downloading mongo, we want to create a place for mongo data files to live.  Run:    
```
sudo mkdir -p /data/db
```
Make sure that /data/db directory has the right permissions:

```
sudo chown `active_username` /data/db
(enter password) 
```

(username ~ is just mac username (you can double
check this by running 'whoami' in the terminal)


Run mongo daemon:
```
mongod
```

(In order to access direct mongo functionality, you can just
run 'mongo' in a separate terminal)

Once the mongo daemon is running, we can create a **database** "example" with a **collection** "datascience" as below. Note that the database and collection only exist once we insert tweets into them, since mongo is lazy by design.

In [32]:
!pip install pymongo

Collecting pymongo
  Downloading pymongo-3.6.0-cp36-cp36m-manylinux1_x86_64.whl (378kB)
[K    100% |████████████████████████████████| 378kB 2.8MB/s eta 0:00:01
[?25hInstalling collected packages: pymongo
Successfully installed pymongo-3.6.0


In [33]:
import json
from pymongo import MongoClient


client = MongoClient(port=27017)
db = client.example
tweets = db.datascience

In [35]:
for tweet in results:
    data={}
    data['tweet']=tweet.text.encode('utf-8') 
    data['datetime']=tweet.created_at
    tweets.insert_one(data)

In [36]:
tweets.find_one()

{'_id': ObjectId('5a921430527f160bea578b1c'),
 'datetime': datetime.datetime(2018, 2, 25, 1, 32, 11),
 'tweet': b"Me: let's have lunch at a vegan spot.\nBro: alright I'm down.\n*Post lunch*\nBro: let's go to Knott's.\nMe: alright I'm\xe2\x80\xa6 https://t.co/aTbQPW9iKx"}

In [37]:
tweets

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'example'), 'datascience')

In [38]:
tweets.count()

10

Finally, here is a quick example of pulling it all together into a live data acquisition feed/store. This will run continually and grab new tweets as they hit if it doesn't error, so let's leave it for about a minute and see what we get. You should have mongod still running for this.

If you want to use this for a project, you'll want to add some robustness to the stream listener - check out the tweepy doc for more info.

In [None]:
client = MongoClient(port=27017)
db = client.example_live
livetweets = db.datascience

livetweets.drop() # only if we want to start from scratch

class StreamListener(tweepy.StreamListener):

    def on_status(self, status):
        
        # skip retweets
        if status.retweeted: 
            return
    
        # store tweet and creation date
        data = {}
        data['tweet'] = status.text
        data['datetime'] = status.created_at

        # insert into db
        try:
            livetweets.insert_one(data)
        except:
            pass

stream_listener = StreamListener()
stream = tweepy.Stream(auth=api.auth, listener=stream_listener)
stream.filter(track=['data science'])

In [None]:
livetweets.count()

In [None]:
livetweets.find_one()

Nice!