# Twitter Demo: Using Twitter API

---
<img src="data/How-to-Scrape-Data-from-Twitter.jpg" style="width: 713px; height: 475px;" />
 *Blog post by [scraping expert](https://scrapingexpert.com/how-to-scrape-data-from-twitter/)* 
 
 
 ### Professor Crystal Chang

This notebook will demostrate how to use the Twitter API to collect real-time tweets from any user.

*Estimated Time: 30 minutes*

---


In [16]:
# Run this cell to set up your notebook
import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import zipfile

# Ensure that Pandas shows at least 280 characters in columns, so we can see full tweets
pd.set_option('max_colwidth', 280)

import re

# Downloading Recent Tweets

---

Since we would be looking at Twitter data, first we need to download the data from Twitter!

Twitter provides an API for downloading tweet data in large batches.  The `tweepy` package makes it fairly easy to use.

In [17]:
# Make sure you type in the command " pip install tweepy " in your terminal 
#Loading more libraries to work with api
import tweepy
from pathlib import Path
import json

There are instructions on using `tweepy` [here](http://tweepy.readthedocs.io/en/v3.5.0/getting_started.html), but don't worry too much if it looks intimidating! 

Twitter's API requires you to have authentication keys. In order to receive this keys, you need to apply to be a Twitter developer. 

Follow the instructions below to get your Twitter API keys.  **Read the instructions completely if you want to use Twitter API.**

1. [Create a Twitter account](https://twitter.com).  If you have an existing account, feel free to use it. Feel free to create a throw-away account if you do not want to use your personal one.
2. Under account settings, add your phone number to the account.
3. [Create a Twitter developer account](https://dev.twitter.com/resources/signup).  Attach it to your Twitter account.
4. Once you're logged into your developer account, [create an application](https://apps.twitter.com/app/new).  You can call it whatever you want, and you can write any URL when it asks for a web site.  You don't need to provide a callback URL.
5. On the page for that application, find your Consumer Key and Consumer Secret.
6. On the same page, create an Access Token.  Record the resulting Access Token and Access Token Secret.
7. Edit the file [keys.json](keys.json) and replace the placeholders with your keys.  


**If you decide to follow along with this demo, once you reach the "Use case details" page, you can copy and paste this blurb**

1. I am using Twitter's API to learn more about text analysis -- more specifically sentiment analysis.
2. I only plan to use tweets to understand more about sentiment analysis and intend to explore different polarities of various tweets
3. No, I do not plan on Tweeting content. 
4. Source code will only be distributed to a class.

## DISCLAIMER !!!!


### Protect your Twitter Keys
<span style="color:red">
If someone has your authentication keys, they can access your Twitter account and post as you! Be careful of where you keep your key information.  
</span>
Typically, you would store sensitive information like this in a separate file and read it programmatically (through code). That way, you can share the rest of your code without sharing your keys. For this demo, all key information will be placed in `keys.json`.


### Be careful about which functions you call!

<span style="color:red">
This API can retweet tweets, follow and unfollow people, and modify your twitter settings.  Be careful which functions you invoke! </span> For example, you can accidentally retweet tweets with `retweet` when you really meant to see `retweet_count`. 
</span>

### Avoid making too many API calls.

<span style="color:red">
Twitter limits developers to a certain rate of requests for data.  If you make too many requests in a short period of time, you'll have to wait awhile (around 15 minutes) before you can make more.  </span> 
So carefully follow the code examples you see and don't rerun cells without thinking. Instead, always save the data you've collected to a file.


**I currently have my keys set up on a "throw away" account. Do not spam the next 2 code cells!!**

In [21]:
import json
key_file = 'keys.json'
# Loading your keys from keys.json (which you should have filled
with open(key_file) as f:
    keys = json.load(f)

This cell will test the Twitter authentication. It should properly run and display your Twitter username for the keys you used.

In [22]:
import tweepy
from tweepy import TweepError
import logging

try:
    auth = tweepy.OAuthHandler(keys["consumer_key"], keys["consumer_secret"])
    auth.set_access_token(keys["access_token"], keys["access_token_secret"])
    api = tweepy.API(auth)
    print("Your username is:", api.auth.get_username())
except TweepError as e:
    logging.warning("There was a Tweepy error. Double check your API keys and try again.")
    logging.warning(e)

Your username is: PACS190_Demo


Don't worry too much if this looks intimidating! Feel free to look back on this notebook in the future to read what each function does.

In [23]:
def load_keys(path):
    """Loads your Twitter authentication keys from a file on disk.
    
    Args:
        path (str): The path the key file. 
    
    Returns:
        dict: A dictionary mapping the key names to key values."""
    
    with open(path) as f:
        keys = json.load(f)
    return keys

In [24]:
def download_recent_tweets_by_user(user_account_name, keys):
    """Downloads tweets by indicated Twitter user.

    Args:
        user_account_name (str): The name of the Twitter account
          whose tweets will be downloaded.
        keys (dict): A Python dictionary with Twitter authentication
          keys (strings), similar to this:
            {
                "consumer_key": "<your Consumer Key here>",
                "consumer_secret":  "<your Consumer Secret here>",
                "access_token": "<your Access Token here>",
                "access_token_secret": "<your Access Token Secret here>"
            }

    Returns:
        list: A list of Dictonary objects, each representing one tweet."""
    import tweepy
    keys = load_keys(keys)
    
    try:
        auth = tweepy.OAuthHandler(keys["consumer_key"], keys["consumer_secret"])
        auth.set_access_token(keys["access_token"], keys["access_token_secret"])
        api = tweepy.API(auth)
        print("Your username is:", api.auth.get_username())
    
    except TweepError as e:
        logging.warning("There was a Tweepy error. Double check your API keys and try again.")
        logging.warning(e)
        
        
    trump_tweets_save_path = user_account_name + '_tweets.json'
    
    if not Path(trump_tweets_save_path).is_file():
        donaldtrump_tweets = [t._json for t in tweepy.Cursor(api.user_timeline, id=user_account_name, 
                                                                 tweet_mode='extended').items()]
        
    if Path(trump_tweets_save_path).is_file():
        return load_keys(trump_tweets_save_path)
    
    return donaldtrump_tweets

In [25]:
def save_tweets(tweets, path):
    """Saves a list of tweets to a file in the local filesystem.
    
    This function makes no guarantee about the format of the saved
    tweets, **except** that calling load_tweets(path) after
    save_tweets(tweets, path) will produce the same list of tweets
    and that only the file at the given path is used to store the
    tweets.

    Args:
        tweets (list): A list of tweet objects (type Dictionary) to
          be saved.
        path (str): The place where the tweets will be saved.

    Returns:
        None"""
    with open(path, "w+") as f:        
        json.dump(tweets, f)

In [26]:
def load_tweets(path):
    """Loads tweets that have previously been saved.
    
    Calling load_tweets(path) after save_tweets(tweets, path)
    will produce the same list of tweets.
    
    Args:
        path (str): The place where the tweets were be saved.

    Returns:
        list: A list of Dictionary objects, each representing one tweet."""

    with open(path, "r") as f:
        donaldtrump_tweets = json.load(f)
    return donaldtrump_tweets


In [27]:
def get_tweets_with_cache(user_account_name, keys_path):
    """Get recent tweets from one user, loading from a disk cache if available.
    
    The first time you call this function, it will download tweets by
    a user.  Subsequent calls will not re-download the tweets; instead
    they'll load the tweets from a save file in your local filesystem.
    All this is done using the functions you defined in the previous cell.
    
    Args:
        user_account_name (str): The Twitter handle of a user, without the @.
        keys_path (str): The path to a JSON keys file in your filesystem.
    """
    
    path_name = user_account_name + '_tweets.json'
    
    save_tweets(download_recent_tweets_by_user(user_account_name, keys_path), path_name)
    return load_tweets(path_name)

In [28]:
trump_tweets = get_tweets_with_cache("realdonaldtrump", key_file)
print("Number of tweets downloaded:", len(trump_tweets))

Your username is: PACS190_Demo
Number of tweets downloaded: 3240


In [29]:
created = pd.DataFrame(trump_tweets)#['created_at']
created['date'] = created['created_at'].apply(lambda d: pd.datetime.strptime(str(d),"%a %b %d %X %z %Y"))
created['month'] = created['date'].apply(lambda d: d.strftime('%Y-%m-%d %X'))

The cell above uses an object we haven't worked with before, called a **DataFrame**. DataFrames organize data in a table and come in handy for:
- comparing many documents at a time
- doing analysis involving **metadata** (data about the data, like when it was produced or where it came from).

Use `list` to see what attributes (columns) this dataframe has!

In [30]:
list(created)

['contributors',
 'coordinates',
 'created_at',
 'display_text_range',
 'entities',
 'extended_entities',
 'favorite_count',
 'favorited',
 'full_text',
 'geo',
 'id',
 'id_str',
 'in_reply_to_screen_name',
 'in_reply_to_status_id',
 'in_reply_to_status_id_str',
 'in_reply_to_user_id',
 'in_reply_to_user_id_str',
 'is_quote_status',
 'lang',
 'place',
 'possibly_sensitive',
 'quoted_status',
 'quoted_status_id',
 'quoted_status_id_str',
 'quoted_status_permalink',
 'retweet_count',
 'retweeted',
 'retweeted_status',
 'source',
 'truncated',
 'user',
 'date',
 'month']

You can select certain columns that you wish to work with without looking at the entire dataframe.

In [31]:
trump_df = created[['id', 'retweet_count', 'source', 'full_text', 'date']]
trump_df.columns = ['id', 'retweet_count', 'source', 'text', 'date'] #can also rename columns
trump_df.head() #take a peep at the first 5 rows

Unnamed: 0,id,retweet_count,source,text,date
0,1049473255151755264,9898,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",https://t.co/4ySIkmfllE,2018-10-09 01:34:56+00:00
1,1049445228694962176,11000,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",https://t.co/k2bOxapRtR,2018-10-08 23:43:34+00:00
2,1049385141557030912,8549,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",Great to see @AGPamBondi launch a cutting-edge statewide school safety APP in Florida today - named by Parkland Survivors. BIG PRIORITY and Florida is getting it done! #FortifyFL,2018-10-08 19:44:49+00:00
3,1049383326975373312,11120,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","Every day, our police officers race into darkened allies, deserted streets, &amp; onto the doorsteps of the most hardened criminals. They see the worst of humanity &amp; they respond with the best of the American Spirit. America‚Äôs LEOs have earned the everlasting gratitude of...",2018-10-08 19:37:36+00:00
4,1049380830395609090,12174,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",We thank you. We salute you. We honor you. And we promise you: we will ALWAYS have your BACK ‚Äì now and FOREVER! #IACP2018 https://t.co/nvUUIuvouj,2018-10-08 19:27:41+00:00


Notice in the `date` column, the values end with 
` 2018-10-09 01:34:56+00:00 ` +00:00 indicating that the time is in UTC. 

We'll convert the tweet times to US Eastern Time, the timezone of New York and Washington D.C., since those are the places we would expect the most tweet activity from Trump.

In [32]:
trump_df['est_time'] = (trump_df['date'].dt.tz_convert("EST")) # Convert to Eastern Time

trump_df = trump_df[['id', 'retweet_count', 'source', 'text', 'est_time']]

trump_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,id,retweet_count,source,text,est_time
0,1049473255151755264,9898,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",https://t.co/4ySIkmfllE,2018-10-08 20:34:56-05:00
1,1049445228694962176,11000,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",https://t.co/k2bOxapRtR,2018-10-08 18:43:34-05:00
2,1049385141557030912,8549,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",Great to see @AGPamBondi launch a cutting-edge statewide school safety APP in Florida today - named by Parkland Survivors. BIG PRIORITY and Florida is getting it done! #FortifyFL,2018-10-08 14:44:49-05:00
3,1049383326975373312,11120,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","Every day, our police officers race into darkened allies, deserted streets, &amp; onto the doorsteps of the most hardened criminals. They see the worst of humanity &amp; they respond with the best of the American Spirit. America‚Äôs LEOs have earned the everlasting gratitude of...",2018-10-08 14:37:36-05:00
4,1049380830395609090,12174,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",We thank you. We salute you. We honor you. And we promise you: we will ALWAYS have your BACK ‚Äì now and FOREVER! #IACP2018 https://t.co/nvUUIuvouj,2018-10-08 14:27:41-05:00
5,1049373138130280449,12872,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",America‚Äôs police officers have earned the everlasting gratitude of our Nation. In moments of danger &amp; despair you are the reason we never lose hope ‚Äì because there are men &amp; women in uniform who face down evil &amp; stand for all that is GOOD and JUST and DECENT and R...,2018-10-08 13:57:07-05:00
6,1049367031433383936,7930,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","It was my great honor to address the International Association of Chiefs of Police Annual Convention in Orlando, Florida. Thank you! #IACP2018 #LESM https://t.co/Z0nY5bSNr6",2018-10-08 13:32:51-05:00
7,1049321480780099584,7397,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","Departing Washington, D.C. for the International Association of Chiefs of Police Annual Convention in Orlando, Florida. Look forward to seeing everyone soon! #IACP2018 https://t.co/EwSd7IU9t1",2018-10-08 10:31:51-05:00
8,1049292375330361345,17573,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","Christopher Columbus‚Äôs spirit of determination &amp; adventure has provided inspiration to generations of Americans. On #ColumbusDay, we honor his remarkable accomplishments as a navigator, &amp; celebrate his voyage into the unknown expanse of the Atlantic Ocean. https://t.c...",2018-10-08 08:36:11-05:00
9,1049000243168206848,18646,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",RT @FLOTUS: Thank you Kenya üá∞üá™ üá∫üá∏ https://t.co/JrHncob8Qp,2018-10-07 13:15:22-05:00


**Let's save this DataFrame to work on our next module!**

In [33]:
trump_df.to_csv('data/trumptweets.csv', encoding='utf-8', index=False)

Notebook developed by: Tina Nguyen

Data Science Modules: http://data.berkeley.edu/education/modules