# Twitter Example

This notebook contains a heavily-commented minimal example of how to access Twitter using API keys and pull all tweets relating to a given search term.

The library used to interact with Twitter is [Tweepy](http://docs.tweepy.org/en/v3.5.0/index.html). This notebook explains only how to get Tweets for a single search term over the past seven days; if you want to do something more complicated - such as constantly update a file based on new tweets coming in in real time - the documentation linked above is the best place to look.

### 1. External setup

1. Install tweepy by running `conda install -c conda-forge tweepy` in the anaconda prompt

2. Create a [Twitter](https://twitter.com) account and verify it with a phone number

3. Apply for a developer account at [developer.twitter.com](https://developer.twitter.com)

4. Create an "app" on Twitter. This just means that you come up with a name and a justification for having Twitter API keys. "I want to experiment with Twitter's API" has always worked for me as a reason.

5. Give your app both read and write permissions.

### 2. Import libraries

In [1]:
import pandas as pd # Creates & manipulates dataframes
import tweepy # Interacts with Twitter

### 3. Authenticate a Twitter handler

When you create a Twitter app, you will be given a set of API keys. You should have the following:

- API key
- API secret key
- Access token
- Access token secret

In the code cell below, replace the placeholder values with your keys. Your keys should be inside speech marks, so the program will interpret them as strings.

In [2]:
API_KEY = "osPbVSYPfW06UetGHf5Yqgnhi"
API_SECRET = "tGOUH7naMnqkM1EcBa69OMToKl9slqtSiUbvp4XFRH3ggKzyfF"
ACCESS_TOKEN = "24293199-t3ttxV0YP3ZvX2SYjVKnVe3fSTpLA4NNWCvYA8ak2"
ACCESS_TOKEN_SECRET = "ozSmxH1t0p5Lv6N44BpXMxt0MxHn0RUGeId0WDL4ucnkF"

The cell below creates and authenticates an object - `API` - that can be used to interact with Twitter.

In [3]:
# Create an authentication handler
auth = tweepy.OAuthHandler(API_KEY, API_SECRET)

# Feed the handler all the necessary information
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

# Use the handler to authenticate an API object
API = tweepy.API(auth, wait_on_rate_limit=True,wait_on_rate_limit_notify=True)

In the cell above, `wait_on_rate_limit` tells the `API` object not to make too many requests in a short space of time; this stops Twitter from flagging your program as harmful. Whenever `API` hits the request limit, it will stop making requests until the request counter has reset.

### 4. Access tweets relating to a specific search

Using the free API, you can only access Tweets from the last seven days.

In [None]:
# Make a holder to store tweets
tweets = []

# Create a string to search for
query='#bitcoin OR #btc'

# Get tweets from the last seven days
for status in tweepy.Cursor(API.search,q=query,lang="en", since = '2019-08-01', until="2019-08-10", ).items():
    
    # Add each tweet to the holder
    tweets.append(status)

The cell above shows how to access Tweets using a cursor object. Essentially, a cursor allows you to page through search results on Twitter, getting one tweet at a time until there are no more to show.

The loop takes each tweet in turn as `status` and adds it to the holder, `tweets`.

In [5]:
# See how many tweets there are
len(tweets)

30904

### 5. Store the Tweets in a dataframe

Now we need to store our tweets in a dataframe, extracting out key information into columns.

Currently, all tweets are stored as Tweepy Status objects, containing not just the text but a huge number of values.

[An example of the Tweepy status object](https://gist.github.com/dev-techmoe/ef676cdd03ac47ac503e856282077bf2)

For the purposes of this example, only four different values are going to be extracted per tweet, but the same structure could be easily adapted to get more or different information.

The four values to be extracted per tweet are as follows:

- `created_at` - the date & time the tweet was posted
- `text` - the text of the tweet
- `retweet_count` - the number of times the tweet was retweeted
- `favourite_count` - the number of "favourites" a tweet has

In [6]:
# Make an empty dataframe with column headings

df = pd.DataFrame(columns=["created","text","retweets","favourites","verified",
                           "followers","user_id","hashtags","user_location"])

# Inspect the empty dataframe
df.head()

Unnamed: 0,created,text,retweets,favourites,verified,followers,user_id,hashtags,user_location


In [7]:
# Fill the dataframe with tweets by looping through each tweet and 
# attaching its information as a new row

# As many times as there are tweets in the holder,
for i in range(len(tweets)):
    
    # temporarily assign the next tweet to status
    status = tweets[i]
    
    # Make a new row in df with the desired values from status
    df.loc[i] = [status.created_at, status.text,
                 status.retweet_count, status.favorite_count,
                 status.user.verified, status.user.followers_count,
                 status.user.id_str,
                 status.entities['hashtags'],status.user.location]

In [9]:
# Check the dataframe
df.sample(10)

Unnamed: 0,created,text,retweets,favourites,verified,followers,user_id,hashtags,user_location
1447,2019-08-11 12:16:39,RT @Cryptofhm: 🚨 2000 $WIN #crypto #win #Givea...,202,0,False,86,1125496958892769280,"[{'text': 'crypto', 'indices': [27, 34]}, {'te...",
11486,2019-08-10 23:55:22,RT @coincloudATM: Are you ready for our next h...,235,0,False,115,3233045975,"[{'text': 'BITCOIN', 'indices': [50, 58]}]",Lost somewhere
7034,2019-08-11 06:15:16,$BTC 💵 price: $11386.31 1.00000BTC \n1h: +0.05...,0,0,False,1323,910085909936246785,"[{'text': 'Bitcoin', 'indices': [80, 88]}]","Earth, Blockchain"
14059,2019-08-10 20:49:38,Light Canvas Style $XRP Bag\n.\nFor Sale @ htt...,0,2,False,326,4330729395,"[{'text': 'xrpthestandard', 'indices': [67, 82...",Canada
20486,2019-08-10 14:59:23,RT @FinWhaleX: FinWhaleX platform is a real go...,282,0,False,14450,860082193317601283,[],Việt Nam
6170,2019-08-11 07:21:04,RT @CryptoJeans: 🏆MEGA #giveaway🏆 \n\n💸100$ in...,207,0,False,4427,352350305,"[{'text': 'giveaway', 'indices': [23, 32]}]",Jaipur/Bangalore/Delhi
29366,2019-08-10 07:04:15,"#Crypto #News: ""New Jersey Governor Signs Bill...",0,0,False,1449,3351117989,"[{'text': 'Crypto', 'indices': [0, 7]}, {'text...",World Wide Web
26603,2019-08-10 09:53:02,RT @ante_hero: Bankroll Weekly Giveaway 7\n\n3...,37,0,False,559,1549890307,[],"Uruagu nnewi,anambra state"
14327,2019-08-10 20:31:04,Decentralized Crypto Token (DCTO) is airdroppi...,1,1,False,991,925408907513745408,[],
10620,2019-08-11 01:04:54,RT @bitcoinization: Where am I?\n\nFirst to ge...,1,0,False,783,931492393,"[{'text': 'WhereAmI', 'indices': [102, 111]}, ...",


In [10]:
test = df.copy()
test['day'] = test['created'].apply(lambda x: pd.to_datetime(x).day)
test['day'].value_counts()

10    19467
11    11437
Name: day, dtype: int64

### 6. Save the data as a .csv

Due to rate limits and the speed at which Tweets can be downloaded from Twitter, it's probably best to keep the code which accesses Twitter separate from the code that analyses Twitter data; that way, if you need to run the analysis sections multiple times, you don't have to wait for the data each time.

Because of this, the final section of this notebook simply stores the dataframe as a `.csv` file, allowing it to be loaded into other notebooks. 

In [16]:
# Save the whole dataframe to the same folder as the notebook file
df.to_csv("bitcoin_or_crypto.csv")