## Collecting tweets using the Twitter API


In this section we are going to see how to connect to the Twitter API to collect tweets and save them.

"In computer programming, an **Application Programming Interface (API)** is a set of subroutine definitions, protocols, and tools for building application software." [wikipedia](https://en.wikipedia.org/wiki/Application_programming_interface)

The Twitter API is the tool we use to collect tweets from Twitter

Twitter offers two different APIs:
- The Streaming API (https://dev.twitter.com/streaming/public) which allows to access a sample (~1%) of the public data flowing through Twitter.

- The REST API (https://dev.twitter.com/rest/public) which provide programmatic access to read and write Twitter data.

To use the Twitter API from python, we will use the library [tweepy](http://www.tweepy.org/) which facilitate the access to the API.

To install it run the following command in your terminal or execute the cell below:
```
pip install tweepy
```



In [None]:
# this will install tweepy on your machine
!pip install tweepy

Create a Twitter app and find your consumer token and secret

1. go to https://apps.twitter.com/
2. click `Create New App`
3. fill in the details
4. click on `manage keys and access tokens`
5. click `create my access token`
6. create a new file in the lesson's folder named `keys.json` and copy paste your *Consumer Key (API Key)* and *Consumer Secret (API Secret)* as shown below in this new file:

In [None]:
import json
with open('keys.json', 'r') as fopen:
    keys = json.load(fopen)
print(keys)

### Authentificate with the Twitter API


In [None]:
import tweepy

auth = tweepy.OAuthHandler(keys['consumer_key'], keys['consumer_secret'])
auth.set_access_token(keys['access_token'], keys['access_token_secret'])

# create the api object that we will use to interact with Twitter
api = tweepy.API(auth)

In [None]:
# example of:
tweet = api.update_status('Hey @BovetAlexandre!')

In [None]:
# see all the information contained in a tweet:
print(tweet)

## Collecting tweets from the Streaming API
source : http://tweepy.readthedocs.io/en/v3.5.0/streaming_how_to.html

### Step 1: Creating a StreamListener

This simple stream listener prints status text. The on_data method of Tweepy’s StreamListener conveniently passes data from statuses to the on_status method.
Create class MyStreamListener inheriting from StreamListener and overriding on_status.:

In [None]:
#override tweepy.StreamListener to make it print tweet content when new data arrives
class MyStreamListener(tweepy.StreamListener):

    def on_status(self, status):
        print(status.text)

### Step 2: Creating a Stream

Using the api object we created and the StreamListener we can create a Stream Object:

In [None]:
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener)

### Step 3: Starting a Stream

A number of twitter streams are available through Tweepy. Most cases will use filter, the user_stream, or the sitestream. For more information on the capabilities and limitations of the different streams see [Twitter Streaming API Documentation](https://dev.twitter.com/streaming/overview/request-parameters)

In this example we will use filter to stream all tweets containing the word python. The track parameter is an array of search terms to stream.

In [None]:
# this will start tracking tweets with the key word 'new york'.
# to stop it, interrupt the kernel.
# try with different keywords
# you have to run the cell below to disconnect the stream before rerunning this one
myStream.filter(track=['#COP24','COP24','#ClimateChange','#ParisAgreement'])

In [None]:
myStream.disconnect()

In [None]:
myStream.filter(track=['realdonaldtrump,trump'], languages=['en'])

In [None]:
myStream.disconnect()

In [None]:
# streaming tweets from a given location
# we need to provide a comma-separated list of longitude,latitude pairs specifying a set of bounding boxes
# for example for New York
myStream.filter(locations=[-74,40,-73,41])

In [None]:
myStream.disconnect()

### Saving the stream to a file
Lets' define a new StreamListener that will save the collected data to a file

In [None]:
#override tweepy.StreamListener to make it save data to a file
# and limit the maximum number of tweets we want to collect
class StreamSaver(tweepy.StreamListener):
    def __init__(self, filename, max_num_tweets=2000, api=None):
        self.filename = filename
        
        self.num_tweets = 0
        
        self.max_num_tweets = max_num_tweets
        
        tweepy.StreamListener.__init__(self, api=api)
        
        
    def on_data(self, data):
        #print json directly to file
        
        with open(self.filename,'a') as tf:
            tf.write(data)

        self.num_tweets += 1

        if self.num_tweets%100 == 0:
            print(self.num_tweets)

        if self.num_tweets > self.max_num_tweets:
            return False
        
            
    def on_error(self, status):
        print(status)

In [None]:
# create the new StreamListener and stream object that will save collected tweets to a file
saveStream = StreamSaver(filename='tweets.txt', max_num_tweets=1000)
mySaveStream = tweepy.Stream(auth = api.auth, listener=saveStream)


In [None]:
mySaveStream.filter(track=['#COP24','COP24','#ClimateChange','#ParisAgreement'])
mySaveStream.disconnect()