# Twitter Thesis Project: collecting tweets 

## Introduction
This is a 'jupyter notebook': a certain kind of program you can use to develop your own software applications. In this notebook we will use the Python computer language and the Twitter API ('application programmer interface') to automatically collect and analyse tweets. 

This notebook contains cells, i.e., snippets of either code or normal text. We use the text cells (like the current one) to explain what is going on. You can edit a cell by clicking on it. After you made the changes, you can either click 'run' above, or press shift+enter, to execute what is written. In the case of a text cell, this will just display the text in the correct format (try it with this cell!); in case of a code cell, the code will be executed. 

The next cell will be a code cell where we ask the computer to print a simple sentence for us. Try to change this sentence and then execute the code. 


In [1]:
print('hello world')

hello world


**Important note: to use this program, you have to execute all the code in all the cells in the correct order. **

If you want to learn more about using jupyter notebooks, look for a tutorial online (e.g., https://www.dataquest.io/blog/jupyter-notebook-tutorial/). Most of the questions you have or the problems you encounter will also be solved through a simple google search with the correct keywords.

**However, if you have other questions or any problems that you really don't know how to solve, please contact us on Slack and we'll be happy to help or to schedule a meeting. **

## Connecting to the Twitter API

In [1]:
from twython import Twython
#if this results in an error, you need to install twython first. See guideline document.
print('import successful')

import successful


First, we need to connect to Twitter using the correct passwords/keys. There is a limit on how many tweets you can collect each 15 minutes (this makes sure the Twitter servers are not overloaded, amongst other reasons).  Running the code below 'logs you in' to the Twitter application. If all goes well, the output should show information on the number of calls ('questions we can ask') we can still perform these 15 minutes. With each call, you can collect 100 Tweets. 
e.g.: {'/search/tweets': {'limit': 450, 'remaining': 443, 'reset': 1568288620}}

In [2]:
APP_KEY = 'yN3VbAb8QZdzD5GPkVuOHLfMN'         #API key
APP_SECRET = 'YRdyk39bx9iRPQBhK2Nh1fT32JdGYTrEhqxcEbcpLMIxbT7wKh'   #API secret key
twitter = Twython(APP_KEY, APP_SECRET, oauth_version=2)
ACCESS_TOKEN = twitter.obtain_access_token()

twitter = Twython(APP_KEY, access_token=ACCESS_TOKEN)
twitter.get_application_rate_limit_status()['resources']['search']



{'/search/tweets': {'limit': 450, 'remaining': 450, 'reset': 1570027350}}

## Streaming Tweets

In this part of the code, we will start 'streaming' tweets: collecting newly created tweets based on certain criteria. These tweets will then be saved in a csv file, a file format that you can open with excel, pages, etc. 

Every time you want to start streaming, run the code in the cells below. It migth take a while before a first tweet is discovered, so there's nothing wrong if no tweet shows up for a while. If a lot of tweets are streamed (like, e.g., when you would use a keyword like 'Trump' or 'Brexit'); make sure to halt the program in time.

New tweets will automatically be added to a file with the filename as specified below. You can change the filename (but do keep the extension '.csv'). This file will be created once a first tweet that matches the criteria is discovered, and tweets will be added to the same file regardless of whether you restarted the application in between. The file will be generated in the same folder as the folder where these notebooks are located. 


In [23]:
from twython import TwythonStreamer
import csv
import os.path

filename = 'collected_tweets_keyword_brussel.csv' #change the filename here 


In the code cell below, we first specify what will happen if we find a tweet that matches our criteria. Currently, it will tell us when a new tweet is collected. If it's not a retweet, its date, place and text will be written to file.

There's a lot more information you can access for each tweet. If you want to save more than the date, place and text (e.g., the name of the user) please go to https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object and consult the section 'Tweet Data Dictionary'. List all the properties you want to save to file, and contact us so we can update this part of the code.  

In [30]:
class MyStreamer(TwythonStreamer):
    def on_success(self, data):
        print("-------new tweet collected!")
        
        if 'retweeted_status' in data:
            print("but it's a retweet...")
        else:
            print([data['created_at'],data['place'], data['text']])
            
            file_exists = os.path.isfile(filename);
            with open(filename,'a', encoding='utf-8') as f: #this will add the newly collected tweets to your dataset ('a' = append) 
                writer = csv.writer(f, delimiter=';')
                if not file_exists: #if it's a new file, we should create a header 
                    writer.writerow(['Date','Place','Text']) #this is the document header
                writer.writerow([data['created_at'],data['place'], data['text']])
            

    def on_error(self, status_code, data):
        #print(data)
        print(status_code)
        # self.disconnect()

In the next cell, we connect to the twitter stream.

In [None]:
OAUTH_TOKEN = '1100028871259377670-qtcMTW2ereJ3A0KIvFguWu0ZmW0n8k'
OAUTH_TOKEN_SECRET = 'wnPYmWOds9xD1i1CM9K8gfzMNZ26QoBmXW4JSSA81faRF'


stream = MyStreamer(APP_KEY, APP_SECRET,
                    OAUTH_TOKEN, OAUTH_TOKEN_SECRET)

When you execute the next cell, the streaming will start. This is also the place where you can edit the criteria you want to 'filter' the stream on. There's different types of filters you can use (at the same time):



**follow** 	(optional): 	A comma separated list of user IDs, indicating the users to return statuses for in the stream. 

**track** (optional): 	Keywords to track. Phrases of keywords are specified by a comma-separated list. 

**locations** 	(optional): 	Specifies a set of bounding boxes to track. 

see https://developer.twitter.com/en/docs/tweets/filter-realtime/api-reference/post-statuses-filter


In [3]:
stream.statuses.filter(track='bussel')

NameError: name 'stream' is not defined