# Twitter Thesis Project: collecting tweets 

## Introduction
This is a 'jupyter notebook': a certain kind of program you can use to develop your own software applications. In this notebook we will use the Python computer language and the Twitter API ('application programmer interface') to automatically collect and analyse tweets. 

This notebook contains cells, i.e., snippets of either code or normal text. We use the text cells (like the current one) to explain what is going on. You can edit a cell by clicking on it. After you made the changes, you can either click 'run' above, or press shift+enter, to execute what is written. In the case of a text cell, this will just display the text in the correct format (try it with this cell!); in case of a code cell, the code will be executed. 

The next cell will be a code cell where we ask the computer to print a simple sentence for us. Try to change this sentence and then execute the code. 


In [1]:
print('hello world')

hello world


**Important note: to use this program, you have to execute all the code in all the cells in the correct order. **

If you want to learn more about using jupyter notebooks, look for a tutorial online (e.g., https://www.dataquest.io/blog/jupyter-notebook-tutorial/). Most of the questions you have or the problems you encounter will also be solved through a simple google search with the correct keywords.

**However, if you have other questions or any problems that you really don't know how to solve, please send an email to hpinson@vub.be and I'll be happy to help or to schedule a meeting. **

## Connecting to the Twitter API

In [5]:
from twython import Twython
#if this results in an error, you need to install twython first. See guideline document.
print('import successful')

import successful


First, we need to connect to Twitter using the correct passwords/keys. There is a limit on how many tweets you can collect each 15 minutes (this makes sure the Twitter servers are not overloaded, amongst other reasons).  Running the code below 'logs you in' to the Twitter application. If all goes well, the output should show information on the number of calls ('questions we can ask') we can still perform these 15 minutes. With each call, you can collect 100 Tweets. 
e.g.: {'/search/tweets': {'limit': 450, 'remaining': 443, 'reset': 1568288620}}

In [6]:
APP_KEY = 'yN3VbAb8QZdzD5GPkVuOHLfMN'         #API key
APP_SECRET = 'YRdyk39bx9iRPQBhK2Nh1fT32JdGYTrEhqxcEbcpLMIxbT7wKh'   #API secret key
twitter = Twython(APP_KEY, APP_SECRET, oauth_version=2)
ACCESS_TOKEN = twitter.obtain_access_token()

twitter = Twython(APP_KEY, access_token=ACCESS_TOKEN)
twitter.get_application_rate_limit_status()['resources']['search']



{'/search/tweets': {'limit': 450, 'remaining': 450, 'reset': 1569503547}}

## Searching Tweets

!!!add description

In [7]:
#result = twitter.search(q='python', result_type='popular')
#result = twitter.search(q='vrije universiteit brussel', count='100')

#zoek naar 100 (het maximum) tweets die het woord 'brussel' bevatten; uitgezonderd retweets. 
resultBrussels = twitter.search(q='brussel -filter:retweets', count='100')

resultAntwerpen = twitter.search(q='antwerpen -filter:retweets', count='100')

In [12]:
for x in resultBrussels['statuses']:
  print(x['text'])
  print('___')
  


Sociale tolken worden zes maal duurder: ‘Dit zal de integratie niet ten goede komen’ https://t.co/QcKWdzdoXV via… https://t.co/65gFfZSoUe
___
@Zelina_VegaWWE Wouldn’t Brussel sprouts be kinda on theme as a name?
___
zijn er niet uit wie Brussel nu moet monitoren
zesde minister voor N-VA wenkt
kunst en cultuur in #Vlaanderen zal h… https://t.co/L0E3lS1z06
___
Typisch voorbeeld van 'airgernis' : Twee grote artikelen over de alarmerende situatie van onze wereld, en een klein… https://t.co/z0ncvipW6n
___
misschien toch les skippen om naar Antwerpen te gaan omdat ik geen vrienden heb in Brussel die mee naar Brussel bro… https://t.co/SL2gAsBAz6
___
wie gaat straks Brussel Brost??
___
* Jalapeños 
* Dates
* Figs
* Tater tots
* Avocado
* Pineapple spears
* Little smokies 
* Big smokies 
* Oysters
*… https://t.co/F3bjlfyguQ
___
Today's midnight snack list excitedly headlines twice-baked brussel sprouts with a side of yummy e. coli!
___
Tlief in diepe rouw. Haar favoriet om te fotograferen als h

In [14]:
for x in resultAntwerpen['statuses']:
    print('--')
    print(x['text'])

--
#A12 Boom richting Antwerpen , ter hoogte van Bevrijdingstunnel Ongeval afgehandeld (26/09/19 14:51 - 26/09/19 15:06)
--
@medinalisya @caia_rose Nene bedoel da van Antwerpen cousin
--
@asaphxsampa @caia_rose G ik ben er al geweest, das mijn stad 😂 das totaal anders dan Antwerpen
--
#A12 Ongeval afgehandeld richting Antwerpen thv de Bevrijdingstunnel.
--
Boost for Talents #Antwerpen: 15 jongeren aan start van intensief begeleidingstraject doorheen secundaire en hogere… https://t.co/rjF7WX0A9k
--
@zdghft @TersEsther @Stad_Antwerpen @PZAntwerpen Bedankt om door te geven. We kijken de verkeerslichten na.
--
misschien toch les skippen om naar Antwerpen te gaan omdat ik geen vrienden heb in Brussel die mee naar Brussel bro… https://t.co/SL2gAsBAz6
--
@TersEsther @wegenenverkeer @Stad_Antwerpen @PZAntwerpen Tijdens de spits is het een ramp, 20 minuten van aan het z… https://t.co/HoDQnAfd8m
--
Bruno Valkeniers legt vandaag de eed af als nieuw provincieraadslid @prov_antwerpen Hij volgt Mari

## Streaming Tweets

!!! add description

In [23]:
from twython import TwythonStreamer
import csv
import os.path

filename = 'collected_tweets_keyword_brexit.csv' #change the filename here 


In [26]:
class MyStreamer(TwythonStreamer):
    def on_success(self, data):
        print("-------new tweet collected!")
        
        if 'retweeted_status' in data:
            print("but it's a retweet...")
        else:
      
            print([data['created_at'],data['place'], data['text']])
            
            file_exists = os.path.isfile(filename)
         
            with open(filename,'a') as f: #this will add the newly collected tweets to your dataset ('a' = append) 
                writer = csv.writer(f, delimiter=';', encoding="utf-8")
                if not file_exists: 
                    writer.writerow(['Date','Place','Text'])
                writer.writerow([data['created_at'],data['place'], data['text']])
            

    def on_error(self, status_code, data):
        #print(data)
        print(status_code)
        # self.disconnect()

In [27]:
OAUTH_TOKEN = '1100028871259377670-qtcMTW2ereJ3A0KIvFguWu0ZmW0n8k'
OAUTH_TOKEN_SECRET = 'wnPYmWOds9xD1i1CM9K8gfzMNZ26QoBmXW4JSSA81faRF'


stream = MyStreamer(APP_KEY, APP_SECRET,
                    OAUTH_TOKEN, OAUTH_TOKEN_SECRET)

stream.statuses.filter(track='brexit')

-------new tweet collected!
but it's a retweet...
-------new tweet collected!
but it's a retweet...
-------new tweet collected!
but it's a retweet...
-------new tweet collected!
but it's a retweet...
-------new tweet collected!
['Thu Sep 26 14:27:42 +0000 2019', None, 'Parents write to your primary schools.\n“Dear X,\nPlease accept this letter as a confirmation that I insist that my c… https://t.co/8dGtNuG8dP']


TypeError: 'encoding' is an invalid keyword argument for this function