# Using Twitter Streaming API to collect users in a geographical region

In this notebook, we present how to use the Streaming Twitter API, which allows us to collect a sample of real-time tweets (around 1% of the total flow). We added a geographical filter in Paris region (Île-de-France). We used this request for a few weeks, in order to collect several user ids in the Paris region.

In [1]:
import json
import tweepy
import os

# Authentification

You would need to enter the tokens from your Twitter developer's account (see https://developer.twitter.com/en/portal/dashboard)

In [2]:
path_to_keys='../data/keys'
path_to_data='../data/streaming'

In [3]:
with open(os.path.join(path_to_keys,'auth_naila.json')) as f:
    auth_key = json.load(f)
    
# Import token pairs
with open(os.path.join(path_to_keys,'key_naila.json')) as f:
    token_key = json.load(f)

In [4]:
import pandas as pd

# Get these values from your dev.twitter application settings.
CONSUMER_KEY = auth_key['consumer_key']
CONSUMER_SECRET = auth_key['consumer_secret']
ACCESS_TOKEN = token_key['access_token']
ACCESS_TOKEN_SECRET = token_key['access_token_secret']

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

# Download data from Twitter API

In [5]:
from tweepy import Stream
from tweepy.streaming import StreamListener
from http.client import IncompleteRead
import time

You can add geographical, language, and keywords filters (stream only tweets containing those keywords).

In [6]:
language_id  = "in"
#most_freq_words = ['some_word', 'word',...]

# Bounding boxes for geolocations
# Online-Tool to create boxes (c+p as raw CSV): http://boundingbox.klokantech.com/
GEOBOX_PARIS = [2.2526417626,48.8163260795,2.4220143351,48.9021588775]
GEOBOX_WORLD = [-180,-90,180,90]

In [None]:
class MyListener(StreamListener):
 
    def on_data(self, data):
        try:
            with open(os.path.join(path_to_data,'twitter-stream-geobox-paris-'+time.strftime("%Y-%m-%d-%H", time.gmtime())+'.json'), 'a') as f:
                f.write(data)
                return True
        except BaseException as e:
            print('Error on_data: ', str(e))
        return True
 
    def on_error(self, status):
        print(status)
        return True

while True:
    
    try:
        
        # Connect/reconnect the stream
        twitter_stream = Stream(auth, MyListener())
        twitter_stream.filter(locations=GEOBOX_PARIS)
        
        #twitter_stream.filter(languages=[language_id],track=most_freq_words)
        #twitter_stream.filter(languages=['fr'])

    except:
        
        # Oh well, reconnect and keep trucking
        continue