# Retrieving data from the Twitter API

The aim of this part is to gather all possible tweets from major political parties in Switzerland and from the eminent members of those parties. We selected the 6 parties with the most representatives in the Swiss National Council since the last [2019 elections](https://en.wikipedia.org/wiki/2019_Swiss_federal_election) which are the UDC, PS, PLR, Les Verts, PDC and les Verts'libéraux  in our analysis. 

In [1]:
# Standard imports. Tweepy is a library to interact easily with the Twitter API via Python.
import tweepy 
import json
import csv
import pandas as pd
import os

To get data from the Twitter API, one needs to create a developper account. We initialize the credentials obtained from Twitter below.

In [2]:
consumer_key_1 = "92AFPa7uaAAjwxBWTfpzsc9Gc" 
consumer_secret_1 = "i7KDx59hX2nhc1vCRxewAT4woaAVlW2MJ5CddZsCFRKpxzQIw5"
access_key_1 = "2934954767-JArTI62xesl1Q5VxNXf9Mx0Czeg9vjQ7MqYufGg"
access_secret_1 = "roWovEouQhFUK7j4MewuHIKiOfKHHeG0o6OzLh5jwPfbX"

We retrieve the Twitter accounts in both languages of the 6 major political parties in Switzerland. Note that all parties except PDC have seperate Twitter accounts for communucation in French and German. PDC does all its' Twitter communication in both languages from the same account.

In [3]:
# Twitter usernames of the 6 major political parties
frenchAccounts = ['UDCch', 'PSSuisse', 'PLR_Suisse', 'LesVertsSuisses', 'vertliberaux']
germanAccounts = ['SVPch', 'spschweiz', 'FDP_Liberalen', 'GrueneCH', 'grunliberale']
germanAndFrench = ['CVP_PDC']
users = frenchAccounts + germanAccounts + germanAndFrench

The function below gets all tweets from a given list of users. The Twitter API limits us to a hard limit of a maximum of 3200 tweets per user. We try to get as many tweets as possible from each of the accounts. The features we gather are the following: 

* ``id``: The ID of a tweet 
* ``timestamp`` : The time the tweet was published
* ``partyname`` : The name of the party as defined in the twitter account
* ``username`` : The actual unique twitter username 
* ``tweet_text`` : The content of the tweet
* ``all_hashtags`` : A list containing all hashtags included in the tweet
* ``all_mentions`` : A list containing all user mentions in the tweet
* ``all_urls`` : A list containing all the URLs in the tweet
* ``retweet_count`` : The number of retweets of the given tweet
* ``favorite_count`` : The number of favorites of the given tweet
* ``range`` : The number of characters in the tweet
* `lang` : The language of the tweet


In [4]:
# Function that retrieves the maximum possible number of tweets from given accounts into a csv file.
def get_party_tweets(users, consumer_key, consumer_secret, access_key, access_secret):
    
    # Authenticate to the Twitter API
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth,  parser=tweepy.parsers.JSONParser()) 
    
    # Open the CSV file to write to.
    with open('data/twitter_data/party_tweets.csv', 'w', encoding="utf-8") as file:

        w = csv.writer(file)

        # write Header row to spreadsheet
        w.writerow(['id', 'timestamp', 'partyname', 'username', 'tweet_text', 'all_hashtags', 'all_mentions', 'all_urls', 'retweet_count', 'favorite_count', 'range', 'lang'])
        
        # iterate over all usernames
        for username in users:
            # The index i is for the internal page indexing used in the Twitter API
            for i in range(18):
                # 200 is the maximum number of tweets one can retrieve per page: 200*18 = 3600 (above 3200 just to be sure)
                tweets = api.user_timeline(screen_name=username, count = 200, tweet_mode="extended", page = i)
                # Write the attributes in the CSV
                for tweet in tweets:
                    w.writerow([tweet['id'],
                                tweet['created_at'],
                                tweet['user']['name'],
                                tweet['user']['screen_name'], 
                                tweet['full_text'].replace('\n',' '),
                                [e['text'] for e in tweet['entities']['hashtags']],
                                [e['screen_name'] for e in tweet['entities']['user_mentions']],
                                [e['expanded_url'] for e in tweet['entities']['urls']],
                                tweet['retweet_count'],
                                tweet['favorite_count'],
                                tweet['display_text_range'][1],
                                tweet['lang']])

In [5]:
get_party_tweets(users, consumer_key_1, consumer_secret_1, access_key_1, access_secret_1)

Here we have the account names for all eminent members of the given parties.

In [6]:
PS_accounts = ['ChristianLevrat', 'NordmannRoger', 'beat_jans', 'jcschwaab', 'MarinaCarobbio', 'PascaleBruderer', 'SusanneSlo', 'enussbi', 'cedricwermuth', 'PaulRechsteiner', 'BaGysi', 'CarloSommaruga', 'danieljositsch', 'yferi', 'zanettiroberto', 'eviallemann', 'margretkiener', 'JayBadran', 'MathiasReynard', 'ada_marra']
PDC_accounts = ['SchmidFederer', 'ybuttet', 'ClaudeBegle', 'LeoMuellerLU', 'MarcoRomanoPPD', 'BulliardMarbach', 'fregazzi', 'Ch_Lohr', 'PirminBischof', 'gerhardpfister', 'lombardi1956', 'GraberKonrad', 'Violapamherd', 'RuthHumbel', 'MullerAltermatt', 'martin_candinas', 'Elisabeth_S_S', 'KathyRiklin', 'DdeBuman', 'engler_stefan']  
UDC_accounts = ['thomas_aeschi', 'AdrianAmstutz', 'BrandHeinz', 'jfrime', 'AmaudruzCeline', 'NatalieRickli', 'UGiezendanner', 'verenaherzog', 'SVPBrunner', 'lukasreimann', 'LuziStamm']
Les_Verts_accounts = ['bglaettli', 'RobertCramer_GE', 'SibelArslanBS', 'bastiengirod', 'RegulaRytz', 'nr_mayagraf', 'adelethorens', 'Chrige_Haesler', 'FrickerJonas']
Les_Verts_liberaux_accounts = ['tiana_moser', 'Martin_Baeumle', 'kathrinbertschy', 'beatflach', 'Juerg_Grossen', 'I_Chevalley']
PLR_accounts = ['nantermod', 'OliFrancais', 'CLuscher', 'ChristaMarkwald', 'IsabelleMoret', 'DorisFiala', 'RaphaelComteCE', 'SchneeDani67', 'fderder', 'ThierryBurkart', 'Marcel_Dobler', 'LaurentWehrli', 'Damian_Mueller_', 'FluriKurt', 'ignaziocassis', 'cwasi', 'PetraGoessi', 'RuediNoser']
all_member_accounts = PS_accounts + PDC_accounts + UDC_accounts + Les_Verts_accounts + Les_Verts_liberaux_accounts + PLR_accounts

In [7]:
# This is a helper function for us to identify the party name of the member
def get_party_name(username):
    if username in PS_accounts:
        return 'PS Suisse'
    elif username in PDC_accounts:
        return 'CVP PDC PPD PCD'
    elif username in UDC_accounts:
        return 'UDC Suisse'
    elif username in Les_Verts_accounts:
        return 'Les VERTS suisses 🌻'
    elif username in Les_Verts_liberaux_accounts:
        return "Vert'libéraux Suisse"
    elif username in PLR_accounts:
        return 'PLR Suisse'

In [8]:
# Function that retrieves the tweets of party members
def get_member_tweets(users, consumer_key, consumer_secret, access_key, access_secret, file_name):
    
    # Authenticate to the Twitter API
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth,  parser=tweepy.parsers.JSONParser()) 
    
    # Open the CSV file to write to.
    with open(file_name, 'w', encoding="utf-8") as file:

        w = csv.writer(file)

        #write header row to spreadsheet
        w.writerow(['id', 'timestamp', 'member_name', 'party_name', 'username', 'tweet_text', 'all_hashtags', 'all_mentions', 'all_urls', 'retweet_count', 'favorite_count', 'range', 'lang'])
        
        # iterate over all usernames
        for username in users:
            
            # Get the name of the members party
            party_name = get_party_name(username)
            
            for i in range(18):
                tweets = api.user_timeline(screen_name=username, count = 200, tweet_mode="extended", page = i)
                for tweet in tweets:
                    w.writerow([tweet['id'],
                                tweet['created_at'],
                                tweet['user']['name'],
                                party_name,
                                tweet['user']['screen_name'], 
                                tweet['full_text'].replace('\n',' '),
                                [e['text'] for e in tweet['entities']['hashtags']],
                                [e['screen_name'] for e in tweet['entities']['user_mentions']],
                                [e['expanded_url'] for e in tweet['entities']['urls']],
                                tweet['retweet_count'],
                                tweet['favorite_count'],
                                tweet['display_text_range'][1],
                                tweet['lang']])

The execution of the next line in order to the tweets of all members gives an error. This is normal since we retreive too many tweets. Twitter has a limit of the number of requests one can send in an hour. We just query with the remainder of the usernames and join the CSV's obtained.

In [9]:
get_member_tweets(all_member_accounts, consumer_key_1, consumer_secret_1, access_key_1, access_secret_1, 'data/twitter_data/member_tweets.csv' )

RateLimitError: [{'message': 'Rate limit exceeded', 'code': 88}]

We look at our CSV file and see that we exceeded the rate limit for the tweets of 'DdeBuman'. We join the rest of the parties accounts to the members of PDC for which the retrieved tweets were incomplete and launch another query.

In [10]:
member_tweets_batch_2 = ['DdeBuman', 'engler_stefan'] + UDC_accounts + Les_Verts_accounts + Les_Verts_liberaux_accounts + PLR_accounts

In [11]:
get_member_tweets(member_tweets_batch_2, consumer_key_1, consumer_secret_1, access_key_1, access_secret_1, 'data/twitter_data/member_tweets_2.csv')

No error in the execution this time. We got the maximum number of tweets from all the accounts. Now we merge both member tweet files  `member_tweets_2.csv` and `member_tweets_2.csv` into a single one by putting them into a pandas DataFrame and deleting duplicate rows.

In [12]:
df1=pd.read_csv("data/twitter_data/member_tweets.csv")
df2=pd.read_csv("data/twitter_data/member_tweets_2.csv")

full_df = pd.concat([df1,df2])
unique_df = full_df.drop_duplicates(keep='last')
unique_df.to_csv('data/twitter_data/merged_member_tweets.csv', index=False)

Finally we remove the additional CSV files we created.

In [13]:
os.remove('data/twitter_data/member_tweets.csv')
os.remove('data/twitter_data/member_tweets_2.csv')