# Twitter Election Integrity Author Interactions

**Objective:** This notebook identifies all tweets removed by Twitter in its election integrity-related takedowns that mentioned, replied, or retweeted to a certain Twitter account. 

**Prerequisites:** 
- In the same directory as this notebook, there must be a folder called "dataSources" with all of the csvs of tweet accounts removed by Twitter. These are listed below and can be found [here](https://about.twitter.com/en_us/values/elections-integrity.html#data). 
- You should also know the userid of the account of interest. If you know the account handle, the userid can be found [here](http://gettwitterid.com).

**Outcomes:** After running this notebook, there should be a folder called "foundAuthorInteractions" containing a csv for each on in the dataSources folder. Each of the output csvs contains all tweets that mention, are replies to, or retweet the provided Twitter account.

In [1]:
# Pointers to all locations of the files of interest
filenames = ['dataSources/ira_tweets_csv_hashed.csv',
             'dataSources/iranian_tweets_csv_hashed.csv',
             'dataSources/russia_201901_1_tweets_csv_hashed.csv',
             'dataSources/iran_201901_1_tweets_csv_hashed.csv',
             'dataSources/venezuela_201901_1_tweets_csv_hashed.csv',
             'dataSources/venezuela_201901_2_tweets_csv_hashed.csv',
             'dataSources/bangladesh_201901_1_tweets_csv_hashed.csv'
            ]

# The account id of the user you're interested in (default is @nytimes)
account_id = '807095'

In [2]:
# Imports of necessary libraries
import pandas as pd
import codecs
import csv
import os

In [3]:
# Make folder for saving author interactions csvs
os.mkdir('foundAuthorInteractions_'+str(account_id))

In [4]:
# Check all datasets for 
for i in range(0, len(filenames)):
    # Update user on progress
    print("Currently processing " + filenames[i] + "...")

    # Open csv of focus
    with open(filenames[i]) as csvfile:
        readCSV = csv.reader(csvfile, delimiter = ',')

        # Set up a counter to later update user on progress
        count = 0 

        # Instantiate for later use
        in_reply_to_userid_index = 0
        retweet_userid_index = 0
        df = pd.DataFrame()
        
        # Check every row for interactions with the user
        for row in readCSV:
            
            # Set up dataframe for holding data
            if(count == 0):
                df = pd.DataFrame(columns = row)
                
                # Get the index of the columns of our interest
                in_reply_to_userid_index = row.index('in_reply_to_userid')
                retweet_userid_index = row.index('retweet_userid')
                
            elif ((row[in_reply_to_userid_index] == account_id) or (row[retweet_userid_index] == account_id)):
                df = df.append([pd.DataFrame(columns=list(df), data=[row])])
                
            if(count % 1000000 == 0):
                print("Read " + str(count) + " rows so far")
                
            count += 1
                    
        # Save as csv
        df.to_csv('foundAuthorInteractions_'+str(account_id)+'/'+filenames[i][12:len(filenames[i])-4]+'AuthorInteractions.csv')
    
    print('Finished this data source!')
        

Currently processing dataSources/ira_tweets_csv_hashed.csv...
Read 0 rows so far
Read 1000000 rows so far
Read 2000000 rows so far
Read 3000000 rows so far
Read 4000000 rows so far
Read 5000000 rows so far
Read 6000000 rows so far
Read 7000000 rows so far
Read 8000000 rows so far
Finished this data source!
Currently processing dataSources/iranian_tweets_csv_hashed.csv...
Read 0 rows so far
Read 1000000 rows so far
Finished this data source!
Currently processing dataSources/russia_201901_1_tweets_csv_hashed.csv...
Read 0 rows so far
Finished this data source!
Currently processing dataSources/iran_201901_1_tweets_csv_hashed.csv...
Read 0 rows so far
Read 1000000 rows so far
Read 2000000 rows so far
Read 3000000 rows so far
Read 4000000 rows so far
Finished this data source!
Currently processing dataSources/venezuela_201901_1_tweets_csv_hashed.csv...
Read 0 rows so far
Read 1000000 rows so far
Read 2000000 rows so far
Read 3000000 rows so far
Read 4000000 rows so far
Read 5000000 rows so 