# Twitter Interactions Graph

**Objective:** This notebook creates a csv representing the interactions between accounts. This csv can then be used in social network analysis software like Gephi. 
<br><br>
Notably, this notebook uses the tweet content, rather than metadata, to determine the network. Specifically, if the first four characters of a tweet are "RT @" or "QT @", a directed edge from the tweet author to the mentioned author is created. However, there is no way of distinguishing whether a user simply began their tweet with "RT". This notebook is good for systems like Sysomos that allow researchers to download Twitter data but do not contain Tweet metadata that provides the metadata about retweeted or quoted tweets. 

**Prerequisites:** 
- In the same directory as this notebook, there must be a csv of the name provided that contains general Twitter data.

**Outcomes:** After running this notebook, there should be an output csv containsing fields "source" and "target". The relationship between these elements is as follows: "[source] retweeted or quoted [target]." Thus, the graph is directed, where source -> target.

In [1]:
# Pointer to file of interest
filename = './file.csv'

In [2]:
# Imports of necessary libraries
import pandas as pd
import codecs
import csv

In [3]:
# Create graph

# Open csv of focus
with open(filename) as csvfile:
    readCSV = csv.reader(csvfile, delimiter = ',')
    
    # Set up a counter to later update user on progress
    count = 0
    
    # Instatiate for later use
    userid1_index = 0
    tweetcontent_index = 0
    userid1 = []
    userid2 = []
    
    # Check every row for interactions with the user
    for row in readCSV:
        
        # Get the index of the columns of our interest
        if(count == 0):
            userid1_index = row.index('Author ID')
            tweetcontent_index = row.index('Contents')
        
        else:
            # Check if a RT or QT
            if((row[tweetcontent_index][0:4] == 'RT @') or (row[tweetcontent_index][0:4] == 'QT @')):
                # Add the username of account that did the RT or QT to the source array
                userid1.append(row[userid1_index])
                
                # Add the username of account that was RT'd or QT'd to the target array
                userid2.append(row[tweetcontent_index][4:row[tweetcontent_index].find(' ',4)-1])
    
        # Print count to update on progress
        if(count % 5000 == 0):
            print(count)
            
        count += 1
    
    df = pd.DataFrame.from_dict({'source': userid1,
                                 'target': userid2})
    

0
5000
10000
15000
20000
25000


In [4]:
# Save to csv
df.set_index('source').to_csv('user_graph.csv')