# Quick Twitter to Gephi Conversion Notebook
Below are a number of functions that can speed up your conversion of a Twitter dataset generated using our earlier notebooks, into a Gephi compatible Retweet network.

## Usage
Run the cell below to load in `pandas` and the pre-written functions, then follow the steps below.

In [None]:
import pandas as pd

def flatten_nested_dicts(df):
    dicts = df.to_dict(orient='records')
    flattened = pd.json_normalize(dicts)
    return flattened

def create_rt_edge_list(tweet_df):
    """"Creates an edge list where the Source is the retweeter, and the Target is the original tweet author. (Source)-[RETWEETED]-(Target)"""

    subset = tweet_df[['author_id','referenced_tweets']].dropna()
    edge_data = subset.explode('referenced_tweets').copy()
    edge_data = flatten_nested_dicts(edge_data)
    edge_data = edge_data[['author_id','referenced_tweets.id', 'referenced_tweets.type']]
    edge_data = edge_data[edge_data['referenced_tweets.type'] == 'retweeted']

    user_data = tweet_df[['id','author_id']]
    edge_data = edge_data.merge(user_data, how='left',left_on='referenced_tweets.id', right_on='id')
    edge_data = edge_data.drop(columns=['referenced_tweets.id','id','referenced_tweets.type'])

    edge_data['weight'] = 1
    edge_data = edge_data.groupby(['author_id_x','author_id_y'], as_index=False).sum()
    edge_data = edge_data.rename(columns= {'author_id_x':'Source','author_id_y':'Target'})
    return edge_data

def create_node_list(tweet_df, edges):
    """"Creates node list. Requires you pass in both the original tweets dataframe, and your newly created edge list"""

    node_data = tweet_df[['user_id','user_username','user_public_metrics']].drop_duplicates('user_id')
    nodes_in_network = pd.concat([edges['Source'],edges['Target']], axis=0).drop_duplicates()

    node_data = node_data[node_data['user_id'].isin(nodes_in_network)].copy()
    node_data = flatten_nested_dicts(node_data)
    gephi_node_labels = {'user_id':'ID','user_username':'Label',
                         'user_public_metrics.followers_count':'followers_count',
                         'user_public_metrics.following_count':'following_count',
                         'user_public_metrics.tweet_count':'tweet_count',
                         'user_public_metrics.listed_count':'listed_count'}
    node_data = node_data.rename(columns=gephi_node_labels)
    return node_data

def save_to_disk(edge_list, node_list):
    """"Saves node and edge lists to disk in Gephi friendly format"""
    edge_list.to_csv('my_edge_list.csv', index=False)
    node_list.to_csv('my_node_list.csv', index=False)


Change the filename to your tweet dataset filename, and make sure it is in the same folder as this notebook.
Run the cells below to create a Pandas dataframe edge list and a Pandas dataframe node list.

In [None]:
FILENAME = 'trhr.json'

tweets = pd.read_json(FILENAME)
tweets.info()

In [None]:
edge_list = create_rt_edge_list(tweets)
edge_list

In [None]:
node_list = create_node_list(tweets,edge_list)
node_list

If you are happy with your two lists, run the cell below to save to disk. You should have two new files, `my_edge_list.csv` and `my_node_list.csv` which can be used in Gephi.

In [None]:
save_to_disk(edge_list,node_list)