*Note: this notebook is meant to accompany a forthcoming book chapter by __Deen Freelon__ titled *Partition-specific network analysis of digital trace data: Research questions and tools. *It may or may not stand on its own. Also, you'll need to run the code in the predecessor to the current notebook first.*

#Prelude: Prepping the data...

Before running the code for any of the RQs, run the following code. You can paste it into its own .py file or download and run this notebook locally.

Also, see the comments in the tsm.py source code for explanations of the functions' output.

In [None]:
import tsm
edge_list = tsm.t2e('wiunion_rts_hydrated.csv','RTS_ONLY')
partitioned_network = tsm.get_top_communities(edge_list)

The variable *partitioned_network* is a louvainObject which contains several pieces of information about the partitioned network. For example, if you type:

In [None]:
partitioned_network.n_communities

Python will display the total number of communities the Louvain method discovered. And if you type:

In [None]:
partitioned_network.modularity

Python will display the partition's modularity value. A complete description of all the attributes of a louvainObject can be found in the comments of TSM itself.

I highly recommend inspecting the contents of your network partition to figure out which partition is which, as results will vary on each run. 

# RQ1
*How do different subgraphs in a partitioned network relate to one another?*

Uncomment the last line of code below to save your proximity grid to a CSV file.

In [None]:
partnet_ei = tsm.calc_ei(partitioned_network.node_list,edge_list,'PROX')
partnet_prox = tsm.prox_grid(partnet_ei)
for row in partnet_prox:
    print(row)

# tsm.save_csv('wiunion_proxgrid.csv',partnet_prox)

# RQ2
*Which nodes are heavily connected to by distinct subgraphs?* 

In [None]:
partnet_inter = tsm.get_intermediaries(partitioned_network.node_list,edge_list,0.5,0.001)
for inter in partnet_inter:
    print(inter)

# RQ3
*How do subgraphs change over time?*

In [None]:
unique_dates = sorted(list(set([i[2] for i in tweet_data])))
num_buckets = len(unique_dates)//7
wk_buckets = []

for n in range(num_buckets):
    wk_buckets.append(unique_dates[n*7:(n+1)*7])

tweets_by_wk = []    
    
for n,bucket in enumerate(wk_buckets):
    tweets_by_wk.append([])
    for tweet in tweet_data:
        if tweet[2] in bucket:
            tweets_by_wk[n].append([tweet[0],tweet[1]])

partitions_by_wk = []

for bucket in tweets_by_wk:
    partitions_by_wk.append(tsm.get_top_communities(tsm.t2e(bucket,'RTS_ONLY')))

community_matches = []

for n,edgelist in enumerate(partitions_by_wk):
    if n+1 < len(partitions_by_wk):
        community_matches.append(tsm.match_communities(partitions_by_wk[n].node_list,partitions_by_wk[n+1].node_list))

The following line of code displays all best community matches for each time-slice pair that meets or exceeds the threshold.

In [None]:
for n,_ in enumerate(community_matches):
    print('--------')
    for i in community_matches[n].shared_nodes:
        print(i,'\t',community_matches[n].best_matches[i],'\t',community_matches[n].shared_nodes[i])