### Goal

As we have learned, measuring network dispersion between two individuals is a decent way of identifying whether a romantic relationship may be present. In this activity we aim to teach you how to calculate this dispersion value. 

Furthermore, you will apply this measurement to a close friend within your Facebook friends in an attempt to "guess" who their romantic partner may be. 

In [1]:
import json
from collections import deque

### Collect Data

To get started, collect your own data from https://lostcircles.com/. Download JSON(no pics) after up finish loading your network. Run the following function to get a adjacency list. One thing about the json file is that some edges only have one directions recorded but we should add both directions to the adj list.

We have included the code for generating your personal adjacency list down below. Simply fill in your own file path once you get that, and then run the cell to fill the "adj_list" variable.

In [None]:
path = 'YOUR_FILE_PATH_TO_LOSTCIRCLES_FILE_HERE.json' # Replace this with your own file path

def process_json(path): # Parse the json file into adjacency list and node list
    data = json.load(open(path))
    links = data["links"]
    nodes = data["nodes"]
    adj_list = {}
    for edge in links:
        
        if edge["source"] in adj_list:
            if edge["target"] not in adj_list[edge["source"]]:
                adj_list[edge["source"]].append(edge["target"])
        else:
            adj_list[edge["source"]] = [edge["target"]]
            
        
        if edge["target"] in adj_list:
            if edge["source"] not in adj_list[edge["target"]]:
                adj_list[edge["target"]].append(edge["source"])
        else:
            adj_list[edge["target"]] = [edge["source"]]
            
    return adj_list, nodes

adj_list, nodes = process_json(path)

### Selecting Some Friends

Next you should select two close friends from your Facebook network. In this section we are going to attempt to calculate the dispersion value between two individuals. 

For the best results, the two friends you select should have a high amount of mutual friends which you are also mutual friends with. 

Two select these two friends, fill in the name variables below. Make sure to spell their names correctly and exactly as they are shown on Facebook. After that, run the cell. The result will be the two variables fIndex1 and fIndex2 which correspond to the index in the adjacency list of the first friend and the index for the second friend.

In [None]:
friend1 = "Friend Name" # --- FILL IN: Choose friend names from facebook
friend2 = "Friend Name"

def getFriendIndex(name):
    c = 0
    for i in nodes:
        if i['name'] == friend1:
            return c
        c += 1
    
fIndex1 = getFriendIndex(friend1) #Index in adjacency list of friend 1
fIndex2 = getFriendIndex(friend2) #Index in adjacency list of friend 2

### Calculating Dispersion Between Two People

In this section we will calculate the dispersion between the two friends you have found. To do this, we have included the skeletons of three essential algorithms. Your task is to complete these three algorithms. Once you have done that you will be able to find the dispersion value between your selected friends.

In [None]:
# --- Finish this function ---
def common_neighbor(adj_list, node1, node2):  #Find the list of common neighbors of node1 and node2\
    # declare a list to store results
    # loop through the neighbors of node1
    #   if neighbor is in neighbors of node2
    #      add neighbor to result list
    # return results

    
def distance(threshold, adj_list, u, v): 
    # Use BFS to check if distance between u and v are within threshold, return 1 when dist > threshold, 0 when <=
    queue = deque([u]) # Queue data structure
    explored = {u} # Set data structure for O(1) lookup
    count = 0
    while(len(queue) != 0 and count < threshold):
        cur_layer = len(queue)
        
        for i in range(cur_layer):
            cur = queue.popleft()
            for nei in adj_list[cur]:
                if nei == v:
                    return 0
                if nei not in explored:
                    queue.append(nei)
                    explored.add(nei)
                    
        count += 1
    return 1


# --- Finish this function ---
def dispersion(adj_list, node1, node2, threshold = 1, normalized = False): #Calculate the dispersion
    result = ______ # FILL IN: initialize dispersion result variable
    common_nei = ______ # FILL IN: get common neighbors of node1 and node2 from previous function
    
    #loop through common neighbors with index i
    #   loop through common neighbors with index j
    #          result += distance between node1 and node2
    
    if normalized:
        if len(common_nei) <= 1:
            return 0
        return result/len(common_nei)

    # return result

### Calculate Max Dispersion

In this section you will calculate the maximum dispersion for one of your friends in your network. To do this, fill out the function below. We have provided two helper functions to assist in your calculations.

In [None]:
def dispersionEqn(adj_list, u, v, values, iteration_num, threshold):
    common_nei = common_neighbor(adj_list, u, v)
    if len(common_nei) <= 1:
        return 0
    
    result = 0
    for nei in common_nei:
        result += values[nei][iteration_num] * values[nei][iteration_num]
    
    for i in range(len(common_nei) - 1):
        for j in range(i + 1, len(common_nei)):
            result += 2 * distance(threshold, adj_list, common_nei[i], common_nei[j]) * values[common_nei[i]][iteration_num] * values[common_nei[j]][iteration_num]
    
    return result/len(common_nei)


def find_max_dispersion(dispersions):
    result = 0
    max_dispersion = 0
    for key in dispersions.keys():
        if dispersions[key][len(dispersions[key]) - 1] > max_dispersion:
            max_dispersion = dispersions[key][len(dispersions[key]) - 1] #compare the last number and find the max
            result = key
    return result


# --- Finish this function ---
def recursive_dispersion(adj_list, node, max_iterations = 1, threshold = 1):
    disp_values = _____ # declare empty dict. This will hold the dispersion values for 
                        # each user. You are building these values up one by one
    
    # Loop through neighbor adjacency list. For any node that is not the target node, add it to our values
    # dictionary and set its dispersion value to 1 (for now)
    
    iteration_num = 0
    while iteration_num < max_iterations:
        # loop through all neighbors in values (Loop through keys)
        #    for each neighbor, calculate the dispersion using the dispersionEqn helper function. 
        #    Append this dispersion value to the END of the dispersion values dict
        # increment number of iterations
    
    return disp_values

Once you have finished the recursive_dispersion method above, try running the following cell with a friend of your choice. Your algorithm should return a different friend which has the highest dispersion value with your input friend. 

In [None]:
target_friend = "Friend Name"
friendIndex = getFriendIndex(target_friend)
num_recursions = 3 # This value determines how many recursions our recursive dispersion function will undergo. 
                   # The paper we read stated that 3 recursions provides a good balance between accuracy and speed

threshold = 1 # How many degrees of separation two nodes must be to not be considered neighbors. # Higher threshold
              # means you are only looking at nodes where mutual friends are further apart
    
dispersions = recursive_dispersion(adj_list, friendIndex, num_recursions, threshold)

# Final result: The output of this will state which person has the highest dispersion value with the target friend you 
# selected.
nodes[find_max_dispersion(dispersions)]