# Part 2: Email Behaviour Data Analysis

---

### Install Python packages (pip only)

In [1]:
#e.g., %pip install networkx


### Import Python packages

In [2]:
import networkx as nx
import json

---

### Task 1 of 1 

Examine the file "emails_cmt224.edgelist" which represents email behaviour at an organisation. Each line contains two numbers, 𝑢 and 𝑣, separated by a blank space. Consider each number as an identifier for an individual in an organisation, with the space on each line representing that the individual, 𝑢, sent at least one email to the another individual, 𝑣, at some point. Model the data using an appropriate network representation and answer the following questions:

##### Q1. Using the largest, strongly connected component (where at least one path exists between each individual and all others), could the connectivity be suggested to be reflective of a small world phenomenon in comparison to a comparative random network?

In [3]:
#CODE:
emailsGraph= nx.read_edgelist('emails_cmt224.edgelist', create_using=nx.DiGraph)

emailsGraphSC = sorted(nx.strongly_connected_components(emailsGraph), key= len, reverse=True)
largestconnected = emailsGraph.subgraph(emailsGraphSC[0]).copy()

randomNetwork = nx.gnm_random_graph(largestconnected.number_of_nodes(), largestconnected.number_of_edges(), directed=True, seed = 123)

largestconnectedShort = nx.average_shortest_path_length(largestconnected) 
largestconnectedCluster = nx.average_clustering(largestconnected)
randomNetworkShort = nx.average_shortest_path_length(randomNetwork) 
randomNetworkCluster = nx.average_clustering(randomNetwork)


print('Average shortest path for the largest connected compoenent: %.2f'%largestconnectedShort)
print('Average clustring in the strongly connected componenet: %.2f'%largestconnectedCluster)

print('Average shortest path for the random network: %.2f'%randomNetworkShort)
print('Average clustring in the random network: %.3f'%randomNetworkCluster)



Average shortest path for the largest connected compoenent: 2.55
Average clustring in the strongly connected componenet: 0.39
Average shortest path for the random network: 2.27
Average clustring in the random network: 0.038


##### Q2. Are the majority of connections in the entire network mutual connections where emails have been exchanged at least once, or asymmetric? In comparison, how many individuals have a higher or lower ratio of mutual connections than the entire network?

In [4]:
#CODE:
overallMututal = nx.reciprocity(emailsGraph)
eachPerson = nx.reciprocity(emailsGraph,emailsGraph.nodes())
count = 0

for node,val in eachPerson.items():
    if val >= overallMututal:
        count +=1
        
print('Individualy higher mutual connectesion than the entier network: ', count)
print('Individualy lower mutual connectesion than the entier network: ' , len(eachPerson) - count)


Individualy higher mutual connectesion than the entier network:  408
Individualy lower mutual connectesion than the entier network:  578


##### Q3. Considering induced, connected subgraphs of 3 individuals (triads) only, calculate the ratio of triads containing ONLY mutual connections in the entire network (i.e., the number of triad occurrences with only edges pointing in both directions, triads 201 and 300 / the total number of triad occurrences). From this, determine whether this ratio is reflective (i.e., of similar value) to the overall ratio of mutual connections in the network. What does this suggest about how mutual connections are connected together in the network?

In [5]:
#CODE:
def calculate_normalised_connected_triadic_census(graph):
    # Calculate the triadic census
    tc = nx.triadic_census(graph)
    del tc["003"]
    del tc["012"]
    del tc["102"]
    factor = 1.0 / sum(tc.values())
    for k in tc:
        tc[k] = tc[k] * factor
    return tc

mydict = calculate_normalised_connected_triadic_census(emailsGraph)
totalTargetTriads = mydict['201'] + mydict['300']
print('ratio of triads 201 and 300 out of 100 percent is:  %.2f' %(totalTargetTriads* 100))


ratio of triads 201 and 300 out of 100 percent is:  32.31


---
### Task 2 of 2

Examine the JSON file "emails_cmt224_departments.json" (departments file). Keys in the departments file represent individuals using the same ids as in the "emails_cmt224.edgelist" file in Part 2, Task 1 and the values represent a department id that the individual can be attributed to. Using the contents of the departments file in combination with the network in Part 2, Task 1, answer the following questions:

##### Q1. Using the connections that individuals have in the network, are they more likely to mix with others in their department or those with a similar number of connections?

In [6]:
#CODE:
#for user i check the emails, then i count the number of recpeiant in same depratment or not 
emailsDepartments = json.load(open('emails_cmt224_departments.json'))
for node in emailsGraph:
    emailsGraph.nodes[node]['department'] = emailsDepartments[node]

degree = nx.degree_assortativity_coefficient(emailsGraph)
attr = nx.attribute_assortativity_coefficient(emailsGraph, 'department')

print('Degree assortativity: %.3f'%degree)
print('Departement assortativity: %.2f'%attr)



Degree assortativity: 0.029
Departement assortativity: 0.31


##### Q2. Are all departments with 10 or more members more tightly connected amongst themselves in comparison to all individuals across the overall network irrespective of their department?  Where in this context, 'more tightly connected' is defined as having more mutual AND clustered connections. In addition to answering the overall question as yes or no, provide a list of departments this is true for (if any) and not true for (if any).

In [7]:
#CODE:
nodeAttrList = nx.get_node_attributes(emailsGraph, 'department')

groupedAttrList = {}
for pair in nodeAttrList.items():
    if pair[1] not in groupedAttrList.keys():
        groupedAttrList[pair[1]] = []

    groupedAttrList[pair[1]].append(pair[0])
    

count = 0
top_department = {}
for department, individualList in groupedAttrList.items():
    if len(individualList) >= 10:
        count += 1
        top_department[department] = individualList



subGraphlist = []

for department, individualList in top_department.items():
   subGraphlist.append([department,emailsGraph.subgraph(individualList).copy()])
    
overallRep = nx.overall_reciprocity(emailsGraph)
overallTranstivity = nx.transitivity(emailsGraph)

tightlyConnected = []
weaklyConnected = []
for subgraph in subGraphlist:
    if nx.overall_reciprocity(subgraph[1]) > overallRep and nx.transitivity(subgraph[1]) >overallTranstivity :
        tightlyConnected.append([subgraph[0],subgraph[1]])
    else:
        weaklyConnected.append([subgraph[0],subgraph[1]])
        
        



print('Top departments count is: ', len(top_department), ' And they are: ' , top_department.keys())
print ('Tightly connected depeartments count: ', len(tightlyConnected), ' And they are: ', [ele[0] for ele in tightlyConnected])

print( 'not tightly connected depeartments count: ', len(weaklyConnected), ' And they are: ', [ele[0] for ele in weaklyConnected])


Top departments count is:  28  And they are:  dict_keys(['1', '15', '3', '0', '7', '14', '16', '20', '19', '36', '21', '38', '22', '34', '17', '37', '35', '10', '4', '5', '13', '6', '9', '8', '23', '11', '2', '27'])
Tightly connected depeartments count:  21  And they are:  ['0', '7', '16', '20', '19', '36', '21', '38', '22', '34', '17', '37', '35', '10', '4', '5', '13', '9', '8', '11', '2']
not tightly connected depeartments count:  7  And they are:  ['1', '15', '3', '14', '6', '23', '27']
