# Reading in the network data

The network data is stored in a tab separated value file, with one line for each edge in the network.

The dolphin social network data, and various other network data sets, are available here:

[Koblenz Network Collection](http://konect.uni-koblenz.de/)

You can download the dolphins.tsv file from blackboard. It is the same as the version available from the Koblenz Network Collection but with the header line removed.

In [2]:
# Create an empty edge set
edges = set()

with open("dolphins.tsv",'r') as f:
    for l in f:
        a,b = l.split("\t")
        e = (int(a),int(b))
        edges.add(e)

ValueError: not enough values to unpack (expected 2, got 1)

# Convert edge list (set) to neighbours

We will use a dictionary, with nodes as keys and values as neighbour sets to represent the network. This way nodes can have arbitrary names, not just integers.

In [None]:
# Create an empty dictionary
network = {}

for (a,b) in edges:
    #Check if key is in dictionary
    if a in network:
        network[a].add(b)
    else:
        network[a]={b}
    if b in network:
        network[b].add(a)
    else:
        network[b]={a}

In [None]:
network[4] # Neighbours of node 4

# Find the degree of each node

In [None]:
degrees = []

for neighbours in network.values():
    degrees.append(len(neighbours))

In [None]:
degrees[:5] # Degrees of first five nodes

In [None]:
max(degrees) # Maximum degree of any node

In [None]:
min(degrees) # Minimum degree of any node

In [None]:
sum(degrees)/len(degrees) # Average node degree

# Local clustering coefficients of nodes

First we define a helper function to count the number of edges between some set of nodes in our network.

In [None]:
def count_edges(net,ns):
    # Count number of edges between nodes in ns
    c = 0
    for n in ns:
        neighbours = net[n]
        es = neighbours & ns
        c = c + len(es)
    return c/2

The local clustering coefficient of a node is calculated as:

$C = \frac{2E_N}{k(k-1)}$

where $E_N$ is the total number of edges between neighbours of the node, and $k$ is the degree of the node.

(Since the node has $k$ neighbours, there are $\frac{k(k-1)}{2}$ possible edges between them)

In [None]:
clust_coeffs = []

for neighbours in network.values():
    k = len(neighbours)
    if k>1:
        e = count_edges(network,neighbours)
        lcc = e / (0.5*k*(k-1))
        clust_coeffs.append(lcc)
    else:
        clust_coeffs.append(0)
        
sum(clust_coeffs)/len(clust_coeffs) # Average clustering coefficient
# Jupyter note -- the value of the last statement in a cell is displayed
# below the cell