# Social Network Analysis - Exercise Sheet 3


### Bipartite Graph Generators

The random generation of graphs is important for the task of network analysis.
There is a multitude of random generators for graphs available, the most prominent being the Erdős–Rényi model.
In the field of social network analysis the Watts and Strogatz model (small-worlds) and the Barabási–Albert model (scale-free networks with preferential attachment) also became prominent generation models to more adequately generate networks mimicking at least some aspects of empirically observed networks.

However, none of these models can be used as is to generate bipartite graphs and bipartite graphs were analyzed by projecting them to both partitions and analyzing the projected graphs using standard graph measures. 
This way of analyzing bipartite networks drew criticism for not resulting in 

Therefore, it was nessessary to come up with more appropriate generation models for this important class of networks.

In [None]:
# Execute this cell to show the PDF containing the paper or open the PDF seperately
from IPython.display import IFrame
IFrame("./texts/Bipartite_graphs_as_models_of_complex_networks.pdf", width='100%', height=800)

### Preferential Attachment

Preferential attachment means that the more connected a node is, the more likely it is to receive new links. Nodes with higher degree have stronger ability to grab links added to the network.

Formally:
The probability that a new node is connected to the node $i$ is $$p_i = \frac{k_i}{\sum_j k_j}$$
where $k_i$ is the degree of node $i$ and the sum is made over all pre-existing nodes $j$.


*Hints:*
* Only look at bottom nodes not yet connected to the new top node.
* You can use *stats.rv_discrete* to create the probability distribution for the new edge.

## Exercise
1. **Implementation of the generator:** Implement the algorithm to generate bipartite random graphs described in section 4.2. "Growing bipartite model with preferential attachment" in the paper "Bipartite graphs as models of complex networks" (2006).
The generator should accept discrete scipy [Distributions](https://docs.scipy.org/doc/scipy/reference/stats.html) as input distributions for the degree distributions. Bipartite graphs shohuld be represented in networkx by [coloring](https://networkx.github.io/documentation/networkx-1.10/reference/algorithms.bipartite.html), i.e., using an attribute `bipartite` to distinguish the nodes where `bipartite=0` identifies top nodes and `bipartite=1` identifies bottom nodes.
2. **Implementation of projection and analysis:** Implement a function to project bipartite graphs onto the top and bottom sets using the networkx functionality. Implement functions Connected, ASP, ALCC, Density and ADeg. You can either use your own implementation or networkx functionality. *Hint: When using networkx most of the functions become one-liners.* 
3. **Evaluation of generated and projected bipartite graphs:** Generate three networks for the given parameters (parameters can be found in the corresponding section) analyze them with respect to connectedness, average shortest path, average local clustering coefficient, density and average degree. Project each network onto both partitions and analyze the projected network with respect to the same properties. 

##### Hints
* Submit your code zipped via [moodle](https://moodle.uni-kassel.de/course/view.php?id=11038) until 15.12.2023 23:55 MEZ
* You can use the [NetworkX](https://networkx.github.io/documentation/stable/) library. 
* Watch out which networkx version you are working with `print(nx.__version__)` and use only the documentation corresonding to that version!
* Below the Implementation section is a Test section that can be used to check your code.

### Implementation
Implement your solution in this section.
Use the predefined methods.
You can add more methods if you want.

In [None]:
import scipy.stats as stats
import networkx as nx
from networkx.algorithms import bipartite
import numpy as np
import random as rd


def bipartite_graph_generator(top_distribution=stats.binom(10, 0.25), number_of_top_nodes=25, overlap_ratio=0.5):
    assert(overlap_ratio <=1)
    assert(overlap_ratio >=0)

    G = nx.Graph()
 
    # TODO

    return G

## Tests 
This section contains testcases that can be used to test if the implemented generator works correctly.

The left histogram shows the approximate distribution of the top vertices, the right histogram of the bottom vertices.
The left histogram has the exact distribution plotted in red for reference.
Small deviations are normal since the testcases used only generate fairly small graphs.
The right side should show an approximate power-law distribution.

In [None]:
# Draw the generated Graph
import matplotlib.pyplot as plt
def analyize_distributions_bipartite_graph(G, distribution=None, attribute='bipartite'):
    # Separate by groups
    t = list(n for n,d in G.nodes(data=True) if d[attribute]==0)
    b = list(n for n,d in G.nodes(data=True) if d[attribute]==1)
    
    # define proper bin sizes
    t_max = max([G.degree(n) for n in t])
    t_bins = [-0.5 + i  for i in range(0,t_max+2)]
    b_max = max([G.degree(n) for n in b])
    b_bins = [-0.5 + i  for i in range(0,b_max+2)]

    # create the plots
    fig, ax = plt.subplots(1,2,figsize=(12,3))
    
    # reference histogram in red
    if distribution:
        ax[0].hist(distribution.rvs(size=5000),bins=t_bins, color='red', density=True)
    
    ax[0].hist([G.degree(node) for node in t], bins=t_bins, rwidth=0.9, density=True)
    ax[0].set_xlabel('Node Degree')
    ax[0].set_ylabel('Frequency')
    ax[0].set_title("Histogram of top node degrees")
    ax[1].hist([G.degree(node) for node in b], bins=b_bins, rwidth=0.9, density=True)
    ax[1].set_xlabel('Node Degree')
    ax[1].set_ylabel('Frequency')
    ax[1].set_title("Histogram of bottom node degrees")
    plt.show()
    return
    
def draw_bipartite_graph(G,attribute='bipartite', sort_top=True, sort_bottom=True):
    # Separate by group
    t = list(n for n,d in G.nodes(data=True) if d[attribute]==0)
    b = list(n for n,d in G.nodes(data=True) if d[attribute]==1)
    
    # create the color map
    color_map = []
    for node,d in G.nodes(data=True) :
        if d[attribute]==0:
            color_map.append('orange')
        else: color_map.append('lightgreen')  

    pos = {}
    # Update position for node from each group
    # sort the nodes by degree
    if sort_top:
        t = sorted(t, key=lambda node: G.degree(node),reverse=True)
    if sort_bottom:
        b = sorted(b, key=lambda node: G.degree(node),reverse=True)
    
    pos.update((node, (index, 2)) for index, node in enumerate(t))
    pos.update((node, (index, 1)) for index, node in enumerate(b))

    nx.draw(G, pos=pos, node_color = color_map, with_labels = True)
    plt.show()
    return

### Test 1

In [None]:
test_distr = stats.rv_discrete(name='custm', values=([1, 2, 2, 2, 5], [1/5 for i in range(5)]))
G = bipartite_graph_generator(test_distr,25,0.75)
analyize_distributions_bipartite_graph(G,distribution=test_distr)
draw_bipartite_graph(G,sort_top=False)

### Test 2

In [None]:
binomial = stats.binom(15, 0.5)
G = bipartite_graph_generator(binomial,75,0.85)
analyize_distributions_bipartite_graph(G,distribution=binomial)

### Test 3

In [None]:
geometric = stats.geom(0.25)
G = bipartite_graph_generator(geometric,75,0.85)
analyize_distributions_bipartite_graph(G,distribution=geometric)

### Test 4

In [None]:
uniform_trunc_fibonacci = stats.rv_discrete(name='custm', values=([1, 1, 2, 3, 5, 8, 13], [1/7 for i in range(7)]))
G = bipartite_graph_generator(uniform_trunc_fibonacci,75,0.85)
analyize_distributions_bipartite_graph(G,distribution=uniform_trunc_fibonacci)

## 2 Implementation of projection and analysis

**Implement** the functions for the analysis of graphs.
You can use your own implementation or the [networkx](https://networkx.github.io/documentation/stable/) implementations.
*Hint: When using networkx the functions can mostly be written as one-liners.*


**Implement** a projection of the graph onto both the top and bottom node sets. Use the [networkx implementation](https://networkx.github.io/documentation/networkx-1.9/reference/algorithms.bipartite.html) to do this.
*Hint: You can use the [G.nodes(data=true)](https://networkx.github.io/documentation/networkx-1.10/reference/generated/networkx.Graph.nodes.html) flag to get the nodes with their corresponding attribute dictionary.*


In [None]:
def Connected(G):
    # connectedness of G
    connected = # TODO
    return connected

def ASP(G):
    # average shortest path in G
    if Connected(G):
        ASP = # TODO
        return ASP
    else:
        return float('inf')

def ALCC(G):
    # average clustering coefficient of G
    if Connected(G):
        ALCC = # TODO
        return ALCC
    else:
        return -1
    
def NumNodes(G):
    # number of nodes of G
    return G.number_of_nodes()

def NumEdges(G):
    # number of edges of G
    return G.number_of_edges()

def Density(G):
    # density of G
    Density = # TODO
    return Density

def ADeg(G):
    # average degree of G
    ADeg = # TODO
    return ADeg


def project_bipartite_graph(G, attribute='bipartite'):
    # computes the projections onto the top and bottom node sets of a bipartite graph

    # TODO
    
    return top_graph, bottom_graph

In [None]:
# compute all functions for a graph
functions = [Connected, NumNodes, NumEdges, ASP, ALCC, Density, ADeg]
def functions_on_graph(G,functions):
    return list(map(lambda f:(f.__name__,f(G)),functions))


## 3 Evaluation of generated and projected bipartite graphs

**Generate** bipartite graphs given the following parameters:

1. **Graph 1**
    * Distribution: Binomial(15, 0.5) -> stats.binom(15, 0.5)
    * Number of Top Nodes: 150
    * Overlap Ratio: 0.85
    
    
2. **Graph 2**
    * Distribution: Geometric(0.1) -> stats.geom(0.1)
    * Number of Top Nodes: 100
    * Overlap Ratio: 0.95
    
    
3. **Graph 3**
    * Distribution: Discrete Distribution with P(3)=0.1; P(6)=0.5; P(9)=0.3; P(15)=0.1 -> see Test 4 for an example of a custom discrete distribution
    * Number of Top Nodes: 150
    * Overlap Ratio: 0.5
    
---
    
**Analyze** the graphs with respect to:
* connectedness
* average shortest path
* density
* average degree

---

**Project** the graph onto both the top and bottom node sets and **analyze** the projected graphs with respect to:
* connectedness
* average shortest path
* average local clustering coefficient
* density
* average degree

---

**Try to generate connected graphs**. 
Notice that a correctly implemented generator does not guarantee connectedness.
Compare the results of the projections and the original graph.
Write a **short** summary of your observations, e.g., what is similar, what is not, what changes for the projections etc.

### Graph 1

In [None]:
binomial = stats.binom(15, 0.5)
G = bipartite_graph_generator(binomial,150,0.85)
# TODO

#### Observations:

    # TODO

### Graph 2

In [None]:
geom = # TODO
G = bipartite_graph_generator(# TODO)
# TODO

#### Observations: 

    # TODO

### Graph 3

In [None]:
some_discrete_dist = # TODO
G = bipartite_graph_generator(# TODO)
# TODO

#### Observations:

    # TODO