# SD212: Graph mining

# Lab 1: Sampling

The objective of this lab is to explore differents ways to sample nodes in graphs. You're supposed to have done the lab on sparse matrices before.

## Import

In [None]:
from IPython.display import SVG

In [None]:
import numpy as np
from scipy import sparse

In [None]:
from sknetwork.data import load_netset
from sknetwork.visualization import svg_graph

In [None]:
from sknetwork.linalg import normalize

## Graphs

We will sample nodes from [Openflights](https://openflights.org), the graph of daily flights between airports.

In [None]:
graph = load_netset("openflights")

In [None]:
adjacency_weighted = graph.adjacency
names = graph.names
position = graph.position

In [None]:
# display graph (without edges)
image = svg_graph(adjacency_weighted, position, height=400, width=800, 
                  display_node_weight=True, display_edges=False)

In [None]:
SVG(image)

In [None]:
# remove weights
adjacency = (adjacency_weighted > 0).astype(int)

## To do

* How many nodes are there in this graph?
* How many edges?
* What is the degree of Paris-Orly airport?

## To do

Sample 10 nodes at random and give their names, using the following methods:
* uniform node sampling
* uniform edge sampling
* uniform neighbor sampling

**Hint:** Use ``np.random.choice``.

In [None]:
def sample_nodes(adjacency, n_samples=10):
    '''Sample nodes uniformly at random.
    
    Parameters
    ----------
    adjacency : sparse.csr_matrix
        Adjacency matrix.
    n_samples : int
        Number of samples.
        
    Returns
    -------
    nodes : np.ndarray
        Sampled nodes.
    '''
    # to be modified
    # no loop allowed
    return None

In [None]:
def sample_from_edges(adjacency, n_samples=10):
    '''Sample nodes from edges, selected uniformly at random.
    
    Parameters
    ----------
    adjacency : sparse.csr_matrix
        Adjacency matrix.
    n_samples : int
        Number of samples.
        
    Returns
    -------
    nodes : np.ndarray
        Sampled nodes.
    '''
    # to be modified
    # no loop allowed
    return None

In [None]:
def sample_from_neighbors(adjacency, n_samples=10):
    '''Sample nodes from neighbors, selected uniformly at random.
    
    Parameters
    ----------
    adjacency : sparse.csr_matrix
        Adjacency matrix.
    n_samples : int
        Number of samples.
        
    Returns
    -------
    nodes : np.ndarray
        Sampled nodes.
    '''
    # to be modified
    # no loop allowed
    transition_matrix = normalize(adjacency)
    return None

## To do

Compute the average degree of a node sampled with each of the above sampling methods.

**Note:** You must give the exact value (i.e., don't sample nodes!)

## Weighted graphs

We now take the weights into account (here the daily number of flights between airports).

## To do

* How many daily flights are there?
* What are the top-3 airports in number of flights?

## To do

Sample 10 nodes at random and give their names, using the following methods:
* weighted edge sampling
* weighted neighbor sampling

In [None]:
def sample_from_edges_weighted(adjacency, n_samples = 10):
    '''Sample nodes from edges, selected in proportion to weights.
    
    Parameters
    ----------
    adjacency : sparse.csr_matrix
        Weigthed adjacency matrix.
    n_samples : int
        Number of samples.
        
    Returns
    -------
    nodes : np.ndarray
        Sampled nodes.
    '''
    # to be modified
    # no loop allowed
    return None

In [None]:
def sample_from_neighbors_weighted(adjacency, n_samples = 10):
    '''Sample nodes from neighbors, selected uniformly at random.
    
    Parameters
    ----------
    adjacency : sparse.csr_matrix
        Weigthed adjacency matrix.
    n_samples : int
        Number of samples.
        
    Returns
    -------
    nodes : np.ndarray
        Sampled nodes.
    '''
    # to be modified
    # no loop allowed
    return None

## To do

Compute the average weight (i.e., average number of daily flights) of a node sampled from each of the above sampling methods.

**Note:** Again, you must give the exact values.