###### Introduction to Network Analysis 2023/24 (xii)

## Random-walk sampling, network comparison

In [3]:
import networkx as nx

### I. Estimation by random-walk sampling

You are given five networks in Pajek format.

+ Java class dependency network ([java.net](http://lovro.fri.uni-lj.si/ina/nets/java.net))
+ *nec* overlay map of the Internet ([nec.net](http://lovro.fri.uni-lj.si/ina/nets/nec.net))
+ Sample of Facebook social network ([facebook.net](http://lovro.fri.uni-lj.si/ina/nets/facebook.net))
+ Enron e-mail communication network ([enron.net](http://lovro.fri.uni-lj.si/ina/nets/enron.net))
+ A small part of Google web graph ([www_google.net](http://lovro.fri.uni-lj.si/ina/nets/www_google.net))



In [2]:
# !wget http://lovro.fri.uni-lj.si/ina/nets/java.net
# !wget http://lovro.fri.uni-lj.si/ina/nets/nec.net
# !wget http://lovro.fri.uni-lj.si/ina/nets/facebook.net
# !wget http://lovro.fri.uni-lj.si/ina/nets/enron.net
# !wget http://lovro.fri.uni-lj.si/ina/nets/www_google.net

1. **(code)** Represent the networks with simple undirected graphs and reduce them to their largest connected component.



In [15]:
# your code here
import random
import tqdm


def get_clean_graph(name):
  G = nx.read_pajek(name)
  G = nx.MultiGraph(G)
  s = max(nx.connected_components(G), key=len)
  return G.subgraph(s)


def biased_k(G:nx.Graph, percent=15):
  nodes = list(G.nodes())
  s = int(len(G) * (percent/100))
  cs = 0
  n = random.choice(nodes)
  for i in tqdm.tqdm(range(s)):
    cs += G.degree(n)
    nodes = list(G.neighbors(n))
    n = random.choice(nodes)
  
  return cs / s


In [17]:
G = get_clean_graph("www_google.net")

In [18]:
biased_k(G)

100%|██████████| 128370/128370 [01:52<00:00, 1142.95it/s]


178.06267040585806

2. **(code)** Implement a random-walk sampling and apply it to the networks until you sample 15% of the nodes (with repetitions). Let $s$ be the number of sampled nodes and $k_1,\dots,k_s$ their degree sequence. Estimate the average degree of the network $\langle k\rangle$ using a biased average $$\frac{\sum_ik_i}{s}$$ and also the corrected estimate $$\frac{s}{\sum_ik_i^{-1}}.$$



In [None]:
# your code here

3. **(discuss)** Compare both estimates to the true average degree $\langle k\rangle$.