Skip to content
Course exercise: sampling online graphs
Python Shell
Find file
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
.gitignore
README
download_data.sh
generate.graph
graphsize.py
graphsize_test.py
mhrw.py
mhrw_sample.py
plotter.py
rwrw.py
rwrw_sample.py
uis_wr.py
uis_wr_test.py
wis_wr.py

README

Lab exercise in Advanced Topics in Distributed Systems: Sampling Massive Online Graphs

---

Data source
- http://snap.stanford.edu/data/p2p-Gnutella31.html 
- or ./download_data.sh 

---

Exercise

Implement graph size estimation using the following sampling techniques:
(a) Uniform Independent Sample Without Replacement, 
(b) Uniform Independent Sample With Replacement, 
(c) Weighted Independent Sample With Replacement (with weights equal to node degrees), 
(d) Random Walk (Metropolis-Hastings and Reweighted)
    1) using all nodes, 
    2) every k-th node,
    3) the weights are node degrees 

What do you observe?

---

Requirements
1) Python
2) networkx (easy_install networkx)
3) gnuplot

---

Where to start?

Each sampling technique is contained in its own file (except UIS_WOR which doesn't make sense since there wont be any collisions). For example, to start a full MHRW sample run: 
     python mhrw_sample.py

UIS_WR and WIS_WR samples are self-contained. Run as:
     python uis_wr.py or python wis_wr.py

RWRW is broken. 

---  

Generate graphs

Run 'gnuplot generate.graph' to create some (good looking) graphs.
Something went wrong with that request. Please try again.