# Networks: structure, evolution & processes
**Internet Analytics - Lab 2**

---

**Group:** *Your group letter.*

**Names:**

* *Name 1*
* *Name 2*
* *Name 3*

---

#### Instructions

*This is a template for part 2 of the lab. Clearly write your answers, comments and interpretations in Markodown cells. Don't forget that you can add $\LaTeX$ equations in these cells. Feel free to add or remove any cell.*

*Please properly comment your code. Code readability will be considered for grading. To avoid long cells of codes in the notebook, you can also embed long python functions and classes in a separate module. Don’t forget to hand in your module if that is the case. In multiple exercises, you are required to come up with your own method to solve various problems. Be creative and clearly motivate and explain your methods. Creativity and clarity will be considered for grading.*

---

## 2.2 Network sampling

#### Exercise 2.7: Random walk on the Facebook network

In [1]:
# ... WRITE YOUR CODE HERE...
import requests
import random

In [2]:
def random_walk(s, N):
    ages = 0
    dict_users = {}
    
    # Initialize the first user
    URL_TEMPLATE = 'http://iccluster028.iccluster.epfl.ch:5050/v1.0/facebook?user={user_id}'
    url = URL_TEMPLATE.format(user_id=s)
    response = requests.get(url)
    dict_users[s] = response.json()
    
    # Walk randomly through the graph 
    for _ in range(N):
        
        data = dict_users[s]
        
        age = data['age']
        friends = data['friends']
        
        ages += age
        s = random.choice(friends)
        
        # Request for a new user if not already in the graph
        if s not in dict_users:
            url = URL_TEMPLATE.format(user_id=s)
            response = requests.get(url)
            dict_users[s] = response.json()
            
    return ages/max(N,1)

N = 1000
avg_age = random_walk('a5771bce93e200c36f7cd9dfd0e5deaa', N)

In [3]:
print("The average age is {:.1f} years old.".format(round(avg_age, 1)))

The average age is 25.6 years old.


The average age obtained is around 26 years old for 1000 nodes visited.

#### Exercise 2.8

Our estimation of 23 years is far from the real user average age. This can be explained by the fact that, as seen in the course, younger users tend to have more friends and thus, they tend to be sampled more often. Moreover, our first user is 13 years so we will first more likely navigate among young people. So we want now to compensate for this bias, using the formula seen in class.

In [4]:
def better_random_walk(s, N):
    weighted_ages = 0
    sum_degrees = 0
    URL_TEMPLATE = 'http://iccluster028.iccluster.epfl.ch:5050/v1.0/facebook?user={user_id}'
    dict_users = {}
    
    # Initialize the first user
    url = URL_TEMPLATE.format(user_id=s)
    response = requests.get(url)
    dict_users[s] = response.json()
    
    # Walk randomly through the graph 
    for _ in range(N):
        data = dict_users[s]
        
        age = data['age']
        friends = data['friends']
        degree = len(friends)
        
        # Compensate for the bias
        weighted_age = age / degree
        sum_degree = 1 / degree
        
        weighted_ages += weighted_age
        sum_degrees += sum_degree
        
        s = random.choice(friends)
        
        # Request for a new user if not already in the graph
        if s not in dict_users:
            url = URL_TEMPLATE.format(user_id=s)
            response = requests.get(url)
            dict_users[s] = response.json()
    
    return weighted_ages / sum_degrees

N = 10000

avg_age = better_random_walk('a5771bce93e200c36f7cd9dfd0e5deaa', N)

In [5]:
print("The average age is {:.1f} years old.".format(round(avg_age, 1)))

The average age is 44.7 years old.


Our new average of around 45 years old is thus much closer to the true average, with the same number of nodes visited, since we have compensated for the bias.