# Networks: structure, evolution & processes
**Internet Analytics - Lab 2**

---

**Group:** *F*

**Names:**

* *Dessimoz Franck*
* *Lefebvre Hippolyte*
* *Micheli Vincent*

---

#### Instructions

*This is a template for part 2 of the lab. Clearly write your answers, comments and interpretations in Markodown cells. Don't forget that you can add $\LaTeX$ equations in these cells. Feel free to add or remove any cell.*

*Please properly comment your code. Code readability will be considered for grading. To avoid long cells of codes in the notebook, you can also embed long python functions and classes in a separate module. Don’t forget to hand in your module if that is the case. In multiple exercises, you are required to come up with your own method to solve various problems. Be creative and clearly motivate and explain your methods. Creativity and clarity will be considered for grading.*

---

## 2.2 Network sampling

#### Exercise 2.7: Random walk on the Facebook network

In [2]:
import requests
import numpy as np

# Base url of the API
URL_TEMPLATE = 'http://iccluster051.iccluster.epfl.ch:5050/v1.0/facebook?user={user_id}'
# Target user id
user_id = 'f30ff3966f16ed62f5165a229a19b319'
# The actual url to call 
url = URL_TEMPLATE.format(user_id=user_id)
# Execute the HTTP Get request
response = requests.get(url)
# Format the json response as a Python dict
data = response.json()

def walk_age(start,n):
    u = start
    avg_age = 0
    
    for i in range(n):
        avg_age += u["age"]
        next = np.random.choice(u["friends"])
        u = requests.get('http://iccluster051.iccluster.epfl.ch:5050/v1.0/facebook?user={id}'.format(id=next)).json()
        
    return avg_age/n



visited = 10000
walk1 = walk_age(data, visited)
print("Estimated average age of a facebook user is {age} over a sample of {visited} users".format(age=walk1, visited=visited))

Estimated average age of a facebook user is 22.254 over a sample of 1000 users


Our estimation is pretty far from the true average (around 20 years off). We suppose that this discrepancy is due to the fact that young people are more connected than older people. Therefore every time a random friend is chosen it is more likely that this friend is young. Moreover at the beginning of the random walk there might be a bit of locality bias, since the first user (aged 19) is more likely to be connected to other young users even though this bias is quickly washed off by the depth first search nature of the algorithm.

#### Exercise 2.8

In order to have a more accurate estimate of the average age, we change our algorithm the following way:

We use the formula seen in class that downplays the weight of high degree nodes, i.e

$ \frac{\sum \frac{age(x)}{deg(x)}}{\sum \frac{1}{deg(x)}} $

In [3]:
def unbiased_walk_age(start,n):
    u = start
    avg_age = 0
    degrees_sum = 0
    
    for i in range(n):
        degree = len(u['friends'])
        degrees_sum += 1/degree
        avg_age += u["age"]/degree
        next = np.random.choice(u["friends"])
        u = requests.get('http://iccluster051.iccluster.epfl.ch:5050/v1.0/facebook?user={id}'.format(id=next)).json()
        
    return avg_age/degrees_sum

walk2 = unbiased_walk_age(data, visited)
print("Estimated average age of a facebook user is {age} over a sample of {visited} users".format(age=walk2, visited=visited))

Estimated average age of a facebook user is 41.34135280673381 over a sample of 1000 users
