# TP1 (Student version)

We can use the following libraries.

In [None]:
import matplotlib.pyplot as plt
import math
import sys
import json
print(sys.version)

## Exercise 1: get things started

### Question 1

Create manually a few graphs (with approximately a dozen nodes) and store them in the format (for each line):

x y 

You will use them to test your codes.

### Question 2

Download the following graphs:

http://snap.stanford.edu/data/email-Eu-core.html

http://snap.stanford.edu/data/com-Amazon.html

http://snap.stanford.edu/data/com-LiveJournal.html

Also, download the graph email_data_ebel.txt from http://lioneltabourier.fr/teaching_en.html

All these graphs allow you to check the results of your programs.


### Question 3

Make a program which reads a graph from a text file and counts the number of nodes and edges in a graph (without storing it in memory). If a same link appears several times, it will be counted as many times.

In [None]:
def count_node_link(file_name):
    nodes, count_nodes, count_edges = set(), 0, 0
    with open(file_name, "r") as graph_file:
        for line in graph_file:
            node1, node2 = [int(node) for node in line.split()]
            if node1 not in nodes:
                count_nodes += 1
                nodes.add(node1)
            if node2 not in nodes:
                count_nodes += 1
                nodes.add(node2)
            count_edges += 1
    return count_nodes, count_edges

In [None]:
print(count_node_link("res/email-Eu-core.txt"))

### Question 4

Make a program which counts the degree (i.e. the number of edges) of a node of a graph (without storing it in memory). If a same link appears several times, it will increase the degree by as many times. In case of a self-loop, the degree is increased once.

In [None]:
def compute_degree(file_name, node):
    degree = 0
    with open(file_name, "r") as graph_file:
        for line in graph_file:
            node1, node2 = [int(node) for node in line.split()]
            if node1 == node:
                degree += 1
            if node2 == node and node2 != node1:
                degree +=1
    return degree

In [None]:
print(compute_degree("res/email-Eu-core.txt", 1))

## Exercise 2: loading a graph in memory

### Question 5

Make a program which reads a graph from a text file and load it as a python dictionary of lists. 
This implementation of the adjacency list format will be the standard format that we will use to store a graph in this course.

In [None]:
def graph_from_file(file_name):
    graph = {}
    with open(file_name, "r") as graph_file:
        for line in graph_file:
            node1, node2 = [int(node) for node in line.split()]
            if node1 not in graph:
                graph[node1] = []
            graph[node1].append(node2)
            if node2 not in graph:
                graph[node2]= []
            graph[node2].append(node1)
    return graph
    

In [None]:
print(json.dumps(graph_from_file("res/email-Eu-core.txt"), indent = 4))

### Question 6

Make a program that deletes the self-loops and duplicated edges existing in the graph and writes it in a new text file. 

In [None]:
def remove_loop_dupes(graph):
    for node in graph:
        graph[node] = list(dict.fromkeys(graph[node]))
        try:
            graph[node].remove(node)
        except ValueError:
            pass
def graph_to_file(graph, file_name):
    with open(file_name, "w") as graph_file:
        for node1 in graph:
            for node2 in graph[node1]:
                graph_file.write("{} {}\n".format(node1, node2))
                

In [None]:
email_graph = graph_from_file("res/email-Eu-core.txt")
remove_loop_dupes(email_graph)
graph_to_file(email_graph, "output/test.txt")

### Question 7

Try the data structure of Question 5 on various graphs downloaded in Question 2. Conclude on the scalability (what graph size can you handle with this data structure).

## Exercise 3: degree distribution

### Question 8
Create a program which computes the degree distribution of a graph, store it in a python dictionary of the form:

deg: number of occurrences

In [None]:
def compute_degree_dist(graph):
    degree_dist = {}
    for node in graph:
        degree = len(graph[node])
        if degree not in degree_dist:
            degree_dist[degree] = 0
        degree_dist[degree] += 1
    return degree_dist

### Question 9

Plot the degree distribution in log scale (using matplotlib for example).

In [None]:
def plot_degree_dist(graph, log=True):
    if log:
        plt.yscale("log")
        plt.xscale("log")
    degree_dist = compute_degree_dist(graph)
    plt.scatter(degree_dist.keys(), degree_dist.values())

In [None]:
plot_degree_dist(graph_from_file("res/email-Eu-core.txt"))