# Lab: Introduction to graphs 
# Student version

We can use the following libraries.

In [1]:
import matplotlib.pyplot as plt
import math
import sys
print(sys.version)

3.13.3 (tags/v3.13.3:6280bb5, Apr  8 2025, 14:47:33) [MSC v.1943 64 bit (AMD64)]


## Exercise 1: getting things started with undirected graphs

### Question 1

Create manually an undirected graph (with approximately a dozen nodes) _test_graph.txt_ and store it in a text file in the format (for each line):

$\texttt{x y}$

Where x and y are separated with a space. You will use them to test your codes.

In [3]:
with open("test_graph.txt", "r") as f:
    for line in f:
        print(line.strip())


1 2
1 3
1 4
2 5
2 6
3 7
3 8
4 9
5 10
6 11
7 8
9 12


### Question 2

Download from Moodle the undirected graph called RoadNet-PA (be careful it is quite heavy). This dataset represents the roads of Pennsylvania (a state of the United States). Note that

$\texttt{0 1}$

means that roads 0 and 1 are connected and that if the edge $\texttt{0 1}$ exists then $\texttt{1 0}$ exists too as it represents the same edge in an undirected graph. 

This dataset allows you to check the results of your programs.

In [5]:
def load_graph(path):
    graph = {}

    with open(path, "r") as f:
        for line in f:
            # Skip SNAP comment lines (start with '#')
            if line.startswith("#"):
                continue
            
            u, v = line.strip().split()

            # Convert to int for efficiency
            u = int(u)
            v = int(v)

            # Undirected: add both ways
            graph.setdefault(u, set()).add(v)
            graph.setdefault(v, set()).add(u)

    return graph

graph = load_graph("RoadNet-PA.txt")

# Quick sanity check
print("Number of nodes:", len(graph))

Number of nodes: 1088092


### Question 3

Make a program which reads a graph from a text file and counts the number of nodes and edges in a graph (without storing the graph in memory). 

In [7]:
def count_graph(path):
    nodes = set()
    edges = 0

    with open(path, "r") as f:
        for line in f:
            if not line.strip():
                continue

            u, v = line.strip().split()
            u = int(u)
            v = int(v)

            edges += 1
            nodes.add(u)
            nodes.add(v)

    return len(nodes), edges


# Example
n, m = count_graph("RoadNet-PA.txt")
print("Nodes:", n)
print("Edges:", m)

Nodes: 1088092
Edges: 1541898


### Question 4

Make a program which counts the degree (i.e. the number of edges) of a node of a graph (without storing it in memory).

In [8]:
def degree_of(path, target):
    degree = 0
    
    with open(path, "r") as f:
        for line in f:
            if line.startswith("#") or not line.strip():
                continue

            u, v = line.strip().split()
            u = int(u)
            v = int(v)

            # Increment if the target appears
            if u == target or v == target:
                degree += 1

    return degree


# Example:
print(degree_of("RoadNet-PA.txt", 42))

    

3


## Exercise 2: loading a graph in memory

### Question 5

Make a program which reads a graph from a text file and load it as a python **dictionary of lists**. 
This implementation of the adjacency list format will be the standard format that we will use to store a graph in this course.

In [None]:
def load_graph_as_adj_list(path):
    graph = {}

    with open(path, "r") as f:
        for line in f:
            if line.startswith("#") or not line.strip():
                continue

            u, v = line.strip().split()
            u = int(u)
            v = int(v)

            # undirected: add both ways
            if u not in graph:
                graph[u] = []
            if v not in graph:
                graph[v] = []

            graph[u].append(v)
            graph[v].append(u)

    return graph


# Example usage
adj = load_graph_as_adj_list("RoadNet-PA.txt")
print("Nodes:", len(adj))
print("Degree of node 0:", len(adj[0]))


### Question 6

Try the data structure of Question 5 on the graph downloaded in Question 2, can you load it in memory? Can you print it on the screen?

Conclude on the scalability (meaning what graph size can you handle with this data structure).

## Exercise 3: degree distribution

### Question 7
Create a program which computes the degree distribution of a graph, store it in a python dictionary of the form:

deg: number of occurrences

### Question 8

Plot the degree distribution in logarithmic scale (using matplotlib for example). Be careful to choose an adequate plotting style.

## Exercise 4: the case of directed graphs

### Question 9

Download the directed graph called AS-Caida-dir on Moodle. This is a directed graph of the Internet at the AS level. Contrary to the previous one, it is directed: 

$\texttt{0 1}$ means that the traffic can go from 0 to 1, but not from 1 to 0.

By adapting your codes for undirected graphs to directed graphs, load it in memory into a double ajacency list format.

### Question 10

Plot in log-log scale both the in-degree distribution and the out-degree distribution of this graph.

## Exercise 5: going further

### Question 11

1) Recall what is the time complexity to find if a node $j$ is a neighbour of node $i$ in the adjacency list format.

2) Now suppose that we order the lists of neighbors of the nodes by indices, can you propose a more efficient way to check that node $j$ is a neighbour of node $i$?

3) Implement your proposition.