## 17.3 Edge list representation

The simplest way to store a graph is to have two separate collections,
for its nodes and edges. This is called an **edge list** representation.
The word 'list' doesn't refer to Python's list type,
but rather to listing all edges explicitly.
Any suitable data structure can be used for the collections:
they're usually arrays, linked lists or sets.

In Python, it's best to use sets, one with the nodes,
the other with pairs of nodes. Let's consider the same example graphs.

<p id="fig-17.3.1"></p>

*[Figure 17.3.1](../33_Figures/Figures_17_3.ipynb#Figure-17.3.1)*

![Image 17_1_un_directed.png](17_1_un_directed.png)

The undirected graph on the left could be represented with sets

- `{'Alice', 'Bob', 'Chan', 'David'}`
- `{('Alice', 'Bob'), ('Bob', 'Chan'), ('Alice', 'David'), ('David', 'Bob')}`.

An undirected edge between nodes A and B is stored as a single pair
to save memory: it doesn't matter whether it's (A, B) or (B, A).
Since edges are undirected, one could consider representing each as
a set of two nodes: {{'Alice', 'Bob'}, {'Bob', 'Chan'}, ...}.
[Python's sets](../08_Unordered/08_5_summary.ipynb#8.5.3-Sets) require items to be hashable,
but sets themselves aren't,
so `{{'Alice', 'Bob'}, {'Bob', 'Chan'}}` isn't a valid Python literal.

The digraph on the right would be represented with sets

- `{1, 2, 3, 4}`
- `{(1, 2), (2, 1), (4, 1), (4, 2)}`.

Using Python's `set` type has two advantages. The first is conceptual:
the representation directly follows the definition of graphs as
a set of nodes and a set of edges.
The second is practical: the type supports constant-time membership checks,
additions and deletions, whereas Python's lists don't.

If nodes can't be represented by a hashable type
(strings, integers, Booleans and tuples thereof),
then Python's lists are the second-best choice for an edge list representation.

### 17.3.1 Exercises

#### Exercise 17.3.1

Bob thinks that the collection of nodes helps implement graph operations
in an easy and efficient way, but is not strictly necessary,
since the nodes can be obtained by going though the collection of edges.
He proposes to not store the collection of nodes for very large graphs,
to save memory.

Is he right? Is the collection of nodes redundant?
Can we implement the graph ADT just with the collection of edges?

_Write your answer here._

[Hint](../31_Hints/Hints_17_3_01.ipynb)
[Answer](../32_Answers/Answers_17_3_01.ipynb)

#### Exercise 17.3.2

For each operation below, describe very briefly how it's implemented
for digraphs and give the worst-case complexity in terms of
the number of nodes *n* and the number of edges *e*.
You may wish to look again at the complexity of
[set operations](../08_Unordered/08_5_summary.ipynb#8.5.3-Sets).
I've done some operations for you.

Operation | Implementation | Complexity
:-|:-|:-
add node A | add to set of nodes | Θ(1)
remove node A |   |
has edge (A, B) | check membership in the set of edges | Θ(1)
add edge (A, B) |   |
remove edge (A, B) |   |
in-neighbours of A | find all B such that (B, A) is in the set of edges | Θ(*e*)
out-neighbours of A | |
in-degree of A |   |

[Hint](../31_Hints/Hints_17_3_02.ipynb)
[Answer](../32_Answers/Answers_17_3_02.ipynb)

#### Exercise 17.3.3

Our definition of graph doesn't allow **loops** (edges between the same node)
but sometimes they are necessary to model a network.
It's therefore useful to know how to handle them.

Consider a director/actor network. It can be modelled with a digraph
where the nodes represent people and where edge (A, B) means that
A has directed B in some play or film.
If person A acted in a play or film they directed,
then the digraph must include edge (A, A).

Can an edge list representation accommodate loops? How?

_Write your answer here._

[Answer](../32_Answers/Answers_17_3_03.ipynb)

⟵ [Previous section](17_2_concepts.ipynb) | [Up](17-introduction.ipynb) | [Next section](17_4_adj_matrix.ipynb) ⟶