In [1]:
import algolab
from graphs import *
algolab.init()


<center>
<span class="algolab-title">Chapter 4: Graphs</span>
</center>
<br/>
<center>
<span style="font-size:30px">DRAFT</span>
</center>

# Graph theory

See Alberto Montresor theory here: http://disi.unitn.it/~montreso/sp/slides/06-grafi.pdf

See [Graphs on the book](https://interactivepython.org/runestone/static/pythonds/Graphs/toctree.html)

In particular, see :
* [Vocabulary and definitions](https://interactivepython.org/runestone/static/pythonds/Graphs/VocabularyandDefinitions.html)


To keep it short, a graph is a set of vertices linked by edges. 

## Directed graphs

First off, [download](graphs.py) the Python skeleton to modify.

In this worksheet we are going to use so called Directed Graphs (`DiGraph` for brevity), that is graphs that have _directed_ edges: each edge can be pictured as an arrow linking source node _a_ to target node _b_.  With such an arrow, you can go from _a_ to _b_ but you cannot go from _b_ to _a_ unless there is another edge in the reverse direction. 

* A `DiGraph` for us can also have no edges or no verteces at all. 
* A vertex for us can be anything, a string like _'abc'_, the number _3_, etc 
* In our model, edges simply link vertices and have no weights
* The `DiGraph` is represented as an adjacency list, mapping each vertex to the verteces it is linked to.

In [22]:
%%HTML

<p class="algolab-question">
QUESTION: <code>DiGraph</code> model is thus good for dense or sparse graphs?
</p>

 
Let's look at the constructor:

```python
class DiGraph:
    def __init__(self):
        # The class just holds the dictionary _edges: as keys it has the verteces, and 
        # to each vertex associates a list with the verteces it is linked to.

        self._edges = {}
        
    def add_vertex(self, vertex):
        """ Adds vertex to the DiGraph. A vertex can be any object.
            
            If the vertex already exist, does nothing.
        """
        if vertex not in self._edges:            
            self._edges[vertex] = []            
```            

You will see that inside it just initializes `_edges`. So the only way to create a `DiGraph` is with a call like

In [3]:
g = DiGraph()

`DiGraph` provides an `__str__` method to have a nice printout:

In [4]:
print g

DiGraph()


You can add then vertices to the graph like so:

In [5]:
g.add_vertex('a')
g.add_vertex('b')
g.add_vertex('c')

In [6]:
print g

a: []
b: []
c: []



Adding a vertex twice does nothing:

In [7]:
g.add_vertex('a')
print g

a: []
b: []
c: []



Once you added the verteces, you can start adding directed edges among them with the method `add_edge`:

```python
    def add_edge(self, vertex1, vertex2):
        """ Adds an edge to the graph, from vertex1 to vertex2
        
            If verteces don't exist, raises an Exception.
            If there is already such an edge, exits silently.            
        """
        
        if not vertex1 in self._edges:
            raise Exception("Couldn't find source vertex:" + str(vertex1))

        if not vertex2 in self._edges:
            raise Exception("Couldn't find target vertex:" + str(vertex2))        
            
        if not vertex2 in self._edges[vertex1]:
            self._edges[vertex1].append(vertex2)

```

In [8]:
g.add_edge('a', 'c')
print g

a: ['c']
b: []
c: []



In [9]:
g.add_edge('a', 'b')
print g

a: ['c', 'b']
b: []
c: []



Adding an edge twice makes no difference:

In [10]:
g.add_edge('a', 'b')
print g

a: ['c', 'b']
b: []
c: []



Notice a `DiGraph` can have self-loops too (also called _caps_):

In [11]:
g.add_edge('b', 'b')
print g

a: ['c', 'b']
b: ['b']
c: []



To obtain the verteces, you can use the function `verteces`:

In [12]:
print g.verteces()

set(['a', 'c', 'b'])


Notice it returns a _set_, as verteces are stored as keys in a dictionary, so they are not supposed to be in any particular order. When you print the whole graph you see them vertically  ordered though, for clarity purposes:

In [13]:
print g

a: ['c', 'b']
b: ['b']
c: []



Verteces in the edges list are instead stored and displayed in the order in which they were inserted.

To obtain the edges, you can use the method `adj(self, vertex)`:
```python
    def adj(self, vertex):
        """ Returns the verteces adjacent to vertex. 
            
            NOTE: verteces are returned in a NEW list.
            Modifying the list will have NO effect on the graph!
        """
        if not vertex in self._edges:
            raise Exception("Couldn't find a vertex " + str(vertex))
        
        return self._edges[vertex][:]

```

In [19]:
lst = g.adj('a')
print lst

['c', 'b']


Let's check we actually get back a new list (so modifying the old one won't change the graph):

In [20]:
lst.append('d')
print lst

['c', 'b', 'd']


In [21]:
print g.adj('a')

['c', 'b']


INFO: This technique of giving back copies is also called _defensive copying_: it prevents users from modifying the 
internal data structures of a class instance in an uncontrolled manner. For example, if we allowed them direct access to the internal verteces list, they could add duplicate edges, which we don't allow in our model.
If instead we only allow users to add edges by calling `add_edge`, we are sure the constraints for our model will always remain satisfied.


# 1) Building graphs


```python

    
```

# 2) Manipulate graphs


```python

def reverse(self):
    """ Reverses the direction of all the edges """

```

```python

def remove_self_loops(self):
    """ Removes all of the self loops """
```

TODO: graph union, intersection, ...


# 3) Query graphs 

Today we query graphs the "Do it yourself" way with Depth First Search (DFS) or Breadth First Search (BFS). 

If you have a big graph and complex query needs, there are off-the-shelves query languages and databases (example: Cypher and Neo4J)


## 3.1) Play with dfs and bfs

Create small graphs (like linked lists a->b->c, triangles, mini-full graphs, trees) and try to predict the visit sequence (verteces order, with discovery and finish times) you would have running a dfs or bfs. Then write tests that assert you actually get those sequences bwhen running provided dfs and bfs


# 4) Do cool stuff with theory 

- find connected components
- determine if a graph is acyclic
- find node distances

    

In [17]:
algolab.run(VisitTest)

..
----------------------------------------------------------------------
Ran 2 tests in 0.003s

OK


In [18]:
algolab.run(DiGraphTest)

.......
----------------------------------------------------------------------
Ran 7 tests in 0.062s

OK
