In [4]:
import networkx as nx
import numpy as np

# Creating and mutating a graph object

Create an empty graph object.

In [5]:
g = nx.Graph()

### Adding nodes

Add a node to the graph.

In [6]:
g.add_node('a')

Check how many nodes are in the graph.

In [7]:
len(g)

1

Alternatively:

In [8]:
g.number_of_nodes()

1

Generate a list of nodes in the network.

In [6]:
g.nodes()

['a']

Add a "bunch" (a.k.a. a list) of nodes to the graph object.

In [8]:
g.add_nodes_from(['b','c','d'])

Look at all the nodes in the graph now. Order isn't preserved, but this doesn't matter.

In [9]:
g.nodes()

['d', 'a', 'b', 'c']

Check again how many nodes are in the graph.

In [10]:
len(g)

4

### Adding edges
An *edge* is a connection/relationship/interaction between two of the nodes. Use the `.add_edge()` method on the graph object to put edges into the network. `add_edge` needs a source and a target node. If one of the nodes doesn't already exist in the graph, it will be added automatically.

In [11]:
g.add_edge('a','b')

Look at the list of edges in the network.

In [12]:
g.edges()

[('a', 'b')]

We can also remove an edge from the network.

In [13]:
g.remove_edge('a','b')

Check to make sure it was removed.

In [14]:
g.edges()

[]

Add a bunch of edges and check to make sure they're in there. Adding edges that already exist doesn't (shouldn't!) change anything.

In [9]:
g.add_edges_from([('a','b'),('a','c'),('a','d')])
g.edges()

[('c', 'a'), ('a', 'b'), ('a', 'd')]

In [23]:
g.add_edge('d','a')
g.edges()

[('d', 'a'), ('a', 'c'), ('a', 'b')]

# Make a digraph

A directed graph ("digraph") is a graph where the direction of the edges matter: a connection from A to B doesn't mean that a connection from B to A also exists. Examples of these kinds of directed links are following relationships in social media platforms like Twitter or Instagram.

Make a `dg` DiGraph object that's empty.

In [11]:
dg = nx.DiGraph()

Add some edges again, but remember this time the ordering matters: a connection from A to B doesn't mean a B to A connection also exists.

In [12]:
dg.add_edges_from([('a','b'),('a','c'),('a','d')])
dg.nodes()

['c', 'a', 'd', 'b']

In an undirected Graph object, there's no difference between a (C,A) and a (A,C) edge, but in a directed graph these are distinct.

In [13]:
dg.add_edge('c','a')

You can also add self-loops where the node is connected to itself. This doesn't make much sense when we're thinking about (online or offline) social systems: you can't follow or be friends with yourself. However in many other systems, espcially biological systems, these self-loops are important to model.

In [14]:
dg.add_edge('a','a')
dg.edges()

[('c', 'a'), ('a', 'c'), ('a', 'b'), ('a', 'd'), ('a', 'a')]

In [38]:
nx.adjacency_matrix(dg).todense()

matrix([[0, 0, 0, 0],
        [1, 1, 1, 1],
        [0, 0, 0, 0],
        [0, 1, 0, 0]], dtype=int64)

In [40]:
nx.incidence_matrix(dg).todense()

matrix([[ 1.,  0.,  0.,  0.,  0.],
        [ 1.,  1.,  1.,  0.,  1.],
        [ 0.,  0.,  1.,  0.,  0.],
        [ 0.,  1.,  0.,  0.,  1.]])

# Compare graph representations

There are three classic ways to represent a graph as a data structure:

* **Edge list** - A list of all the edges in the network. This has the benefit of being relatively efficient at storing data, but can be inefficient for doing some kinds of calculations since we need to loop through the list.
* **Adjacency matrix** - A square matrix where the rows and columns are ordered symmetrically. The values of each row record whether or not (or how strong) a tie exists between the nodes. Because the absence of an is still recorded as a value, this is very memory/space inefficient but the matrix representation lets us do some computations very quickly as well.
* **Incidence matrix** - A matrix where the each row corresponds to a nodes and the values are a list of the neighbors. This has good trade-offs between the performance and space considerations of edge lists and adjacency matrices. 

### Edgelist
An edgelist is how `networkx` manages most operations:

In [24]:
g.edges()

[('d', 'a'), ('a', 'c'), ('a', 'b')]

Check the directed graph's edgelist.

In [27]:
dg.edges()

[('c', 'a'), ('a', 'c'), ('a', 'b'), ('a', 'd'), ('a', 'a')]

### Adjacency matrix
We can also call the `adjacency_matrix` function on the graph object. This returns a sparse NumPy matrix, which is basically an edgelist, so we use the `todense()` to give us the true matrix.

In [29]:
nx.adjacency_matrix(g).todense()

matrix([[0, 1, 0, 0],
        [1, 0, 1, 1],
        [0, 1, 0, 0],
        [0, 1, 0, 0]], dtype=int64)

The order of the rows and columns corresponds to the order of the nodes in the nodelist. C is first in the nodelist, so it's also the first row in the adjacency matrix, etc.

In [15]:
g.nodes()

['c', 'a', 'd', 'b']

Also check the directed graph we made.

In [26]:
nx.adjacency_matrix(dg).todense()

matrix([[0, 1, 0, 0],
        [1, 1, 1, 1],
        [0, 0, 0, 0],
        [0, 0, 0, 0]], dtype=int64)

## Adjacency list

We can use the `generate_adjlist` function on the graph object to create the adjacency list. This returns a generator object which we unpack with the `list`. The first element in each item is the node and the values after it are the nodes to which it's connected.

In [24]:
list(nx.adjlist.generate_adjlist(g,delimiter=' '))

['c a', 'a b d', 'd', 'b']

In [25]:
list(nx.adjlist.generate_adjlist(dg,delimiter=' '))

['c a', 'a c b d a', 'd', 'b']