# Implementing the model in Python

Before we dive into implementing the model, we need to get setup ready to create our graph with *igraph*.

## Getting setup

In [None]:
#! pip install -r requirements.txt

To begin creating this graph in Python, we can import the nodes from *musae_facebook_target.csv*, using the standard Python csv library:

In [None]:
import csv
with open('./data/musae_facebook_target.csv', 'r', encoding='utf-8') as csv_file:
    reader = csv.reader(csv_file)
    data = [line for line in reader]
    print(data[:10])
    print(len(data))

Here, we open the *csv* file with *utf-8 encoding*, as some node name strings contain non-standard characters. We use *csv.reader* to read the file, and convert this into a list of lists with a list comprehension (a special construct to encapsulate a loop inside a list to return a new list based on the loop logic, in essence to create a list from another list). Finally, we confirm that the *csv* is loaded correctly by examining the first few lines, and checking the length of the imported list, which should be equal to 22,471, the number of rows.

## Adding nodes and attributes

In *igraph*, we could add nodes one at a to begin creating our graph, as we did in the previous chapter. However, as with many operations in Python, it is faster to add nodes listwise, in a single operation.

To prepare our data for this, we need lists of node names, and each node attribute. We can prepare these lists using more list comprehensions:

In [None]:
node_ids = [int(row[0]) for row in data[1:]]
page_names = [row[2] for row in data[1:]]
page_types = [row[3] for row in data[1:]]

Note that the [1:] list slice on data is removing the csv header from each list, that we would not want to include as a node.

We need to confirm that the *id* row of our data increases sequentially. As mentioned in chapter 1, *igraph* uses a sequentially increasing integer index for every node added to the graph. If our id column also uses this, adding nodes will be a simple process. To confirm that this is the case, we can compare the id column to a Python *range()*:

In [None]:
assert node_ids == list(range(len(node_ids)))

This assert is making sure that a *range()* of 0 to the len() of the node_ids list is equivalent to the list of node ids in our data. This assert should raise no AssertionError, as they are identical.

This means importing our nodes into *igraph*, in this case, is as simple as creating a new, undirected, empty graph and telling igraph how many nodes we would like:

In [None]:
import igraph as ig
g = ig.Graph(directed=False)
g.add_vertices(len(node_ids))


We can confirm how many nodes have been added by accessing the *vs* attribute of the *Graph()* object, and check that this is equal to the length of the *node_ids* list using another *assert*:

In [None]:
print(len(g.vs))
assert len(node_ids) == len(g.vs)

This will show that the number of nodes is 22470, one less than the number of rows in the original csv file, which accounts for the removed header. Additionally, the assert will compare the length of both objects and raise an error if these values are not equal (expressed with the == equality symbol). 

Now that nodes have been added, we can add our attributes to the nodes in a listwise operation, using the attribute lists *page_names* and *page_types* that were prepared earlier:

In [None]:
g.vs['page_name'] = page_names
g.vs['page_type'] = page_types

Here we use the *vs* attribute of the graph to write the page names in order, from node with id 0 to node 22470. Because the order of our properties and ids was preserved when preparing these lists earlier, this is the easiest way to quickly add all of our node attributes.

We can confirm that node attributes have been written to the graph with:

In [None]:
print(g.vs[0]['page_name'])
print(g.vs[0]['page_type'])

Which should print the node name and type of the first data row in our original csv.

## Adding edges

Now that our nodes have been added to the graph, we can begin to connect them together. All the information we need to do this is contained in *musae_facebook_edges.csv*, so lets import this file:

In [None]:
with open('./data/musae_facebook_edges.csv', 'r') as csv_file_2:
	reader = csv.reader(csv_file_2)
	edge_data = [row for row in reader]
	print(edge_data[:10])
	print(len(edge_data))


As with the nodes earlier, we are importing the csv edge list using Python’s inbuilt csv library (imported earlier). This file contains no special characters, so we don’t need to specify the encoding.

Again, we examine the first few rows of the imported list of lists, by printing them, along with the number of rows to get an idea of how many edges we are adding to the graph. 

Notice that this file also contains a header, that we do not want to inadvertently include as an edge in our graph. Also, in *igraph*, nodes are referred to by their integer ID, so we will need to change our list elements to integers, ready for edge addition. We can do this, and remove the header, using a list comprehension:

In [None]:
edges = [[int(row[0]), int(row[1])] for row in edge_data[1:]]
print(edges[:10])

We then confirm that the edge list has been converted to integers correctly, by again printing to examine the first ten elements.

Now that the data is prepared, the edges can all be added to our graph at once with the *Graph.add_edges()* method:

In [None]:
g.add_edges(edges)

We can confirm that the edges have been added by accessing the *es* attribute of the graph, and counting them:

In [None]:
print(len(g.es))

This should be equal to the number of rows in the csv file, minus one (171002 edges).

Let's also confirm that an edge we know should be in the graph, has been added correctly. Looking at the first non-header row of *musae_facebook_edges.csv*, we can see there should be an edge between node 0 and node 18427. We can access the first edge added to the graph using the es attribute, and indexing:

In [None]:
first_edge = g.es[0]

This edge should be connecting nodes with IDs  0 and 18427. We can validate this by printing the source and target attributes of out newly created first_edge variable:

In [None]:
print(first_edge.source)
print(first_edge.target)

Finally, to relate this back to the real dataset, we can check what Facebook pages this edge represents, by accessing the node’s *page_name* attributes:

In [None]:
print(g.vs[0]['page_name'])
print(g.vs[18427]['page_name'])

This shows us that the corresponding Facebook pages for these nodes are ‘The Voice of China 中国好声音’ and ‘The Voice Global’, and that these Facebook pages share a mutual like.