### Implementing the model in Python
To begin creating this graph in Python, we can import the nodes from `musae_facebook_target.csv`, using the standard python csv library. 

In [2]:
import csv

with open('./data/musae_facebook_target.csv', 'r', encoding='utf-8') as csv_file:
    reader = csv.reader(csv_file) # We use csv.reader to read the file, and convert this into a list of lists with a list comprehension
    data = [line for line in reader]
    print(data[:10])
    print(len(data))

[['id', 'facebook_id', 'page_name', 'page_type'], ['0', '145647315578475', 'The Voice of China 中国好声音', 'tvshow'], ['1', '191483281412', 'U.S. Consulate General Mumbai', 'government'], ['2', '144761358898518', 'ESET', 'company'], ['3', '568700043198473', 'Consulate General of Switzerland in Montreal', 'government'], ['4', '1408935539376139', 'Mark Bailey MP - Labor for Miller', 'politician'], ['5', '134464673284112', 'Victor Dominello MP', 'politician'], ['6', '282657255260177', 'Jean-Claude Poissant', 'politician'], ['7', '239338246176789', 'Deputado Ademir Camilo', 'politician'], ['8', '544818128942324', 'T.C. Mezar-ı Şerif Başkonsolosluğu', 'government']]
22471


## Adding nodes and attributes

In igraph, we would add nodes one at a time to begin creating our graph, asa we did in Chapter 1, introducing Graphs in the Real World. However, as with many operations in Python, it is faster to add nodes listwise, in a single operation. 

In [6]:
node_ids = [int(row[0]) for row in data[1:]]
page_names = [row[2] for row in data[1:]] 
page_types = [row[3] for row in data[1:]]

print(page_names)
print(page_types)
print(node_ids)

# Note that the [1:] list slice on data is removing the csv header from each list, which we don't want to include as a node.

['tvshow', 'government', 'company', 'government', 'politician', 'politician', 'politician', 'politician', 'government', 'government', 'government', 'politician', 'government', 'government', 'government', 'government', 'company', 'politician', 'government', 'company', 'politician', 'company', 'government', 'company', 'company', 'company', 'government', 'tvshow', 'company', 'company', 'company', 'government', 'government', 'politician', 'politician', 'company', 'company', 'government', 'government', 'government', 'politician', 'government', 'company', 'government', 'tvshow', 'company', 'politician', 'government', 'company', 'company', 'government', 'politician', 'tvshow', 'company', 'government', 'tvshow', 'government', 'company', 'government', 'politician', 'politician', 'company', 'politician', 'company', 'government', 'company', 'company', 'politician', 'tvshow', 'tvshow', 'tvshow', 'politician', 'government', 'politician', 'government', 'government', 'company', 'politician', 'politic

In [7]:
# We want to confirm that the id row of our data increases sequentially. 
# igraph uses sequentially increasing integer index for every node added to the graph.

assert node_ids == list(range(len(node_ids)))
# The assert statement should raise no AssertionError error, if they are identical else it would.

# alternative
print( node_ids == list(range(len(node_ids))))

In [23]:
# Importing our nodes into igraph.
import igraph as ig
g = ig.Graph(directed=False)
g.add_vertices(len(node_ids))

# count the number of nodes
print(len(g.vs))
assert len(node_ids) == len(g.vs)
 
# Note this export 22470, one less than the number of rows in the original csv file, which accounts for the remove of header.

22470


In [24]:
# Now that nodes have been added, we can add our attributes to the nodes in a listwisee operation using page_names, and page_types
g.vs['page_name'] = page_names
g.vs['page_type'] = page_types

# Here, we use the vs attribute of the graph to write the page names in order, from the node with ID 0 to node 22470

print(g.vs[1]['page_name'])
print(g.vs[1]['page_type'])

# Using the print statement, we can confirm that the node attributes have been written to the graph.

U.S. Consulate General Mumbai
government


## Adding edges
Edges are relationships between nodes.  

In [16]:
# All the information we need to do this is contained in musae_facebook_edges.csv

with open('./data/musae_facebook_edges.csv', 'r') as csv_file_2:
    reader = csv.reader(csv_file_2)
    edge_data = [row for row in reader]
    print(edge_data[:10])
    print(len(edge_data))

[['id_1', 'id_2'], ['0', '18427'], ['1', '21708'], ['1', '22208'], ['1', '22171'], ['1', '6829'], ['1', '16590'], ['1', '20135'], ['1', '8894'], ['1', '15785']]
171003


In [19]:
# Notice that the file contains a header that we do not want to be included as an edge in our graph. Also in igraph, nodes are referred to by their integer ID,
# So we will need to change our list elements to integers, ready for edge addition. 
edges = [[int(row[0]), int(row[1])] for row in edge_data[1:]]
print(edges[:10])

# We then confirm that the edge list has been converted to integers correctly by printing the first 10 elements.

[[0, 18427], [1, 21708], [1, 22208], [1, 22171], [1, 6829], [1, 16590], [1, 20135], [1, 8894], [1, 15785], [1, 10281]]


#### Adding edges 
Now that the data is prepare, the edges can all be added to our graph at once.

In [25]:
g.add_edges(edges)
print(len(g.es))

# This should equal to the number of rows in the .csv file, minus one 171002

171002


In [27]:
# Let's confirrm that an edge we know should be in graph has been added correctly.
first_edge = g.es[0]
print(first_edge.source)
print(first_edge.target)

0
18427


In [30]:
print(g.vs[0]['page_name'])
print(g.vs[18427]['page_name'])

# This shows us the corresponding facebook pages from these nodes are the Voice of CHina and The voice global

The Voice of China 中国好声音
The Voice Global


### Writing a generic graph import methods
We created our graph from the datasets in many small stages. We may want to speed up this process the next time we import a similar graph, which we can do by writting some more generic python methods.


In [31]:
def read_csv(csv_path):
    """
    Import a csv file
    :param csv_path: path to the csv to import
    :return: A list of lists reada from the csv.
    """
    import csv
    import os

    assert os.path.exists(csv_path), \
        f"File could not be found at {csv_path}"
    
    with open(csv_path, 'r', encoding='utf-8') as csv_file:
        reader = csv.reader(csv_file)
        data = [row for row in reader]
    
    return data


## Next using our imported list of list, we are going to ad nodes and edges to our graph. 
# Beginning with nodes..

def add_nodes(g, nodes, attributes):
    """
    Add nodes to the graph

    :param g: An igraph Graph() object.
    :param nodes: A list of lists containing nodes and node attributes, with a header. The firsrt element of each list in nodes should be the node ID.
    :param attributes: A list of attributes corresponding to the header (index 0 ) of the nodes list. The names of attributes in the list will be addded to the graph.
    """
    assert nodes[0][0] == 'id', \
        f'The first column in the imported csv should be the ID header, "id". Instead, it '\
        f'is {nodes[0][0]}.'
    
    node_ids = [int(row[0]) for row in nodes[1:]]

    assert node_ids == list(range(len(node_ids))), \
        f'Node IDs should increase sequentially in the imported csv, from 0 to the number of' \
        f' nodes-1, {len(node_ids)}.'
    
    assert isinstance(attributes, list), \
        f'Attributes to add to the graph should be a list. instead attribues is of type {type(attributes)}.'
    
    g.add_vertices(len(node_ids))
    headers = nodes[0]

    for attribute in attributes:
        attr_index = headers.index(attribute)
        g.vs[attribute] = [row[attr_index] for row in nodes[1:]]
        
    return g

In [32]:
def add_edges(g, edges):
    '''
    Add edges to the graph, where nodes are already present.
 
    :param g: An igraph Graph() object.
    :param edges: A list of lists containing edges, with a header.
    '''
    
    assert len(edges[0]) == 2, \
        f'Each element in the imported edges csv should be of length 2, representing an edge'\
        f' between two linked nodes. Instead, the first element is of length {len(edges)[0]}.'
 
    edges_to_add = [[int(row[0]), int(row[1])] for row in edges[1:]]
    g.add_edges(edges_to_add)
 
    return g


In [33]:
import igraph
def graph_from_attributes_and_edgelist(node_attr_csv, edgelist_csv, attributes):
    g = igraph.Graph(directed=False)

    nodes = read_csv(node_attr_csv)
    edges = read_csv(edgelist_csv)

    g = add_nodes(g, nodes, attributes)
    g = add_edges(g, edges)

    return g

In [35]:
node_attr_csv = './data/musae_facebook_target.csv'
edgelist_csv = './data/musae_facebook_edges.csv'
attributes = ['page_name', 'page_type']

g = graph_from_attributes_and_edgelist(node_attr_csv, edgelist_csv, attributes)

In [36]:
print(g.vs[0]['page_name'])
print(g.vs[0]['page_type'])
first_edge = g.es[0]
print(first_edge.source)
print(first_edge.target)
print(len(g.es))
print(g.vs[0]['page_name'])
print(g.vs[18427]['page_name'])

The Voice of China 中国好声音
tvshow
0
18427
171002
The Voice of China 中国好声音
The Voice Global


Now that we are satisfied that our graph data model is set up as intended, we can begin to use it to answer network-based questions. Having the tools to examine the graph in detail will allow us to turn a critical eye to the dataset and the way we have represented it in our graph schema.