<center><img src="https://github.com/DACSS-CSSmeths/guidelines/blob/main/pics/small_logo_ccs_meths.jpg?raw=true" width="700"></center>







# Graphs

Let me show you a graph (from [wikipedia](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics))):

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/6n-graf.svg/440px-6n-graf.svg.png"/>

As you can see, it is simply a representation of two sets:

1. A set of **vertices** or **nodes**. In the image above you see the nodes _1_, _2_, _3_, _4_, _5_, and _6_.
2. A set of **edges** or **links**. In the image above, the links are connecting pairs of nodes. 

Altogether, a _graph_ reveals some _relationship_ among the _nodes_. The graph structure will allow us to explore and understand that relationship. 

## Creating Graphs

The graph above can be represented computationally in Python using **networkx**:

In [None]:
import networkx as nx

# create graph
G = nx.Graph()

# the list of edges, edges as tuples
listOfEdges=[(1, 2), (1, 5),(2,5),(2,3),(3,4),(4,5),(4,6)]

# create nodes and edges
G.add_edges_from(listOfEdges)

The las code chunk created your first graph!

## Basic Elements

**G** is the  object:

In [None]:
#you don't see much...just what it is:
G

In [None]:
# You see nodes and their attributes (nothing yet)
G.nodes.data()

In [None]:
# You just see node ids
G.nodes()

In [None]:
# You see edges
G.edges()

In [None]:
# You see edges and their attributes (nothing yet)
G.edges.data()

## Drawing

As you can see, the graph is created by adding pairs of nodes. Once you complete that stage, you can draw the graph:

In [None]:
# draw
nx.draw(G=G,
        with_labels=True,
        node_color='yellow',
        edgecolors='black')



## Directed Graphs

The graph we created and drew represented an **undirected** graph, that is, the relationships between a pair of nodes are **symmetric**: the relationships can not represent direction because they are _inherently mutual_ between the nodes. For example, the relationship *to be a neighbor of* is symmetric.

The following graph is **directed** (also from wikipedia):

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/23/Directed_graph_no_background.svg/340px-Directed_graph_no_background.svg.png"/>


You can create this directed graph this way:

In [None]:
# create DIRECTED graph
dG = nx.DiGraph()

# create nodes and edges
dG.add_edges_from([(1, 2), (1, 3),(3,2),(3,4),(4,3)])

# drawing
nx.draw(dG,with_labels=True,node_color='white',edgecolors='black')

Directed links are also called **arcs**. Notice the _DiGraph_ created represents an **asymmetric** relationship: the relationship a node has with another node does not need to be mutual, but could be (see nodes _3_ and _4_). If the arcs represent **cares for someone**, it shows that the feeling is not reciprocal for most cases in this graph. If a relationship is not mutual, and can only be in one direction, it is called **anti symmetric** (be "a parent of" belongs to this kind).

## Attributes

Nodes can have attributes. This is how you add attributes manually:

In [None]:
# adding attributes, just female/male for simplicity
dG.nodes[1]["sex"]='male'
dG.nodes[2]["sex"]='female'
dG.nodes[3]["sex"]='female'
dG.nodes[4]["sex"]='male'

In [None]:
# seeing attributes
nx.get_node_attributes(dG, "sex")

Notice the above structure is a Python dictionary, {'key':'value'}.
Python dictionaries are important for network data management. Here, for example, the node is the key, and the attribute is the value. Knowing how to **build** a dictionary is needed to add attributes via coding, instead of manually as we did above.

Above, we had few nodes, so we added an attribute node by node. Let me set the color of the node based on _sex_, this time NOT manually:

* using **items()**: This will help you create a dictionary (dict):

In [None]:
# requesting attribute as "items"
nx.get_node_attributes(dG, "sex").items()

* create a dictionary using **comprehensions**:

In [None]:
# assigning color condtionally to sex
# write 'red' as value for  'node', if the 'sex' of 'node' is 'female', else, write 'blue'
# do this for every pair 'node','sex' in 'nx.get_node_attributes(dG, "sex").items()'
{node:'red' if sex=='female' else 'blue' for node,sex in nx.get_node_attributes(dG, "sex").items()}

And that is how we built a dict.

* Now, use the dict to add the attribute:

In [None]:
# dict saved a 'colorDict'
colorNodes={node:'red' if sex=='female' else 'blue' for node,sex in nx.get_node_attributes(dG, "sex").items()}

# use 'colorNodes' to create attribute "color"
nx.set_node_attributes(dG, colorNodes, "color")

# the attribute is now in the node data:
dG.nodes.data()

You can also recover the attributes like this:

In [None]:
# since
nx.get_node_attributes(dG, "color").values()

This last code can serve when drawing:

In [None]:
# using node attributes

nx.draw(dG,
        with_labels=True,
        node_color=nx.get_node_attributes(dG, "color").values())

Of course, edges can have attributes too:

In [None]:
dG.edges[(1, 2)]['weight']=1
dG.edges[(1, 3)]['weight']=3
dG.edges[(3, 2)]['weight']=5
dG.edges[(3, 4)]['weight']=10
dG.edges[(4, 3)]['weight']=0.5

In [None]:
# see them
dG.edges.data()

Let me add another attribute conditional on another attribute:

In [None]:
# if
{edge:'magenta' if weight<=1 else 'grey' for edge,weight in nx.get_edge_attributes(dG, "weight").items()}

In [None]:
# then
colorEdges={edge:'magenta' if weight<=1 else 'yellow' for edge,weight in nx.get_edge_attributes(dG, "weight").items()}
nx.set_edge_attributes(dG,values=colorEdges,name='color')

In [None]:
# see edges and attributes
dG.edges.data()

Let's use edge attributes:

In [None]:
# use the edge attributes
# add labels to edges
            
pos = nx.spring_layout(dG) # position of the nodes

nx.draw(dG,
        pos, # using "position"
        with_labels=True,
        node_color=nx.get_node_attributes(dG, "color").values())

# adding labels
final_dG=nx.draw_networkx_edge_labels(dG,pos,edge_labels=nx.get_edge_attributes(dG,'weight'))

We can also use:

* The color of edges:

In [None]:
nx.get_edge_attributes(dG,'color').values()

* the width of edges:

In [None]:
nx.get_edge_attributes(dG,'weight').values()

See that here:

In [None]:
pos = nx.circular_layout(dG) 

# draw nodes first
nx.draw_networkx_nodes(dG,pos,
                       node_color=nx.get_node_attributes(dG,'color').values())
# draw edges
nx.draw_networkx_edges(dG, pos,width=list(nx.get_edge_attributes(dG,'weight').values()), # values as list
                       edge_color= nx.get_edge_attributes(dG,'color').values())
# draw node labels
nx.draw_networkx_labels(dG, pos)

final_dG=nx.draw_networkx_edge_labels(dG,pos,label_pos=0.25,
                               edge_labels=nx.get_edge_attributes(dG,'weight'))

## The bipartite network

This is a different way to see a relationship. Think about _being an actor_ in a film:

* These would be the ones acting:

In [None]:
actor=['Leonardo DiCaprio', 'Tom Hanks', 'Tom Hanks', 'Leonardo DiCaprio', 'Al Pacino', 'Matt Damon',
 'Christian Bale', 'Robert De Niro', 'Al Pacino', 'Dustin Hoffman', 'Dustin Hoffman', 'Jack Nicholson',
 'Christian Bale', 'Jack Nicholson', 'Matt Damon', 'Leonardo DiCaprio', 'Tom Hardy', 'Robert De Niro',
 'Robin Williams', 'Tom Hardy', 'Robin Williams', 'Robin Williams', 'Christian Bale', 'Leonardo DiCaprio',
 'Morgan Freeman', 'Morgan Freeman','Robert De Niro', 'Al Pacino','Robert De Niro', 'Al Pacino']

* These would be the movies:

In [None]:
movie=['The Departed', 'Saving Private Ryan', 'Catch Me If You Can', 'The Revenant',
 'The Godfather Part II', 'Good Will Hunting', 'The Dark Knight Rises', 'Awakenings', 'Insomnia',
 'Empire of the Sun', 'Catch Me If You Can', 'The Bucket List', 'Batman Begins', 'The Departed', 'Saving Private Ryan',
 'Catch Me If You Can', 'The Revenant', 'The Godfather Part II', 'Good Will Hunting',
 'The Dark Knight Rises', 'Awakenings', 'Insomnia', 'Empire of the Sun', 'Catch Me If You Can',
 'The Bucket List', 'Batman Begins','The Irishman','The Irishman','Heat','Heat']

We can make pairs like this:

In [None]:
actor_movie=[('Leonardo DiCaprio', 'The Departed'),
 ('Tom Hanks', 'Saving Private Ryan'),
 ('Tom Hanks', 'Catch Me If You Can'),
 ('Leonardo DiCaprio', 'The Revenant'),
 ('Al Pacino', 'The Godfather Part II'),
 ('Matt Damon', 'Good Will Hunting'),
 ('Christian Bale', 'The Dark Knight Rises'),
 ('Robert De Niro', 'Awakenings'),
 ('Al Pacino', 'Insomnia'),
 ('Dustin Hoffman', 'Empire of the Sun'),
 ('Dustin Hoffman', 'Catch Me If You Can'),
 ('Jack Nicholson', 'The Bucket List'),
 ('Christian Bale', 'Batman Begins'),
 ('Jack Nicholson', 'The Departed'),
 ('Matt Damon', 'Saving Private Ryan'),
 ('Leonardo DiCaprio', 'Catch Me If You Can'),
 ('Tom Hardy', 'The Revenant'),
 ('Robert De Niro', 'The Godfather Part II'),
 ('Robin Williams', 'Good Will Hunting'),
 ('Tom Hardy', 'The Dark Knight Rises'),
 ('Robin Williams', 'Awakenings'),
 ('Robin Williams', 'Insomnia'),
 ('Christian Bale', 'Empire of the Sun'),
 ('Leonardo DiCaprio', 'Catch Me If You Can'),
 ('Morgan Freeman', 'The Bucket List'),
 ('Morgan Freeman', 'Batman Begins'),
('Robert De Niro', 'Heat'),
 ('Al Pacino', 'Heat'),
('Robert De Niro', 'The Irishman'),
 ('Al Pacino', 'The Irishman')]

# here
actor_movie

The previous list of pairs look like edges. Then, let's make and draw a network:

In [None]:
# create DIRECTED graph
dG_actmovie = nx.DiGraph()

# create nodes and edges
dG_actmovie.add_edges_from(actor_movie)

# drawing
nx.draw_circular(dG_actmovie,with_labels=True,node_color='white',edgecolors='black')

You are creating a graph but it will not be useful. 

When you have a structure 'childNode'->'parentNode', you have a **bipartite graph**. Let me show you how to create one:

In [None]:
from networkx.algorithms import bipartite 

# this is not new
bp_actmovie = nx.Graph()

# this is new:
bp_actmovie.add_nodes_from(actor, bipartite=0) # Add the node attribute "bipartite"
bp_actmovie.add_nodes_from(movie, bipartite=1)

# this is not new
bp_actmovie.add_edges_from(actor_movie)

We have a a graph. It seems not different:

In [None]:
bp_actmovie

But the **bipartite** from **networkx.algorithms** will prove useful:

* Differentiate the node roles in the bipartite graph:

In [None]:
childNode,parentNode = bipartite.sets(bp_actmovie)

In [None]:
parentNode

In [None]:
childNode

* Draw a bipartite graph

In [None]:
# remember
nx.get_node_attributes(bp_actmovie,'bipartite')

Assign color to node:

In [None]:
# dict saved a 'colorDict'
colorNodes={node:'yellow' if bp==0 else 'pink' for node,bp in nx.get_node_attributes(bp_actmovie, "bipartite").items()}
nx.set_node_attributes(bp_actmovie, colorNodes, "color")

Draw the graph:

In [None]:
# this is new
pos = nx.bipartite_layout(bp_actmovie, childNode,align='horizontal',aspect_ratio=1)

# not new
nx.draw(bp_actmovie, pos=pos, with_labels=True, node_color = nx.get_node_attributes(bp_actmovie,'color').values())

The above is difficult to understand, we could do some changes:

In [None]:
nx.draw(bp_actmovie, pos,node_color = nx.get_node_attributes(bp_actmovie,'color').values())

# this is new
text =nx.draw_networkx_labels(bp_actmovie, pos=pos, font_size=7)
for _, t in text.items():
    t.set_rotation(45) 

The most important: project the bipartite into a regular graph:

In [None]:
actors_proyected=bipartite.weighted_projected_graph(bp_actmovie, childNode)
actors_proyected.edges.data()

Now, we can see relationships between actors, based on previous appearances:

In [None]:
nx.draw(actors_proyected,with_labels=True)

Here, you can see the edge width base on weight attribute:

In [None]:
pos = nx.circular_layout(actors_proyected) 
# draw nodes first
nx.draw_networkx_nodes(actors_proyected,pos)
# draw edges
nx.draw_networkx_edges(actors_proyected,pos, 
                       width=list(nx.get_edge_attributes(actors_proyected,'weight').values()))
# draw node labels
nx.draw_networkx_labels(actors_proyected, pos)

nx.draw_networkx_edge_labels(actors_proyected,pos,label_pos=0.25,
                               edge_labels=nx.get_edge_attributes(actors_proyected,'weight'));

# Reading from files

Few times we will input data from the keyboard as we did above; but upload them from an external file. 

I have the same data we were using above in spreadsheets; including attributes for the  data that will create the directed graph:

In [None]:
from IPython.display import IFrame
IFrame("https://docs.google.com/spreadsheets/d/e/2PACX-1vQvLe4eaHdN5QbzXTodOVynN5oW5st_d7_fmaWHmrlUcvopi2kR2P0j0Q96C8r0W6JcdOPXOzVfIoSD/pubhtml",550,350)

What do we have here:

* **G** was undirected, now it  will be created from  **edgelist_u** OR **adjacency_u**.
* **dG** was directed, now it  will be created from  **edgelist_d** OR **adjacency_d**.
* **bp_actmovie**, now it  will be created from  **bipartite**.

Notice also that we added attributes to dG, for the nodes and the edges. The edges attributes are anoter column in the edgelist; while those values are written in the adjacency matrix itself. The attributes for the nodes ('sex') are in another table **attributes_d**. As you see, adjacency matrices are not good to store edge attributes beside the weight. 

All those tables are in an excel file, which we can open as data frames.

* The data to replicate **G**:

In [None]:
# reading in
import pandas as pd
LinkToData="https://github.com/DACSS-CSSmeths/Networks_intro/raw/refs/heads/main/graphdata/graphFormats.xlsx"
edgelist_u = pd.read_excel(LinkToData,
                           sheet_name='edgelist_u') # name of the sheet

# see the data frame
edgelist_u

Networkx can creat a network if your data frame has those columns names (source and target):

In [None]:
graph_edgelist_u=nx.from_pandas_edgelist(edgelist_u)

We may open the adjacency matrix the same way:

In [None]:
adjacency_u = pd.read_excel(LinkToData,
                            index_col=0, # VERY IMPORTANT!!!!!!!!!!!!!!!!!
                            sheet_name='adjacency_u') 
# see the data frame
adjacency_u

And here, we turn it into a graph:

In [None]:
graph_adjacency_u = nx.from_pandas_adjacency(adjacency_u)

* The data to replicate **dG**:

In [None]:
edgelist_d = pd.read_excel(LinkToData,sheet_name='edgelist_d') 
edgelist_d

In [None]:
adjacency_d = pd.read_excel(LinkToData,sheet_name='adjacency_d',index_col=0) 
adjacency_d

Now, turning those data frames into graphs:

In [None]:
graph_edgelist_d=nx.from_pandas_edgelist(edgelist_d,edge_attr=True,
                                         create_using=nx.DiGraph) # here!!
# see edges
graph_edgelist_d.edges.data()

In [None]:
graph_adjacency_d = nx.from_pandas_adjacency(adjacency_d,
                                             create_using=nx.DiGraph) # here!!
graph_adjacency_d.edges.data()

Here, you see what I meant about the advantage of edgelists to represent attributes over adjacency matrices.

Above, I showed you that the edge attributes were included. 

How do we add the attributes to the nodes?

Let's pick one graph, _graph_adjacency_d_, to add the node attributes:

In [None]:
graph_adjacency_d.nodes.data()

In [None]:
# read the table with attributes
attr_d=pd.read_excel(LinkToData,sheet_name='attributes_d') 
attr_d

We can build a dictionary of attributes as usual:

In [None]:
{n:s for n,s in zip(attr_d.node,attr_d.sex)}

So this is the way!

In [None]:
attrDic_sex={n:s for n,s in zip(attr_d.node,attr_d.sex)}
attrDic_col={n:s for n,s in zip(attr_d.node,attr_d.color)}
nx.set_node_attributes(graph_adjacency_d,attrDic_col , "sex")
nx.set_node_attributes(graph_adjacency_d,attrDic_sex , "color")

Ready:

In [None]:
graph_adjacency_d.nodes.data()

Up to here:

* We created undirected graph **G** by typing the nodes and edges. We created the same graph in two ways by reading data from a link, using edgelist and an adjacency matrix; we named them **graph_edgelist_u** and **graph_adjacency_u**, respectively.
* We created undirected graph **dG** by typing the nodes and edges. We created the same graph in two ways by reading data from a link, using edgelist and an adjacency matrix; we named them **graph_edgelist_d** and **graph_adjacency_d**, respectively.

* The data to replicate **bipartite**: This may need some work.

In general, the data may come like this:

In [None]:
graph_bp = pd.read_excel(LinkToData,
                         sheet_name='bipartite') 

graph_bp

To follow the same steps as before, notice 'graph_bp.cast' is a column (series):

In [None]:
graph_bp.cast

In [None]:
# a cell
graph_bp.cast[0]

In [None]:
# the previous as list

graph_bp.cast[0].split(', ')

Here, the column with each cell as a list:

In [None]:
graph_bp.cast.str.split(',')

Now, notice the magic of **explode()**:

In [None]:
graph_bp.cast.str.split(',').explode()

The best part is that the indexes are kept when 'exploding' the series of lists. The pandas **concat()** will use that this way:

In [None]:
data_forBP=pd.concat([graph_bp.movie,
                      graph_bp.cast.str.split(', ').explode()],axis=1)
data_forBP

Now, we have the two lists of nodes:

In [None]:
actors_file=data_forBP.cast.to_list()
movies_file=data_forBP.movie.to_list()

And the edges:

In [None]:
actor_movie_file=[(a,m) for a,m in zip(actors_file,movies_file)]

In [None]:
bp_actmovie_file = nx.Graph()
bp_actmovie_file.add_nodes_from(actors_file, bipartite=0) 
bp_actmovie_file.add_nodes_from(movies_file, bipartite=1)

bp_actmovie_file.add_edges_from(actor_movie_file)
childNode_file,parentNode_file = bipartite.sets(bp_actmovie_file)

Let's see the plot:

In [None]:
colorNodes={node:'yellow' if bp==0 else 'pink' for node,bp in nx.get_node_attributes(bp_actmovie_file, "bipartite").items()}
nx.set_node_attributes(bp_actmovie_file, colorNodes, "color")
pos = nx.bipartite_layout(bp_actmovie_file, childNode,align='horizontal',aspect_ratio=1)
nx.draw(bp_actmovie_file, pos,node_color = nx.get_node_attributes(bp_actmovie_file,'color').values())
text =nx.draw_networkx_labels(bp_actmovie_file, pos=pos, font_size=7)
for _, t in text.items():
    t.set_rotation(45) 

The most important: getting a network of the actors ('children' set): 

In [None]:
actors_proyected_file=bipartite.weighted_projected_graph(bp_actmovie_file, childNode)
nx.draw(actors_proyected_file,with_labels=True)

Remember we have weigths in the edges:

In [None]:
actors_proyected_file.edges.data()

In [None]:
pos = nx.circular_layout(actors_proyected_file) 
# draw nodes first
nx.draw_networkx_nodes(actors_proyected_file,pos)
# draw edges
nx.draw_networkx_edges(actors_proyected_file,pos, 
                       width=list(nx.get_edge_attributes(actors_proyected_file,'weight').values()))
# draw node labels
nx.draw_networkx_labels(actors_proyected_file, pos)

nx.draw_networkx_edge_labels(actors_proyected_file,pos,label_pos=0.25,
                               edge_labels=nx.get_edge_attributes(actors_proyected_file,'weight'));

### Exporting

You should always export a graph once it is created, in order for a graph to be opne on a different tool:

In [None]:
nx.write_gml(G, "css_G.gml")
nx.write_gml(dG, "css_dG.gml")
nx.write_gml(actors_proyected, "actors_dG.gml")