<div align="right">Python 3.6 Jupyter Notebook</div>

# Network analysis using NetworkX

<div class="alert alert-warning">
<b>This notebook contains advanced exercises that will help you deepen your understanding of Network Analysis.</b> You will be able to achieve 100% for this notebook by successfully completing Exercises 1, 2, 3, 4, 5, 6, and 7.</div>


### Your completion of the notebook exercises will be graded based on your ability to do the following:
 
> **Understand**: Do your pseudo-code and comments show evidence that you recall and understand technical concepts?

> **Apply**: Are you able to execute code (using the supplied examples) that performs the required functionality on supplied or generated data sets? 

> **Analyze**: Are you able to pick the relevant method or library to resolve specific stated questions?

> **Evaluate**: Are you able to interpret the results and justify your interpretation based on the observed data?


#### Notebook objectives
By the end of this notebook, you will be expected to:
> 
  - Prepare a data set for graph analysis, using NetworkX;
  - Evaluate and compare structural properties of a graph object;
  - Interpret what information the structural properties provide in the physical world; and
  - Develop a basic understanding of small-world networks.
  
####  List of exercises:
> **Exercise 1**: Compute the number of call interactions between a pair of nodes.

>**Exercise 2**: Evaluate structure qualitatively in a graph based on visualization.

>**Exercise 3**: Create a graph object using the SMS data set.

>**Exercise 4 [Advanced]**: Compare the centrality structural properties evaluated on a graph.

>**Exercise 5**: Describe the effect on the average clustering coefficient when nodes of lower degree are removed.

>**Exercise 6**: List the two criteria of a small-world network.

>**Exercise 7**: Identify small-world networks, given the values for the characteristic path length and clustering coefficient.

# Notebook introduction 

<img src="./img/social_network_analysis.png" width=350 height=350>

The use of phone logs to infer relationships between volume of communication and other parameters has been an area of major research interest. In his seminal paper,  which was the first application of phone logs, George Kingsley Zipf (1949) investigated the influence of distance on communication. Many studies have since followed. Big data is characterized by significant increases in structured and unstructured data generated by mobile phones that are sampled and captured at high velocities. Its emergence, and the availability of computer processing technologies that are able to store and process these data sets efficiently, has made it possible to expand these studies in order to improve our understanding of human behavior with unprecedented resolution. Mobile phone data allows the inference of real social networks using call detail records, or CDRs (i.e., phone calls, short message service (SMS) and multimedia message (MMS) communications). These records are combined with GPS and WiFi datasets, browsing habits, application logs, and tower data to reveal a superposition of several social actors.

According to Blondel et al. (2015):

> *The mobile nature of a mobile phone brings two advantages: first, the temporal patterns of communications [are] reflected in great detail due to the fact that the owner of the device usually carries the device with them and therefore the possibility of receiving the call exists in almost all cases, and second, the positioning data of a mobile phone allows tracking the displacements of its owner*.

Unlike self-reported surveys – which are often subjective, limited to a very small subset of the population, and have been the only avenue used to gather data in the past – mobile phone CDRs contain information on verifiable communications between millions of people at a time. Further enrichment from geolocation data, which invariably is also collected alongside CDRs, as well as other external data that is available for the target segment (typically demographics), makes mobile phone CDRs an extremely rich and informative source of data for scientists and analysts. 

These interactions via mobile phones can be represented by a large network where nodes represent individuals, and links are drawn between individuals that have had a phone call, or exchanged messages or other media. 

The study of the structure of such networks provides useful insights into their organization, and can assist in improving communication infrastructure, understanding human behavior, traffic planning, and marketing initiatives, among others. According to Gautier Krings (2012), these applications are informed by the extraction and analysis of different kinds of information from large networks, including the following:

1. **Associating every node with geographical coordinates.** This can facilitate how geography influences the creation of links. More specifically, the intensity of communication between nodes decreases as a power of the geographical distance that separates them.

2. **Studying how links in networks change over time (i.e., dynamical networks).** In these networks, new nodes enter or leave the network and the strength of their connections rise and wane during the observation period. Of particular interest is the influence of time scales on the emergence of different structural properties of dynamical networks.

3. **Detecting communities in networks.** Communities are groups of nodes that are densely connected to each other. 



**Load libraries and set global parameters for Matplotlib**

In [None]:
# Load the relevant libraries to your notebook. 
import pandas as pd
import networkx as nx
import matplotlib.pylab as plt 

import numpy as np       
from networkx.drawing.nx_agraph import graphviz_layout
import graphviz
import pygraphviz
import random
from IPython.display import Image, display
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

# Set global parameters for plotting. 
%matplotlib inline
plt.rcParams['figure.figsize'] = (10, 8)

In [None]:
def pydot(G):
    pdot = nx.drawing.nx_pydot.to_pydot(G)
    display(Image(pdot.create_png()))

# 1. Graph structures using NetworkX
In this notebook, you will continue working with the empirical dataset from the "Friends and Family" study used in Module 2. 

### 1.1 Data preparation

As before, the first step is preparing the data for analysis. In the following, you will load the data into a DataFrame object, filter and retain the records of interest, and select the fields or data columns to use when creating graph objects.

#### 1.1.1 Load the data into a DataFrame
In this data, each record or row is typical of what is available in a CDR, i.e., the actors involved, the starting time of the interaction, the duration of the interaction, who initiated it, and who was the recipient, among other details not included here (such as the geolocation of the sender and receiver). 

In [None]:
# Read the CallLog.csv file, print the number of records loaded as well as the first 5 rows.
calls = pd.read_csv('CallLog.csv')
print('Loaded {0} rows of call log.'.format(len(calls)))
calls.head()

#### 1.1.2 Row filtering

In the data set, there are calls to outsiders that can be seen in each entry where the participant's ID is "`NaN`". These are not relevant to the current exercise and need to be removed before you proceed. Remove all calls where one of the participant IDs is missing. First, check the number of records in your DataFrame using Pandas's [shape  DataFrame method](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.shape.html).

In [None]:
# Initial number of records.
calls.shape[0]

Next, review the data using the ``info()`` method.

In [None]:
calls.info()

Next, you will clean the data by removing interactions involving outsiders as discussed above. Removing missing values is very common in data analysis, and Pandas has a convenient method, appropriately named [dropna()](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), designed to automate this cleaning process.

In [None]:
# Drop rows with NaN in either of the participant ID columns.
calls = calls.dropna(subset = ['participantID.A', 'participantID.B'])
print('{} rows remaining after dropping missing values from selected columns.'.format(len(calls)))
calls.head(n=5)

#### 1.1.3 Column selection

For the purpose of this study, you should only focus on the social actors involved in the call interaction. Therefore, you can remove all columns not relevant to the network being analyzed.

In [None]:
# Create a new object containing only the columns of interest.
interactions = calls[['participantID.A', 'participantID.B']]

Finally, exclude rows where the actors are the same.

In [None]:
# Get a list of rows with different participants.
row_with_different_participants = interactions['participantID.A'] != interactions['participantID.B']

# Update "interactions" to contain only the rows identified. 
interactions = interactions.loc[row_with_different_participants,:]
interactions.head()

### 1.2 Creating graph objects with NetworkX
The call interactions captured above are directed, meaning that edges (u,v) and (v,u) are different.

First, let's try to capture the number of interactions between social actors, irrespective of who initiated the call. This will be done using an undirected graph. You will need to capture the number of interactions between any pair of actors with a link in the graph. Therefore, the graph object that needs to be created is a weighted undirected graph.

Using a Pandas DataFrame object as direct input into NetworkX to create graphs,  the following demonstration illustrates how to build an unweighted and undirected graph.

In [None]:
# Create an unweighted undirected graph using the NetworkX's from_pandas_edgelist method.
# The column participantID.A is used as the source and participantID.B as the target.
G = nx.from_pandas_edgelist(interactions, 
                             source='participantID.A', 
                             target='participantID.B', 
                             create_using=nx.Graph())

Review basic information on your graph.

In [None]:
# Print the number of nodes in our network.
print('The undirected graph object G has {0} nodes.'.format(G.number_of_nodes()))

# Print the number of edges in our network.
print('The undirected graph object G has {0} edges.'.format(G.number_of_edges()))

In the following cells, the neighbors for five of the nodes are saved in Python dict, with the node label as key, and then printed.

In [None]:
# Declare a variable for number of nodes to get neighbors of.
max_nodes = 5

In [None]:
# Variable initialization.
count = 0
ndict = {}

# Loop through G and get a node's neigbours, store in ndict. Do this for a maximum of 'max_nodes' nodes. 
for node in list(G.nodes()):
    ndict[node] = tuple(G.neighbors(node))
    count = count + 1
    if count > max_nodes:
        break

In [None]:
print(ndict)

In [None]:
# Print only the first item in the dict.
print([list(ndict)[0], ndict[list(ndict)[0]]])

Your original objective is to create a **weighted undirected** graph for call interactions, with the weights representing the number of interactions between two distinct participants. As illustrated above, you can use the "from_pandas_dataframe" method to build an undirected graph between the pairs of actors, by specifying the graph structure using a parameter to the argument "create_using=". To get the correct weights in the undirected graph, however, you will need to add the weight information separately. Unfortunately, you cannot rely on NetworkX to do this as it cannot be used to control what data the undirected edges get. Below is a description of how to add the necessary weights to the undirected graph.

The first task is to compute the number of interactions between participants. You will use Pandas' "group_by" DataFrame method to achieve this.

In [None]:
# Get the count of interactions between participants and display the top 5 rows.
grp_interactions = pd.DataFrame(interactions.groupby(['participantID.A', 'participantID.B']).size(), 
                                columns=['counts']).reset_index()

grp_interactions.head(5)

In [None]:
nx.to_pandas_edgelist?

In [None]:
# Create a directed graph with an edge_attribute labeled counts.
g = nx.from_pandas_edgelist(grp_interactions, 
                             source='participantID.A', 
                             target='participantID.B', 
                             edge_attr='counts', 
                             create_using=nx.DiGraph())

Instantiate a weighted undirected graph, and populate edge information using the edges list from the directed graph.

In [None]:
# Set all the weights to 0 at this stage. We will add the correct weight information in the next step.
G = nx.Graph()
G.add_edges_from(g.edges(), counts=0)

Now, iterate through each link from the directed graph, adding the attribute weight (counts) to the corresponding link in the undirected graph.

In [None]:
for u, v, d in g.edges(data=True):
    G[u][v]['counts'] += d['counts']

Look at some of the edges and their corresponding weights.

In [None]:
# Print a sample of the edges, with corresponding attribute data.
max_number_of_edges = 5
count = 0
for n1,n2,attr in G.edges(data=True): # unpacking
    print(n1,n2,attr)
    count = count + 1
    if count > max_number_of_edges:
        break     

You can verify whether the steps you executed above have worked using the following:

In [None]:
# Verify our attribute data is correct using a selected (u,v) pair from the data.
u = 'fa10-01-77'
v = 'fa10-01-78'
print('Number of undirected call interactions between {0} and {1} is {2}.'.format(u,
                                                                    v,
                                                            G.get_edge_data(v,u)['counts']))

In [None]:
# Compare our data set to the interactions data set.
is_uv_pair = ((interactions['participantID.A'] == u) & (interactions['participantID.B'] == v)) 
is_vu_pair = ((interactions['participantID.A'] == v) & (interactions['participantID.B'] == u))
print('Number of undirected call interactions between {0} and {1} is {2}'.format(u,
                                            v, 
                                            interactions[is_uv_pair | is_vu_pair].shape[0]))

Based on the comparison above, it can be said with confidence that your graph object captures the interactions as expected.

<br>
<div class="alert alert-info">
<b>Exercise 1 Start.</b>
</div>

### Instructions
> Calculate the number of call interactions between participant sp10-01-52 and participant fa10-01-81 captured in your graph, using any of the above approaches.


In [None]:
# Number of call interactions between participant sp10-01-52 and participant fa10-01-81.
u = 'sp10-01-52'
v = 'fa10-01-81'
print('Number of undirected call interactions between {0} and {1} is {2}.'.format(u,
                                                                  v,
                                                            G.get_edge_data(v,u)['counts']))

In [None]:
# Compare our data set to the interactions data set.
is_uv_pair = ((interactions['participantID.A'] == u) & (interactions['participantID.B'] == v)) 
is_vu_pair = ((interactions['participantID.A'] == v) & (interactions['participantID.B'] == u))
print('Number of undirected call interactions between {0} and {1} is {2}'.format(u,
                                            v, 
                                            interactions[is_uv_pair | is_vu_pair].shape[0]))

<br>
<div class="alert alert-info">
<b>Exercise 1 End.</b>
</div>

> **Exercise complete**:
    
> This is a good time to "Save and Checkpoint".

### 1.3 Graph visualization
The next step is to visualize the graph object – a topic that you briefly touched on in Notebook 1. NetworkX is not primarily a graph drawing package, but provides basic drawing capabilities using Matplotlib. More advanced graph visualization packages can be used. However, these are outside of the scope of this course.

[NetworkX documentation](https://networkx.github.io/documentation/stable/reference/introduction.html#drawing) (2015) states:
> Proper graph visualization is hard, and we highly recommend that people visualize their graphs with tools dedicated to that task. Notable examples of dedicated and fully-featured graph visualization tools are Cytoscape, Gephi, Graphviz and, for LaTeX typesetting, PGF/TikZ.

A graph is an abstract mathematical object without a specific representation in the Cartesian coordinate space, and graph visualization is therefore not a well-defined problem with a unique solution. Depending on which structures in the graph object are of interest, several layout algorithms exist that can be used to optimize node positioning for display visualization. Whenever you want to visualize a graph, you have to find mapping from vertices to Cartesian coordinates first, preferably in a way that is aesthetically pleasing. A separate branch of graph theory, namely graph drawing, attempts to solve this problem via several graph layout algorithms.

You will use the interface provided by [Graphviz](http://graphviz.org/) for node positioning in most of your visualization in this course, because considering other possibilities may distract from the core objectives. Two node positioning algorithms can be accessed using the Graphviz interface provided by NetworkX. They are the following:
-  **dot**: "hierarchical" or layered drawings of directed graphs. This is the default to use if edges have directionality. The dot algorithm produces a ranked layout of a graph honoring edge directions. It is particularly appropriate for displaying hierarchies or directed acyclic graphs.
- **neato**: "spring model" layouts.  This is the default to use if the graph is not too large (about 100 nodes), and you don't know anything else about it. Neato attempts to minimize a global energy function, which is equivalent to statistical multidimensional scaling. An ideal spring is placed between every pair of nodes, such that its length is set to the shortest path distance between the endpoints. The springs push the nodes so their geometric distance in the layout approximates their path distance in the graph.

Below is a visual display of your weighted undirected call graph, using different visualization approaches.

#### 1.3.1 Graphviz layout using the "dot" engine

In [None]:
pos = graphviz_layout(G, prog='dot') # you can also try using the "neato" engine
nx.draw_networkx(G, pos=pos, with_labels=False)
_ = plt.axis('off')

#### 1.3.1a Graphviz layout using the "neato" engine (my own attempt)

In [None]:
pos = graphviz_layout(G, prog='neato') # using the "neato" engine
nx.draw_networkx(G, pos=pos, node_color='blue', with_labels=False)
_ = plt.axis('off')

#### 1.3.2 Graph visualization using NetworkX's in-built "spring layout"

In [None]:
layout = nx.spring_layout(G)
nx.draw_networkx(G, pos=layout, node_color='green', with_labels=False)
_ = plt.axis('off')

#### 1.3.3 Graph visualization with Pydot rendering

In [None]:
pydot(G)

<br>
<div class="alert alert-info">
<b>Exercise 2 Start.</b>
</div>

### Instructions 

> Based on the various visualizations explored above, what can you tell about these networks and the types of interactions they capture? Please provide written feedback (a sentence or two) based on your insights of the call log data in the markdown cell below.

> **Hint**:
- In your answer, indicate if there appears to be some structure in the graph, or if the connections between nodes appear random (i.e., do some nodes have more links than others)? Do the participants cluster into identifiable communities or not?

<br>
<div class="alert alert-info">
<b>Exercise 2 End.</b>
</div>

> **Exercise complete**:
    
> This is a good time to "Save and Checkpoint".

<br>
<div class="alert alert-info">
<b>Exercise 3 Start.</b>
</div>

### Instructions

You will now need to reproduce the steps above for SMS records.

> 1. Load the file "SMSLog.csv" from the data folder in your home directory, into a variable "sms".
> 2. Create a weighted undirected graph using the number of interactions between participants as weights.
> 3. Assign the graph to variable "H" (do not overwrite "G" as you will still use it below).
> 4. Ignore all interactions where one of the parties is missing or unknown (i.e. "`NaN`").
> 5. Disregard any self-interactions.
> 6. Display a visualization of the obtained graph network, using the spring_layout algorithm for node positioning.


>**Hints**:
>
> - Make sure that you use different variables when loading the datasets, and remember that you can always insert additional cells in the notebook, should you prefer to break up steps or perform additional investigations.

> - It is good practice to make clear comments (start the line with #) in your code when sharing your work or if you need to review it at a later stage. Make sure that you add comments to enable your tutor to understand your thinking process.

> - The number of cells below are only indicative. You can insert additional cells as required.

In [None]:
# Read the SMSLog.csv file, print the number of records loaded as well as the first 5 rows.
SMS = pd.read_csv('../data/SMSLog.csv')
print('Loaded {0} rows of SMS log.'.format(len(calls)))
SMS.head()

In [None]:
# Initial number of records.
SMS.shape[0]

In [None]:
# Data review using the info() method
SMS.info()

In [None]:
# Drop rows with NaN in either of the participant ID columns.
SMS = SMS.dropna(subset = ['participantID.A', 'participantID.B'])
print('{} rows remaining after dropping missing values from selected columns.'.format(len(SMS)))
SMS.head(n=5)

In [None]:
# Create a new object "reciprocal_actions" containing only the columns of interest.
reciprocal_actions = SMS[['participantID.A', 'participantID.B']]

In [None]:
# Get a list of rows with different participants.
row_with_different_participants = reciprocal_actions['participantID.A'] != reciprocal_actions['participantID.B']

# Update "reciprocal_actions" to contain only the rows identified. 
reciprocal_actions = reciprocal_actions.loc[row_with_different_participants,:]
reciprocal_actions.head()

In [None]:
# Create an unweighted undirected graph using the NetworkX's from_pandas_edgelist method.
# The column participantID.A is used as the source and participantID.B as the target.
H = nx.from_pandas_edgelist(reciprocal_actions, 
                             source='participantID.A', 
                             target='participantID.B', 
                             create_using=nx.Graph())

In [None]:
# Print the number of nodes in our network.
print('The undirected graph object H has {0} nodes.'.format(H.number_of_nodes()))

# Print the number of edges in our network.
print('The undirected graph object H has {0} edges.'.format(H.number_of_edges()))

In [None]:
# Declare a variable for number of nodes to get neighbors of.
max_nodes = 5

In [None]:
# Variable initialization.
count = 0
ndict = {}

# Loop through H and get a node's neigbours, store in ndict. Do this for a maximum of 'max_nodes' nodes. 
for node in list(H.nodes()):
    ndict[node] = tuple(H.neighbors(node))
    count = count + 1
    if count > max_nodes:
        break

In [None]:
print(ndict)

In [None]:
# Print only the first item in the dict.
print([list(ndict)[0], ndict[list(ndict)[0]]])

In [None]:
# Get the count of SMS reciprocal_actions between participants and display the top 5 rows.
grp_reciprocal_actions = pd.DataFrame(reciprocal_actions.groupby(['participantID.A', 'participantID.B']).size(), 
                                columns=['counts']).reset_index()

grp_reciprocal_actions.head(5)

In [None]:
nx.to_pandas_edgelist?

In [None]:
# Create a directed graph with an edge_attribute labeled counts.
h = nx.from_pandas_edgelist(grp_reciprocal_actions, 
                             source='participantID.A', 
                             target='participantID.B', 
                             edge_attr='counts', 
                             create_using=nx.DiGraph())

In [None]:
# Set all the weights to 0 at this stage. We will add the correct weight information in the next step.
H = nx.Graph()
H.add_edges_from(h.edges(), counts=0)

In [None]:
for u, v, d in h.edges(data=True):
    H[u][v]['counts'] += d['counts']

In [None]:
# Print a sample of the edges, with corresponding attribute data.
max_number_of_edges = 5
count = 0
for n1,n2,attr in H.edges(data=True): # unpacking
    print(n1,n2,attr)
    count = count + 1
    if count > max_number_of_edges:
        break 

In [None]:
# Verify our attribute data is correct using a selected (u,v) pair from the data.
u = 'fa10-01-67'
v = 'fa10-01-68'
print('Number of undirected SMS reciprocal_actions between {0} and {1} is {2}.'.format(u,
                                                                    v,
                                                            H.get_edge_data(v,u)['counts']))

In [None]:
# Compare our data set to the reciprocal_actions data set.
is_uv_pair = ((reciprocal_actions['participantID.A'] == u) & (reciprocal_actions['participantID.B'] == v)) 
is_vu_pair = ((reciprocal_actions['participantID.A'] == v) & (reciprocal_actions['participantID.B'] == u))
print('Number of undirected SMS reciprocal_actions between {0} and {1} is {2}'.format(u,
                                            v, 
                                            reciprocal_actions[is_uv_pair | is_vu_pair].shape[0]))

In [None]:
# Visualization of the obtained graph 'H' network, using the spring_layout algorithm for node positioning.
layout = nx.spring_layout(H)
nx.draw_networkx(H, pos=layout, node_color='red', with_labels=False)
_ = plt.axis('off')

<br>
<div class="alert alert-info">
<b>Exercise 3 End.</b>
</div>

> **Exercise complete**:
    
> This is a good time to "Save and Checkpoint".

## 2. Computing and visualizing structural properties of networks
Physical networks exhibit different behaviors. Since graph objects are abstractions of these behaviors, you might expect these graphs to be different. To characterize these differences, you need more than visualizations that are pleasing to the eye. To this end, a number of characteristics have been developed to characterize the structural properties of graphs. These properties help you understand and characterize physical networks with more mathematical rigor. You will now explore characteristics discussed in the video content.

### 2.1 Degree distribution
The degree of a node in a network is the number of connections it has to other nodes, and the degree distribution (also referred to as the neighbor distribution) is the probability distribution of these degrees over the whole network. Specifically, the degree distribution $p(k)$ is the probability that a randomly-chosen node has $k$ connections (or neighbors).

#### 2.1.1 Degree distribution histogram
A degree distribution histogram is a plot of the frequency of occurrence of the number of connections or neighbors, based on the relationships (edges) between entities (nodes), as represented by a graph object.

Continuing with the call data (graph G), from the preceding sections, you will now compute and plot the degree distribution of the data.

In [None]:
# Extract the degree values for all the nodes of G
degrees = []
for (nd,val) in G.degree():
    degrees.append(val)

In [None]:
# Plot the degree distribution histogram.
out = plt.hist(degrees, bins=50)
plt.title("Degree Histogram")
plt.ylabel("Frequency Count")
plt.xlabel("Degree")

#### 2.1.2 Logarithmic plot of the degree distribution

In many cases, the histogram distribution is best represented using a log-log plot. A log-log plot is a type of graph that uses a logarithmic scale on both the x-axis and y-axis. Log-log plots are best used when you are interested in the multiplicative factors (ratios) between data points rather than the additive difference and when you want to simplify the representation of data that changes exponentially or follows a polynomial relationship.

In [None]:
# Logarithmic plot of the degree distribution.
values = sorted(set(degrees))
hist = [list(degrees).count(x) for x in values]
out = plt.loglog(values, hist, marker='o')
plt.title("Degree Histogram")
plt.ylabel("Log(Frequency Count)")
plt.xlabel("Log(Degree)")

### 2.2 Node centrality
Centrality measures provide relative measures of importance in a network. There are many different centrality measures, and each measures a different type of importance. In the video content, you were introduced to the following centrality measures:

1. **Degree centrality:** Number of connections. An important node is involved in a large number of interactions. For directed graphs, the in-degree and out-degree concepts are used. The in-degree of a Node v is the number of edges with Vertex v as the terminal vertex, and the out-degree of v is the number of edges with v as the initial vertex.

2. **Closeness centrality:** Average length of the shortest paths between a specific node and all other nodes in the graph. An important node is typically close to, and can communicate quickly with, the other nodes in the network.

3. **Betweenness centrality:** Measures the extent to which a particular vertex lies on the path between all other vertices. An important node will lie on a high proportion of paths between other nodes in the network.

4. **Eigenvector centrality:** An important node is connected to important neighbors.  

The following schematic is a demonstration and comparison of the first three of the centrality metrics discussed  above. In this figure, Node X always has the highest centrality measure, although it measures different behaviors in each case.

<img src="img\social_network_analysis_centrality.png" width=1050, height=450>

NetworkX provides the functionality to evaluate these metrics for graph objects, which will be described in the next section.

#### 2.2.1 Degree centrality

In [None]:
# Plot degree centrality.
call_degree_centrality = nx.degree_centrality(G)
colors =[call_degree_centrality[node] for node in G.nodes()]
pos = graphviz_layout(G, prog='dot')
nx.draw_networkx(G, pos, node_color=colors, node_size=300, with_labels=False)
_ = plt.axis('off')

The visual above uses different colors on nodes to highlight their degree centrality. Blue and purple nodes have a low value, and the yellow and green nodes indicate the nodes with the highest centrality values in the network. Although it is possible to add label information on the nodes, it can become too busy and, therefore, make it difficult to read the visual. In the following example, the data is arranged according to the degree centrality measure so that the node with the highest degree centrality measure appears at the top, followed by the node with the next highest degree centrality measure, and so forth (that is, in descending order). 