# Part1 

In [1]:
import networkx as nx

G1 = nx.read_gml('friendships.gml')

### 1. Find the degree centrality, closeness centrality, and normalized betweeness centrality (excluding endpoints) of node 100.

In [2]:
""" Author: Pedro Gonzalez-Guevara"""

degreeCentrality = nx.degree_centrality(G1)[100]
closenessCentrality = nx.closeness_centrality(G1)[100]
normalizedBetweenessCentrality = nx.betweenness_centrality(G1, normalized=True, endpoints=False)[100]

print(F"Degree Centrality of node 100 is               : {degreeCentrality} ")
print(F"Closeness Centrality of node 100 is            : {closenessCentrality}")
print(F"Normalized Betweeness Centrality of node 100 is: {normalizedBetweenessCentrality}")

Degree Centrality of node 100 is               : 0.0026501766784452294 
Closeness Centrality of node 100 is            : 0.2654784240150094
Normalized Betweeness Centrality of node 100 is: 7.142902633244772e-05


### 2. Suppose you are employed by an online shopping website and are tasked with selecting one user in network G1 to send an online shopping voucher to. We expect that the user who receives the voucher will send it to their friends in the network. You want the voucher to reach as many nodes as possible. The voucher can be forwarded to multiple users at the same time, but the travel distance of the voucher is limited to one step, which means if the voucher travels more than one step in this network, it is no longer valid. Apply your knowledge in network centrality to select the best candidate for the voucher.

In [3]:
def findNodeWithHighestDegreeCentrality():
    """ Find the node with the highest degree centrality
    
    Parameters
    ----------
        None
    
    Returns
    -------
        highestDegreeCentrality: Tuple consisting of a node and its degree centrality.
        
    Author: Pedro Gonzalez-Guevara
    """
    degreeCentralityList = nx.degree_centrality(G1)
    highestDegreeCentrality = (0.0, 0.0)
    
    for (node, degreeCentrality) in degreeCentralityList.items():
        if degreeCentrality >= highestDegreeCentrality[1]:
            highestDegreeCentrality = (node, degreeCentrality)
    return highestDegreeCentrality

node = findNodeWithHighestDegreeCentrality()[0]
degreeCentrality = findNodeWithHighestDegreeCentrality()[1]
print(F"Node #{node} has the highest degree centrality at {degreeCentrality}")
print("and thus it can reach as many nodes as possible with only one step")


Node #105 has the highest degree centrality at 0.0636042402826855
and thus it can reach as many nodes as possible with only one step


### 3. Now the limit of the voucher’s travel distance has been removed. Because the network is connected, regardless of who you pick, every node in the network will eventually receive the voucher. However, we now want to ensure that the voucher reaches the nodes in the lowest average number of hops. 

### How would you change your selection strategy? Write a code to tell us who is the best candidate in the network under this condition.

In [4]:
def findNodeWithHighestClosenessCentrality():
    """ Find the node with the highest closeness centrality
    
    Parameters
    ----------
        None
    
    Returns
    -------
        highestClosenessCentrality: Tuple consisting of a node and its closeness centrality.
        
    Author: Pedro Gonzalez-Guevara
    """
    closenessCentralityList = nx.closeness_centrality(G1)
    highestClosenessCentrality = (0.0, 0.0)
    
    for (node, closenessCentrality) in closenessCentralityList.items():
        if closenessCentrality >= highestClosenessCentrality[1]:
            highestClosenessCentrality = (node, closenessCentrality)
    return highestClosenessCentrality

node = findNodeWithHighestClosenessCentrality()[0]
closenessCentrality = findNodeWithHighestClosenessCentrality()[1]
print(F"Node #{node} has the highest closeness centrality at {closenessCentrality}")
print("Therefore, it can distribute the voucher at the lowest average number of hops")

Node #23 has the highest closeness centrality at 0.3847722637661455
Therefore, it can distribute the voucher at the lowest average number of hops


### 4. Assume the restriction on the voucher’s travel distance is still removed, but now a competitor has developed a strategy to remove a person from the network in order to disrupt the distribution of your company’s voucher. Your competitor is specifically targeting people who are often bridges of information flow between other pairs of people. Identify the single riskiest person to be removed under your competitor’s strategy?

In [5]:
def findNodeWithHighestBetweenessCentrality():
    """ Find the node with the highest normalized betweeness centrality
    
    Parameters
    ----------
        None
    
    Returns
    -------
        highestBetweenessCentrality: Tuple consisting of a node and its betweeness centrality.
        
    Author: Pedro Gonzalez-Guevara
    """
    betweennessCentralityList = nx.betweenness_centrality(G1, normalized=True, endpoints=False)
    highestBetweenessCentrality = (0.0, 0.0)
    
    for (node, betweennessCentrality) in betweennessCentralityList.items():
        if betweennessCentrality >= highestBetweenessCentrality[1]:
            highestBetweenessCentrality = (node, betweennessCentrality)
    return highestBetweenessCentrality

node = findNodeWithHighestBetweenessCentrality()[0]
betweennessCentrality = findNodeWithHighestBetweenessCentrality()[1]
print(F"Node #{node} has the highest betweenness centrality at {betweennessCentrality}")
print("Therefore, it is the riskest node that can be removed since many short paths pass through this node")

Node #333 has the highest betweenness centrality at 0.03939111019601939
Therefore, it is the riskest node that can be removed since many short paths pass through this node


# Part 2

## Load Network from GML.

Author: Miguel Agueda-Cabral

In [6]:
G2 = nx.read_gml('blogs.gml')

## 5. Apply the Scaled Page Rank Algorithm to this network. Find the Page Rank of node `realclearpolitics.com` with damping value 0.85.

Author: Miguel Agueda-Cabral

In [10]:
pr_scores = nx.pagerank(G2)  # Compute PageRank on G2.
target = "realclearpolitics.com"  # Set target node name.
target_score = pr_scores[target]  # Retrieve score of target.
print(F"The PageRank score for {target} is {target_score}")

The PageRank score for realclearpolitics.com is 0.004636694781649094


## 6. Apply the Scaled Page Rank Algorithm to this network with damping value 0.85. Find the 5 nodes with highest Page Rank.

Author: Miguel Agueda-Cabral

In [16]:
ordered_pr = sorted(pr_scores, key=lambda x: pr_scores[x], reverse=True)
top_five = ordered_pr[:5]

print("The top 5 nodes ranked by Page Rank are:")
for i, name in enumerate(top_five):
    print(F"\t{i+1}. {name}: {pr_scores[name]}")

The top 5 nodes ranked by Page Rank are:
	1. dailykos.com: 0.017901443885198386
	2. atrios.blogspot.com: 0.015178631721614684
	3. instapundit.com: 0.012627090660729751
	4. blogsforbush.com: 0.012508582138399098
	5. talkingpointsmemo.com: 0.012393033204751033


## 7. Apply the HITS Algorithm to the network to find the hub and authority scores of node `realclearpolitics.com`.

In [17]:
hubs, auths = nx.hits(G2)

print(F"{target} has hub score: {hubs[target]}\t authority score: {auths[target]}")

realclearpolitics.com has hub score: 0.0003243556140916672	 authority score: 0.003918957645699851


## 8. Apply the HITS Algorithm to this network to find the 5 nodes with highest hub scores.

Author: Miguel Agueda-Cabral

In [18]:
ordered_hubs = sorted(hubs, key=lambda x: hubs[x], reverse=True)
top_five = ordered_hubs[:5]

print("The top 5 nodes ranked by HITS are:")
for i, name in enumerate(top_five):
    print(F"\t{i+1}. {name}: {hubs[name]}")

The top 5 nodes ranked by HITS are:
	1. politicalstrategy.org: 0.006860032844631305
	2. madkane.com/notable.html: 0.006198130021198813
	3. liberaloasis.com: 0.0061346896013232885
	4. stagefour.typepad.com/commonprejudice: 0.0059907290973003745
	5. bodyandsoul.typepad.com: 0.005939626690753934


## 9. Apply the HITS Algorithm to this network to find the 5 nodes with highest authority scores.

Author: Miguel Agueda-Cabral

In [19]:
ordered_auths = sorted(auths, key=lambda x: auths[x], reverse=True)
top_five = ordered_auths[:5]

print("The top 5 nodes ranked by HITS are:")
for i, name in enumerate(top_five):
    print(F"\t{i+1}. {name}: {auths[name]}")

The top 5 nodes ranked by HITS are:
	1. dailykos.com: 0.015042267072353255
	2. talkingpointsmemo.com: 0.014450907816427404
	3. atrios.blogspot.com: 0.014083800022788357
	4. washingtonmonthly.com: 0.011953445820310986
	5. talkleft.com: 0.009705131061986168
