# 1: Degree and Closeness Centrality.
## Degree Centrality.
- UN-directed Graph

In [1]:
import networkx as nx

G = nx.karate_club_graph()
# Convert node labels to numbers to match the figure:
G = nx.convert_node_labels_to_integers(G , first_label=1)
# now , get the Degree Centrality for each node in the Network:
degCent = nx.degree_centrality(G)

# print the centrality degree of node `34`.
print(degCent[34])  # 17/33
print(degCent[33])  # 12/33

0.5151515151515151
0.36363636363636365


- Directed Graph.

In [2]:
G = nx.DiGraph()
G.add_edges_from([('A', 'B'), ('C', 'A'), ('A', 'E'), ('G', 'A'), ('A', 'N'),
                  ('B', 'C'), ('D', 'B'), ('B', 'E'), ('C', 'D'), ('E', 'C'),
                  ('D', 'E'), ('E', 'D'), ('F', 'G'), ('I', 'F'), ('J', 'F'),
                  ('H', 'G'), ('I', 'G'), ('G', 'J'), ('I', 'H'), ('H', 'I'),
                  ('I', 'J'), ('J', 'O'), ('O', 'J'), ('K', 'M'), ('K', 'L'),
                  ('O', 'K'), ('O', 'L'), ('N', 'L'), ('L', 'M'), ('N', 'O')])

In directed graph you have the chooce to measure according to `in_degree` or `out_degree` Centrality. let's try out in_degree_centrality.

In [3]:
# in degree.
indegCent = nx.in_degree_centrality(G)
print(indegCent['A'])  # 2/14
print(indegCent['L'])  # 3/14

# out degree.
outdegCent = nx.out_degree_centrality(G)
print('\n',outdegCent['A'])  # 3/14
print(outdegCent['L'])  # 1/14

0.14285714285714285
0.21428571428571427

 0.21428571428571427
0.07142857142857142


## Closeness Centrality.
- UN-directed Graph

In [4]:
G = nx.karate_club_graph()
# Convert node labels to numbers to match the figure:
G = nx.convert_node_labels_to_integers(G , first_label=1)

# Get the Closeness Centrality score for every node in the network.
closeCent = nx.closeness_centrality(G)
closeCent[32]

0.5409836065573771

**VERIFYING THE RULE**

In [5]:
# calculate the path by how many edges to reach the node `32`.
# So, we can easily know the distance from `32` and all other nodes using 
# shortest_path_length() which built upon the bfs_tree .

denominator = sum(nx.shortest_path_length(G,32).values())
print(denominator)

nominator = len(G.nodes()) - 1
print(nominator)

Close_32 = nominator/denominator
Close_32

61
33


0.5409836065573771

**How to measure the closeness centrality of a node when it cannot reach all other nodes.** 

EX : `Directed` Graph with nodes that can't reach evert other node like `L`.

In [6]:
G = nx.DiGraph()
G.add_edges_from([('A', 'B'), ('C', 'A'), ('A', 'E'), ('G', 'A'), ('A', 'N'),
                  ('B', 'C'), ('D', 'B'), ('B', 'E'), ('C', 'D'), ('E', 'C'),
                  ('D', 'E'), ('E', 'D'), ('F', 'G'), ('I', 'F'), ('J', 'F'),
                  ('H', 'G'), ('I', 'G'), ('G', 'J'), ('I', 'H'), ('H', 'I'),
                  ('I', 'J'), ('J', 'O'), ('O', 'J'), ('K', 'M'), ('K', 'L'),
                  ('O', 'K'), ('O', 'L'), ('N', 'L'), ('L', 'M'), ('N', 'O')])

In [7]:
# Option_1
closeCent = nx.closeness_centrality(G , normalized= False)
print(closeCent['L'])

# Option_2
closeCent = nx.closeness_centrality(G , normalized= True)
print(closeCent['L'])

1.0
0.07142857142857142


# 2: Betweenness Centrality.

In [8]:
G = nx.karate_club_graph()
# Convert node labels to numbers to match the figure:
G = nx.convert_node_labels_to_integers(G , first_label=1)

In [9]:
btwnCent = nx.betweenness_centrality(G, normalized=True , endpoints=False)
btwnCent

{1: 0.4376352813852815,
 2: 0.05393668831168831,
 3: 0.14365680615680615,
 4: 0.011909271284271283,
 5: 0.0006313131313131313,
 6: 0.02998737373737374,
 7: 0.029987373737373736,
 8: 0.0,
 9: 0.05592682780182782,
 10: 0.0008477633477633478,
 11: 0.0006313131313131313,
 12: 0.0,
 13: 0.0,
 14: 0.045863395863395856,
 15: 0.0,
 16: 0.0,
 17: 0.0,
 18: 0.0,
 19: 0.0,
 20: 0.03247504810004811,
 21: 0.0,
 22: 0.0,
 23: 0.0,
 24: 0.017613636363636363,
 25: 0.0022095959595959595,
 26: 0.0038404882154882162,
 27: 0.0,
 28: 0.022333453583453587,
 29: 0.0017947330447330447,
 30: 0.0029220779220779218,
 31: 0.014411976911976905,
 32: 0.13827561327561327,
 33: 0.14524711399711404,
 34: 0.30407497594997596}

In [10]:
# let's get the heighest node in terms of `btwnCent` there are 2 methods:
# method_1
sorted(btwnCent.items() , key= lambda x: x[1] , reverse=True)[:5]

[(1, 0.4376352813852815),
 (34, 0.30407497594997596),
 (33, 0.14524711399711404),
 (3, 0.14365680615680615),
 (32, 0.13827561327561327)]

In [11]:
# method_2
import operator
sorted(btwnCent.items() , key= operator.itemgetter(1) , reverse=True)[:5]

[(1, 0.4376352813852815),
 (34, 0.30407497594997596),
 (33, 0.14524711399711404),
 (3, 0.14365680615680615),
 (32, 0.13827561327561327)]

- **Here we are going to experience `Approximation` Approach to avoid computing btwnCent based on all pairs of nodes. Basically, it will take sample from the network to take its pairs instead of all possible pairs (s,t) in the entire network.** 

In [12]:
btwnCent_approx = nx.betweenness_centrality(G, normalized=True ,
                                     endpoints=False , k=10)
# sort the dictionary above.
sorted(btwnCent_approx.items() , key= lambda x: x[1] , reverse=True)[:5]

[(1, 0.3890058321308321),
 (34, 0.3346951659451659),
 (33, 0.22431066618566617),
 (32, 0.15064604377104376),
 (3, 0.1457500601250601)]

- **Here we will use a `subsets` to choose s & t from as another option to reduce computational time and to inspect the btwnCent between a specific 2 groups of nodes.**

In [13]:
btwnCent_subset = nx.betweenness_centrality_subset(G,[34,33,21,30,16,27,15,23,10],
                                            [1,4,13,11,6,12,17,7],
                                                normalized=True)
# sort the dictionary above.
sorted(btwnCent_subset.items() , key= lambda x: x[1] , reverse=True)[:5]

[(1, 0.04899515993265994),
 (34, 0.028807419432419434),
 (3, 0.018368205868205867),
 (33, 0.01664712602212602),
 (9, 0.014519450456950456)]

## What about The Importance of Edges ?

In [14]:
btwnCent_edge = nx.edge_betweenness_centrality(G, normalized=True)

# sort the dictionary above.
sorted(btwnCent_edge.items() , key= lambda x: x[1] , reverse=True)[:5]

[((1, 32), 0.12725999490705373),
 ((1, 7), 0.07813428401663695),
 ((1, 6), 0.07813428401663694),
 ((1, 3), 0.07778768072885717),
 ((1, 9), 0.07423959482783016)]

- So they all tend to be edges that are connected to node number 1, which if you remember, node number 1 here is the instructor of the karate club.

**A subset of edges as we did with nodes.**

In [15]:
btwnCent_edge_subset = nx.edge_betweenness_centrality_subset(G,
                                            [34,33,21,30,16,27,15,23,10],
                                            [1,4,13,11,6,12,17,7],
                                            normalized=True)

# sort the dictionary above.
sorted(btwnCent_edge_subset.items() , key= lambda x: x[1] , reverse=True)[:5]

[((1, 9), 0.01366536513595337),
 ((1, 32), 0.01366536513595337),
 ((14, 34), 0.012207509266332794),
 ((1, 3), 0.01211343123107829),
 ((1, 6), 0.012032085561497326)]

- And if we find here the top five edges with the highest betweenness centrality for this particular choice of source and target nodes, we find that these are the the most important ones. And notice that most of them tend to be edges that go from inside the target or inside the source set to the outside. And that make sense because these are the ones that actually end up showing up in the shortest paths between the source and the targets.


- And they also tend to be connected to very important nodes in the network, namely node number 1, which is the instructor of the karate club and node number 34, which is the instructor of the new karate club after these club splits in two.

# 3,4: `Basic` & `Scaled` Page Rank.

![for_coursera](for_coursera.png)
- **Basic PAgeRank**

In [16]:
G = nx.DiGraph()
G.add_edges_from([('A', 'B'), ('C', 'B'), ('B', 'D'), ('B', 'G'), ('B', 'C'),
                  ('B', 'F'), ('D', 'A'), ('D', 'E'), ('D', 'C'), ('E', 'A'),
                  ('F', 'G'), ('G', 'F')])
                  
basic_page_rank = nx.pagerank(G)
# sort the dictionary above.
sorted(basic_page_rank.items() , key= lambda x: x[1] , reverse=True)[:7]

[('F', 0.3280507927524911),
 ('G', 0.328050792752491),
 ('B', 0.13072972919003595),
 ('A', 0.06543693487722958),
 ('C', 0.06315158949959035),
 ('D', 0.04920893254397847),
 ('E', 0.03537122838418331)]

- **Scaled PAgeRank**

In [17]:
scaled_page_rank = nx.pagerank(G, alpha=0.8)
# sort the dictionary above.
sorted(scaled_page_rank.items() , key= lambda x: x[1] , reverse=True)[:7]

[('F', 0.2950175048721824),
 ('G', 0.29501750487218237),
 ('B', 0.15216425904765085),
 ('A', 0.0797510311959225),
 ('C', 0.07473913572031664),
 ('D', 0.05900450477059524),
 ('E', 0.044306059521149965)]

# 5: Hubs and Authorities.
![61](61.png)

In [18]:
G = nx.DiGraph()
G.add_edges_from([('A', 'D'), ('B', 'C'), ('B', 'E'), ('C', 'A'), ('D', 'C'),
              ('D', 'B'), ('E', 'F'), ('E', 'B'), ('E', 'C'), ('E', 'D'),
             ('F', 'C'), ('F', 'H') , ('G', 'C'), ('G', 'A') ,('H', 'A')])

# Then use nx.hits() that outputs `Hubs` & `Authority` respectively.

In [19]:
# Hubs score:
nx.hits(G)[0]

{'A': 0.04305010866633368,
 'B': 0.14444089275625216,
 'C': 0.029508489628427694,
 'D': 0.18749100142258587,
 'E': 0.26762580012702586,
 'F': 0.14444089275625216,
 'G': 0.15393432501469498,
 'H': 0.029508489628427694}

In [20]:
# Authority score:
nx.hits(G)[1]

{'A': 0.08751958758098452,
 'B': 0.18704574147053715,
 'C': 0.36903609552875044,
 'D': 0.1276828398619346,
 'E': 0.05936290160860254,
 'F': 0.10998993234058818,
 'G': 0.0,
 'H': 0.05936290160860254}

# 6: Centrality Examples.
![for_coursera2](for_coursera2.png)


In [21]:
G = nx.DiGraph()
G.add_edges_from([('1', '5'), ('1', '2'), ('2', '1'), ('2', '4'),
                ('2', '3'),('3', '4'), ('3', '2'), ('3', '1'), ('4', '3'),
                ('4', '1'),('5', '1'), ('5', '6') , ('6', '5'), ('6', '7'),
                ('7', '6'),('7', '9'), ('7', '8') , ('8', '6'), ('8', '7'),
                ('8', '9'),('9', '6') ,('9', '8')])

## 1)
**Degree Centrality**

In [22]:
degCent = nx.degree_centrality(G)
sorted(degCent.items() , key= lambda x: x[1] , reverse=True)

[('1', 0.75),
 ('6', 0.75),
 ('2', 0.625),
 ('3', 0.625),
 ('7', 0.625),
 ('8', 0.625),
 ('5', 0.5),
 ('4', 0.5),
 ('9', 0.5)]

**Closeness Centrality**

In [23]:
closeCent = nx.closeness_centrality(G , normalized= True)
sorted(closeCent.items() , key= lambda x: x[1] , reverse=True)

[('5', 0.4444444444444444),
 ('1', 0.42105263157894735),
 ('6', 0.42105263157894735),
 ('2', 0.36363636363636365),
 ('3', 0.36363636363636365),
 ('7', 0.36363636363636365),
 ('8', 0.36363636363636365),
 ('4', 0.34782608695652173),
 ('9', 0.34782608695652173)]

## 2)
**Betweenness Centrality**

`Q`:Try to answer the following question without calculation: which node has the highest betweenness centrality? 

- Node 5 has the highest centrality because all shortest paths from {1, 2, 3, 4} to {6, 7, 8, 9} have to go through node 5. In other words, node 5 is a bridge. Hence node 5 lies on the most shortest paths in the network.



In [24]:
btwnCent = nx.betweenness_centrality(G, normalized=True , endpoints=False)
sorted(btwnCent.items() , key= lambda x: x[1] , reverse=True)

[('5', 0.5714285714285714),
 ('1', 0.5446428571428571),
 ('6', 0.5446428571428571),
 ('2', 0.21428571428571427),
 ('7', 0.21428571428571427),
 ('3', 0.008928571428571428),
 ('8', 0.008928571428571428),
 ('4', 0.0),
 ('9', 0.0)]

#### NOTE:
- Then, come two and seven. And so, unlike closeness centrality, betweenness is able to capture the fact that actually two is in a kind of key position compared to three because if nodes one, five, six, seven, eight, and nine want to reach four, then they have to go through node two, not through node three.


- So, betweenness comes out very similar to closeness but betweenness is able to capture those structural differences between nodes two and three, whereas, closeness centrality does not. 

## 3,4)
**`Basic` & `Scaled` Page Rank**

In [25]:
scaled_page_rank = nx.pagerank(G, alpha=0.8)
sorted(scaled_page_rank.items() , key= lambda x: x[1] , reverse=True)

[('1', 0.1624924244197534),
 ('6', 0.1624924244197534),
 ('5', 0.15221518761879843),
 ('2', 0.1086093364937095),
 ('7', 0.1086093364937095),
 ('3', 0.08021505931151808),
 ('8', 0.08021505931151808),
 ('4', 0.07257558596561971),
 ('9', 0.07257558596561971)]

### NOTE:
- And so, the nodes with the highest PageRank in this network are nodes one and six and then node five. So, unlike betweenness, which says that five is the most central node, PageRank has one and six and then five.


- Now, why these may be? Well, if you notice, node five here gives all its PageRank to nodes one and six, whereas, nodes one and six give some of their PageRank to node five, but they also give to other nodes. So, this is part of the reason why node five comes second to one and six.


- So, in this case, PageRank comes out very similar to betweenness but it flips the nodes one and six and five.

## 5)
**Hubs and Authorities**

In [26]:
Auth = nx.hits(G)[1]
sorted(Auth.items() , key= lambda x: x[1] , reverse=True)

[('1', 0.21121135254127466),
 ('6', 0.21121135254127466),
 ('9', 0.11077926439430932),
 ('4', 0.1107792643943093),
 ('3', 0.10043208814696536),
 ('8', 0.10043208814696536),
 ('7', 0.0652811600977279),
 ('2', 0.06528116009772789),
 ('5', 0.02459226963944558)]

In [27]:
Hub = nx.hits(G)[0]
sorted(Hub.items() , key= lambda x: x[1] , reverse=True)

[('5', 0.1484870508510458),
 ('2', 0.1484870508510458),
 ('7', 0.1484870508510458),
 ('3', 0.13613104446713567),
 ('8', 0.13613104446713567),
 ('4', 0.1095467049192504),
 ('9', 0.1095467049192504),
 ('1', 0.03159167433704516),
 ('6', 0.03159167433704516)]

So, the node with the lowest authority score here is five even though for many of the other centrality measures, it had a very high centrality.

So, `why may this be the case?`

- If you remember, the HITS algorithm gives every node an authority score and a hub score. And so, in order to kind of understand what the HITS algorithm is saying, you have to kind of look at those scores together. And so, what happens is that, when you look at the hub scores of this network, two, five, and seven which were the nodes that we're kind of wondering why they wouldn't have high centrality, high authority.


- its because they have high hub score. So the way that the HITS algorithm analyzes a network is that, it says that the authorities are one and six and two, five, and seven are the nodes with a very high hub score. So, to interpret the scores, you really have to take them together. 

# QUIZ 3.
![quiz_Q1](quiz_Q1.png)

In [28]:
# 1
G = nx.Graph()
G.add_edges_from([('A', 'B'), ('C', 'A'), ('B', 'D'),('C','E'), 
                  ('D', 'E'),('D', 'G'), ('C', 'D'),('E', 'G'),
                  ('G', 'F')])
degCent = nx.degree_centrality(G)
degCent

{'A': 0.3333333333333333,
 'B': 0.3333333333333333,
 'C': 0.5,
 'D': 0.6666666666666666,
 'E': 0.5,
 'F': 0.16666666666666666,
 'G': 0.5}

![quiz_Q2](quiz_Q2.png)

In [29]:
# 2
closeCent = nx.closeness_centrality(G)
print(nx.shortest_path_length(G,'G'))
print(sum(nx.shortest_path_length(G,'G').values()))
print((7-1)/sum(nx.shortest_path_length(G,'G').values()))
closeCent

{'G': 0, 'D': 1, 'E': 1, 'F': 1, 'B': 2, 'C': 2, 'A': 3}
10
0.6


{'A': 0.46153846153846156,
 'B': 0.5454545454545454,
 'C': 0.6,
 'D': 0.75,
 'E': 0.6666666666666666,
 'F': 0.4,
 'G': 0.6}

In [30]:
# 3
btwnCent = nx.betweenness_centrality(G, normalized=True , endpoints=False)
btwnCent

{'A': 0.03333333333333333,
 'B': 0.07777777777777777,
 'C': 0.18888888888888888,
 'D': 0.38888888888888884,
 'E': 0.1111111111111111,
 'F': 0.0,
 'G': 0.3333333333333333}

In [31]:
# 4
btwnCent_edge = nx.edge_betweenness_centrality(G, normalized=False)
btwnCent_edge

{('A', 'B'): 2.666666666666666,
 ('A', 'C'): 4.333333333333333,
 ('B', 'D'): 5.666666666666667,
 ('C', 'D'): 3.666666666666666,
 ('C', 'E'): 3.666666666666666,
 ('D', 'E'): 2.0,
 ('D', 'G'): 6.333333333333333,
 ('E', 'G'): 3.6666666666666665,
 ('G', 'F'): 6.0}

![quiz_Q5](quiz_Q5.png)
![quiz_Q6](quiz_Q6.png)
![quiz_Q7](quiz_Q7.png)

In [32]:
# 7
G = nx.DiGraph()
G.add_edges_from([('A', 'B'), ('B', 'A'), ('A', 'C'),
                  ('C', 'D'), ('D', 'C')])
scaled_page_rank = nx.pagerank(G, alpha=0.95)
scaled_page_rank

{'A': 0.04442211856731087,
 'B': 0.03360050631947267,
 'C': 0.46639949368052747,
 'D': 0.4555778814326893}

![quiz_Q8](quiz_Q8.png)

In [33]:
# 8
G = nx.DiGraph()
G.add_edges_from([('A', 'B'), ('A', 'C'), ('B', 'C'),
                  ('C', 'A'), ('D', 'C')])

basic_page_rank = nx.pagerank(G)
basic_page_rank

{'A': 0.372526246091333,
 'B': 0.19582365458881654,
 'C': 0.39415009931985023,
 'D': 0.037500000000000006}

![quiz_Q9](quiz_Q9.png)

In [34]:
# 9
# Hubs score:
print(nx.hits(G )[0])
# Authority score:
print(nx.hits(G)[1])

{'A': 0.41421356195612885, 'B': 0.2928932185186695, 'C': 1.0065322483445805e-09, 'D': 0.2928932185186695}
{'A': 2.429983801460681e-09, 'B': 0.29289321810158986, 'C': 0.7071067794684264, 'D': 0.0}


![quiz_Q10](quiz_Q10.png)

**In a social media network such as Twitter, where the users are nodes and following relationships are directed edges.**

Q :`who are likely examples of nodes that have high PageRank but low indegree-centrality` and `why`?


A : Someone who is followed by someone who has a high page rank or high centrality:

- An example would be an actor that everyone follows, follows a cousin of his, that no one follows, but if this actor who has a high page rank follows him, the cousin would have a high page rank.
