<a href="https://colab.research.google.com/github/ZhihaoDC/TFG/blob/main/marvel_social_network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Setup Environment

In [1]:
from google.colab import drive
drive.mount('/gdrive')

%cd /gdrive/My Drive/TFG

!git pull https://github.com/ZhihaoDC/TFG

Mounted at /gdrive
/gdrive/My Drive/TFG
remote: Enumerating objects: 9, done.[K
remote: Counting objects: 100% (9/9), done.[K
remote: Compressing objects: 100% (6/6), done.[K
remote: Total 6 (delta 2), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (6/6), done.
From https://github.com/ZhihaoDC/TFG
 * branch            HEAD       -> FETCH_HEAD
Updating da6a999..12b779e
Fast-forward
 README.md                   |    6 [32m+[m[31m-[m
 marvel_social_network.ipynb | 1367 [32m++++++++++++++++++++++++++++++++++++++++++[m[31m-[m
 2 files changed, 1365 insertions(+), 8 deletions(-)


#Import Libraries

In [3]:
!pip install pyvis

Collecting pyvis
  Downloading https://files.pythonhosted.org/packages/07/d1/e87844ec86e96df7364f21af2263ad6030c0d727660ae89935c7af56a540/pyvis-0.1.9-py3-none-any.whl
Collecting jsonpickle>=1.4.1
  Downloading https://files.pythonhosted.org/packages/bb/1a/f2db026d4d682303793559f1c2bb425ba3ec0d6fd7ac63397790443f2461/jsonpickle-2.0.0-py2.py3-none-any.whl
Installing collected packages: jsonpickle, pyvis
Successfully installed jsonpickle-2.0.0 pyvis-0.1.9


In [232]:
import pandas as pd
import numpy as np
import networkx as nx
from pyvis.network import Network
import IPython

import matplotlib.pyplot as plt
# import matplotlib.colors as mcolors
# import seaborn as sns

# import statistics
# import math
# import itertools
# import re #regular expressions

# import plotly.express as px
# import plotly.figure_factory as ff

#Read files

*   **nodes**: Node name and type.
*   **edges**: Heroes and the comic in which they appear.
*   ***heroes***: Edges between heroes that appear in the same comic.

In [207]:
nodes = pd.read_csv('./datasets/marvel-social-network/nodes.csv') #Node name and type
edges = pd.read_csv('./datasets/marvel-social-network/edges.csv') #Heroes and the comic in which they appear
heroes = pd.read_csv('./datasets/marvel-social-network/hero-network.csv') # Edges between heroes that appear in the same comic

In [208]:
##Show basic info on dataframes:

# nodes.info() #190.090 rows, 2 columns, no null values
# print('\n')
# edges.info() #96.104 rows, 2 columns, no null values
# print('\n')
# heroes.info() #574.467 rows, 2 columns, no null values

In [209]:
##Peek into the dataframes:

# nodes.head(10)
# edges.head(10)
heroes.head(10)

Unnamed: 0,hero1,hero2
0,"LITTLE, ABNER",PRINCESS ZANDA
1,"LITTLE, ABNER",BLACK PANTHER/T'CHAL
2,BLACK PANTHER/T'CHAL,PRINCESS ZANDA
3,"LITTLE, ABNER",PRINCESS ZANDA
4,"LITTLE, ABNER",BLACK PANTHER/T'CHAL
5,BLACK PANTHER/T'CHAL,PRINCESS ZANDA
6,"STEELE, SIMON/WOLFGA","FORTUNE, DOMINIC"
7,"STEELE, SIMON/WOLFGA","ERWIN, CLYTEMNESTRA"
8,"STEELE, SIMON/WOLFGA",IRON MAN/TONY STARK
9,"STEELE, SIMON/WOLFGA",IRON MAN IV/JAMES R.


#Data cleaning

In [210]:
#Remove leading and trailing spaces

nodes = nodes.applymap(lambda x: x.strip())
edges = edges.applymap(lambda x: x.strip())
heroes = heroes.applymap(lambda x: x.strip())

#Graph exploration

In [211]:
#Number of times IronMan/TonyStark has appeared in another hero's comic
heroes.loc[ heroes['hero1']=='IRON MAN/TONY STARK' ].shape[0]

5850

In [212]:
# Number of times Spiderman/PeterParker appeared in the same comic as IronMan/TonyStark
heroes.loc[ (heroes['hero1']=='SPIDER-MAN/PETER PAR') & (heroes['hero2']=='IRON MAN/TONY STARK') ].shape[0]

40

In [213]:
# Number of times IronMan/TonyStark appeared in the same comic as Spiderman/PeterParker
heroes.loc[ (heroes['hero1'] == 'IRON MAN/TONY STARK') & (heroes['hero2'] == 'SPIDER-MAN/PETER PAR') ].shape[0]

54

These two values may be different because of the structure of the edgelist. Maybe a relationship of (hero1=Spiderman, hero2=Ironman) indicates an occurence of Spiderman appearing in an Ironman comic.

In [214]:
#Number different comics in which IronMan/TonyStark has appeared in 
ironman_h1 = heroes.loc[heroes['hero1'] == 'IRON MAN/TONY STARK'].drop_duplicates()
ironman_h1.shape[0]

1131

In [215]:
#Number different heroes that have appeared in a IronMan/TonyStark comic
ironman_h2 = heroes.loc[heroes['hero2'] == 'IRON MAN/TONY STARK'].drop_duplicates()
ironman_h2.shape[0]

1106

In [216]:
#Number of times IronMan has been involved in his or other heroes' comics
ironman_merge = pd.merge(ironman_h1, ironman_h2, how='outer', left_on='hero2', right_on='hero1')
ironman_merge.shape[0]

1521

Since we are interested in a non directed graph, we will ignore this fact for the moment

In [217]:
heroes = heroes.drop_duplicates()

#Generate graph

In [218]:
#Generate Undirected Graph structure

graph = nx.from_pandas_edgelist(heroes, source='hero1', target='hero2') 
graph = graph.to_undirected(graph) # Unweighted undirected graph

print(nx.info(graph))

Name: 
Type: Graph
Number of nodes: 6421
Number of edges: 167112
Average degree:  52.0517


The graph density of simple graphs is defined to be the ratio of the number of edges ${\displaystyle |E|}$ with respect to the maximum possible edges
For undirected simple graphs, the graph density is:

${\displaystyle D={\frac {|E|}{\binom {|V|}{2}}}={\frac {2|E|}{|V|(|V|-1)}}}$

where E is the number of edges and V is the number of vertices in the graph.

 The maximum number of edges for an undirected graph is ${\displaystyle {\binom {|V|}{2}}={\frac{|V|(|V|-1)}{2}}}$, so the maximal density is 1 (for complete graphs) and the minimal density is 0

In [219]:
def graph_density(n_vertex, n_edges):
  return (2*n_edges / (n_vertex * (n_vertex - 1)) )

In [220]:
density = graph_density(graph.number_of_nodes(), graph.number_of_edges())
print(density)

0.008107742265085212


In [221]:
#Check graph generation

# #Previously seen:
# #Number of times IronMan has been involved in his or other heroes' comics
# ironman_merge = pd.merge(ironman_h1, ironman_h2, how='outer', left_on='hero2', right_on='hero1')
# ironman_merge.shape[0] 
# #Out: 1521

len(list(graph.edges('IRON MAN/TONY STARK')))

1521

In [222]:
graph_nodes = graph.nodes()

NodeView(('LITTLE, ABNER', 'PRINCESS ZANDA', "BLACK PANTHER/T'CHAL", 'STEELE, SIMON/WOLFGA', 'FORTUNE, DOMINIC', 'ERWIN, CLYTEMNESTRA', 'IRON MAN/TONY STARK', 'IRON MAN IV/JAMES R.', 'RAVEN, SABBATH II/EL', 'CARNIVORE/COUNT ANDR', 'GHOST', 'ZIMMER, ABE', 'FU MANCHU', 'TARR, BLACK JACK', 'SMITH, SIR DENIS NAY', 'SHANG-CHI', 'STARSHINE II/BRANDY', 'ROM, SPACEKNIGHT', 'MAN-THING/THEODORE T', 'WU, LEIKO', 'DOCTOR DREDD', 'RESTON, CLIVE', 'JACKSON, STEVE', 'MYSTIQUE/RAVEN DARKH', 'BLOB/FRED J. DUKES', 'TORPEDO III/BROCK JO', 'PYRO/ALLERDYCE JOHNN', 'AVALANCHE/DOMINIC PE', 'ROGUE /', 'DESTINY II/IRENE ADL', 'HYBRID/JAMES JIMMY M', 'CLARK, SARAH', 'KILLBURN, MACK', 'JONES, TAMMY ANNE', 'JONES, LORRAINE LORR', 'JONES, NELL', 'JONES, DANIEL DANNY', 'CLARK, JOHN', 'SUB-MARINER/NAMOR MA', 'VASHTI', 'SEAWEED MAN', 'NOVA/RICHARD RIDER', 'FIRESTAR/ANGELICA JO', 'THUNDERBALL/DR. ELIO', 'NAMORITA/NITA PRENTI', 'SPEEDBALL/ROBBIE BAL', 'HULK/DR. ROBERT BRUC', 'NIGHT THRASHER/DUANE', 'SPIDER-MAN/PETER PA

In [223]:
graph_edges = graph.edges()

EdgeView([('LITTLE, ABNER', 'PRINCESS ZANDA'), ('LITTLE, ABNER', "BLACK PANTHER/T'CHAL"), ('LITTLE, ABNER', 'CARNIVORE/COUNT ANDR'), ('LITTLE, ABNER', 'IRON MAN/TONY STARK'), ('LITTLE, ABNER', 'GOLDEN-BLADE'), ('LITTLE, ABNER', 'DIXON, GENERAL'), ('LITTLE, ABNER', 'IRON MAN IV/JAMES R.'), ('LITTLE, ABNER', 'JOCASTA'), ('LITTLE, ABNER', 'FUJIKAWA, RUMIKO'), ('LITTLE, ABNER', 'MADAME MENACE/SUNSET'), ('LITTLE, ABNER', 'JACOBS, GLENDA'), ('LITTLE, ABNER', 'WAR MACHINE II/PARNE'), ('LITTLE, ABNER', 'SAPPER'), ('LITTLE, ABNER', 'HOGAN, VIRGINIA PEPP'), ('LITTLE, ABNER', 'BINARY/CAROL DANVERS'), ('LITTLE, ABNER', 'FIN FANG FOOM/MIDGAR'), ('LITTLE, ABNER', 'MANN, DR. J. VERNON'), ('LITTLE, ABNER', 'THOR/DR. DONALD BLAK'), ('LITTLE, ABNER', 'TEMPEST II/NICOLETTE'), ('LITTLE, ABNER', 'JARVIS, EDWIN'), ('LITTLE, ABNER', 'INFERNO III/SAMANTHA'), ('LITTLE, ABNER', 'DECAY II/YOSHIRO HAC'), ('PRINCESS ZANDA', "BLACK PANTHER/T'CHAL"), ('PRINCESS ZANDA', 'CARNIVORE/COUNT ANDR'), ('PRINCESS ZANDA', 'MA

#Generate subgraph

In [224]:
sort_degrees = sorted(list(graph.degree), key= lambda degree: degree[1], reverse= True)
print(sort_degrees)

[('CAPTAIN AMERICA', 1905), ('SPIDER-MAN/PETER PAR', 1737), ('IRON MAN/TONY STARK', 1521), ('THING/BENJAMIN J. GR', 1416), ('MR. FANTASTIC/REED R', 1377), ('WOLVERINE/LOGAN', 1368), ('HUMAN TORCH/JOHNNY S', 1361), ('SCARLET WITCH/WANDA', 1322), ('THOR/DR. DONALD BLAK', 1289), ('BEAST/HENRY &HANK& P', 1265), ('VISION', 1238), ('INVISIBLE WOMAN/SUE', 1236), ('HAWK', 1175), ('WASP/JANET VAN DYNE', 1091), ('ANT-MAN/DR. HENRY J.', 1082), ('CYCLOPS/SCOTT SUMMER', 1078), ('SHE-HULK/JENNIFER WA', 1071), ('STORM/ORORO MUNROE S', 1070), ('ANGEL/WARREN KENNETH', 1070), ('DR. STRANGE/STEPHEN', 1065), ('HULK/DR. ROBERT BRUC', 1055), ('PROFESSOR X/CHARLES', 1032), ('WONDER MAN/SIMON WIL', 1031), ('COLOSSUS II/PETER RA', 1023), ('MARVEL GIRL/JEAN GRE', 1003), ('HERCULES [GREEK GOD]', 989), ('JARVIS, EDWIN', 986), ('SUB-MARINER/NAMOR MA', 979), ('DAREDEVIL/MATT MURDO', 967), ('ICEMAN/ROBERT BOBBY', 944), ('FURY, COL. NICHOLAS', 922), ('JAMESON, J. JONAH', 920), ('BLACK WIDOW/NATASHA', 920), ('QUICKSIL

In [241]:
#Create subgraph with k greatest degree nodes

#Get top k greatest degree nodes
k = 100
greatest_kdeg = []
for node, degree in sort_degrees[:k]:
  greatest_kdeg.append(node)

#Generate subgraph
subgraph = graph.subgraph(greatest_kdeg)
print(nx.info(subgraph))

Name: 
Type: Graph
Number of nodes: 100
Number of edges: 4327
Average degree:  86.5400


In [244]:
# net = Network(notebook=True)
# net.from_nx(subgraph)
# net.repulsion(node_distance=500)
# net.inherit_edge_colors(True)
# net.save_graph('marvel-network-subgraph.html')
# IPython.display.HTML(filename='marvel-network-subgraph.html')

In [237]:
# plt.figure(figsize=(10,10))
# nx.draw_networkx(subgraph)
# plt.show()

#Girvan-Newman