In [211]:
import networkx as nx
import pandas as pd
import numpy as np

**Data Cleaning**

In [212]:
# Import data.
edges = pd.read_csv(r'dataset\archive\edges.csv')
hero_net = pd.read_csv(r'dataset\archive\hero-network.csv')
nodes = pd.read_csv(r'dataset\archive\nodes.csv')

# Remove last space if present.
hero_net['hero1'] = hero_net['hero1'].apply(lambda row: row.rstrip().rstrip('/'))
hero_net['hero2'] = hero_net['hero2'].apply(lambda row: row.rstrip().rstrip('/'))

# Cut to max 20 characters hero's names.
edges['hero'] = edges['hero'].apply(lambda row: row[:20])
nodes['node'] = nodes['node'].apply(lambda row: row[:20])

# Remove row with same hero (self-loop).
hero_net.drop(hero_net[hero_net.hero1 == hero_net.hero2].index, inplace=True)
hero_net = hero_net.reset_index(drop=True)

**First Graph**

For this function use hero_network dataset.

The number of *nodes* is the same of the unique heroes in all the dataset and *edges* are weighted with $w_{AB} = \frac{1}{n_{AB}}$ where $n_{AB}$ is the number of edges between node A and node B and $w_{AB}$ the weight of the single edge between these nodes.

The created graph is `undirected` and `weighted` and there are no self-loops or multiple edges.

In [203]:
def first_graph(dataset):

    # Remake the dataframe sorting the names by row to check duplicates.
    dataset = pd.DataFrame(np.sort(dataset.values), columns=dataset.columns)

    # Store the edges weights in a sorted dictionary.
    edges_weight = dict(sorted(dict(round(1 / (dataset.hero1 + dataset.hero2).value_counts(), 5)).items()))

    # Drop duplicates.
    dataset.drop_duplicates(inplace=True)

    # Create "weight" column.
    dataset = dataset.sort_values(by=['hero1', 'hero2'])
    dataset['weight'] = edges_weight.values()

    # Generate the graph.
    graph = nx.from_pandas_edgelist(dataset, 'hero1', 'hero2', 'weight')

    return graph

**Second Graph**

For this function we just need the dataset with edges to generate the graph.

In [None]:
def second_graph(dataset):
    graph = nx.from_pandas_edgelist(dataset, 'hero', 'comic')
    return graph

---