# Network Visualization Notebook

This notebook provides code to visualize citation networks. It requires cleaned, standardized edge and node CSV files representing the connections between authors.

Import the necessary modules:

In [None]:
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
import plotly.express as px

Load in the node CSV file and quickly inspect it:

In [None]:
node_df = pd.read_csv("nodes.csv")
node_df.head()

Load in the edge CSV file and quickly inspect it:

In [None]:
edge_df = pd.read_csv("edges.csv")
edge_df.head()

The following code adds the nodes and then edges to the network. It also removes any isolate nodes with no connections to other nodes.

In [None]:
G = nx.DiGraph()

# add nodes
for _, row in node_df.iterrows():
    id = row.Id
    name = row.label
    G.add_node(id)


# add edges between nodes
for _, row in edge_df.iterrows():
    source = row.source
    target = row.target
    G.add_edge(source, target)

# remove isolate nodes with no connections
G_copy = G.copy()
isolate_nodes = nx.isolates(G)
G_copy.remove_nodes_from(isolate_nodes)

You can define a threshold value to filter nodes based on their out degree (number of outgoing connections). This code block defines that threshold and then removes nodes from the network below that threshold.

In [None]:
# filtering author based on how many citations they make
threshold = 4
nodes_to_remove = []
nodes_by_degree = dict(G_copy.out_degree())

for key, value in nodes_by_degree.items():
    if value < threshold:
        nodes_to_remove.append(key)
G_copy.remove_nodes_from(nodes_to_remove)

This block updates the set of labels for the network based on the nodes removed in the previous step.

In [None]:
valid_nodes = set(G_copy.nodes())
valid_node_df = node_df.loc[node_df['Id'].isin(valid_nodes)]
n_labels = {id : label for id, label in zip(valid_node_df['Id'], valid_node_df['label'])}

We also define the sizes of the nodes based on their respective degrees. 300 is the default size and is therefore assigned to nodes with the mean value of degree.

In [None]:
# generate scaled sizes for nodes
degrees = dict(nx.degree(G_copy))
scaling_factor = 300/np.mean(list(degrees.values())) # 300 is default size for nodes
node_sizes = [d*scaling_factor for d in degrees.values()]
max_size = max(node_sizes)

This code draws the network based on the remaining nodes and edges and using the attributes defined previously. It also saves it to the working directory.

In [None]:
plt.figure(figsize=(100,60))
nx.draw(
    G_copy,
    # pos=nx.spring_layout(G_copy, k=0.05, iterations=20),
    pos=nx.kamada_kawai_layout(G_copy),
    with_labels=True,
    node_size=node_sizes,
    node_color = node_sizes,
    cmap=plt.cm.RdYlGn,
    vmin=0,
    vmax=max_size,
    alpha=0.7,
    labels=n_labels,
    font_color='blue',
    font_size='18',
    edge_color='black'
)
plt.savefig("graph.png")
plt.show()

### Other Metrics

We can also visualize the network by comparing the in-degrees and out-degrees of authors.

This code generates a dataframe of authors and their in-degree values.

In [None]:
nodes_in_degree = dict(G_copy.in_degree())
sorted_nodes_in = sorted(nodes_in_degree.items(), key=lambda x: x[1], reverse=True)
authors_in_df = pd.DataFrame.from_records(sorted_nodes_in, columns=['Index', 'Citations'])
authors_in_df['Author'] = authors_in_df['Index'].map(n_labels)
authors_in_df.head()

We visualize the top 25 authors with the following bar chart:

In [None]:
in_df = authors_in_df.nlargest(25, 'Citations')
fig = px.bar(in_df, x='Author', y='Citations', title="Top 25 Most-Cited Authors")
fig.show(renderer='svg')

We repeat the process for the out-degree values of the authors:

In [None]:
nodes_out_degree = dict(G_copy.out_degree())
sorted_nodes_out = sorted(nodes_out_degree.items(), key=lambda x: x[1], reverse=True)
authors_out_df = pd.DataFrame.from_records(sorted_nodes_out, columns=['Index', 'Citations'])
authors_out_df['Author'] = authors_out_df['Index'].map(n_labels)
authors_out_df.head()

In [None]:
out_df = authors_out_df.nlargest(25, 'Citations')
fig = px.bar(out_df, x='Author', y='Citations', title="Top 25 Authors Citing Others")
fig.show(renderer='svg')

The following lines will print the names of any authors that appear in the top 25 for both in- and out-degree values.

In [None]:
authors_out = set(out_df['Author'])
authors_in = set(in_df['Author'])
authors_out.intersection(authors_in)