# Graphing Revenge: A Network Analysis of ‘The Count of Monte Cristo’

## Table of Contents

<ol>
    <li><b>Preliminary information about data</b></li>
        <ol>a. Scope of the project</ol>
        <ol>b. Dependencies</ol>
    <li><b>Data extraction and data cleaning</b></li>
        <ol>Import novel sections</ol>
        <ol>Tokenisation and characters extraction</ol>
    <li><b>Construction of a network based on co-occurrence</b></li>
        <ol></ol>
    <li><b>Defining the Narrative Window</b></li>
        <ol>Collinear window strategy</ol>
        <ol>Coplanar co-occurrence window strategy</ol>
    <li><b>Results</b></li>
        <ol></ol>
<ol>

# 1. Preliminary information about data

## Scope of the project

This study deals with the network analysis of Alexandre Dumas’ novel <b>‘The Count of Monte Cristo’</b>, published between 1844 and 1846 in serialized format.This analysis employes the detection of character interactions within the novel based on co-occurrence to construct social network for each of the five volumes of the novel, coherent with the version provided by <a href="https://www.gutenberg.org/ebooks/1184" target="_blank">Project Gutenberg</a>.
The network analysis aims at exploring the novel's core themes of <b>revenge and redemption</b>, by observing how they are reflected in the character networks. To this extent, it aims to show how the social network of Dumas’ novel favors Dan-tès’ plan of revenge, uncovering the most relevant characters and the evolution of their centrali-ty compared to their relationship with Edmond throughout the novel. The related article is available in the dedicated <a href="" target="_blank">GitHub Repository</a>.

## Dependencies

In [7]:
!pip install nameparser
!pip install nltk
!pip install mplcursors
!pip install watermark

Collecting watermark
  Downloading watermark-2.4.3-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading watermark-2.4.3-py2.py3-none-any.whl (7.6 kB)
Installing collected packages: watermark
Successfully installed watermark-2.4.3


In [8]:
import re
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from statistics import mode, median, mean
import networkx as nx
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import mplcursors
import plotly.express as px
import ALL_regex_1_20, ALL_regex_21_40, ALL_regex_41_85_88_117, ALL_regex_86_87
import watermark

In [17]:
%watermark --iversions

networkx  : 2.8.4
nltk      : 3.7
mplcursors: 0.5.3
plotly    : 5.9.0
re        : 2.2.1
matplotlib: 3.7.0
watermark : 2.4.3
numpy     : 1.23.5
pandas    : 1.5.3



# 2. Data extraction and data cleaning
After downloading the text file of the novel from Project Gutenberg, it was divided into <b>eight sections</b> to allow a more precise identification of character names for the substitution with a unique name using regular expressions.

## Import novel sections

In [None]:
# VOLUME 1
with open("/Users/alessandrafailla/Desktop/Network_analysis_test/montecristo_chapter_division2/ch1_20_countmontecristo.txt") as f:
    montecristo_1_20=f.read()
    
with open("/Users/alessandrafailla/Desktop/Network_analysis_test/montecristo_chapter_division2/ch21_27_countmontecristo.txt") as f1:
    montecristo_21_27=f1.read()
    
# VOLUME 2    
with open("/Users/alessandrafailla/Desktop/Network_analysis_test/montecristo_chapter_division2/ch28_40_countmontecristo.txt") as f2:
    montecristo_28_40=f2.read()
    
with open("/Users/alessandrafailla/Desktop/Network_analysis_test/montecristo_chapter_division2/ch41_47_countmontecristo.txt") as f2a:
    montecristo_41_47=f2a.read()

# VOLUME 3 
with open("/Users/alessandrafailla/Desktop/Network_analysis_test/montecristo_chapter_division2/ch48_73_countmontecristo.txt") as f3:
    montecristo_48_73=f3.read()

# VOLUME 4
with open("/Users/alessandrafailla/Desktop/Network_analysis_test/montecristo_chapter_division2/ch74_85_countmontecristo.txt") as f4:
    montecristo_74_85=f4.read()
    
with open("/Users/alessandrafailla/Desktop/Network_analysis_test/montecristo_chapter_division2/ch86_87_countmontecristo.txt") as f4a:
    montecristo_86_87=f4a.read()
    
with open("/Users/alessandrafailla/Desktop/Network_analysis_test/montecristo_chapter_division2/ch88_95_countmontecristo.txt") as f4b:
    montecristo_88_95=f4b.read()
    
# VOLUME 5
with open("/Users/alessandrafailla/Desktop/Network_analysis_test/montecristo_chapter_division2/ch96_117_countmontecristo.txt") as f5:
    montecristo_96_117=f5.read()

Names of characters are researched within each section to be replaced with unique IDs. <b>Regular expressions</b> for a total of 46 characters were developed, taking into consideration all possible name variations. 

In [None]:
for key, value in ALL_regex_1_20.regex_dict.items():
    montecristo_1_20 = re.sub(key, value, montecristo_1_20)
    
for key, value in ALL_regex_21_40.regex_dict.items():
    montecristo_21_27 = re.sub(key, value, montecristo_21_27)
    montecristo_28_40 = re.sub(key, value, montecristo_28_40)
    
for key, value in ALL_regex_41_85_88_117.regex_dict.items():
    montecristo_41_47 = re.sub(key, value, montecristo_41_47)
    montecristo_48_73 = re.sub(key, value, montecristo_48_73)
    montecristo_74_85 = re.sub(key, value, montecristo_74_85)
    montecristo_88_95 = re.sub(key, value, montecristo_88_95)
    montecristo_96_117 = re.sub(key, value, montecristo_96_117) 

for key, value in ALL_regex_86_87.regex_dict.items():
    montecristo_86_87 = re.sub(key, value, montecristo_86_87)

Each section containing IDs for each occurrence of characters in the text is saved.

In [None]:
file_path_1_20 = "/Users/alessandrafailla/Desktop/Network_analysis_test/definitive_sub_chapters/definitive_montecristo_1_20.txt"
file_path_21_27 = "/Users/alessandrafailla/Desktop/Network_analysis_test/definitive_sub_chapters/definitive_montecristo_21_27.txt"
file_path_28_40 = "/Users/alessandrafailla/Desktop/Network_analysis_test/definitive_sub_chapters/definitive_montecristo_28_40.txt"
file_path_41_47 = "/Users/alessandrafailla/Desktop/Network_analysis_test/definitive_sub_chapters/definitive_montecristo_41_47.txt"
file_path_48_73 = "/Users/alessandrafailla/Desktop/Network_analysis_test/definitive_sub_chapters/definitive_montecristo_48_73.txt"
file_path_74_85 = "/Users/alessandrafailla/Desktop/Network_analysis_test/definitive_sub_chapters/definitive_montecristo_74_85.txt"
file_path_86_87 = "/Users/alessandrafailla/Desktop/Network_analysis_test/definitive_sub_chapters/definitive_montecristo_86_87.txt"
file_path_88_95 = "/Users/alessandrafailla/Desktop/Network_analysis_test/definitive_sub_chapters/definitive_montecristo_88_95.txt"
file_path_96_117 = "/Users/alessandrafailla/Desktop/Network_analysis_test/definitive_sub_chapters/definitive_montecristo_96_117.txt"

with open(file_path_1_20, 'w') as file1:
    file1.write(montecristo_1_20)

with open(file_path_21_27, 'w') as file2:
    file2.write(montecristo_21_27)

with open(file_path_28_40, 'w') as file3:
    file3.write(montecristo_28_40)

with open(file_path_41_47, 'w') as file4:
    file4.write(montecristo_41_47)

with open(file_path_48_73, 'w') as file5:
    file5.write(montecristo_48_73)

with open(file_path_74_85, 'w') as file6:
    file6.write(montecristo_74_85)

with open(file_path_86_87, 'w') as file7:
    file7.write(montecristo_86_87)

with open(file_path_88_95, 'w') as file8:
    file8.write(montecristo_88_95)
    
with open(file_path_96_117, 'w') as file9:
    file9.write(montecristo_96_117)



## Tokenisation and characters extraction

### Full text tokenisation

The novel, currently in string format, is converted into a list of tokens using the **nltk** (Natural Language Toolkit) library. Sections are joined to obtain the full text version of the novel.

In [None]:
def wordTokens(text):
    wtokens = nltk.word_tokenize(text.lower())
    wtokens = [w for w in wtokens if w not in '!"#$%&\'()*+, -./:;<=>?@[\]^_`{|}~”“’—‘']
    return wtokens

count_montecristo_full_final = montecristo_1_20 + montecristo_21_27 + montecristo_28_40 + montecristo_41_47 + montecristo_48_73 + montecristo_74_85 + montecristo_86_87 + montecristo_88_95 + montecristo_96_117
tokens_count_montecristo = wordTokens(count_montecristo_full_final.lower())

### Novel volumes tokenisation

Sections are joined to provide the five volumes of the novel. Then, each volume is tokenised to obtain a list of tokens.

In [None]:
montecristo_section_1 = montecristo_1_20 + montecristo_21_27
montecristo_section_2 = montecristo_28_40 + montecristo_41_47
montecristo_section_3 = montecristo_48_73
montecristo_section_4 = montecristo_74_85 + montecristo_86_87 + montecristo_88_95
montecristo_section_5 = montecristo_96_117

In [None]:
tokens_montecristo_section_1 = wordTokens(montecristo_section_1.lower())
tokens_montecristo_section_2 = wordTokens(montecristo_section_2.lower())
tokens_montecristo_section_3 = wordTokens(montecristo_section_3.lower())
tokens_montecristo_section_4 = wordTokens(montecristo_section_4.lower())
tokens_montecristo_section_5 = wordTokens(montecristo_section_5.lower())

### Extract characters and character indices

Creation of a list and a dictionary containing character names extracted from the regular expression dictionary. The character dictionary contains names as keys and empty dictionaries as values. Additionally, we extract character indices within the tokenised text, that will be used to find co-occurrences.

In [None]:
# Replace character occurrences with unique character IDs

char_dict={}
char_list = []

for char1 in ALL_regex_1_20.regex_dict.values():
    char_dict[char1.lower()]={}

for char2 in ALL_regex_21_40.regex_dict.values():
    if char2.lower() not in char_dict:
        char_dict[char2.lower()]={}
        
for char3 in ALL_regex_41_85_88_117.regex_dict.values():
    if char3.lower() not in char_dict:
        char_dict[char3.lower()]={}

for char4 in ALL_regex_86_87.regex_dict.values():
    if char4.lower() not in char_dict:
        char_dict[char4.lower()]={}


for char in char_dict.keys():
    char_list.append(char)

In [None]:
# Returns a dictionary containing indices of character mentions within the tokenised text
def position_dict(chars, my_words):
    pos_dict = {}
    for i in chars.keys():
        k = []
        for ix, j in enumerate(my_words):
            if j == i.lower():
                k.append(ix)
        pos_dict[i] = np.array(k)
    return pos_dict

In [None]:
indices_dict = position_dict(char_dict, tokens_count_montecristo)
indices_dict_vol1 = position_dict(char_dict, tokens_montecristo_section_1)
indices_dict_vol2 = position_dict(char_dict, tokens_montecristo_section_2)
indices_dict_vol3 = position_dict(char_dict, tokens_montecristo_section_3)
indices_dict_vol4 = position_dict(char_dict, tokens_montecristo_section_4)
indices_dict_vol5 = position_dict(char_dict, tokens_montecristo_section_5)

# 3. Construction of network based on co-occurrence

The functions defined in this section allow to build a characters interaction dictionary based on **co-occurrence** of character mentions in the tokenised novel. After removing characters with less than 3 interactions, we create a list of tuples, each tuple representing the interaction between two characters. We provide a method to merge the results of interaction detection retrieved through collinear and co-planar method, to integrate interactions detected only through collinear approach. Finally, we provide a **graph constructor** that provides a graph starting from the list of tuples representing interactions.

In [None]:
# Returns a dictionary containing links among characters, given a distance threshold (narrative window)
# Characters' IDs are used as keys, while nested dictionaries containing names of characters and
# number of interactions are the values. Characters without any interaction are removed
def links_dic_f(indices_dic, threshold):
    link_dic = {}
    rem_set = set()
    for first_char, ind_arr1 in indices_dic.items():
        dic = {}
        for second_char, ind_arr2 in indices_dic.items():
            if first_char == second_char:
                continue
            matr = np.abs(ind_arr1[np.newaxis].T - ind_arr2) <= threshold
            s = np.sum(matr)
            if s > 3:
                dic[second_char] = s
        link_dic[first_char] = dic
    
    for key in link_dic:
        if link_dic[key] == {}:
            rem_set.add(key)

    for key in rem_set:
        del link_dic[key]

    return link_dic


# Returns a list of tuples that represent the connection between characters (the edges of the graph)
def edge_tuples_f(link_dic):
    edges_tuples = []
    for key in link_dic:
        for item, value in link_dic[key].items():
            tup = (key, item, value)
            edges_tuples.append(tup)
            
    return edges_tuples


# Adds nodes if they don't exist already, adds edge between characters and the number of interactions
def add_nodes_and_weighted_edges_from_list(graph, edges_list):
    for edge in edges_list:
        character1, character2, num_interactions = edge
        if character1 not in graph:
            graph.add_node(character1)
        if character2 not in graph:
            graph.add_node(character2)
        graph.add_edge(character1, character2, weight=num_interactions)
        graph.add_edge(character2, character1, weight=num_interactions)  # Add edge in both directions

    return graph

def update_combined_window(collinear_interactions_dict, coplanar_interact_dict):
    for key in collinear_interactions_dict.keys():
        if key not in coplanar_interact_dict:
            coplanar_interact_dict[key]=collinear_interactions_dict[key]
        else:
            for val in collinear_interactions_dict[key]:
                if val not in coplanar_interact_dict[key]:
                    coplanar_interact_dict[key][val] = collinear_interactions_dict[key][val]
                    
    return coplanar_interact_dict



#  4. Defining the Narrative Window

To perform a network analysis on the novel, we first need to define **criteria to identify an interaction** between two characters in the text. To do so, we need to identify a **narrative window**, a number of words identifying the boundaries within the text that can include one interaction.
We rely onto two different approaches to define the narrative window: the **collinear and the co-planar approach**, that will be then used complementarily to find an optimal solution.

## Collinear window strategy

Applying a **collinear approach** to find the optimal narrative window to identify character interactions consists of searching consequent characters mentions within the text. Thus, given one character, we identify their interaction only with characters mentioned consequently within the window.
We build dictionaries of collinear interactions for **narrative windows ranging from 10 to 500**. We then compute the **edge density** for each interaction dictionary and plot the resulting values, with edge density on the y axis and the window size on the x axis. Finally, we identify the narrative window as the value on the x axis where the plot starts to flatten.

In [None]:
def coll_appr(text, char_lst, window):
    dict_interazioni ={}
    for ix, i in enumerate(text):
        if i in char_lst:
            end = min(ix+window, len(text))
            for j in text[ix+1:end]:
                if j in char_lst and j != i:
                    if i not in dict_interazioni:
                        dict_interazioni[i] = {j: 1}
                        break
                    else:
                        if j not in dict_interazioni[i]:
                            dict_interazioni[i][j] = 1
                            break
                        else:
                            dict_interazioni[i][j] += 1
                            break

    # Remove interactions occurring less than 3 times
    interactions_to_remove = []
    for char, interactions in dict_interazioni.items():
        for partner, count in interactions.items():
            if count < 3:
                interactions_to_remove.append((char, partner))
    for char, partner in interactions_to_remove:
        del dict_interazioni[char][partner]
                            
    return dict_interazioni

In [None]:
def collinear_window_iterations(window_min, window_max, tokens, char_list):
    dict_interactions_coll = {}
    dict_of_graphs = {}
    range_of_windows = range(window_min, window_max)
    for number in range_of_windows:
        G = nx.Graph()
        coll_interactions_link = coll_appr(tokens, char_list, number)
        dict_interactions_coll[number] = coll_interactions_link
        G = add_nodes_and_weighted_edges_from_list(G, edge_tuples_f(coll_interactions_link))
        dict_of_graphs[number] = G
    return dict_interactions_coll, dict_of_graphs

In [None]:
coll_dict_interactions, dict_Gfull_collinear = collinear_window_iterations(10, 500, tokens_count_montecristo, char_list)
coll_dict1_interactions, dict_Gvol1_collinear = collinear_window_iterations(10, 500, tokens_montecristo_section_1, char_list)
coll_dict2_interactions, dict_Gvol2_collinear = collinear_window_iterations(10, 500, tokens_montecristo_section_2, char_list)
coll_dict3_interactions, dict_Gvol3_collinear = collinear_window_iterations(10, 500, tokens_montecristo_section_3, char_list)
coll_dict4_interactions, dict_Gvol4_collinear = collinear_window_iterations(10, 500, tokens_montecristo_section_4, char_list)
coll_dict5_interactions, dict_Gvol5_collinear = collinear_window_iterations(10, 500, tokens_montecristo_section_5, char_list)

In [None]:
# Calculate and return the density of a graph 
def calculate_density(graph):
    num_nodes = 46
    num_edges = len(graph.edges)

    if num_nodes <= 1:
        return 0.0 

    density = (2.0 * num_edges) / (num_nodes * (num_nodes - 1))

    return density

# Plot edge density to find the value at which the plot plateaus
def plot_edge_density(dict_graphs, flat_value = 100):
    densities = []
    threshold_list = []

    for threshold, graph in dict_graphs.items():
        density = calculate_density(graph)
        densities.append(density)
        threshold_list.append(threshold)
    
    # Create a plotly figure
    fig = px.line(x=threshold_list, y=densities, labels={'x': 'Threshold', 'y': 'Edge Density'})

    # Add interactive hover labels
    fig.update_traces(mode='lines+markers', hovertemplate="Threshold: %{x}<br>Edge Density: %{y:.4f}")

    # Show the figure
    fig.show()
    
    
    delta_density = []
    for i,j in zip(densities[1:], densities[:-1]):
        delta_density.append(i-j)
        
        
    idx = 0
    current_value = 0
    while idx < len(delta_density) and not current_value == flat_value:
        if delta_density[idx] == 0:
            current_value += 1
        else:
            current_value = 0
        idx += 1
        
        
    if current_value == flat_value:
        plateau = idx-flat_value+int(threshold_list[0])
        
        
    # Create a plotly figurea
    fig = px.line(x=threshold_list[:-1], y=delta_density, labels={'x': 'Threshold', 'y': 'Edge Density'})

    # Add interactive hover labels
    fig.update_traces(mode='lines+markers', hovertemplate="Threshold: %{x}<br>Edge Density: %{y:.4f}")

    # Show the figure
    fig.show()
    
    return plateau
    
# Plot the values of density for the given threshold range
#plot_edge_density(dict_of_graphs)
plateau_1 = plot_edge_density(dict_Gvol1_collinear)
plateau_2 = plot_edge_density(dict_Gvol2_collinear)
plateau_3 = plot_edge_density(dict_Gvol3_collinear)
plateau_4 = plot_edge_density(dict_Gvol4_collinear)
plateau_5 = plot_edge_density(dict_Gvol5_collinear)

coll_narr_window = (plateau_1 + plateau_2 + plateau_3 + plateau_4 + plateau_5)/5
print("Collinear average window size:", coll_narr_window)

## Coplanar Co-occurrence Window Strategy

The <b>coplanar</b> strategy considers all character mentions within a designated window of text, even if they are not consecutive. It requires a different approach for deriving narrative window sizes due to the continuous increase in edge density with window size increments. Instead, the number of tokens between characters, “gaps”, is examined, treating these gaps as boundaries of character interaction events to generate window sizes based on statistically derived upper limits. Specifically, the <b>interquartile range</b> (iqr = q3 - q1) is used to define the probable upper limits, with any elements outside these limits considered as suspected outliers and left out, resulting in three window sizes for analysis. The smallest discretized window size value, Q3 = 178, was adopted for this analysis.

In [None]:
def calculate_gaps(indices_dic):
    # Calculate the gaps between consecutive mentions
    gaps = []
    for character_indices in indices_dic.values():
        gaps.extend(np.diff(character_indices))
    return gaps

def generate_window_sizes(gaps):
    # Analyze the distribution of gaps
    q1 = np.percentile(gaps, 25)
    q3 = np.percentile(gaps, 75)
    iqr = q3 - q1

    # Define lower and upper bounds for gaps
    inf_dg = q1 - 1.5 * iqr
    sup_dg = q3 + 1.5 * iqr

    # Remove suspected outliers
    filtered_gaps = [gap for gap in gaps if inf_dg <= gap <= sup_dg]

    # Generate window sizes
    wp1 = q3
    wp2 = (sup_dg + q3) / 2
    wp3 = sup_dg

    return wp1, wp2, wp3

# Calculate gaps
gaps = calculate_gaps(indices_dict)
gaps1 = calculate_gaps(indices_dict_vol1)
gaps2 = calculate_gaps(indices_dict_vol2)
gaps3 = calculate_gaps(indices_dict_vol3)
gaps4 = calculate_gaps(indices_dict_vol4)
gaps5 = calculate_gaps(indices_dict_vol5)


# Generate window sizes
window_sizes = generate_window_sizes(gaps)
window_sizes1 = generate_window_sizes(gaps1)
window_sizes2 = generate_window_sizes(gaps2)
window_sizes3 = generate_window_sizes(gaps3)
window_sizes4 = generate_window_sizes(gaps4)
window_sizes5 = generate_window_sizes(gaps5)


print("Window Sizes:", window_sizes)
print("Window Sizes vol1:", window_sizes1)
print("Window Sizes vol2:", window_sizes2)
print("Window Sizes vol3:", window_sizes3)
print("Window Sizes vol4:", window_sizes4)
print("Window Sizes vol5:", window_sizes5)


In [None]:
definitive_window_wp1 = (window_sizes1[0] + window_sizes2[0] + window_sizes3[0] + window_sizes4[0] + window_sizes5[0])/5
definitive_window_wp2 = (window_sizes1[1] + window_sizes2[1] + window_sizes3[1] + window_sizes4[1] + window_sizes5[1])/5
definitive_window_wp3 = (window_sizes1[2] + window_sizes2[2] + window_sizes3[2] + window_sizes4[2] + window_sizes5[2])/5

definitive_window_wp1, definitive_window_wp2, definitive_window_wp3

# 5. Results

## Graphs constructors

The following functions construct network graphs to visualize character interactions in a narrative text. The <code>collinear_graph</code> function focuses on collinear relationships, while the <code>coplanar_graph</code> function emphasizes coplanar relationships. Both functions utilize the <code>final_graph</code> function to create and export the graphs for visualization.

In [None]:
# Graph constructor
def final_graph(interact_dct, edges, num):
    G = nx.Graph()
    G.add_nodes_from(interact_dct)
    G.add_weighted_edges_from(edges)
    outfile= "Graph_"+num+".gexf"
    nx.write_gexf(G, outfile)
    return G

# Collinear graph
def collinear_graph(tokens, characters, window, file_name: str):
    collinear_interactions = coll_appr(tokens, characters, window)
    collinear_edges = edge_tuples_f(collinear_interactions)
    collinear_graph = final_graph(collinear_interactions, collinear_edges, file_name)
    return collinear_graph

# Co-planar graph
def coplanar_graph(indices, characters, window, filename: str):
    coplanar_interactions = links_dic_f(indices, window)
    coplanar_edges = edge_tuples_f(coplanar_interactions)
    coplanar_graph = final_graph(coplanar_interactions, coplanar_edges, filename)
    return coplanar_graph

## Centrality measures

These functions calculate and visualize centrality measures for nodes in a given graph. The <code>calc_centralities</code> function computes four centrality metrics - degree centrality, betweenness centrality, eigenvector centrality, and closeness centrality - for the nodes in the graph and returns them as a DataFrame. The <code>plot_centrality</code> function plots the top 15 nodes ranked by a specified centrality metric and saves the plot as an image file. Finally, the <code>centralities_function</code> combines these two functions to calculate centrality measures, save them to a CSV file, generate centrality plots, and save them as PNG files, all associated with a given graph.

In [None]:
def calc_centralities(graph):
    
    dgc = nx.degree_centrality(graph)
    dgc = pd.DataFrame.from_dict(dgc, orient='index', columns=["DGC"])
    btc = nx.betweenness_centrality(graph)
    btc = pd.DataFrame.from_dict(btc, orient='index', columns=["BTC"])
    evc = nx.eigenvector_centrality(graph, weight='weight', max_iter=600)
    evc = pd.DataFrame.from_dict(evc, orient='index', columns=["EVC"])
    clc = nx.closeness_centrality(graph)
    clc = pd.DataFrame.from_dict(clc, orient='index', columns=["CLC"])
    df = pd.concat([dgc, btc, evc, clc], axis=1)
    return df


def plot_centrality(centr, df, title, n, col_list):
    
    ax = plt.subplot(2, 2, n)
    s = df.sort_values(centr, ascending=False)[:10]
    x = list(s[centr].index)[::-1]
    y = list(s[centr])[::-1]
    
    for i, v in enumerate(y):
        bars = ax.barh(x[i], v, color=col_list[n-1])
        ax.bar_label(bars, fmt="%.2f", label_type="center")
    
    plt.title(title, size=22)
    ax.get_xaxis().set_visible(False)
    ax.spines['top'].set_visible(False)
    ax.spines['bottom'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['left'].set_visible(False)
    ax.tick_params(axis='y', length = 0, labelsize=14)

    
def centralities_function(G, name):
    my_centr=calc_centralities(G)
    my_centr.to_csv("Centr_"+name+".csv")
    figfile_centr="Centr_"+name+".png"
    col_list = ["peachpuff", "plum", "orange", "CornflowerBlue"]
    fig, ax = plt.subplots(2,2, figsize=(12, 10))
    plt.tight_layout(w_pad=15)
    plot_centrality("DGC", my_centr, 'Degree Centrality', 1, col_list)
    plot_centrality("EVC", my_centr, 'Closeness Centrality', 2, col_list)
    plot_centrality("BTC", my_centr, 'Betweeness Centrality', 3, col_list)
    plot_centrality("EVC", my_centr, 'Eigenvector Centrality', 4, col_list)
    plt.savefig(figfile_centr, dpi=300, bbox_inches='tight')
    return my_centr, figfile_centr

# Results

This sections presents the results for each volume of the novel, as well as for the complete novel. Graphs resulting from <b>coplanar</b>, <b>collinear</b>, and <b>combined</b> approaches are provided, as well as a graphical representation of the highest centrality measures for the combined approach, adopted for the analysis. 

## Volume 1

In [None]:
# Interaction dictionaries (collinear and co-planar)
coll_int_v1 = coll_appr(tokens_montecristo_section_1, char_list, 227)
copl_int_v1_wp1 = links_dic_f(indices_dict_vol1, 178)

# Collinear edges and graph
coll_edges_v1 = edge_tuples_f(coll_int_v1)
collinear_graph_vol1 = collinear_graph(tokens_montecristo_section_1, char_list, 227, "_collinear_vol1")

# Co-planar graph for the smallest coplanar window size
coplanar_graph_vol1_wp1 = coplanar_graph(indices_dict_vol1, char_dict, 178, "_coplanar_vol1_wp1")

# Interaction dictionary from integration of coplanar and collinear, for the smallest coplanar window size
combined_v1_wp1 = update_combined_window(coll_int_v1, copl_int_v1_wp1)

# Edges and graph for combined strategy, for the smallest coplanar window size
combined_edges_v1_wp1 = edge_tuples_f(combined_v1_wp1)

combined_graph_v1_wp1 = final_graph(combined_v1_wp1, combined_edges_v1_wp1, "v1_combined_1_20")

# Centrality measures for combined strategy, for the smallest coplanar window size
combined_centr_v1_wp1 = calc_centralities(combined_graph_v1_wp1)
top_combined_centr_v1_wp1 = centralities_function(combined_graph_v1_wp1, 'v1_combined_1_20')

## Volume 2

In [None]:
# Interaction dictionaries (collinear and co-planar)
coll_int_v2 = coll_appr(tokens_montecristo_section_2, char_list, 227)
copl_int_v2_wp1 = links_dic_f(indices_dict_vol2, 178)

# Collinear edges and graph
coll_edges_v2 = edge_tuples_f(coll_int_v2)
collinear_graph_vol2 = collinear_graph(tokens_montecristo_section_2, char_list, 227, "_collinear_vol2")

# Co-planar graph for the smallest coplanar window size
coplanar_graph_vol2_wp1 = coplanar_graph(indices_dict_vol2, char_dict, 178, "_coplanar_vol2_wp1")

# Interaction dictionary from integration of coplanar and collinear, for the smallest coplanar window size
combined_v2_wp1 = update_combined_window(coll_int_v2, copl_int_v2_wp1)

# Edges and graph for combined strategy, for the smallest coplanar window size
combined_edges_v2_wp1 = edge_tuples_f(combined_v2_wp1)

combined_graph_v2_wp1 = final_graph(combined_v2_wp1, combined_edges_v2_wp1, "v2_combined_1_20")

# Centrality measures for combined strategy, for the smallest coplanar window size
combined_centr_v2_wp1 = calc_centralities(combined_graph_v2_wp1)
top_combined_centr_v2_wp1 = centralities_function(combined_graph_v2_wp1, 'v2_combined_1_20')

## Volume 3

In [None]:
# Interaction dictionaries (collinear and co-planar)
coll_int_v3 = coll_appr(tokens_montecristo_section_3, char_list, 227)
copl_int_v3_wp1 = links_dic_f(indices_dict_vol3, 178)

# Collinear edges and graph
coll_edges_v3 = edge_tuples_f(coll_int_v3)
collinear_graph_vol3 = collinear_graph(tokens_montecristo_section_3, char_list, 227, "_collinear_vol3")

# Co-planar graph for the smallest coplanar window size
coplanar_graph_vol3_wp1 = coplanar_graph(indices_dict_vol3, char_dict, 178, "_coplanar_vol3_wp1")

# Interaction dictionary from integration of coplanar and collinear, for the smallest coplanar window size
combined_v3_wp1 = update_combined_window(coll_int_v3, copl_int_v3_wp1)

# Edges and graph for combined strategy, for the smallest coplanar window size
combined_edges_v3_wp1 = edge_tuples_f(combined_v3_wp1)
combined_graph_v3_wp1 = final_graph(combined_v3_wp1, combined_edges_v3_wp1, "v3_combined_1_20")

# Centrality measures for combined strategy, for the smallest coplanar window size
combined_centr_v3_wp1 = calc_centralities(combined_graph_v3_wp1)
top_combined_centr_v3_wp1 = centralities_function(combined_graph_v3_wp1, 'v3_combined_1_20')

## Volume 4

In [None]:
# Interaction dictionaries (collinear and coplanar)
coll_int_v4 = coll_appr(tokens_montecristo_section_4, char_list, 227)
copl_int_v4_wp1 = links_dic_f(indices_dict_vol4, 178)

# Collinear edges and graph
coll_edges_v4 = edge_tuples_f(coll_int_v4)
collinear_graph_vol4 = collinear_graph(tokens_montecristo_section_4, char_list, 227, "_collinear_vol4")

# Co-planar graph for the smallest coplanar window size
coplanar_graph_vol4_wp1 = coplanar_graph(indices_dict_vol4, char_dict, 178, "_coplanar_vol4_wp1")

# Interaction dictionary from integration of coplanar and collinear, for the smallest coplanar window size
combined_v4_wp1 = update_combined_window(coll_int_v4, copl_int_v4_wp1)

# Edges and graph for combined strategy, for the smallest coplanar window size
combined_edges_v4_wp1 = edge_tuples_f(combined_v4_wp1)
combined_graph_v4_wp1 = final_graph(combined_v4_wp1, combined_edges_v4_wp1, "v4_combined_1_20")

# Centrality measures for combined strategy, for the smallest coplanar window size
combined_centr_v4_wp1 = calc_centralities(combined_graph_v4_wp1)
top_combined_centr_v4_wp1 = centralities_function(combined_graph_v4_wp1, 'v4_combined_1_20')

## Volume 5

In [None]:
# Interaction dictionaries (collinear and co-planar)
coll_int_v5 = coll_appr(tokens_montecristo_section_5, char_list, 227)
copl_int_v5_wp1 = links_dic_f(indices_dict_vol5, 178)

# Collinear edges and graph
coll_edges_v5 = edge_tuples_f(coll_int_v5)
collinear_graph_vol5 = collinear_graph(tokens_montecristo_section_5, char_list, 227, "_collinear_vol5")

# Co-planar graph for the smallest coplanar window size
coplanar_graph_vol5_wp1 = coplanar_graph(indices_dict_vol5, char_dict, 178, "_coplanar_vol5_wp1")

# Interaction dictionary from integration of coplanar and collinear, for the smallest coplanar window size
combined_v5_wp1 = update_combined_window(coll_int_v5, copl_int_v5_wp1)

# Edges and graph for combined strategy, for the smallest coplanar window size
combined_edges_v5_wp1 = edge_tuples_f(combined_v5_wp1)
combined_graph_v5_wp1 = final_graph(combined_v5_wp1, combined_edges_v5_wp1, "v5_combined_1_20")

# Centrality measures for combined strategy, for the smallest coplanar window size
combined_centr_v5_wp1 = calc_centralities(combined_graph_v5_wp1)
top_combined_centr_v5_wp1 = centralities_function(combined_graph_v5_wp1, 'v5_combined_1_20')

## Graph full text

In [None]:
# Interaction dictionaries (collinear and co-planar)
coll_int_full = coll_appr(tokens_count_montecristo, char_list, 227)
copl_int_full_wp1 = links_dic_f(indices_dict, 178)

# Collinear edges and graph
coll_edges_full = edge_tuples_f(coll_int_full)
collinear_graph_full = collinear_graph(tokens_count_montecristo, char_list, 227, "_collinear_full")

# Co-planar graph for the smallest coplanar window size
coplanar_graph_full_wp1 = coplanar_graph(indices_dict, char_dict, 178, "_coplanar_full_wp1")

# Interaction dictionary from integration of coplanar and collinear, for each window size (wp1, wp2)
combined_full_wp1 = update_combined_window(coll_int_full, copl_int_full_wp1)

# Edges and graph for combined strategy, for the smallest coplanar window size
combined_edges_full_wp1 = edge_tuples_f(combined_full_wp1)
combined_graph_full_wp1 = final_graph(combined_full_wp1, combined_edges_full_wp1, "full_combined_1_20")

# Centrality measures for combined strategy, for the smallest coplanar window size
combined_centr_full_wp1 = calc_centralities(combined_graph_full_wp1)
top_combined_centr_full_wp1 = centralities_function(combined_graph_full_wp1, 'full_combined_1_20')

## Plotting centrality measures

The evolution of  Edmond Dantes' enemies' degree centrality throughout the novel is plotted to provide an additional insight on the development of their role throughout the narrative.

In [None]:
def plot_degree_centrality(data):
    # Extracting data from the table
    characters = data.index.tolist()
    volumes = data.columns.tolist()
    num_volumes = len(volumes)
    
    # Creating a new graph
    G = nx.Graph()
    
    # Adding nodes to the graph
    for character in characters:
        G.add_node(character)
    
    # Adding edges with weights based on degree centrality values
    for i, volume in enumerate(volumes):
        for j, character1 in enumerate(characters):
            for k, character2 in enumerate(characters):
                if j != k:
                    G.add_edge(character1, character2, weight=data.iloc[j, i] + data.iloc[k, i])
    
    # Plotting the line graph
    plt.figure(figsize=(8, 5))
    for character in characters:
        centrality_values = [data.loc[character, volume] for volume in volumes]
        plt.plot(range(1, num_volumes + 1), centrality_values, label=character)
        
    for character in characters:
        centrality_values = [data.loc[character, volume] for volume in volumes]
        for i, volume in enumerate(volumes):
            plt.annotate(str(round(centrality_values[i], 2)), (i + 1, centrality_values[i]), textcoords="offset points", xytext=(0,3), ha='center')


    # Customizing plot
    plt.xlabel('Volume')
    plt.ylabel('Degree Centrality')
    plt.title('Degree Centrality of Dantès Enemies across Volumes')
    plt.xticks(range(1, num_volumes + 1), volumes)
    plt.legend()
    plt.grid(True)
    plt.show()

# Sample data
data = {
    'v1': [0.60, 0.67, 0.67, 0.73],
    'v2': [0.40, 0.37, 0.14, 0.09],
    'v3': [0.58, 0.61, 0.39, 0.11],
    'v4': [0.60, 0.66, 0.57, 0.29],
    'v5': [0.59, 0.56, 0.06, 0.12]
}

characters = ['Danglars', 'Villefort', 'Fernand', 'Caderousse']

df = pd.DataFrame(data, index=characters)

# Calling the function with the sample data
plot_degree_centrality(df)