### Setting up the network



With the character lists in place, and the books cleaned, we are now ready to create the network graph that we wish to analyze. Since we will need multiple graphs for the different analyses we will perform later, the first usefull graph we can look at contains the complete network of each book. As well as this we wish to be able to differentiate between the chapters for said books in order to keep track of where each appearance and connection between characters take place.

First of all let's consider a dictionary of graphs, that represent the complete network for each book. For this we use the functions as seen below:
`get_edge_chapter_weight(character1, character2, chapter)` calculates the edge weight between two lists of characters, which is then used in the `get_edge_book_weight(character1, character2, book)` function to get the edges between two characters for a given book. This is done by looping over all chapters of the book whilst calling the `get_edge_chapter_weight()` function. The `get_node_size(character, book)` function also loops over every chapter in a book to calculate the total appearances of every character. 

In [None]:
def get_edge_chapter_weight(character1: list, character2: list, chapter: list):
    weight = 0
    for page in chapter:
        page_text = ' '.join(page)
        if any([char in page_text for char in character1]) and any([char in page_text for char in character2]):
                weight += 1
    
    return weight

def get_edge_book_weight(character1: list, character2: list, book: dict):
    weight = defaultdict(int)
    for chapternr in range(1, len(book)+1):
        weight[chapternr] = get_edge_chapter_weight(character1, character2, book[chapternr])
    return weight

def get_node_size(character: list, book: dict):
    size = defaultdict(int)
    for chapternr, chapter  in book.items():
        for page in chapter:
            page_text = ' '.join(page)
            if any([char in page_text for char in character]):
                size[chapternr] += 1
    return size

These functions are then combined in a loop regarding all books, where the full network of each book, is added with size and edge weights between characters. This process is seen in the loop below:

In [None]:
from Utils.network_utils import get_node_size, get_edge_book_weight

Boook_networks = {}

for book_nr in range(1,8):
    # Load formatted book (dict of chapters with list of lists of pages and sentences)
    with open(f"1.Dataset_files/Books_formatted/Book{book_nr}.pkl","rb") as f:
        book = pkl.load(f)

    print(f"Book {book_nr}, Number of chapters: {len(book)}, number of pages: {sum([len(chapter) for chapter in book.values()])}, number of sentences: {sum([sum([len(page) for page in chapter]) for chapter in book.values()])}")


    # Draw network
    G = nx.Graph()

    # Add nodes to graph
    for character in tqdm(characters, desc='Adding character nodes to graph...'):
        character['sizes'] = get_node_size(character['aliases'], book)
        G.add_node(character['name'], sizes=character['sizes'], attr=character['aliases'])

    # Add edges to graph
    for i, char1 in enumerate(tqdm(characters, desc='Adding edges to graph...')):
        for j, char2 in enumerate(characters[i+1:]):
            edge_weight = get_edge_book_weight(char1['aliases'], char2['aliases'], book)
            if sum(edge_weight.values()) != 0:
                G.add_edge(char1['name'], char2['name'], weight=edge_weight)

    # Save graph
    Boook_networks[book_nr] = G.copy()

In summary this dictionary of networks contain networks corresponding to each book. Each of these networks have nodes and edges that consist of character appearances- and co-appearances. Each of the nodes contain the attributes `name`, `size`, and `chapter_nr`. Therefore the network can be used to find the network state at a specific point in time in the books. This will be explored below.