# The Fairy Network!

In [125]:
import pandas as pd
import numpy as np
import itertools
from pyvis import network as net
from pyvis.network import Network
import networkx as nx
import igraph

Fortunately, because of the way the transcripts are written, every dialogue is written in a single line. Therefore, we can read the file line by line to get the dialogues. However, it is important to note that a single line may not be a dialogue but a scene description, or in other cases dialogues may have scene descrptions inside of them, meaning that just because a name is present in a line doesn't mean that somebody is referring to that character. (maybe strip those away?)

In [126]:
Transcripts = {}
for i in range(1,15):
    with open(f"Transcripts/Episode {i}.txt", "r", encoding= "ISO-8859-1") as file:
        try:
            Transcripts["Episode " + f"{i}"] = file.readlines()
        except UnicodeDecodeError:
            print(f"Decoding error in Episode {i}")


# Note: Episodes 13 and 14 couldn't be read due to decoding issues 
# under the standard encoding UTF-8. Encoding ISO-8859-1 which is also
# common seemed to work properly.
 

In [127]:
Transcripts["Episode 1"][0:5]

['(Bugle playing Reveille is heard; Timmy is asleep, snoring; camera points to Cosmo and Wanda in fish form)\n',
 'Wanda: Ready, Cosmo?\n',
 'Cosmo: Ready, Wanda.\n',
 'Cosmo and Wanda: 1, (turn to their fairy form) 2, 3!\n',
 'Cosmo: (flies near Timmy) Wakey-wakey, Timmy!\n']

## Cleaning Transcripts
We will now strip all those lines that describe a scene from the transcripts. A quick inspection suggests that such lines begin either with a parenthesis or a square bracket. Thus if a line satisfies this condition it will be discarded from the transcript.

In [128]:
for i in range(1,15):
    for line in Transcripts[f"Episode {i}"]:
        if line[0] == "(":
            Transcripts[f"Episode {i}"].remove(line)
        elif line[0] == "[":
            Transcripts[f"Episode {i}"].remove(line)

Transcripts["Episode 1"][0:5]

['Wanda: Ready, Cosmo?\n',
 'Cosmo: Ready, Wanda.\n',
 'Cosmo and Wanda: 1, (turn to their fairy form) 2, 3!\n',
 'Cosmo: (flies near Timmy) Wakey-wakey, Timmy!\n',
 "Wanda: Oh, come on, little fella, even though we're your (lifts Timmy up in the air with her wand) fairy godparents...\n"]

## Creating the Network

### Nodes
First we need to find all the potential nodes for the network, and to do so we have to keep the following in mind:

 - There will be dialogues of unimportant characters which we don't want to take into account.
 - Some characters may have more than one way to refer to them (e.g. Mr. Turner is called "Dad" by Timmy)
 - Coupled with the previous one, there may be inconsistencies in the way characters are referred to 
by the writers between episodes.

In order to get all the possible candidates for nodes, notice that every line that corresponds to a dialogue starts with the name of the character that speaks, followed by a colon and then the actual dialogue. We will use this to find all characters that speak.

In [129]:
# Find the first instance of a colon (":") and everything that comes before that may be considered a potential node.

def Retrieve_speaker(dialogue):
    colon = dialogue.find(":")
    if colon != -1:
        character = dialogue[:colon]
    else:
        character = "No character"
    return character

def Characters_in_episode(episode_num):
    episode = Transcripts["Episode " + str(episode_num)]
    characters = []
    for line in episode:
        speaker = Retrieve_speaker(line)
        if speaker not in characters:
            characters.append(speaker)
    return characters

Characters_in_episode(11)
# We may be missing Chompy in this episode as well as Phillip in other ones.



['Timmy',
 'Wanda',
 'Cosmo',
 'Vicky',
 'Mayor',
 'No character',
 'Male police officer',
 'Miss Dimmsdale',
 'Girl #1',
 'Girl #2',
 'Mrs. Turner',
 'Mr. Turner',
 'Journalist #1',
 "Timmy's subconscious",
 'Chet Ubetcha',
 'Crowd']

In [130]:
def Possible_Characters():
    characters = []
    for i in range(1,15):
        characters += Characters_in_episode(i)
    characters = list(set(characters)) # drop duplicate instances of characters
    return characters
characters_clean = Possible_Characters()
len(characters_clean)

162

In [131]:
characters_clean[0:10]

['Bus Driver',
 "70's Cosmo & Wanda",
 'Yugopotamian Kids',
 'Baby New Year',
 'Cosmo and Wanda',
 'Girl #1',
 'Dr. Bender and Wendell',
 'Fairy #3',
 'Tour guide',
 'Teddy Bear']

As it can be seen, we may encounter issues like: unison dialogues (e.g "Tad and Chad"), irrelevant characters (e.g. "Kids") or even nonsensical characters (e.g. "Everyone"). Because of this, it is important to handle these cases, and to try to do so in an automated way as much as possible (161 characters is still quite large to do it by hand). We will do this in two stages, a first general stage which will be done now and a second "episode dependent" stage later on. 

For the first stage we will remove every character whose name has either the string "Kid" or "#" inside, since either one of these probably makes reference to an irrelevant character. Coupled with this, we will also remove strings that have conjunctions such as "and" and "&" to mitigate redundance.

In [132]:
unwanted_strings = ["#", "Kid", " and ", " & "]
characters_clean = [character for character in characters_clean if not
                     any(string in character for string in unwanted_strings)]


In [133]:
len(characters_clean)

123

### Edges

The graph will be directed, and an edge will be considered from node A to node B when character A names character B in a dialogue.

In [134]:
#edges = []
edges_clean = pd.DataFrame({"Character A": [], "Character B": []})
for i in range(1, 15):
    edges_clean[f"Weight Ep. {i}"] = []



def Edges_dialogue(dialogue):
    # Variables chars_A and chars_B below are lists of characters involved
    # in the dialogue. Notice that there may be more than one speaker
    # (unison dialogues) and more than one character referenced to 
    # in any dialogue. 
    chars_A = Retrieve_speaker(dialogue).replace(" & ", " and ") 
    chars_A = chars_A.split(" and ") # Split characters in unison dialogues
    chars_A = [character for character in chars_A if character in characters_clean]


    chars_B = [] 
    for character in characters_clean:
        if dialogue.find(character) != -1: # Check whether a particular character is being referenced
            chars_B.append(character)

    if len(chars_B) > 0:
        edges = list(itertools.product(chars_A, chars_B)) # Construct all directed pairs
    else:
        edges = []
    return edges


def Edges_episode(episode_num):
    edges_ep = []
    for dialogue in Transcripts[f"Episode {episode_num}"]:
        edges_ep += Edges_dialogue(dialogue)
    return edges_ep

for i in range(1,15):
    for edge in Edges_episode(i):
        # First, check using a boolean mask whether a particular connection 
        # is already on the edges dataframe
        mask = (edges_clean["Character A"] == edge[0]) & (edges_clean["Character B"] == edge[1])
        if not edges_clean[mask].any(axis = None):
            # If it isn't, then add it to the df
            new_row = pd.DataFrame([{"Character A": edge[0], "Character B": edge[1],
                                      **{f"Weight Ep. {j}": 1 if j == i else 0 for j in range(1, 15)}}])
            edges_clean = pd.concat([edges_clean, new_row], ignore_index=True)
        else:
            # If it is, add to the weight of the particular episode
            edges_clean.loc[mask, f"Weight Ep. {i}"] += 1    


Self-references are usually uninteresing since they come from scene descriptions not actual monologues. Thus we rule those out.

In [135]:
edges_clean = edges_clean.loc[edges_clean["Character A"] != edges_clean["Character B"]]
edges_clean.reset_index(drop = True)

Unnamed: 0,Character A,Character B,Weight Ep. 1,Weight Ep. 2,Weight Ep. 3,Weight Ep. 4,Weight Ep. 5,Weight Ep. 6,Weight Ep. 7,Weight Ep. 8,Weight Ep. 9,Weight Ep. 10,Weight Ep. 11,Weight Ep. 12,Weight Ep. 13,Weight Ep. 14
0,Wanda,Cosmo,7.0,6.0,2.0,1.0,1.0,2.0,3.0,4.0,2.0,11.0,2.0,2.0,2.0,1.0
1,Cosmo,Wanda,8.0,2.0,1.0,0.0,4.0,5.0,4.0,5.0,4.0,9.0,3.0,2.0,2.0,1.0
2,Cosmo,Timmy,6.0,1.0,4.0,2.0,2.0,11.0,5.0,4.0,1.0,2.0,5.0,6.0,3.0,1.0
3,Wanda,Timmy,10.0,6.0,1.0,2.0,0.0,11.0,6.0,4.0,2.0,1.0,4.0,9.0,4.0,4.0
4,Timmy (1/4),Timmy,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
275,Maria,Santa,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
276,Santa,Wanda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
277,Santa,Cosmo,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
278,All Holiday mascots,All,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


In [136]:
edges_clean["Weight Season"] = edges_clean.sum(numeric_only=True, axis = 1)


In [137]:
edges_clean.head(10)

Unnamed: 0,Character A,Character B,Weight Ep. 1,Weight Ep. 2,Weight Ep. 3,Weight Ep. 4,Weight Ep. 5,Weight Ep. 6,Weight Ep. 7,Weight Ep. 8,Weight Ep. 9,Weight Ep. 10,Weight Ep. 11,Weight Ep. 12,Weight Ep. 13,Weight Ep. 14,Weight Season
1,Wanda,Cosmo,7.0,6.0,2.0,1.0,1.0,2.0,3.0,4.0,2.0,11.0,2.0,2.0,2.0,1.0,46.0
2,Cosmo,Wanda,8.0,2.0,1.0,0.0,4.0,5.0,4.0,5.0,4.0,9.0,3.0,2.0,2.0,1.0,50.0
4,Cosmo,Timmy,6.0,1.0,4.0,2.0,2.0,11.0,5.0,4.0,1.0,2.0,5.0,6.0,3.0,1.0,53.0
5,Wanda,Timmy,10.0,6.0,1.0,2.0,0.0,11.0,6.0,4.0,2.0,1.0,4.0,9.0,4.0,4.0,64.0
9,Timmy (1/4),Timmy,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
11,No character,Timmy,4.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0
13,Timmy,All,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,1.0,1.0,5.0
14,Mr. Turner,Mrs. Turner,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,4.0
17,Vicky,Mrs. Turner,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
19,Wanda,Vicky,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0


In [139]:
G = nx.from_pandas_edgelist(edges_clean, "Character A", "Character B", "Weight Season")
nt = net.Network(notebook= True, cdn_resources='remote')
nt.from_nx(G)

nt.show("Fairy Network season 1.html")

Fairy Network season 1.html
