# Introduction
In Part III and Part IV, the text in the seven Books were analyzed and presented as lineplots and wordclouds respectively. 

For Part III, we looked at the sentiment of the sentences found in the text and Part IV examined the frequency and prominence of strings/characters with wordclouds. 

In this final Part, we use the text data to chart relationships between characters with network analysis. 

This will be done via:
- Loading the DataFrame from Part I
- Scraping Wikipedia to obtain a list of characters
- Building a list of co-occuring characters for Book 1
- Drawing a network graph based on the list
- Repeating the steps for Books 2-7
- Creating a GIF to show how the networks evolve over time

### Step 1: Import libraries
Before we start, let's import the following:
- pandas as pd
- matplotlib.pyplot as plt
- networkx as nx
- sent_tokenize from ntlk

In [None]:
# Step 1: Import libraries

### Step 2: Read the CSV from Part I
You know the drill - load the CSV that you got at the end of Part I.

In [None]:
# Step 2: Read the CSV into a DataFrame

## Obtain a list of characters
Before we proceed with any analysis, we have to obtain a list of Harry Potter characters in the book universe.

We can either:
- manually list the characters
- scrape the list from somewhere
- acquire the list from somewhere

We'll look into scraping the list of characters, because who has the time for the first and third method? 

### Step 3: Implement a Wikipedia scraper
Don't worry, you won't implement from scratch.

In fact, you can head on straight to this <a href="https://github.com/motizukilucas/harry-potter-wikipedia-scraper/blob/master/hp_character_name_scraper.py">link</a> to copy the entire code for scraping.

A few things to note:
1. Remove .encode('utf-8') at line 18
2. Comment out lines 31-34, i.e. we don't need to write our list of characters out

You'll end up with something like this:

![ListOfAllHPCharacters](https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectHarryPotter/ListOfAllHPCharacters.png)

In [None]:
# Step 3: Implement the code in the Github repo

### Step 4: Get a set of unique name strings
The full names won't appear often, so we'll have to split them into individual names, e.g., ["Hannah Abbott"] into ["Hannah", "Abbott"]. 

Here are the steps:
- Split the strings in the list into individual strings 
- Replace "(Mad-Eye)" to "Mad-Eye" in the list
- Turn the list into a set to eliminate duplicates

Your set should have only 245 items.

In [None]:
# Step 4: Get set of names, and don't forget to change Mad-Eye

## Network analysis preparation for one text
Now we have all the pieces in place, we can now create a network for the first book.

![NetworkAnalysisApproach](https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectHarryPotter/NetworkAnalysisApproachExample.png)

In our project, we define characters to co-occur when they are found in the same sentence.

In the following approach, these are the steps:
1. <font color='red'>Split a full text into sentences</font>
2. <font color='green'>Loop through each word in the sentence and detect whether the name is in the list of names</font>
3. <font color='blue'>Create a list containing the names that co-occured in the sentence</font>

### [Optional] Read the reading
A highly recommended step - this resource will be immensely helpful in implementing Step 7: https://towardsdatascience.com/populating-a-network-graph-with-named-entities-fb8e3d2a380f

### Step 5: Get the text from the first Book
Declare a variable that contains the full text from the first book.

In [None]:
# Step 5: Get Book 1 text

### Step 6: Tokenize the text into sentences
Use sent_tokenize from the nltk library to split the full text into sentences.

You will end up with a list of sentences.

In [None]:
# Step 6: Split the full text into sentences

### Step 7: Extract entities from the sentences
Now that we have a list of sentences, loop through each sentence in the text to detect whether the string is a name found in the list of characters.

![BookOneEntities](https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectHarryPotter/BookOneEntities.png)

The "Extract Entities" section in the reading above is immensely helpful, with a few modifications.

<details>
    <summary><strong>Click here once if you're stuck and need pseudocode</strong></summary>
    <ol>
        <li>Declare an empty list (List 1)</li>
        <li>Use a for loop to loop through the sentences from Step 6. In each loop:</li>
        <ul>
            <li>Take the current sentence, and split it based on " ", getting a list of words</li>
            <li>Declare an empty list (List 2)</li>
            <li>Use a for loop to loop through the list of words. In each loop:</li>
            <ul>
                <li>Check if the current word is in the list of names from Step 4</li>
                <ul>
                    <li>Append the word in List 2 if the current word is</li>
                </ul>
            </ul>
            <li>Append List 2 into List 1</li>
        </ul>
    </ol>
</details>    


<details>
    <summary><strong>Click here once for another hint from the reading above/strong></summary>
    <img src = 'https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectHarryPotter/EntityExtractionHint2.png'>
<details>

In [None]:
# Step 7: Extract entities from the sentences

### Step 8: Remove short and empty lists
Now that you have your list of lists of names, it's time to remove lists that are either empty or has only one item inside.

E.g., [] will be removed, and ["Harry"] will be removed as well.

![CleanedEntitiesBookOne](https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectHarryPotter/CleanedEntitiesBookOne.png)

If all goes well, your list should contain only lists that have more than or equals to two items inside.

In [None]:
# Step 8: Clean your list of entities

## Creating a network graph from the prepared lists
That was tough. Now, let's construct a graph and then plot it. 

The "Create a Network Graph" section in the reading above will be a very useful resource to follow.

### Step 9: Create a Graph object
We will need to create a Graph object first before we can start adding nodes and edges.

Declare a variable that contains a Graph object.

In [None]:
# Step 9: Create a Graph object

### Step 10: Add nodes into the graph
Loop through the list you obtained from Step 8, and add them as nodes into your Graph.

You will need to call the .add_nodes_from method from the Graph object. Again, the reading is a useful resource.

In [None]:
# Step 10: Add nodes into the graph

### Step 11: Add edges into the graph
After the nodes are added, time to add edges to the graph. 

In general, the process of adding edges between node 1 and node 2 is something like this:

```
Graph.add_edges_from([node_1], [node_2])

```
There's a code snippet in article that will be immensely helpful. 

In [None]:
# Step 11: Add edges to the graph

### Step 12: Plot the network graph
Now that you have both the edges and the nodes, it's time to plot it! 

We'll draw our graph in a concentric circle, so we'll use nx.draw_shell. A few parameters to note:
- node_size = 50
- with_labels = True
- verticalalignment = 'bottom'
- font_color = 'red'
- font_size = 15

For best viewing, make sure you configure the plot size as well to something like (16,16), and don't forget to add a title too.

We'll start with these parameters, and we'll add more parameters later.

In [None]:
# Step 12: Plot the network graph

<details>
    <summary><strong>Click once to see what we got</strong></summary>
    <img src = "https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectHarryPotter/BookOneNetworkGraphV1.png">
</details>

### Step 13: Add weights to the edges
The network looks good, i.e. it maps the relationship between characters. 

However, there's no weight to the relationship, i.e. how strong the relationship is. 

We'll need to <font color='red'><strong>repeat Steps 9-11, and modify Step 11</strong></font> by adding four new lines to add weights into the edges.

To add weights to edges, you can do this:
```
G[node_1][node_2]['weight'] = some_value
```

To update the weights in edges, you can do this:
```
G[node_1][node_2]['weight'] += some_value_2
```
A few considerations:
- use a try/except block
- initialize the weights to be 1
- subsequently, every weight update for existing pairs is 0.2

In [None]:
# Step 13a: Create a new Graph object

In [None]:
# Step 13b: Add nodes to the Graph

In [None]:
# Step 13c: Add edges to the nodes, with a try/except block to add weights

<details>
    <summary><strong>Click here once if you're stuck with Step 13c</strong></summary>
    <p>We don't usually do this, i.e. give you code directly, but after the line on .add_edges_from, immediately add the code below. Replace node_1 and node_2 with other things:</p>
    <p style="font-family: monospace">try:</p>
    <p style="font-family: monospace">&nbsp;&nbsp;&nbsp;&nbsp;G[node_1][node_2]['weight'] += 0.2</p>
    <p style="font-family: monospace">except:</p>
    <p style="font-family: monospace">&nbsp;&nbsp;&nbsp;&nbsp;G[node_1][node_2]['weight'] = 1</p>    
</details>

### Step 14: Check weights between nodes
Let's check if the addition of weights went well.

Retrieving the weight of edges is relatively straightforward:
```
G[node_1][node_1]['weight']
```
Check the weight of the edge between "Harry" and "Hagrid". You should get 8.6. If you get 8.600000001 it's ok as well.

In [None]:
# Step 14: Check the weight between Harry and Hagrid nodes

### Step 15: Get all weights from the edges
If Harry and Hagrid's weight seems fine, let's retrieve all of the nodes' edges' weights.

To do this, use NetworkX's .get_edge_attributes method, with the Graph object and 'weight' as parameters.

![BookOneEdgeWeights](https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectHarryPotter/BookOneEdgeWeights.png)

You will get a dictionary containing the node pairs and the respective edge weights.

In [None]:
# Step 15: Get all weights from the edges

### Step 16: Get values only from the weight dictionary
Currently, your weights are still in the form of:
```
weights = {(node_1, node_2): weight_1, (node_3, node_4): weight_2}, ... }
```
Retrieve the values from the dictionary with the .values method, and turn the values into a list using the <font color='green'>list</font> function.

In [None]:
# Step 16: Get a list of weights

### Step 17: Repeat Step 12 to plot the network graph with weights
Repeat Step 12 to plot the network graph, but with the weights by adding the following parameter:
- width = the_list_you_got_from_Step_16

In [None]:
# Step 17: Plot network graph with weights

<details>
    <summary><strong>Click once to see what we got</strong></summary>
    <img src = "https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectHarryPotter/1.png">
</details>

## Network analysis for all Books
Now that you've figured out how to plot the network graph of one Book, it's time to do the rest.

### Step 18: Repeat Steps 5-17 for all Books
Loop through all the Steps that you did for Book 1, and stitch everything together. 

To recap, each Step goes like this:
- Tokenize the full text into sentences
- Search for characters in each sentence
- Clean the list of list of characters
- Create a Graph object
- Add nodes
- Add edges with weights
- Plot Graph

[Optional] After plotting, use a .savefig from plt to save your graph as well. With the seven graphs, we can create a gif to see how the network evolves throughout the books.

In [None]:
# Step 18: Loop through all the Books and plot a Graph       
    
    # Optional: add a .savefig method call to save your image
    

### [Optional] Create a GIF of the seven images
If you saved the images of the networks in your folder, you can stitch them into a GIF! 

We'll have to:
- import imageio
- create the gif using the seven images
- slow down the gif (fps=0.75)

Resources for reference: 
- https://towardsdatascience.com/basics-of-gifs-with-pythons-matplotlib-54dd544b6f30
- https://stackoverflow.com/questions/43160619/speed-up-existing-gif-with-imageio-python

In [None]:
# Import imageio

In [None]:
# lmplement the gif creator (first reading)

In [None]:
# Slow down the gif, with fps=0.75 (second reading)

<details>
    <summary><strong>Click here once to see our GIF</strong></summary>
    <img src = "https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectHarryPotter/mygif_slow.gif">
</details>

# The End
We're finally done with the project series! 

In the Part, we created not just one but seven different network graphs. We also had the chance to create a GIF to see how the networks evolve over time.

For the project recap, you have:
1. Collected and cleaned text from seven books
2. Calculated metrics on the books for visualization
3. Performed sentiment analysis on the texts 
4. Created wordclouds on the texts
5. Charted network graphs of all of the characters

We hope this project series has UpLevelled you and your skills.

Whatever you learn here is but a tip of the iceberg, and launchpad for bigger and better things to come.

Come join us in our Telegram community over at https://bit.ly/UpLevelSG and our Facebook page at https://fb.com/UpLevelSG

<strong>Most importantly, UpLevel won't be what it is today without learners like yourself so help us grow by spreading the word and get more subscribers <font color='red'><3</font></strong>