# Network Analysis
- Take the Shakespeare play you've previously been analyzing and construct a network visualization
- If you've been analyzing Macbeth, choose another play -- I will ask you to re-do the assignment if you use the same example from the lecture notebooks
- You are welcome to use any or all of NetworkX, Bokeh, or Dash Cytoscape
- You are also free to determine the information from the play that you use for nodes and edges
- You must clarify (in markdown cells) what information is contained in the graph and how you are measuring it
- Also include a description (via markdown cells) of the network's density and the "most important" nodes
- You can choose what "most important" means, but use a quantitative metric and include this metric's value(s) in your description
- Tailor the graph's aesthetics to enhance the visualization

## Setting up libraries and Dash

In [1]:
import dash
import dash_cytoscape as cyto
from dash import html, dcc
from dash.dependencies import Input, Output
from jupyter_dash import JupyterDash
from jupyter_dash.comms import _send_jupyter_config_comm_request

In [2]:
_send_jupyter_config_comm_request()

In [37]:
JupyterDash.infer_jupyter_proxy_config()

In [38]:
with open('romeo-and-juliet.txt') as f:
    x = f.read()

## Cleaning text

I first removed the publisher notes from the end of the text, then proceeded to remove the prologue from the beginning of the text to extract only the Acts of the play.

In [39]:
removed_publisher = x.split('THE END')[0]
acts = removed_publisher.split('ACT')[1:6]

In [40]:
len(acts)

5

## Setting up constants and dictionaries

In [72]:
characters = [
    'Escalus',
    'Paris',
    'Montague',
    'Capulet',
    'An old Man',
    'Romeo',
    'Tybalt',
    'Mercutio',
    'Benvolio',
    'Tybalt',
    'Friar Laurence',
    'Friar John',
    'Balthasar',
    'Abram',
    'Sampson',
    'Gregory',
    'Peter',
    'Lady Montague',
    'Lady Capulet',
    'Juliet',
    'Nurse to Juliet'
]

In [73]:
connections = {}
for i in range(len(characters)-1):
    for j in range(i+1,len(characters)):
        connections[(characters[i],characters[j])] = 0

In [74]:
charnum = {}
for k in characters:
    charnum[k] = 0

## Processing text
_Dictionary format -- connections : { character: # of times appeared }_


The for loops go through every scene in all the acts, then look at every character and count the number of times that character appears in that scene.

In [75]:
for act in acts:
    for scene in act.split("Scene"):
        dict = {}
        for char in characters:
            if scene.count(char) != 0:
                dict[char] = scene.count(char)
                charnum[char] += scene.count(char)
        for a in range(len(dict)-1):
            for b in range(a+1,len(dict)):
                if dict[list(dict.keys())[a]] <= dict[list(dict.keys())[b]]:
                    connections[(list(dict.keys())[a],list(dict.keys())[b])] += dict[list(dict.keys())[a]]
                else: 
                    connections[(list(dict.keys())[a],list(dict.keys())[b])] += dict[list(dict.keys())[b]]

Removing dictionary entries that are zero.

In [76]:
charnum = {name:count for name,count in charnum.items() if count != 0}
connections = {(source,target):weight for (source,target),weight in connections.items()
              if weight != 0}

Convert the dictionaries into graph items, using a correction factor for clarity.

In [77]:
graphitems = []

for k,v in charnum.items():
    dashnode = {'data': {'id': k,
                         'label': k.title(),
                         'size': str(v)}}
    graphitems.append(dashnode)

correction_factor = max(connections.values())

for k,v in connections.items():
    if v != 0:
        dashedge = {'data': {'source': k[0],
                             'target': k[1],
                             'weight': v/correction_factor}}
        graphitems.append(dashedge)

In [78]:
app = JupyterDash(__name__)

app.layout = html.Div([
    cyto.Cytoscape(
        layout={'name': 'cose'},
        elements=graphitems,
        style={'width': '100%', 'height': '750px'},
        stylesheet=[
            {
                'selector': 'node',
                'style': {
                    'content':'data(label)',
                    'text-halign':'center',
                    'text-valign':'center',
                    'width': 'data(size)',
                    'height': 'data(size)',
                    'font-size':5,
                    'color': 'black',
                    'background-color': '#30bf7a',
                    'text-outline-color': 'white',
                    'text-outline-width': 0.2,
                    'shape':'circle'
                }
            },
            {
                'selector':'edge',
                'style': {
                    'width':'data(weight)',
                    'line-color': '#f5b236',
                  }
            },
        ]
    )
])

app.run_server(debug=True)

Dash app running on https://jupyter.idre.ucla.edu/user/anhmvc@ucla.edu/proxy/8050/


## Notes and Comments on Network Visualization

- Every node represents **a character** and its size represents **the number of times they appear in the play**.
- Every edge represents **the number of times any two characters appear together in a scene**.
- For example: string "romeo fights tybalt so that juliet can fall in love with romeo", the node "romeo" appears 2 times, the node "tybalt" and "juliet" each appears once, and the edge (romeo, juliet) appears 2 times, edge (juliet, tybalt) and edge (romeo, tybalt) appear once.

Regarding the network density, when looking at the visualization:
- The most dominant nodes are "Romeo" and "Tybalt", followed by "Juliet", "Capulet", "Montague", "Mercutio", and "Paris". 
- The thickest edges are between "Romeo" & "Tybalt", and "Romeo" & "Juliet".
- It should be noted that Tybalt actually appears in the play a lot more than Juliet, this may give rise to the not-so-surprising gender inequality during the times that the play was written. It highlights especially that even though Juliet was supposed to be the main character along with Romeo, she doesn't get much lines and presence despite her role in the story.