Skip to content
Code and data for extracting co-occurrence networks from Shakespeare's plays
Python HTML Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
Networks Deleted old networks Jun 13, 2017
index.html Basic visualization script Jun 13, 2017 Better scaling; added threshold support Jun 17, 2017

This repository contains the code used to extract co-occurrence networks from a tagged corpus of Shakespeare's plays.

The networks have been analysed using persistent homology, a technique from computational topology. Please refer to our paper

Shall I compare thee to a network? – Visualizing the Topological Structure of Shakespeare's Plays

for more details.


  • The folder Corpus contains the original corpus that was used to calculate co-occurrence networks. Additional information about the amount of speech between certain characters has been added. Please refer to for the original data.
  • The folder Networks contains the co-occurrence networks for all the plays that we used in the paper. Networks are categorized into speech-based and time-based filtrations. Please refer to the paper for more details.
  • The folder Plays contains the corrected variants of the plays, sorted into three broad categories.


The main script is called Given the filename of a tagged play, it automatically produces a co-occurrence network using the speech-based filtration we described in the paper. The network will be stored in the current directory. To batch-process all networks automatically, you could for example use:

find ./Plays/ -name "*.txt" -exec ./ {} \;

This traverses the folder Plays and executes the extraction script for every file. If you want the time-based filtration instead, use the parameter -t, i.e.:

find ./Plays/ -name "*.txt" -exec ./ {} -t \;

Again, this will result in a set of networks. Note that all existing networks will be overwritten in the current folder.


A demo of all the extracted networks is available. The demo uses a simple force-directed graph layout to visualize the network.


The data and the code is are released under an MIT licence. Please refer to the file LICENSE for more information.

You can’t perform that action at this time.