Skip to content


Repository files navigation

This repository contains the code used to extract co-occurrence networks from a tagged corpus of Shakespeare's plays.

The networks have been analysed using persistent homology, a technique from computational topology. Please refer to our paper

Shall I compare thee to a network? – Visualizing the Topological Structure of Shakespeare's Plays

for more details.


  • The folder Corpus contains the original corpus that was used to calculate co-occurrence networks. Additional information about the amount of speech between certain characters has been added. Please refer to for the original data.
  • The folder Networks contains the co-occurrence networks for all the plays that we used in the paper. Networks are categorized into speech-based and time-based filtrations. Please refer to the paper for more details.
  • The folder Plays contains the corrected variants of the plays, sorted into three broad categories.


The main script is called Given the filename of a tagged play, it automatically produces a co-occurrence network using the speech-based filtration we described in the paper. The network will be stored in the current directory. To batch-process all networks automatically, you could for example use:

find ./Plays/ -name "*.txt" -exec ./ {} \;

This traverses the folder Plays and executes the extraction script for every file. If you want the time-based filtration instead, use the parameter -t, i.e.:

find ./Plays/ -name "*.txt" -exec ./ {} -t \;

Again, this will result in a set of networks. Note that all existing networks will be overwritten in the current folder.


A demo of all the extracted networks is available. The demo uses a simple force-directed graph layout to visualize the network.


The data and the code is are released under an MIT licence. Please refer to the file LICENSE for more information.