An application to extract dialog from Shakespeare plays, as encoded into TEI by the Folger Library.
Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
character-lists
README.md
characters.txt
ham.xml
mac.xml
oth.xml
parse.py

README.md

#About This program accepts a list of characters and outputs those characters' dialog to standard output. This was designed to extract dialog from Shakespeare plays marked up in TEI XML, but could be adapted for other authors. Unless a character list filename is specified by the -c option, it looks for a file in the current working directory called characters.txt. It can accept any number of XML files, so you can run it on, say, all of Shakespeare's plays at once, using your operating system's wildcard (*).

#Usage From the command-line, run:

python parse.py [-c characters.txt] file1.xml [file2.xml] 

#Examples To get all of Ophelia's speech from the play Hamlet, and store the dialog in a file called opelia-dialog.txt, run:

echo "Ophelia" > characters.txt
python parse.py ham.xml > ophelia-dialog.txt

To get all the dialog from all of Shakespeare's fools, and store it in a file called fools-dialog.txt, first make sure your directory contains XML files for all of Shakespeare's plays, then run:

echo "Touchstone,kil-fool.,Trinculo,Costard,Feste,Launcelot,aww-clo.,tim-fool.,Bottom,Thersites,oth-clo.,DromioS,DromioE,Speed,Launce,ham-sec.-clo.,ham-first-clo.,juc-cit1.,juc-cit2.,juc-cit3.,juc-cit4.,juc-cits.,PompeyBum,win-clo.,Grumio,mac-porter.,mac-port.,Peter1,Cloten,Falstaff1" > characters.txt
python parse.py *.xml > fools-dialog.txt

#Characters.txt Your characters.txt file should contain a comma-separated list of the XML IDs for all the characters whose dialog you want to extract. Newlines and spaces are OK, since the program will ignore those. See example files in this directory.

#Contents of this Directory

  • parse.py: The python script that extracts the dialog.
  • characters.txt: This is just a sample characters.txt file showing the ways you can list the characters whose dialog you want to extract. Use the XML IDs for character names.
  • ham.xml, mac.xml, oth.xml: Sample TEI XML Shakespeare plays.
  • /character-lists: This directory contains sample character list files.

#License This software is released under the GPLv3.