No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
File Outputs
Main Code


This repository contains code used to explore the New York Public Library Ensemble website:

The repository contains four folders. The Main Code folder contains the Python code files. The File Outputs folder contains the files output by the main Python code files. The Pairings folder contains the code and files output used to match directors with actors, directors with headliners, playwrights with actors, playwrights with headliners, and producers with headliners. The Visualizations folder contains all of the files used to create the visualizations for the project.

In order to run the main code, download the Python library Beautiful Soup.

Main Code – Gets links to individual playbill pages using Beautiful Soup and outputs playbill_links.json – Extracts data from individual playbill pages and outputs playbill_page_data.json – Counts each metadata field and outputs counts.json. Also outputs each metadata fields to its own CSV file: headliners.csv, show_titles.csv, show_dates.csv, theater_names.csv, locations.csv, production_staff.csv, cast_members.csv, advertisements.csv. – Combs each playbill page and zips together the dictionaries for show title and type, production staff role and name, actor and character, advertisement company and address – Extracts only the ad types from individual playbill pages and outputs ad_types.json – Counts types of ads and outputs ad_types_counts.csv and ads_dump.json

File Outputs playbill_links_cleaned.json - playbill_links.json contains five links for which there was no corresponding JSON file. These five links have been removed in playbill_links_cleaned.json, because this project was a proof of concept exercise and did not require total comprehensiveness. The five links are: