Skip to content
A Simpsons dataset, so that you can filter out clip show episodes
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci
characters fix some character keys Jan 14, 2019
episodes Merge pull request #19 from colinxfleming/s5 Aug 11, 2019
.gitignore add a gitignore Feb 10, 2018
README.md
compile_data.py turn empty arrays to []s instead of Nones so we can typecheck Jan 14, 2019
remaining_report.py
requirements.txt Add working compiler Feb 10, 2018
simpsons_data.json Recompile data Aug 11, 2019
test_build.py report on stuff... but at what cost? new qa Jan 14, 2019

README.md

A dataset of Simpsons episodes

Please enjoy this dataset of Simpsons data, scraped from SimponsWorld.

Contents

This contains episodes and characters split out into yaml files, which gets boiled down to a json file, which gets loaded into Simpsons Optimizer.

Contributing

  • Change episode or character in the yaml files
  • Open pull request

Data model

Episodes are shaped as follows:

title: String. Episode title.
season: Integer. Season number.
episode: Integer. Episode number.
description: String. Episode description from SimpsonsWorld.
simpsonsworld_id: BigInt. Episode video identifier from SimpsonsWorld.
good: Boolean. Indicator of whether or not the episode is bad.
characters: Array of strings. Strings are character short_names. 

Characters are shaped as follows:

short_name: String. Lowercase common name or nickname, unique reference key.
name: String. Full name.

Episode coverage

There's a lot of episodes and seasons to go through and mark the good ones. Here's what's done so far:

  • Season 1: Episodes cataloged; Characters cataloged
  • Season 2: Episodes cataloged; Characters cataloged
  • Season 3: Episodes cataloged; Characters cataloged
  • Season 4: Episodes cataloged; Characters uncataloged
  • Season 5: Episodes uncataloged; Characters uncataloged
  • Season 6: Episodes cataloged; Characters uncataloged
  • Season 7: Episodes cataloged; Characters uncataloged
  • Season 8: Episodes cataloged; Characters uncataloged
  • Season 9: Episodes cataloged; Characters uncataloged
  • Season 10: Episodes cataloged; Characters uncataloged
  • Season 11: Episodes cataloged; Characters uncataloged
  • Season 12: Episodes cataloged; Characters uncataloged
  • Season 13: Episodes cataloged; Characters uncataloged
  • Season 14: Episodes cataloged; Characters uncataloged
  • Season 15: Episodes cataloged; Characters uncataloged
  • Season 16: Episodes uncataloged; Characters uncataloged
  • Season 17: Episodes uncataloged; Characters uncataloged
  • Season 18: Episodes uncataloged; Characters uncataloged
  • Season 19: Episodes uncataloged; Characters uncataloged
  • Season 20: Episodes uncataloged; Characters uncataloged
  • Season 21: Episodes uncataloged; Characters uncataloged
  • Season 22: Episodes uncataloged; Characters uncataloged
  • Season 23: Episodes uncataloged; Characters uncataloged
  • Season 24: Episodes uncataloged; Characters uncataloged
  • Season 25: Episodes uncataloged; Characters uncataloged
  • Season 26: Episodes uncataloged; Characters uncataloged
  • Season 27: Episodes uncataloged; Characters uncataloged
  • Season 28: Episodes uncataloged; Characters uncataloged
  • Season 29: Episodes uncataloged; Characters uncataloged

Usage

python compile_data.py spits out a json file with keys characters and episodes.

There are current versions of these in the base directory for your convenience.

Suggestions welcome

Please put in issues if you have anything you'd like to add.

You can’t perform that action at this time.