Skip to content

A dataset for textual analysis on arguably the best written comedy television show ever.

Notifications You must be signed in to change notification settings

4m4n5/the-seinfeld-chronicles

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A dataset for textual analysis on arguably the best written comedy television show ever.


Context

Dataset for people who love data science and Seinfeld.


Content

  • Details about all the episodes.
  • Includes attributes like Director, Episode Name, Air Date etc...
  • Complete Scripts of all the episodes.

Upcoming Update will Include :

  • Stage locations and cast

Data Source

The data is scraped from the fan website http://www.seinology.com/.


Possible Explorations

  • Train language models on the corpus.
  • Compare the vocabulary with other works on television, film or literature.
  • Find corellation between language complexity and popularity.
  • Train models to generate scripts based on the data.
  • Analyze obscure wods used in the vocabulary of the series.

These are just basic examples, sky is the limit.


Acknowledgements

The data has been crawled from the http://www.seinology.com/ website.


Contributing

Changes and Improvement suggestions are welcome. Feel free to comment new additions that you think are useful or drop a PR on the github project.

Wanna buy me coffee - paypal.me/AShrivastava961

About

A dataset for textual analysis on arguably the best written comedy television show ever.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published