Skip to content

ayushxx7/Zipfs_Law

Repository files navigation

Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table.

I was made aware of this law because of Vsauce https://www.youtube.com/watch?v=fCn8zs912OE and was quite frankly baffled and mindf*ked beyond thought.

Once I was certain it was true for so many things, I decided to have my own version for verifying it. Enter South Park. I realised that South Park was probably really good source to do data analysis on and so I did. I used subs for the series from Season one to eighteen as my raw input.

SRT files are read, punctuations are removed and so are the [br] breaks in the subs (specific to my download I guess)

A csv file is made from the data using Counter function.

This csv file is used to plot a normal graph and a log graph of the frequency table from the csv.

Happy to say, the law is valid :) and I was able to verify it along with a learning a lot about regex, matplotlib, csv reader and the ease of Python as a whole :)

Releases

No releases published

Packages

No packages published

Languages