Zipf's Law

The pyzipf package tallies the occurrences of words in text files and plots each word's rank versus its frequency together with a line for the theoretical distribution for Zipf's Law.

Motivation

Zipf's Law is often stated as an observational pattern seen in the relationship between the frequency and rank of words in a text:

"…the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc." — wikipedia

Many books are available to download in plain text format from sites such as Project Gutenberg, so we created this package to qualitatively explore how well different books align with the word frequencies predicted by Zipf's Law.

Installation

pip install pyzipf

Usage

After installing this package, the following three commands will be available from the command line

countwords for counting the occurrences of words in a text.
collate for collating multiple word count files together.
plotcounts for visualizing the word counts.

A typical usage scenario would include running the following from your terminal:

countwords dracula.txt > dracula.csv
countwords moby_dick.txt > moby_dick.csv
collate dracula.csv moby_dick.csv > collated.csv
plotcounts collated.csv --outfile zipf-drac-moby.jpg

Additional information on each function can be found in their docstrings and appending the -h flag, e.g. countwords -h.

Contributors

Amira Khan <amira@zipf.org> @amira-khan
Sami Virtanen <sami@zipf.org> @sami-virtanen

Contributing

Interested in contributing? Check out the CONTRIBUTING.md file for guidelines on how to contribute. Please note that this project is released with a Contributor Code of Conduct (CONDUCT.md). By contributing to this project, you agree to abide by its terms. Both of these files can be found in our GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
data		data
docs		docs
pyzipf		pyzipf
results		results
test_data		test_data
.gitignore		.gitignore
.travis.yml		.travis.yml
CITATION.md		CITATION.md
CONDUCT.md		CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
KhanVirtanen2020.md		KhanVirtanen2020.md
LICENSE.md		LICENSE.md
Makefile		Makefile
README.rst		README.rst
environment.yml		environment.yml
readthedocs.yml		readthedocs.yml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
requirements_docs.txt		requirements_docs.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zipf's Law

Motivation

Installation

Usage

Contributors

Contributing

About

Releases 1

Packages

Contributors 3

Languages

License

amira-khan/zipf

Folders and files

Latest commit

History

Repository files navigation

Zipf's Law

Motivation

Installation

Usage

Contributors

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages