Zipf's Law in the Greek language
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


A script to fetch data, process them, and make word lists. Manipulate the lists to find word frequencies and sort according to rank. Calculate related data to prove and hold Zipf's Law for the Greek language. Create related graph-plot.


  • Total documents: 10.000
  • Total words: 4.984.085
  • Vocabulary size: 174.258 (unique words)
  • Words occuring more than 10 times: 31.133
  • Words occuring once: 70.247
  • Final b for all words is -1.06015791025300522471

Review Paper

Grab the paper here or read it online here

Plot Graph



usage: retrieve options

	Options are:
		-a	all, same as -t -m -b -g [Note: no fetch]
		-f	fetch files
		-t	tokenize files
		-m	sort and map tokens - get rank and freq
		-b	calculate b
		-g	create graph plot
		-h	help, print this help message


Results are placed on /tmp/zipf/results

  • Holds all words and their frequency sorted by their rank
  • Report-like file. Holds all words, their frequency, their relational frequency, each word's b-value (also refered as 'a') sorted by the word's rank, including the b-value average rate.
  • rank.freq.plot Includes just the values (rank and frequency) fed to the graph.
  • zipf_plot_greek.png The graph image


Collected data are placed on /tmp/zipf/dumpfiles
Processed data are placed on /tmp/zipf/tokens
Data are collected from the Greek Wikipedia using it's random page generator. The script currently collects 10.000 random pages as default.


Dependencies include

  • elinks An advanced and well-established feature-rich text mode web browser.
  • gnuplot a portable command-line driven graphing utility for linux, OS/2, MS Windows, OSX, VMS, and many other platforms


Zipf Law for the Greek Language by Ivan Kanakarakis is licensed under GNU GPLv3 license.