Skip to content

DavidBuchanan314/english-letter-freqs

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

english-letter-freqs

Useful generation scripts and precomputed LUTs useful for performing frequency analysis on English text.

Whenever writing some frequency analysis code, I am always frustrated by the lack of code-friendly copy-paste-able tables.

Note: The tables below are a derivative work of the corpus, Alice In Wonderland, which is licensed under the Project Gutenberg License. The license terms are rather permissive, but make sure you read it before incorporating these tables into your code. My table generation code is licensed under the MIT license, so feel free to run it on your own corpus.

The two data structures below represent the same data in different formats.

byte_freqs = [
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.022298, 0.000000, 0.000000, 0.022298, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.166921, 0.002692, 0.000806, 0.000006, 0.000012, 0.000006, 0.000000, 0.017219,
	0.000454, 0.000454, 0.000525, 0.000000, 0.015315, 0.004441, 0.007198, 0.000143,
	0.000131, 0.000406, 0.000072, 0.000078, 0.000060, 0.000078, 0.000042, 0.000036,
	0.000060, 0.000066, 0.001528, 0.001158, 0.000000, 0.000000, 0.000000, 0.001206,
	0.000012, 0.004309, 0.000746, 0.001116, 0.001444, 0.001868, 0.000800, 0.001152,
	0.001844, 0.004972, 0.000078, 0.000525, 0.000943, 0.001325, 0.001086, 0.001397,
	0.001027, 0.000507, 0.001265, 0.001725, 0.003408, 0.000663, 0.000310, 0.001528,
	0.000036, 0.000848, 0.000006, 0.000024, 0.000000, 0.000024, 0.000000, 0.000024,
	0.000000, 0.054212, 0.009675, 0.016813, 0.031203, 0.090035, 0.013417, 0.016419,
	0.045247, 0.046572, 0.001325, 0.007174, 0.030159, 0.013399, 0.046978, 0.055173,
	0.010719, 0.000806, 0.038198, 0.041666, 0.069420, 0.023080, 0.005437, 0.016091,
	0.001015, 0.014575, 0.000472, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000,
	0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000
]
char_freqs = {
	'\n':	0.022298,
	'\r':	0.022298,
	' ':	0.166921,
	'!':	0.002692,
	'"':	0.000806,
	'#':	0.000006,
	'$':	0.000012,
	'%':	0.000006,
	"'":	0.017219,
	'(':	0.000454,
	')':	0.000454,
	'*':	0.000525,
	',':	0.015315,
	'-':	0.004441,
	'.':	0.007198,
	'/':	0.000143,
	'0':	0.000131,
	'1':	0.000406,
	'2':	0.000072,
	'3':	0.000078,
	'4':	0.000060,
	'5':	0.000078,
	'6':	0.000042,
	'7':	0.000036,
	'8':	0.000060,
	'9':	0.000066,
	':':	0.001528,
	';':	0.001158,
	'?':	0.001206,
	'@':	0.000012,
	'A':	0.004309,
	'B':	0.000746,
	'C':	0.001116,
	'D':	0.001444,
	'E':	0.001868,
	'F':	0.000800,
	'G':	0.001152,
	'H':	0.001844,
	'I':	0.004972,
	'J':	0.000078,
	'K':	0.000525,
	'L':	0.000943,
	'M':	0.001325,
	'N':	0.001086,
	'O':	0.001397,
	'P':	0.001027,
	'Q':	0.000507,
	'R':	0.001265,
	'S':	0.001725,
	'T':	0.003408,
	'U':	0.000663,
	'V':	0.000310,
	'W':	0.001528,
	'X':	0.000036,
	'Y':	0.000848,
	'Z':	0.000006,
	'[':	0.000024,
	']':	0.000024,
	'_':	0.000024,
	'a':	0.054212,
	'b':	0.009675,
	'c':	0.016813,
	'd':	0.031203,
	'e':	0.090035,
	'f':	0.013417,
	'g':	0.016419,
	'h':	0.045247,
	'i':	0.046572,
	'j':	0.001325,
	'k':	0.007174,
	'l':	0.030159,
	'm':	0.013399,
	'n':	0.046978,
	'o':	0.055173,
	'p':	0.010719,
	'q':	0.000806,
	'r':	0.038198,
	's':	0.041666,
	't':	0.069420,
	'u':	0.023080,
	'v':	0.005437,
	'w':	0.016091,
	'x':	0.001015,
	'y':	0.014575,
	'z':	0.000472
}

About

Useful generation scripts and precomputed LUTs useful for performing frequency analysis on English text.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages