Zipfs Law and Segmenting by Vowels

The python script seg_by_vowel.py segments the Brown Corpus into chunks based on a few different delimiters.

space
a, e, i, o and u

The space delimiter chunks the Brown corpus into pieces equivalent to orthographic words. The vowel delimiters chunk the corpus into non-word sequences that even include whitespace characters.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
plot_zipf.R		plot_zipf.R
seg_by_vowel.py		seg_by_vowel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

plot_zipf.R

plot_zipf.R

seg_by_vowel.py

seg_by_vowel.py

Repository files navigation

Zipfs Law and Segmenting by Vowels

About

Releases

Packages

Languages

JoFrhwld/zipf_by_vowels

Folders and files

Latest commit

History

Repository files navigation

Zipfs Law and Segmenting by Vowels

About

Resources

Stars

Watchers

Forks

Languages