Skip to content

JoFrhwld/zipf_by_vowels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Zipfs Law and Segmenting by Vowels

The python script seg_by_vowel.py segments the Brown Corpus into chunks based on a few different delimiters.

  • space
  • a, e, i, o and u

The space delimiter chunks the Brown corpus into pieces equivalent to orthographic words. The vowel delimiters chunk the corpus into non-word sequences that even include whitespace characters.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published