Skip to content

chriskempson/japanese-subtitles-word-kanji-frequency-lists

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Japanese Subtitles Word & Kanji Frequency Lists

A word frequency and kanji frequency list derived from subtitles from Japanese drama, anime and films.

The data set was comprised of 12,277 subtitle files taken from https://github.com/Matchoo95/JP-Subtitles. The frequeny lists were generated with JParser and cb's Japanese Text Analysis Tool.

Format of Word Frequency Report:

  • Field 1: Number of times word was encountered
  • Field 2: Word
  • Field 3: Frequency Group
  • Field 4: Frequency Rank
  • Field 5: Percentage (Field 1 / Total number of words)
  • Field 6: Cumulative percentage
  • Field 7: Part-of-speech

Format of Kanji Frequency Report:

  • Field 1: Number of times kanji was encountered
  • Field 2: Kanji
  • Field 3: Frequency Group
  • Field 4: Frequency Rank
  • Field 5: Percentage (Field 1 / Total number of kanji)
  • Field 6: Cumulative percentage

About

A word frequency list derived from subtitles from Japanese drama, anime and films.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published