Skip to content

HSK Character Profiler is a Python tool that analyzes Chinese character proficiency and text readability based on HSK lists, with customizable settings. Developed as part of a Master's thesis in Computational Linguistics.

Notifications You must be signed in to change notification settings

Ancastal/HSK-Character-Profiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

HSK Character Profiler

24/12/2023 Edit: This repo will soon be merged with a more up-to-date repository.

The HSK Character Profiler is a Python command-line tool developed as part of a Master's thesis in Computational Linguistics titled "Evaluating the Effectiveness of Machine Translation for Literary Works: A Comparative Study of English and Chinese Corpora."

The tool provides a way to analyze a text file containing Chinese characters and determine the levels of proficiency in Chinese language skills based on the HSK (Hanyu Shuiping Kaoshi) system. It identifies the HSK level of each character in the text file and generates a report of the number of characters found at each HSK level, as well as the average HSK level of the text file.

The HSK Character Profiler is flexible and customizable, allowing users to modify the input file and the HSK character sets used for analysis. It utilizes popular NLP libraries such as NLTK and Jieba for character segmentation and analysis.

This tool can be particularly useful for Chinese language learners, teachers, and researchers who need to assess the difficulty level of a text or determine the appropriate HSK level for a specific vocabulary list or learning material.

About

HSK Character Profiler is a Python tool that analyzes Chinese character proficiency and text readability based on HSK lists, with customizable settings. Developed as part of a Master's thesis in Computational Linguistics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages