Skip to content

A Chinese Webpage Title Text Categorization Tool 中文网页标题分类工具(短文本分类)

Notifications You must be signed in to change notification settings

Serbipunk/webpage_categorization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Chinese Webpage Title Categorization Tool

中文网页标题分类工具(短文本分类)

dependency:

  • gcc>=4.9
  • other library has been embedded in the project, including: jieba Chinese text segmentation, libSVM and sqlite.

hints:

  • For categorizing short text, 20 words are recommanded for best performance.

  • This is just a practise which was accomplished when I was a post-graduate. For the god's sake, don't blame me for these messy code.

  • The output id-category relation

      1. economy 经济金融
      2. education 教育
      3. entertainment 娱乐八卦
      4. sports 体育
      5. IT 科技
    

About

A Chinese Webpage Title Text Categorization Tool 中文网页标题分类工具(短文本分类)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published