nameseer ('name-seer') is a python library for Thai name classification. It can determine whether a given name is a Thai person name or a Thai corporate name. nameseer uses linguistics features within the name to determine whether the name is a Thai person name or not. nameseer comes with a pre-trained model that was trained on 700,000 company names and 700,000 person names, with 99.4% accuracy when tested on 400,000 unseen names.
nameseer is based on the name-ethnicity classifier, orginally proposed here:
Treeratpituk, Pucktada, and C. Lee Giles. "Name-ethnicity classification and ethnicity-sensitive name matching." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 26. No. 1. 2012.
Paper URL : https://ojs.aaai.org/index.php/AAAI/article/download/8324/8183
- abydos
- scikit-learn
- nltk
- pythainlp
- python >= 3.7.6
nameseer
can be installed using pip
pip install nameseer
Once installed, you can use nameseer
within your python code to classify whether a Thai name is a person name or a corporate name.
>>> from nameseer import NameClassifier
>>> nc = NameClassifier.load_pretrained_model()
>>> nc.classify_names(['ประยุทธ์ จันทร์โอชา','แอดวานซ์อินโฟร์ เซอร์วิส'])
['person', 'company']
Treeratpituk, Pucktada (2022). Nameseer: a Thai person name classifier. Mar 29, 2022. See https://github.com/botx/nameseer
Pucktada Treeratpituk, Bank of Thailand (pucktadt@bot.or.th)
This project is licensed under the Apache Software License 2.0 - see the LICENSE file for details