A simple API to classify race and gender by name.
This tool attempts to infer a person's race and gender from their name using a classification algorithm. Attempting to identify a person's race from their name may seem ethically questional but is important for analysis of racial transformation over time. This classifier was trained using data from South Africa and so may not easily be used for other countries. Race classes are borrowed from official races used in South Africa for classification of demographics.
The classifier tends to work well with African names. Chinese has good precision but terrible recall. Results with Indian names are fair. Coloured names are often confused with both White and Indian names resulting in poor performance for the class.
5-fold crossvalidation results are shown below:
Train size | Test size | Race | Precision | Recall | F1 |
---|---|---|---|---|---|
3000 | 1500 | White | 0.768 | 0.7459 | 0.7567 |
3000 | 1500 | Indian | 0.8885 | 0.8234 | 0.8547 |
3000 | 1500 | African | 0.9136 | 0.9839 | 0.9474 |
3000 | 1500 | Chinese | 0.9629 | 0.3030 | 0.4609 |
3000 | 1500 | Coloured | 0.6438 | 0.7280 | 0.6833 |