在NLP中，通过姓名识别性别是一个很有趣的任务。    
这里将用到启发式的方法，即姓名的最后几个字符可以界定性别特征。
例如如果一个名字以'la'结尾，那么它很有可能是一位女性的名字，例如‘Angela’，‘Layla’

In [10]:
import random
from nltk.corpus import names
from nltk import NaiveBayesClassifier
from nltk.classify import accuracy as nltk_accuracy

# 定义提取单词特征函数
def gender_features(word, num_letters=2):
    return {'feature': word[-num_letters:].lower()}

# 提取标记名称
labeled_names = ([(name, 'male') for name in names.words('male.txt')] +
            [(name, 'female') for name in names.words('female.txt')])
# 设置随机生成数的种子值，并混合搅乱训练数据
random.seed(7)
random.shuffle(labeled_names)
# 定义一些输入的姓名
input_names = ['Leonardo', 'Amy', 'Sam']

# 因为不知道需要多少个末尾字符，这里将这个参数设置为1~5.每次循环执行，都会截取相应大小的末尾字符个数
for i in range(1, 5):
        print('\nNumber of letters:', i)
        featuresets = [(gender_features(n, i), gender) for (n, gender) in labeled_names]
        train_set, test_set = featuresets[500:], featuresets[:500]
        # 用朴素贝叶斯分类器做分类
        classifier = NaiveBayesClassifier.train(train_set)
        # 用参数空间的每一个值评价分类器的效果
        print('Accuracy ==>', str(100 * round(nltk_accuracy(classifier, test_set),2)) + str('%'))
        # 为新输入预测输出结果
        for name in input_names:
            print(name, '==>', classifier.classify(gender_features(name, i)))


Number of letters: 1
Accuracy ==> 76.0%
Leonardo ==> male
Amy ==> female
Sam ==> male

Number of letters: 2
Accuracy ==> 79.0%
Leonardo ==> male
Amy ==> female
Sam ==> male

Number of letters: 3
Accuracy ==> 77.0%
Leonardo ==> male
Amy ==> female
Sam ==> female

Number of letters: 4
Accuracy ==> 71.0%
Leonardo ==> male
Amy ==> female
Sam ==> female
