情感分析是指确定一段给定的文本是积极的还是消极的过程。    
有一些场景中，我们还会将‘中性’作为第三个选项。    
情感分析用于分析很多场景中的用户情绪，如营销活动、社交媒体、电子商务客户等。

In [1]:
import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews

# 定义一个用于提取特征的函数
def extract_features(word_list):
    return dict([(word, True) for word in word_list])

# 加载积极和消极评论
positive_fileids = movie_reviews.fileids('pos')
negative_fileids = movie_reviews.fileids('neg')
# 将这些评论数据分成积极评论和消极评论
features_positive = [(extract_features(movie_reviews.words(fileids=[f])), 
            'Positive') for f in positive_fileids]
features_negative = [(extract_features(movie_reviews.words(fileids=[f])), 
        'Negative') for f in negative_fileids]
# 将数据分成训练数据集和测试数据集
threshold_factor = 0.8
threshold_positive = int(threshold_factor * len(features_positive))
threshold_negative = int(threshold_factor * len(features_negative))
# 提取特征
features_train = features_positive[:threshold_positive] + features_negative[:threshold_negative]
features_test = features_positive[threshold_positive:] + features_negative[threshold_negative:]  
print("\nNumber of training datapoints:", len(features_train))
print("Number of test datapoints:", len(features_test))
# 训练朴素贝叶斯分类器
classifier = NaiveBayesClassifier.train(features_train)
print("\nAccuracy of the classifier:", nltk.classify.util.accuracy(classifier, features_test))
# 该分类器对象包含分析过程中获得的最有信息量的单词。
# 通过这些单词可以判定哪些可以被归类为积极评论，哪些可以被归类为消极评论
print("\nTop 10 most informative words:")
for item in classifier.most_informative_features()[:10]:
    print(item[0])
    
# 生成一些随机输入句子
input_reviews = [
        "It is an amazing movie", 
        "This is a dull movie. I would never recommend it to anyone.",
        "The cinematography is pretty great in this movie", 
        "The direction was terrible and the story was all over the place" 
    ]
# 在这些句子上运行分类器
print("\nPredictions:")
for review in input_reviews:
    print("\nReview:", review)
    probdist = classifier.prob_classify(extract_features(review.split()))
    pred_sentiment = probdist.max()
    print("Predicted sentiment:", pred_sentiment )
    print("Probability:", round(probdist.prob(pred_sentiment), 2))


Number of training datapoints: 1600
Number of test datapoints: 400

Accuracy of the classifier: 0.735

Top 10 most informative words:
outstanding
insulting
vulnerable
ludicrous
uninvolving
avoids
astounding
fascination
symbol
seagal

Predictions:

Review: It is an amazing movie
Predicted sentiment: Positive
Probability: 0.61

Review: This is a dull movie. I would never recommend it to anyone.
Predicted sentiment: Negative
Probability: 0.77

Review: The cinematography is pretty great in this movie
Predicted sentiment: Positive
Probability: 0.67

Review: The direction was terrible and the story was all over the place
Predicted sentiment: Negative
Probability: 0.63
