In [1]:
# Imports the Google Cloud client library
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types

# Instantiates a client
client = language.LanguageServiceClient()

## Sentiment analysis on the posts

We read the file and use google NLP features to get a sentiment score on the document, and for each word within each document

In [2]:
import pandas as pd

df = pd.read_json('data/1000_posts.json')

entities_result = []
annotations_result = []
counter = 0
for index, row in df.iloc[0:100,:].iterrows():
    texts = [row['init_post']] + row['comments']
    for text in texts:
        document = types.Document(
                    content=text.lower(),
                    type=enums.Document.Type.PLAIN_TEXT,
                    language='en')
        entities = client.analyze_entity_sentiment(document).entities
        entities_result.append(entities)
        
        annotations = client.analyze_sentiment(document=document)
        annotations_result.append({"score": annotations.document_sentiment.score,
               "magnitude": annotations.document_sentiment.magnitude})

We aggregate the results for each word to get overall values for each word for the entire corpus

In [3]:
processed_docword_sentiment = {}
for i in range(len(entities_result)):
    for j in range(len(entities_result[i])):
        if(entities_result[i][j].name not in processed_docword_sentiment.keys()):
            processed_docword_sentiment[entities_result[i][j].name] = {'score':[entities_result[i][j].sentiment.score],
                                                             'magnitude':[entities_result[i][j].sentiment.magnitude]}
        else:
            processed_docword_sentiment[entities_result[i][j].name]['score'].append(entities_result[i][j].sentiment.score)
            processed_docword_sentiment[entities_result[i][j].name]['magnitude'].append(entities_result[i][j].sentiment.magnitude)

In [4]:
import json

with open('data/document_entity_sentiment.json', 'w') as outfile:
    json.dump(processed_docword_sentiment, outfile, indent=4, ensure_ascii=False)
    
with open('data/document_sentiment.json', 'w') as outfile:
    json.dump(annotations_result, outfile, indent=4, ensure_ascii=False)

In [5]:
word_data = processed_docword_sentiment
doc_data = annotations_result

## Sentiment analysis results

We first start by computing the average sentiment score for the documents

In [8]:
score_sum = 0
for item in doc_data:
    score_sum += item['score']*item['magnitude']
    
print("Average sentiment score over the posts is {:2f}".format(score_sum/len(doc_data)))

Average sentiment score over the posts is -0.078245


We can see that the average sentiment score is slightly below 0, which indicates that the average sentiment is rather negative. However, the value is still very close to 0 which indicates mixed feelings among the posts (which corresponds to the fact that although people do complain about the game, they are still trying to improve it by providing feedback and are not only writing to express dissatisfaction.

In [10]:
aggregated_score = []
score_dict = {}
for item in word_data.keys():
    aggregated_score.append((item, sum(np.array(word_data[item]['score'])*np.array(word_data[item]['magnitude']))/len(word_data[item]['score'])))
    score_dict[item] = sum(np.array(word_data[item]['score'])*np.array(word_data[item]['magnitude']))/len(word_data[item]['score'])
        

In [20]:
score_dict['classic']

0.08693181591277778

In [12]:
print("The top-10 most positive words are:")
print(sorted(aggregated_score, key=lambda tup: -tup[1])[0:10])
print("The top-10 most negative words are:")
print(sorted(aggregated_score, key=lambda tup: tup[1])[0:10])

The top-10 most positive words are:
[('gameplay type', 2.279999957680701), ('talent streamlining', 1.7099999332427984), ('motivator', 1.7099999332427984), ('forgiveness', 1.7099999332427984), ("za'qul", 1.7099999332427984), ('awesomeness', 1.6649999237060555), ('trolololol', 1.6199999141693127), ('though.', 1.6199999141693127), ('bagsnon boe', 1.6199999141693127), ('summation', 1.6199999141693127)]
The top-10 most negative words are:
[('johnny', -2.4299999785423267), ('class balance philosophy', -2.079999954700469), ('attack frenzy', -1.920000104904176), ('self torture', -1.7099999332427984), ('goblins lack', -1.7099999332427984), ('garbage content', -1.7099999332427984), ('reputation farming', -1.7099999332427984), ('pity bonus', -1.6199999141693127), ('lolmode', -1.6199999141693127), ('feed', -1.6199999141693127)]


The top-10 words are not that interesting. However, the bottom-10 are as we can see two features of the game that have been regularly debated: 'class balance philosophy' which is about balancing the power of each class of the game, and 'reputation farming' which is a very time-consuming part of the game currently.

In [36]:
names = ['blizzard','blizz']

print("The scores for blizzard are:")
for item in names:
    print("Score for {} is: {:.3f}".format(item,score_dict[item]))
    
print('\n')  
classic = ['classic']

print("The score for wow classic is:")
print("Score for {} is: {:.3f}".format('classic',score_dict['classic']))
print('\n')

expansions = ['bfa','legion','wod','cataclysm','mop','wotlk','tbc','vanilla']

print("The score for the expansions are:")
for item in expansions:
    print("Score for {} is: {:.3f}".format(item,score_dict[item]))
    
print('\n')
game_features = ['leveling','dungeon','raid','pvp','pve',
        'azerite wq', 'azerite']
print("The score for the game features are:")
for item in game_features:
    print("Score for {} is: {:.3f}".format(item,score_dict[item]))

The scores for blizzard are:
Score for blizzard is: -0.116
Score for blizz is: -0.101


The score for wow classic is:
Score for classic is: 0.087


The score for the expansions are:
Score for bfa is: -0.022
Score for legion is: -0.059
Score for wod is: 0.022
Score for cataclysm is: 0.029
Score for mop is: 0.010
Score for wotlk is: 0.052
Score for tbc is: -0.017
Score for vanilla is: -0.018


The score for the game features are:
Score for leveling is: 0.035
Score for dungeon is: -0.004
Score for raid is: -0.016
Score for pvp is: -0.015
Score for pve is: 0.035
Score for azerite wq is: -0.180
Score for azerite is: -0.230


**Blizzard**: it seems that blizzard is somewhat unpopular on those posts since 'blizzard' and 'blizz' have quite a negative sentiment score associated with their name.

**Expansions**: it is interesting to see that the different WOW expansions have very different sentiment score. Classic wow has the highest sentiment score which seems consistent with the current success of the game. On the contrary, the current extension ('bfa') has quite a low sentiment score. There seems to be a pattern where the intermediary expansions have a positive sentiment score, and the oldest and most recent expansions have a negative sentiment score.

**Game features**: 'pve' (which stands for player vs environment) and 'leveling' have a very high sentiment score and it is consistent with the fact that those are elements that have been quite positively judged by players. On the contrary, 'azerite' and 'azerite wq' have a very low sentiment score: they are very controversial aspects of the game (azerite is some sort of currency for progress within the game)