# Darks Souls II Reviews (2025)

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import altair as alt
import re

## Steam Reviews as of 3/30/24:

In [2]:
df = pd.read_csv('reviews.csv')
reviews = df.copy()
reviews = reviews.set_index('recommendationid')
reviews.drop(columns={'Unnamed: 0', 'in_early_access'}, inplace=True)

Converting date of review from unix:

In [None]:
reviews['month_name'] = pd.to_datetime(reviews.update_date, unit='s').dt.month_name()
reviews['month']      = pd.to_datetime(reviews.update_date, unit='s').dt.month
reviews['year']       = pd.to_datetime(reviews.update_date, unit='s').dt.year
reviews['day']        = pd.to_datetime(reviews.update_date, unit='s').dt.day

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  reviews['month_name'] = pd.to_datetime(reviews.update_date, unit='s').dt.month_name()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  reviews['month'] = pd.to_datetime(reviews.update_date, unit='s').dt.month
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  reviews['year'] = pd.to_datetime(reviews.upda

Focusing on just the English reviews:

In [4]:
reviews = reviews[reviews.language == 'english']

## Cleaning up the reviews

In [None]:
reviews['review'] = reviews.review.str.lower()

In [None]:
from nltk.corpus import stopwords

# Removing urls:
r = [re.sub(r'http\S+', '', review).lower().strip() if pd.notna(review) else review for review in reviews.review]

# Removing esc sequences, punctuation, and numbers:
    # There's some ASCII art in some of the reviews
r = [re.sub(r'[^A-Za-z]', ' ', review).strip() if pd.notna(review) else review for review in r]

# Removing multiple and trailing whitespaces:
r = [re.sub(r' +', ' ', review).strip() if pd.notna(review) else review for review in r]

reviews['review'] = r

In [7]:
reviews

Unnamed: 0_level_0,review,language,init_date,update_date,voted_up,month,year
recommendationid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
190511148,dont give up skeleton,english,1742267451,1742267451,True,March,2025
190504311,boia,english,1742259081,1742259081,True,March,2025
190502415,i love this game to pieces it s the worst soul...,english,1742256864,1742256864,True,March,2025
190501465,i probably shouldn t recommend it because of h...,english,1742255757,1742255757,True,March,2025
190500200,peak souls,english,1742254339,1742254339,True,March,2025
...,...,...,...,...,...,...,...
15162268,try tongue but hole,english,1427932431,1428081346,True,April,2015
15162220,so far so good played it for mins so far with ...,english,1427932153,1427932153,True,April,2015
15162161,still haven t died bonedrinker rufus keep the ...,english,1427931845,1427931845,True,April,2015
15162057,needs more cow bell,english,1427931196,1427931196,True,April,2015


## Sentiment Analysis:
- Seeing why people were positive or negative about the game
    - Comments on story, gameplay, etc

For sake of analysis specifically on the actual reviews, drop any rows that have no reviews:

In [8]:
reviews = reviews.dropna(subset=['review'])
reviews.shape

(45563, 7)

Top 10 Most Common Words in the Reviews:

In [9]:
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

In [10]:
tfidf = TfidfVectorizer(sublinear_tf=True,
                        analyzer='word',
                        max_features=4000,
                        tokenizer=word_tokenize,
                        stop_words=stopwords.words("english"))

In [11]:
review_txt = reviews.review.values.flatten()
tfidf_array = tfidf.fit_transform(review_txt).toarray()
tfidf_df = pd.DataFrame(tfidf_array)
tfidf_df.columns = tfidf.get_feature_names_out()



- Most common word among the reviews isn't very informative - including some of the other popular words
    - Looking at subsets of the reviews could be useful

### Topic Modeling:
- Exploring certain aspects on why people like the game
    - Also get critiques of the game in positive reviews (if any but there sure is considering DS2's reputation in the community)

- Exploring why people don't like the game:
    - Also get positive aspects within this subset of the reviews
    
- Algorithms I can use to perform topic modeling:
    1. Latent Dirichlet Allocation (LDA) 
    2. Non-negative Matrix Factorization (NMF)

Splitting the reviews by how many do and don't recommend buying the game:

In [12]:
pos_reviews = reviews[reviews['voted_up'] == True]
neg_reviews = reviews[reviews['voted_up'] == False]

In [13]:
pos_reviews.shape, neg_reviews.shape

((37387, 7), (8176, 7))

Function to display the output of the models:

In [14]:
def display_topics(model, feature_names, no_top_words):
    topic_dict = {}
    for topic_idx, topic in enumerate(model.components_):
        topic_dict["Topic %d words" % (topic_idx + 1)]= ['{}'.format(feature_names[i])
                        for i in topic.argsort()[:-no_top_words - 1:-1]]
        topic_dict["Topic %d weights" % (topic_idx + 1)]= ['{:.1f}'.format(topic[i])
                        for i in topic.argsort()[:-no_top_words - 1:-1]]
    return pd.DataFrame(topic_dict)

LDA: Probabilistic graphical modeling, and uses CountVectorizer as input

In [15]:
from sklearn.decomposition import LatentDirichletAllocation

In [16]:
count_vector = CountVectorizer()

tf = count_vector.fit_transform(reviews.review)
tf_feat_names = count_vector.get_feature_names_out()

pos_tf = count_vector.fit_transform(pos_reviews.review) 
pos_tf_feat_names = count_vector.get_feature_names_out()

neg_tf = count_vector.fit_transform(neg_reviews.review)
neg_tf_feat_names = count_vector.get_feature_names_out()

In [17]:
lda = LatentDirichletAllocation(n_components=3, random_state=42069)
lda.fit(tf)

In [18]:
no_top_words = 10
display_topics(lda, tf_feat_names, no_top_words)

Unnamed: 0,Topic 1 words,Topic 1 weights,Topic 2 words,Topic 2 weights,Topic 3 words,Topic 3 weights
0,souls,6058.9,the,116528.4,game,19391.9
1,dark,4617.4,and,58230.1,this,13047.2
2,the,2977.7,to,51953.0,it,11285.6
3,best,2909.3,of,50020.8,you,9087.7
4,of,1579.7,it,42364.5,and,8744.9
5,keep,1546.8,game,38244.9,the,8636.9
6,hate,1448.9,you,37730.2,to,8483.3
7,iron,1409.7,is,36991.0,is,7518.9
8,ii,1186.0,in,31495.6,good,5802.5
9,scholar,1011.1,souls,27086.8,my,4755.7


- Interpreted topics that were identified:
    1. People saying how good the game is
    2. Bosses/enemies
    3. People expressing their likes or dislikes of the game

In [19]:
pos_lda = LatentDirichletAllocation(n_components=3, random_state=42069)
pos_lda.fit(pos_tf)

In [20]:
display_topics(pos_lda, pos_tf_feat_names, no_top_words)

Unnamed: 0,Topic 1 words,Topic 1 weights,Topic 2 words,Topic 2 weights,Topic 3 words,Topic 3 weights
0,game,3593.1,the,30013.8,the,62263.0
1,good,1928.2,it,23354.0,and,34203.6
2,this,1393.9,game,22980.7,to,31791.2
3,love,1085.0,souls,17288.7,of,27665.7
4,you,1057.6,this,14294.6,you,24623.5
5,best,1011.0,and,13895.2,is,18550.4
6,die,999.6,is,13270.1,in,18204.5
7,gud,902.4,dark,12090.4,it,16716.3
8,souls,856.1,of,11997.4,game,15781.2
9,died,810.6,but,10458.6,that,14376.1


- Interpreted topics that were identified:
    1. People expressing that they loved the game (expected since I'm looking at the subset of reviews that recommend the game)
    2. (similar to 1st topic)
    3. Bosses/enemies

In [21]:
neg_lda = LatentDirichletAllocation(n_components=3, random_state=42069)
neg_lda.fit(neg_tf)

In [22]:
display_topics(neg_lda, neg_tf_feat_names, no_top_words)

Unnamed: 0,Topic 1 words,Topic 1 weights,Topic 2 words,Topic 2 weights,Topic 3 words,Topic 3 weights
0,the,22002.4,game,4214.4,the,10185.6
1,to,13441.0,the,3650.0,game,5093.0
2,you,11680.2,this,3568.2,and,4993.8
3,and,11648.9,it,2675.9,it,3950.8
4,of,10080.3,to,2360.9,of,3659.4
5,is,7374.6,and,2211.3,to,3415.1
6,in,7020.6,is,2033.2,this,3194.7
7,it,6956.2,you,1823.6,is,3077.1
8,game,6776.6,souls,1713.2,souls,2797.3
9,that,6142.6,dark,1352.7,in,1813.9


- Interpreted topics that were identified:
    1. Bosses/enemies
    2. Controls/PC port of the game
    3. Players' comments on that it's the worst Dark Souls game they've played

NMF: Linear algebra and uses the TF-IDF vectorizer as input

In [23]:
from sklearn.decomposition import NMF

In [24]:
nmf = NMF(n_components=3, random_state=42069)
nmf.fit(tfidf_array)

In [25]:
display_topics(nmf, tfidf.get_feature_names_out(), no_top_words)

Unnamed: 0,Topic 1 words,Topic 1 weights,Topic 2 words,Topic 2 weights,Topic 3 words,Topic 3 weights
0,souls,4.6,good,5.3,game,3.4
1,dark,3.7,game,0.6,great,1.0
2,best,2.7,pretty,0.3,ds,0.9
3,ii,0.4,still,0.1,fun,0.8
4,games,0.3,really,0.0,like,0.8
5,series,0.3,souls,0.0,play,0.7
6,better,0.3,actually,0.0,bad,0.6
7,still,0.3,yeah,0.0,still,0.6
8,worst,0.2,luck,0.0,one,0.5
9,ever,0.2,get,0.0,first,0.5


- Interpreted topics that were identified:
    1. Positive experiences from the game
    2. (similar to 1st topic)
    3. Mixed reception of the game (love and hate)

In [26]:
pos_txt = pos_reviews.review.values.flatten()
pos_tfidf_array = tfidf.fit_transform(pos_txt).toarray()
nmf.fit(pos_tfidf_array)



In [27]:
display_topics(nmf, tfidf.get_feature_names_out(), no_top_words)

Unnamed: 0,Topic 1 words,Topic 1 weights,Topic 2 words,Topic 2 weights,Topic 3 words,Topic 3 weights
0,souls,4.4,good,5.1,game,3.5
1,dark,3.5,game,0.5,great,1.1
2,best,2.9,pretty,0.3,fun,0.8
3,ii,0.4,still,0.1,ds,0.7
4,games,0.3,souls,0.0,like,0.6
5,series,0.3,really,0.0,still,0.6
6,ever,0.3,actually,0.0,play,0.5
7,one,0.3,yeah,0.0,love,0.5
8,still,0.3,get,0.0,hate,0.5
9,better,0.3,dark,0.0,first,0.5


- Interpreted topics that were identified:
    1. Positive outloooks on the game
    2. similar to 1st topic
    3. People expressing their opinion on the game, ranging from good to bad

In [28]:
neg_txt = neg_reviews.review.values.flatten()
neg_tfidf_array = tfidf.fit_transform(neg_txt).toarray()
nmf.fit(neg_tfidf_array)



In [29]:
display_topics(nmf, tfidf.get_feature_names_out(), no_top_words)

Unnamed: 0,Topic 1 words,Topic 1 weights,Topic 2 words,Topic 2 weights,Topic 3 words,Topic 3 weights
0,game,1.9,bad,3.3,souls,2.5
1,ds,0.8,game,0.6,dark,2.3
2,like,0.7,really,0.1,worst,0.4
3,play,0.7,design,0.0,play,0.2
4,good,0.6,kinda,0.0,like,0.2
5,one,0.6,genuinely,0.0,instead,0.2
6,shit,0.5,port,0.0,ii,0.2
7,even,0.5,souls,0.0,games,0.2
8,get,0.5,general,0.0,series,0.1
9,enemies,0.5,hitboxes,0.0,buy,0.1


- Interpreted topics that were identified:
    1. Vague but concerned w/ enemies
    2. Very negative perspectives on the game
    3. Negative experience regarding bosses, hitboxes, and game design

## Looking at reviews after DLC release:

In [31]:
dlc_reviews = reviews[(reviews.year >= 2014) & (reviews.month >= 7) & (reviews.day >= 22)]
dlc_reviews = dlc_reviews[dlc_reviews.review.isna() == False]
dlc_reviews = dlc_reviews[dlc_reviews.review.str.contains(r'\bdlc\b')]
dlc_reviews

Unnamed: 0_level_0,review,language,init_date,update_date,voted_up,month,year,month_name,day
recommendationid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
184244595,i like it even though people kind of hate this...,english,1735584348,1735584348,True,12,2024,December,30
184165897,i started the game trying not to think about a...,english,1735515416,1735515416,False,12,2024,December,29
183613472,unironically a masterpiece my favourite of the...,english,1735091925,1735091925,True,12,2024,December,25
183467284,the game is really really overhated although t...,english,1734952064,1734952064,True,12,2024,December,23
183432730,i genuinely don t understand the amount of hat...,english,1734911344,1734911344,True,12,2024,December,22
...,...,...,...,...,...,...,...,...,...
15664976,hollow good buy if you don t already own game ...,english,1430634124,1440388162,True,8,2015,August,24
15470237,i m changing my review to currently reflect th...,english,1429572560,1448541454,True,11,2015,November,26
15296462,from an avid but not crazy player of dark soul...,english,1428698609,1451422989,False,12,2015,December,29
15244026,the pc community split is the main reason it s...,english,1428369155,1695742605,False,9,2023,September,26


In [34]:
dlc_reviews.iloc[1]['review']

'i started the game trying not to think about all the negatives that i heard about it and yet i can say with sadness that it s probably the only from s game i ll never play again i finished the vanilla game and also all the dlc the game it s not as disgusting as everyone says but it really does show how poorly designed it is and how it s a step down in almost every way from the predecessor i played almost every game fromsoftware has ever made and had planned to get all achievements and yet ds managed to make me desist there s a lot to say about the game so i ll start with the good parts the game has some mechanics that are most welcome bonfire ascetics and a variety of playstyles that i think is a good competition even with ds the soundtrack as always is masterfully crafted and always made me stop for a moment to fully appreciate it majula the locations are amazing and always feel masterfully made on a aesthetic point and gave the world an incredible feeling majula is one of if not the