# <center>The Rise and Fall of Fall Guys</center>

# Introduction

There has been a lot of criticism and hate lately towards the video game Fall Guys. The game was very popular during its early days from when it was released, but after many complaints its popularity seemed to have plummeted, with people calling it a "dead" game. I wanted to observe what people are generally saying, how they feel about the game, and compare if opinions have changed since the initial release of the game. To achieve this, I will analyze customer reviews from the Steam store that were written since the game was released.

For context, Fall Guys: Ultimate Knockout is a PC game released on August 4, 2020. For PC users, the game can be purchased from the [Steam Store](https://store.steampowered.com/app/1097150/Fall_Guys_Ultimate_Knockout/). On Steam, users who have bought or received the game as a gift are given the option to leave a review for the game, and can give a "thumbs up" or "thumbs down" which serves as an indicator of whether the user recommends the game or not.

<i>All of the data for this project was collected on February 13, 2021.</i>

# Outline

- Part 1: Understanding the Data
- Part 2: Observing overall feedback of reviews. Recommended or not, words mentioned in reviews.
- Part 3: Overall total playtime, total playtime by recommended or not, by other fields...
- Part 4: Review_Timestamp. Observe how people felt about the game at the time they reviewed the game
- Part 5: Trend of the Last_Played field. Can be used to check when a player "quit playing" Fall Guys

# Part 1: Understanding the Data

## The data

`reviews.csv` contains every review that has been written on Steam for Fall Guys, with the condition that the review was written in English. It contains information such as the amount of time spent playing Fall Guys, whether the reviewer recommended the game or not, and when the review was written.

Column | Definition
--- | ---
Review | User review of the game
Recommended | True if the reviewer gave a positive recommendation, false otherwise
Total_Playtime | Total playtime (in minutes) at the time the review was written
Review_Timestamp | Date and time of when the review was written. Formatted as m/d/y and GMT Time
Last_Played | Unix timestamp of when the user last played the game 

More information on the Steam review fields: [https://partner.steamgames.com/doc/store/getreviews](https://partner.steamgames.com/doc/store/getreviews)

## Importing libraries and loading the data

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [12]:
# Loading the data
df = pd.read_csv('data/reviews.csv', encoding='ISO-8859-1')

df.drop("Steam ID", axis=1, inplace=True) # Steam ID identifies each user, which we don't need

In [13]:
df

Unnamed: 0,Review,Recommended,Total_Playtime,Review_Timestamp,Last_Played
0,Wall Guys Sucks,True,275,2/13/21 8:25,1613204675
1,1,False,4164,2/13/21 8:04,1608127700
2,very nice game i bought it yesterday i love it...,True,33,2/13/21 7:34,1613203192
3,its ok,True,318,2/13/21 6:51,1613192434
4,ehyehvvchvvcdgyvv dhdhwdfhycfydgvdyyeegt heecduvh,True,8158,2/13/21 6:49,1613202336
...,...,...,...,...,...
128652,"Honestly, one of the most refreshing and bold ...",True,375,8/4/20 7:20,1602679579
128653,Was it so hard to focus more on additional map...,False,13,8/4/20 7:18,1596528167
128654,[h1]9/10[/h1]\n it is really cool that more an...,True,2116,8/4/20 7:12,1605827503
128655,https://www.youtube.com/watch?v=anPEhQvsBr4\n\...,True,3712,8/4/20 7:12,1601014874


## Cleaning the data

The `Last_Played` field is in a Unix timestamp that is hard to interpret for humans. To make the field more human readable, we'll convert it to year-month-day.

In [41]:
# convert Last_Played from Unix timestamp to yyyy-mm-dd
df['Last_Played'] = pd.to_datetime(df.Last_Played).dt.date

In [55]:
df['Last_Played'].head()

0    2021-02-13
1    2020-12-16
2    2021-02-13
3    2021-02-13
4    2021-02-13
Name: Last_Played, dtype: object

To keep the `Review_Timestamp` and `Last_Played` fields consistent, we'll remove the hour:minute part of the timestamp and format the date to be similar as the `Last_Played` field (hour:minute is not needed for our analysis).

In [44]:
# convert Review_Timestamp to yyyy-mm-dd
df['Review_Timestamp'] = pd.to_datetime(df.Review_Timestamp).dt.date

In [56]:
df['Review_Timestamp'].head()

0    2021-02-13
1    2021-02-13
2    2021-02-13
3    2021-02-13
4    2021-02-13
Name: Review_Timestamp, dtype: object

We will also convert the 'Total_Playtime' values from minutes to hours to make them easier to interpret.

In [52]:
# conversion from minute to hour
df['Total_Playtime'] = round(df['Total_Playtime'] / 60, 1)

In [73]:
# Total Playtime in hours
df['Total_Playtime'].head()

0      4.6
1     69.4
2      0.6
3      5.3
4    136.0
Name: Total_Playtime, dtype: float64

## Exploring the dataset

After cleaning up our data, let's take a closer look at our dataset.

In [70]:
df

Unnamed: 0,Review,Recommended,Total_Playtime,Review_Timestamp,Last_Played
0,Wall Guys Sucks,True,4.6,2021-02-13,2021-02-13
1,1,False,69.4,2021-02-13,2020-12-16
2,very nice game i bought it yesterday i love it...,True,0.6,2021-02-13,2021-02-13
3,its ok,True,5.3,2021-02-13,2021-02-13
4,ehyehvvchvvcdgyvv dhdhwdfhycfydgvdyyeegt heecduvh,True,136.0,2021-02-13,2021-02-13
...,...,...,...,...,...
128652,"Honestly, one of the most refreshing and bold ...",True,6.2,2020-08-04,2020-10-14
128653,Was it so hard to focus more on additional map...,False,0.2,2020-08-04,2020-08-04
128654,[h1]9/10[/h1]\n it is really cool that more an...,True,35.3,2020-08-04,2020-11-19
128655,https://www.youtube.com/watch?v=anPEhQvsBr4\n\...,True,61.9,2020-08-04,2020-09-25


We use the describe method below to observe the statistics of amount of hours users played Fall Guys. Notice the huge gap between the 75% percentile and the maximum hours of total playtime. Someone has played Fall Guys for approximately 4329.1 hours, or 180 days! There are 193 days between the official release date of Fall Guys and when this data was extracted, so this person who played for 180 days certainly has a lot of hours in, which I think is quite shocking.

In [72]:
df.describe()

Unnamed: 0,Total_Playtime
count,128657.0
mean,40.766164
std,73.712498
min,0.0
25%,10.4
50%,21.4
75%,46.0
max,4329.1


In [71]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 128657 entries, 0 to 128656
Data columns (total 5 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   Review            128330 non-null  object 
 1   Recommended       128657 non-null  bool   
 2   Total_Playtime    128657 non-null  float64
 3   Review_Timestamp  128657 non-null  object 
 4   Last_Played       128657 non-null  object 
dtypes: bool(1), float64(1), object(3)
memory usage: 4.0+ MB


In [75]:
df.isna().sum()

Review              327
Recommended           0
Total_Playtime        0
Review_Timestamp      0
Last_Played           0
dtype: int64

Apparently, there are missing values for the Review field. This might be because the user didn't write a review for the game, but were able to still give a positive or negative recommendation of the game. In fact, if we check the rows where there are missing reviews, we can still observe the other fields as they aren't missing, so we'll keep these rows in our dataset.

In [78]:
df[df['Review'].isna() == True]

Unnamed: 0,Review,Recommended,Total_Playtime,Review_Timestamp,Last_Played
35,,True,21.6,2021-02-13,2021-02-13
63,,True,10.5,2021-02-12,2021-02-12
132,,True,73.4,2021-02-11,2021-02-12
355,,True,181.6,2021-02-07,2021-02-12
693,,True,54.0,2021-02-03,2021-02-12
...,...,...,...,...,...
119205,,True,24.1,2020-08-06,2020-08-24
119775,,True,56.0,2020-08-06,2021-01-15
119861,,True,8.7,2020-08-06,2020-09-08
120939,,True,10.7,2020-08-05,2020-12-27


# Part 2: Comparing overall feedback