# Twitch Top 200 Watched Games (2016-2021)

This dataset contains data about the top 200 games watched on the streaming platform Twitch from January 2016 until October 2021

I have decided to work with this set because I'm a passionate gamer as well as a consumer of Twitch content

__Size:__ 14000 rows x 12 columns

## Columns

__Rank:__ A number that identifies a game's position in the ranking for a given month and year

__Game:__ Title of the game

__Hours_watched:__ Sum of hours the viewers spent watching a game during that month

__Hours_streamed:__ Sum of hours streamers spent playing a game during that month

__Peak_viewers:__ Peak number of concurrent viewers a specific game reached in that month

__Peak_channels:__ Peak number of concurrent streamers playing a specific game in that month

__Streamers:__ Number of streamers that played a specific game in any given month

__Avg_viewers:__ Monthly average viewership of a specific game

__Avg_channels:__ How many streamers, on average, played a certain game during any given month

__Avg_viewer_ratio:__ View to channel ratio (average), i.e. avg_viewers/avg_channels

This notebook can be found at: https://github.com/ManuGr/dataScienceProject.git

Since the presentation of this analysis, I've added a comment for each of the plots present in the notebook, as well as the axis labels.

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In [2]:
df = pd.read_csv("Twitch_game_data.csv")
df.head(10)

Unnamed: 0,Rank,Game,Month,Year,Hours_watched,Hours_Streamed,Peak_viewers,Peak_channels,Streamers,Avg_viewers,Avg_channels,Avg_viewer_ratio
0,1,League of Legends,1,2016,94377226,1362044 hours,530270,2903,129172,127021,1833,69.29
1,2,Counter-Strike: Global Offensive,1,2016,47832863,830105 hours,372654,2197,120849,64378,1117,57.62
2,3,Dota 2,1,2016,45185893,433397 hours,315083,1100,44074,60815,583,104.26
3,4,Hearthstone,1,2016,39936159,235903 hours,131357,517,36170,53749,317,169.29
4,5,Call of Duty: Black Ops III,1,2016,16153057,1151578 hours,71639,3620,214054,21740,1549,14.03
5,6,Minecraft,1,2016,10231056,490002 hours,64432,1538,88820,13769,659,20.88
6,7,World of Warcraft,1,2016,8771452,342978 hours,46130,1180,33375,11805,461,25.57
7,8,Z1: Battle Royale,1,2016,7894571,205569 hours,41588,460,21396,10625,276,38.4
8,9,Talk Shows & Podcasts,1,2016,7688369,53235 hours,84051,148,10779,10347,71,144.42
9,10,FIFA 16,1,2016,6988475,203646 hours,145728,756,46462,9405,274,34.32


### Changes to the dataset

I've decided to make some modifications to the original dataset that should help me do a better study, namely:
- there seems to be a single row that contained a NaN value in the "Game" column, which I replaced with an empty string
- added a "Date" column that puts together the "Month" and "Year" columns
- the "Hours_Streamed" column was a String type column that contained the number of hours plus the word "hour" which was removed and then I cast the column to a numeric value

In [3]:
df = df.fillna('')

date_column = df["Month"].astype(str) + "/" + df["Year"].astype(str)
df["Date"] = date_column
df["Date"] = pd.to_datetime(df["Date"]).dt.strftime('%Y-%m')

df["Hours_Streamed"] = pd.to_numeric(df["Hours_Streamed"].str.replace(" hours", ""))

df

Unnamed: 0,Rank,Game,Month,Year,Hours_watched,Hours_Streamed,Peak_viewers,Peak_channels,Streamers,Avg_viewers,Avg_channels,Avg_viewer_ratio,Date
0,1,League of Legends,1,2016,94377226,1362044,530270,2903,129172,127021,1833,69.29,2016-01
1,2,Counter-Strike: Global Offensive,1,2016,47832863,830105,372654,2197,120849,64378,1117,57.62,2016-01
2,3,Dota 2,1,2016,45185893,433397,315083,1100,44074,60815,583,104.26,2016-01
3,4,Hearthstone,1,2016,39936159,235903,131357,517,36170,53749,317,169.29,2016-01
4,5,Call of Duty: Black Ops III,1,2016,16153057,1151578,71639,3620,214054,21740,1549,14.03,2016-01
...,...,...,...,...,...,...,...,...,...,...,...,...,...
13995,196,Battlefield V,10,2021,657132,95847,20899,283,13189,884,129,6.86,2021-10
13996,197,Naraka: Bladepoint,10,2021,655856,34220,10748,125,2959,882,46,19.17,2021-10
13997,198,Hearts of Iron IV,10,2021,655665,14621,9832,60,1503,882,19,44.84,2021-10
13998,199,RISK: The Game of Global Domination,10,2021,648689,68,92496,2,18,873,0,9539.54,2021-10


### Twitch Growth

I will mostly be working with the "Game", "Date", "Hours_watched" and "Hours_Streamed" columns. Thus, I first extract a dataframe that only contains those columns.

In [4]:
simple_df = df[["Game", "Hours_watched", "Hours_Streamed", "Date"]]
simple_df

Unnamed: 0,Game,Hours_watched,Hours_Streamed,Date
0,League of Legends,94377226,1362044,2016-01
1,Counter-Strike: Global Offensive,47832863,830105,2016-01
2,Dota 2,45185893,433397,2016-01
3,Hearthstone,39936159,235903,2016-01
4,Call of Duty: Black Ops III,16153057,1151578,2016-01
...,...,...,...,...
13995,Battlefield V,657132,95847,2021-10
13996,Naraka: Bladepoint,655856,34220,2021-10
13997,Hearts of Iron IV,655665,14621,2021-10
13998,RISK: The Game of Global Domination,648689,68,2021-10


### Content per month

In this first plot I aim to show the amount of content streamed and watched each month.

While the amount of hours watched is many orders of magnitude greater than the amount of hours streamed, they're behaviour is very similar throughout the years.

In [None]:
fig = make_subplots(rows=2, cols=1, shared_xaxes=True, shared_yaxes=True)

fig.add_trace(go.Histogram(x=simple_df["Date"], y=simple_df["Hours_watched"], name="Hours Watched", histfunc='sum', opacity=0.75, marker_color='#330C73'), row=1, col=1)
fig.add_trace(go.Histogram(x=simple_df["Date"], y=simple_df["Hours_Streamed"], name="Hours Streamed", histfunc='sum', opacity=0.75, marker_color='#EB89B5'), row=2, col=1)

fig.update_xaxes(title_text="Date", row=2, col=1)
fig.update_yaxes(title_text="Hours (h)", row=1, col=1)
fig.update_yaxes(title_text="Hours (h)", row=2, col=1)

### Individual Games' Growth

At this point I only want to work with 4-5 games that have been present on the top 200 since 2016. Namely: Counter-Strike: Global Offensive, Dota 2, Grand Theft Auto V, League of Legends, and Rust. I chose these games because I thought they offered the most interesting analysis.

Side Note: I decided to change the index solely for aesthetic reasons. Having the index skip 200 numbers didn't look good.

In [None]:
simpler_df = simple_df[(simple_df["Game"] == "League of Legends") | (simple_df["Game"] == "Counter-Strike: Global Offensive") | (simple_df["Game"] == "Dota 2")
                       | (simple_df["Game"] == "Grand Theft Auto V") | (simple_df["Game"] == "Rust")]
simpler_df.index = range(1,len(simpler_df)+1)
simpler_df

### Counter-Strike: Global Offensive

Counter-Strike (aka CSGO) is a third-person shooting game that pits terrorists against counter-terrorists.

From the line plot we can see that CSGO has a somewhat linear behaviour with peaks at different times of the year. These peaks are associated with some of the most famous tournament circuits that the community aroung the game organises. These tend to amass a larger number of viewers than usual. This is a trend that caracterises many of the top 200 games on twitch.

In [None]:
px.line(simpler_df, x="Date", y="Hours_watched", color="Game",
        title="Counter-Strike: Global Offensive",
        color_discrete_map={
                "Counter-Strike: Global Offensive": "blue", "Dota 2": "silver", "Grand Theft Auto V": "silver", "League of Legends": "silver", "Rust": "silver"
        },
        labels={
            "Hours_watched": "Hours Watched (h)"
        })

### Dota 2

Dota 2 is a Multiplayer Online Battle Arena (aka MOBA). It pits 2 teams of 5 people against each other, and their objective is to conquer the enemy team's base.

It is easy to see that the Dota 2 viewership is extremely similar to CSGO's. However, Dota 2's peaks are much more cyclical than CSGO's, happening mostly in August. This is due to the fact that Dota 2's most appealing tournament, The International, happens every year in August. It not only attracts a great number of players, due to its amazing prize pool, but it also attracts quite a bit of viewership.

In [None]:
px.line(simpler_df, x="Date", y="Hours_watched", color="Game",
        title="Dota 2",
        color_discrete_map={
                "Counter-Strike: Global Offensive": "silver", "Dota 2": "blue", "Grand Theft Auto V": "silver", "League of Legends": "silver", "Rust": "silver"
        },
        labels={
            "Hours_watched": "Hours Watched (h)"
        })

### Grand Theft Auto V

Grand Theft Auto V (aka GTAV) is an action-adventure game set in the fictional american state of San Andreas.

GTAV was highly acclaimed when it was released back in 2013, but it quickly lost steam and became a niche game. However, at the start of 2019, Rockstar (GTA's publisher) launched a big update to the online component of the game. This update attracted a lot of well-known streamers, that decided to take on this new update while role-playing with their friends. This role-playing side has since attracted a lot of viewership for the game.

In [None]:
px.line(simpler_df, x="Date", y="Hours_watched", color="Game",
        title="Grand Theft Auto V",
        color_discrete_map={
                "Counter-Strike: Global Offensive": "silver", "Dota 2": "silver", "Grand Theft Auto V": "blue", "League of Legends": "silver", "Rust": "silver"
        },
        labels={
            "Hours_watched": "Hours Watched (h)"
        })

### League of Legends

League of Legends (aka LoL) is a MOBA game as well, that shares very similar mechanincs with Dota 2.

While it has a similar behaviour to both CSGO and Dota 2, it is very different in that it's tournament scene is so much bigger than the other two. There are four major tournaments all year long in North America, Europe, China, and South Korea, as well as hundreds of minor and regional leagues, that attract a great amount of viewership all year round. On top of that, there's a small peak in October that corresponds to the League of Legends World Championship, an yearly event that pits the strongest teams from the 4 major regions, and some of the strongest teams of other minor regions, to decide which team is the best in the world.

In [None]:
px.line(simpler_df, x="Date", y="Hours_watched", color="Game",
        title="League of Legends",
        color_discrete_map={
                "Counter-Strike: Global Offensive": "silver", "Dota 2": "silver", "Grand Theft Auto V": "silver", "League of Legends": "blue", "Rust": "silver"
        },
        labels={
            "Hours_watched": "Hours Watched (h)"
        })

### Rust

Rust was also released in 2013. However, it was a game that flew under the radar, especially if you compare it to the huge success that GTAV had in the same year. Rust has somehow been able to be relevant enough to stay in the top 200 most watched games on Twitch, but it never had as much exposure as it had at the start of 2021. This was the time when a group of Streamers known as OfflineTV decided to pick up the game and play it in a private server where they would role-play with their characters. This move has attracted many other famous streamers, as well as an astounding number of viewers, thus its insane peak in January of 2021.

In [None]:
px.line(simpler_df, x="Date", y="Hours_watched", color="Game",
        title="Rust",
        color_discrete_map={
                "Counter-Strike: Global Offensive": "silver", "Dota 2": "silver", "Grand Theft Auto V": "silver", "League of Legends": "silver", "Rust": "blue"
        },
        labels={
            "Hours_watched": "Hours Watched (h)"
        })