# Steam Top 100 Played Games #

This project is a Python application that creates an interactive dashboard using Dash and Plotly Express to visualize and analyze the "Steam Top 100 Played Games" dataset. The dashboard provides insights into game genres, prices, and player statistics.

### Code Organization

The code is organized into four main sections:

#### 1. Import Libraries
- **pandas**: For data manipulation and cleaning.
- **plotly.express**: For creating interactive and themed visualizations.
- **dash, dash_core_components, dash_html_components**: For building the interactive web application.
- **collections.Counter**: For counting occurrences of genres.

#### 2. Load and Clean Dataset
- **Loading Data**: The dataset is loaded from a CSV file.
- **Data Cleaning**:
  - **Price**: Converts `Free To Play` to `0.0`, removes the `£` symbol, and converts values to float.
  - **Current Players & Peak Today**: Removes commas from numeric strings and converts them to integers.
  - **Genre Tags**: Splits the genre tags into lists by comma, removes the `+` symbol and extra spaces, and filters out empty strings.

#### 3. Analyze Dataset
- **Genre Tags Analysis**:
  - Counts the frequency of each genre across the dataset using `Counter`.
  - Creates a DataFrame (`genre_df`) and extracts the top 10 genres (`top_genres`).
- **Analysis of Top Genre Tags and Game Price**:
  - Explodes the genre tags and computes the average game price per genre.
  - Filters to include only the top 10 genres and sorts by average price in descending order.
- **Analysis of Top Genre Tags and Peak Today**:
  - Computes the average `Peak Today` value per genre.
  - Filters to include only the top 10 genres and sorts by average `Peak Today` in descending order.
- **Top 10 Free and Paid Games by Peak Today**:
  - Identifies the top 10 free games (`Price = 0`) and top 10 paid games (`Price > 0`) based on `Peak Today`.
- **Top Genres in Free and Paid Games**:
  - Filters free and paid games separately.
  - Counts genre occurrences in each subset and extracts the top 10 genres for free and paid games.

#### 4. Visualization and Create a Dash App
The visualizations are created using Plotly Express with a dark theme and arranged in the following sequence:

- **Top Genre Tags**: A bar chart of the top 10 overall game genres.
- **Top Genre Tags and Game Price**: A bar chart showing the average game price per top genre.
- **Top Genre Tags and Peak Today**: A bar chart displaying the average `Peak Today` value per top genre.
- **Distribution of Genre Tags**: A histogram illustrating the distribution of game counts across genres.
- **Top Genres in Free Games**: A bar chart of the top 10 genres in free games.
- **Top Genres in Paid Games**: A bar chart of the top 10 genres in paid games.
- **Top 10 Free to Play Games by Peak Today**: A bar chart for the free games with the highest peak player counts.
- **Top 10 Paid Games by Peak Today**: A bar chart for the paid games with the highest peak player counts.
- **Distribution of Current Players by Game Price**: A histogram aggregating current players by price.
- **Relationship Between Price and Current Players**: A scatter plot with marginal histograms and a rug plot to show the correlation between price and current players.

### Choice of Visualization Types
- **Bar Charts**: Used for categorical comparisons (e.g., genre popularity, price, and player count). Bar charts are ideal for clearly showing the differences in frequency and averages across categories.
- **Histograms**: Used for distribution analysis (e.g., genre distribution, player distribution by price). Histograms provide insights into how values are spread across different categories.
- **Scatter Plots with Marginal Histograms**: Used for correlation analysis (e.g., price vs. current players). This visualization effectively highlights outliers and patterns in relationships between numerical features.

Finally, all these visualizations are integrated into a Dash application with a dark theme. The app layout is defined using Dash components and runs in debug mode for development.


### 1. Import libraries ###

In [1]:
import pandas as pd
import plotly.express as px
from dash import Dash, dcc, html
from collections import Counter

### 2. Load and clean dataset ###

In [2]:
df = pd.read_csv('Steam Top 100 Played Games - List.csv')
df

Unnamed: 0,Rank,Name,Thumbnail URL,Store Link,Price,Current Players,Peak Today,Genre Tags
0,1,Counter-Strike 2,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/730/Counter...,Free To Play,1485535,1489929,"FPS, Shooter, Multiplayer, Competitive, Action..."
1,2,PUBG: BATTLEGROUNDS,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/578080/PUBG...,Free To Play,765150,765150,"Survival, Shooter, Battle Royale, Multiplayer,..."
2,3,Dota 2,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/570/Dota_2?...,Free To Play,698757,715295,"Free to Play, MOBA, Multiplayer, Strategy, eSp..."
3,4,Marvel Rivals,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/2767030/Mar...,Free To Play,312427,565653,"Free to Play, Multiplayer, Hero Shooter, Third..."
4,5,Path of Exile 2,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/2694490/Pat...,£23.93,258475,288757,"Action RPG, Hack and Slash, RPG, Action, Souls..."
...,...,...,...,...,...,...,...,...
95,96,Sid Meier's Civilization® V,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/8930/Sid_Me...,£19.99,17916,21754,"Turn-Based Strategy, Strategy, Turn-Based, Mul..."
96,97,Counter-Strike,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/10/CounterS...,£7.19,17900,19275,"Action, FPS, Multiplayer, Shooter, Classic, Te..."
97,98,Cities: Skylines,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/255710/Citi...,£24.99,17866,18067,"City Builder, Simulation, Building, Management..."
98,99,Fallout 4,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/377160/Fall...,£15.99,17009,20939,"Open World, Post-apocalyptic, Singleplayer, RPG"


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Rank             100 non-null    int64 
 1   Name             100 non-null    object
 2   Thumbnail URL    100 non-null    object
 3   Store Link       100 non-null    object
 4   Price            100 non-null    object
 5   Current Players  100 non-null    object
 6   Peak Today       100 non-null    object
 7   Genre Tags       100 non-null    object
dtypes: int64(1), object(7)
memory usage: 6.4+ KB


Data cleaning and transformation

In [4]:
# a) Price:
#    - Replace 'Free To Play' with '0.0'
#    - Remove the '£' symbol (Unicode: \u00a3)
#    - Convert the resulting values to float
df['Price'] = df['Price'].replace('Free To Play', '0.0').str.replace("\u00a3", "", regex=False).astype(float)

# b) Current Players:
#    - Remove commas from the number strings
#    - Convert the resulting strings to integer
df['Current Players'] = df['Current Players'].str.replace(',', '').astype(int)

# c) Peak Today:
#    - Remove commas from the number strings
#    - Convert the resulting strings to integer
df['Peak Today'] = df['Peak Today'].str.replace(',', '').astype(int)

# d) Genre Tags:
#    - Split the genre tags string by comma to get a list of tags
#    - Remove the '+' symbol and extra spaces from each tag
#    - Filter out any empty strings from the resulting list
df['Genre Tags'] = df['Genre Tags'].apply(
    lambda x: [tag.replace('+', '').strip() for tag in x.split(',') if tag.replace('+', '').strip() != '']
)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Rank             100 non-null    int64  
 1   Name             100 non-null    object 
 2   Thumbnail URL    100 non-null    object 
 3   Store Link       100 non-null    object 
 4   Price            100 non-null    float64
 5   Current Players  100 non-null    int32  
 6   Peak Today       100 non-null    int32  
 7   Genre Tags       100 non-null    object 
dtypes: float64(1), int32(2), int64(1), object(4)
memory usage: 5.6+ KB


In [5]:
df.head()

Unnamed: 0,Rank,Name,Thumbnail URL,Store Link,Price,Current Players,Peak Today,Genre Tags
0,1,Counter-Strike 2,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/730/Counter...,0.0,1485535,1489929,"[FPS, Shooter, Multiplayer, Competitive, Actio..."
1,2,PUBG: BATTLEGROUNDS,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/578080/PUBG...,0.0,765150,765150,"[Survival, Shooter, Battle Royale, Multiplayer..."
2,3,Dota 2,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/570/Dota_2?...,0.0,698757,715295,"[Free to Play, MOBA, Multiplayer, Strategy, eS..."
3,4,Marvel Rivals,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/2767030/Mar...,0.0,312427,565653,"[Free to Play, Multiplayer, Hero Shooter, Thir..."
4,5,Path of Exile 2,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/2694490/Pat...,23.93,258475,288757,"[Action RPG, Hack and Slash, RPG, Action, Soul..."


### 3. Analize datase ###

Genre Tags Analysis

In [6]:
# Flatten the list of genre tags from all games and count their occurrences using Counter
genre_counts = Counter([genre for sublist in df['Genre Tags'] for genre in sublist])
# Create a DataFrame for genre counts and sort in descending order
genre_df = pd.DataFrame(genre_counts.items(), columns=['Genre', 'Count']).sort_values(by='Count', ascending=False)
top_genres = genre_df.head(10)
top_genres

Unnamed: 0,Genre,Count
2,Multiplayer,71
4,Action,64
86,Singleplayer,55
54,Open World,45
11,Co-op,42
48,Adventure,39
25,Simulation,38
68,Sandbox,32
8,First-Person,32
12,Strategy,31


Analysis of Top Genre Tags and Game Price

In [7]:
# Analysis of Top Genre Tags and Game Price
# Explode the 'Genre Tags' column so that each tag becomes a separate row, then group by tag
# Calculate the average game price for each genre
top_genre_prices = df.explode("Genre Tags").groupby("Genre Tags")["Price"].mean().reset_index()
# Filter the result to include only the top 10 genres and sort by Price in descending order
top_genre_prices = top_genre_prices[top_genre_prices["Genre Tags"].isin(top_genres["Genre"])].sort_values(by="Price", ascending=False)

top_genre_prices

Unnamed: 0,Genre Tags,Price
158,Open World,24.830889
194,Sandbox,22.9475
200,Singleplayer,22.144727
199,Simulation,21.910263
210,Strategy,20.670968
14,Adventure,18.823333
146,Multiplayer,18.329577
46,Co-op,17.520238
9,Action,16.31875
93,First-Person,15.532188


Analysis of Top Genre Tags and Peak Today

In [8]:
# Compute the average 'Peak Today' value for each genre
top_genre_peak = df.explode("Genre Tags").groupby("Genre Tags")["Peak Today"].mean().reset_index()
# Filter the result to include only the top 10 genres and sort by Peak Today in descending order
top_genre_peak = top_genre_peak[top_genre_peak["Genre Tags"].isin(top_genres["Genre"])].sort_values(by="Peak Today", ascending=False)
top_genre_peak['Peak Today'] = top_genre_peak['Peak Today'].astype(int)
top_genre_peak

Unnamed: 0,Genre Tags,Peak Today
210,Strategy,146217
46,Co-op,134206
93,First-Person,120666
9,Action,112357
146,Multiplayer,105275
199,Simulation,85193
14,Adventure,58098
194,Sandbox,54126
158,Open World,53417
200,Singleplayer,43919


Analysis of Top 10 Free to Play Games by Peak Today

In [9]:
# Filter the dataset for free games (Price equals 0)
# Select the top 10 free games based on the 'Peak Today' value
top_free_games = df[df["Price"] == 0].nlargest(10, "Peak Today")
top_free_games

Unnamed: 0,Rank,Name,Thumbnail URL,Store Link,Price,Current Players,Peak Today,Genre Tags
0,1,Counter-Strike 2,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/730/Counter...,0.0,1485535,1489929,"[FPS, Shooter, Multiplayer, Competitive, Actio..."
1,2,PUBG: BATTLEGROUNDS,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/578080/PUBG...,0.0,765150,765150,"[Survival, Shooter, Battle Royale, Multiplayer..."
2,3,Dota 2,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/570/Dota_2?...,0.0,698757,715295,"[Free to Play, MOBA, Multiplayer, Strategy, eS..."
3,4,Marvel Rivals,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/2767030/Mar...,0.0,312427,565653,"[Free to Play, Multiplayer, Hero Shooter, Thir..."
6,7,NARAKA: BLADEPOINT,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/1203220/NAR...,0.0,201876,208643,"[Battle Royale, Multiplayer, Martial Arts, PvP..."
9,10,Apex Legends™,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/1172470/Ape...,0.0,150754,151447,"[Free to Play, Battle Royale, Multiplayer, FPS..."
12,13,Delta Force,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/2507950/Del...,0.0,111888,111888,"[Free to Play, FPS, Multiplayer, Extraction Sh..."
22,23,Banana,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/2923300/Ban...,0.0,67467,101034,"[Free to Play, Clicker, Singleplayer, 2D, Casu..."
13,14,War Thunder,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/236390/War_...,0.0,93436,95217,"[Free to Play, Simulation, Vehicular Combat, C..."
23,24,Call of Duty®,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/1938090/Cal...,0.0,66028,92839,"[FPS, Multiplayer, Shooter, Singleplayer, Action]"


Analysis of Top 10 Paid Games by Peak Today

In [10]:
# Filter the dataset for paid games (Price greater than 0)
# Select the top 10 paid games based on the 'Peak Today' value
top_paid_games = df[df["Price"] > 0].nlargest(10, "Peak Today")
top_paid_games

Unnamed: 0,Rank,Name,Thumbnail URL,Store Link,Price,Current Players,Peak Today,Genre Tags
4,5,Path of Exile 2,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/2694490/Pat...,23.93,258475,288757,"[Action RPG, Hack and Slash, RPG, Action, Soul..."
5,6,Grand Theft Auto V,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/271590/Gran...,14.8,207117,207117,"[Open World, Action, Multiplayer, Crime, Mature]"
7,8,Rust,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/252490/Rust...,34.99,168002,205358,"[Survival, Crafting, Multiplayer, Open World, ..."
8,9,Palworld,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/1623730/Pal...,24.99,165613,165613,"[Open World, Survival, Creature Collector, Mul..."
10,11,Wallpaper Engine,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/431960/Wall...,4.29,142322,144523,"[Mature, Utilities, Anime, Software, Design & ..."
11,12,Stardew Valley,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/413150/Star...,10.99,117949,119555,"[Farming Sim, Pixel Graphics, Life Sim, Multip..."
14,15,Baldur's Gate 3,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/1086940/Bal...,49.99,84290,108839,"[RPG, Character Customisation. Choices Matter]"
19,20,Tom Clancy's Rainbow Six® Siege,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/359550/Tom_...,17.99,71468,81646,"[FPS, PvP, eSports, Multiplayer, Shooter, Tact..."
26,27,HELLDIVERS™ 2,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/553850/HELL...,34.99,59489,81203,"[Online Co-Op, PvE, Multiplayer, Action, Co-op]"
15,16,Football Manager 2024,https://shared.cloudflare.steamstatic.com//sto...,https://store.steampowered.com/app/2252570/Foo...,17.99,75905,76937,"[Simulation, Sports, Strategy, Management, Foo..."


Analysis of Top Genres in Free Games

In [15]:
# Filter the dataset for free games and count genre occurrences
free_games = df[df['Price'] == 0]
free_genre_counts = Counter([genre for sublist in free_games['Genre Tags'] for genre in sublist])
free_genre_df = pd.DataFrame(free_genre_counts.items(), columns=["Genre", "Count"]).sort_values(by="Count", ascending=False)
top_free_genres = free_genre_df.head(10)
top_free_genres

Unnamed: 0,Genre,Count
27,Free to Play,27
2,Multiplayer,24
4,Action,23
9,PvP,15
1,Shooter,13
11,Co-op,13
0,FPS,12
8,First-Person,12
44,Massively Multiplayer,11
59,Singleplayer,10


Analysis of Top Genres in Paid Games

In [12]:
# Filter the dataset for paid games and count genre occurrences
paid_games = df[df['Price'] > 0]
paid_genre_counts = Counter([genre for sublist in paid_games["Genre Tags"] for genre in sublist])
paid_genre_df = pd.DataFrame(paid_genre_counts.items(), columns=["Genre", "Count"]).sort_values(by="Count", ascending=False)
top_paid_genres = paid_genre_df.head(10)
top_paid_genres

Unnamed: 0,Genre,Count
11,Multiplayer,47
50,Singleplayer,45
3,Action,41
20,Open World,37
35,Simulation,29
32,Co-op,29
9,Adventure,29
27,Sandbox,28
67,Strategy,22
2,RPG,21


### 4. Create a Dash app ###

Create visualizations using Plotly Express with a dark theme

In [16]:
# Visualization of Top Genre Tags
top_genre_fig = px.bar(
    top_genres,
    x='Genre',
    y='Count',
    title='Top 10 Game Genres',
    labels={
        'Genre': 'Game Genre',
        'Count': 'Number of Games'
    },
    template='plotly_dark'
)

# Visualization of Top Genre Tags and Game Price
top_genre_price_fig = px.bar(
    top_genre_prices,
    x='Genre Tags',
    y='Price',
    title='Average Game Price by Top Genre Tags',
    labels={
        'Genre Tags': 'Game Genre',
        'Price': 'Average Price (GBP)'
    },
    template='plotly_dark'
)

# Visualization of Top Genre Tags and Peak Today
top_genre_peak_fig = px.bar(
    top_genre_peak,
    x='Genre Tags',
    y='Peak Today',
    title='Average Peak Today by Top Genre Tags',
    labels={
        'Genre Tags': 'Game Genre',
        'Peak Today': 'Average Peak Players'
    },
    template='plotly_dark'
)

# Distribution of Genre Tags
genre_fig = px.histogram(
    genre_df,
    x='Count',
    title='Distribution of Game Genre Counts',
    labels={'Count': 'Number of Games'},
    template='plotly_dark'
)

# Visualization of Top 10 Free to Play Games by Peak Today
top_free_games_fig = px.bar(
    top_free_games,
    x='Name',
    y='Peak Today',
    title='Top 10 Free To Play Games by Peak Today',
    labels={
        'Name': 'Game Title',
        'Peak Today': 'Peak Players Today'
    },
    template='plotly_dark'
)

# Visualization of Top 10 Paid Games by Peak Today
top_paid_games_fig = px.bar(
    top_paid_games,
    x='Name',
    y='Peak Today',
    title='Top 10 Paid Games by Peak Today',
    labels={
        'Name': 'Game Title',
        'Peak Today': 'Peak Players Today'
    },
    template='plotly_dark'
)

# e) Visualization of Top Genres in Free Games (Bar chart)
top_free_genres_fig = px.bar(
    top_free_genres,
    x='Genre',
    y='Count',
    title='Top 10 Game Genres in Free Games',
    labels={
        'Genre': 'Game Genre',
        'Count': 'Number of Games'
    },
    template='plotly_dark'
)

# f) Visualization of Top Genres in Paid Games (Bar chart)
top_paid_genres_fig = px.bar(
    top_paid_genres,
    x='Genre',
    y='Count',
    title='Top 10 Game Genres in Paid Games',
    labels={
        'Genre': 'Game Genre',
        'Count': 'Number of Games'
    },
    template='plotly_dark'
)

# Histogram of players by price
players_by_price_fig = px.histogram(
    df,
    x='Price',
    y='Current Players',
    histfunc='sum',
    title='Distribution of Current Players by Game Price',
    labels={
        'Price': 'Price (in GBP)',
        'Current Players': 'Current Players'
    },
    nbins=20,
    template='plotly_dark'
)

# Scatter plot with marginal distributions
marginal_distributions = px.scatter(
    df,
    x='Price',
    y='Current Players',
    marginal_x='histogram',
    marginal_y='rug',
    title='Relationship Between Price and Current Players',
    labels={
        'Price': 'Price (in GBP)',
        'Current Players': 'Current Players'
    },
    template='plotly_dark'
)

In [18]:
# Dash App Setup
app = Dash(__name__)

app.layout = html.Div(
    children=[
        html.H1(
            'Steam Top 100 Played Games Dashboard',
            style={
                'textAlign': 'center',
                'color': 'white'
            }
        ),
        dcc.Graph(id="top-genre-bar", figure=top_genre_fig),
        dcc.Graph(id="top-genre-price", figure=top_genre_price_fig),
        dcc.Graph(id="top-genre-peak", figure=top_genre_peak_fig),
        dcc.Graph(id='genre-dist', figure=genre_fig),
        dcc.Graph(id='top-free-genres', figure=top_free_genres_fig),
        dcc.Graph(id='top_paid_genres_fig', figure=top_paid_genres_fig),
        dcc.Graph(id='top-free-games', figure=top_free_games_fig),
        dcc.Graph(id='top-paid-games', figure=top_paid_games_fig),
        dcc.Graph(id='histogram-price-players', figure=players_by_price_fig),
        dcc.Graph(id='scatter-price-players', figure=marginal_distributions)        
    ],
    style={
        'backgroundColor': '#111111',
        'color': 'white',
        'padding': '20px'
    }
)

# Run the Dash app in debug mode
if __name__ == "__main__":
    app.run_server(debug=True)