# Titre du projet ( a décider)

## Sommaire

1. Number of games by genre.
2. Most played games.
3. Correlation between release date and ~reviews/plays.

## Introduction

Ce projet a pour but d'analyser statistiquement une base de données de jeux vidéo, en répondant à des questions  sur leurs caractéristiques, leur popularité et leur évolution dans le temps. sur un large ensemble. Puis d’en extraire des informations clés sur leurs genres, leur popularité, et leur évolution temporelle ???.

In [None]:
# Install dependencies

#!pip install pandas matplotlib ipywidgets kaggle

1. Number of games by genre.
We use "Genres" columns.

In [None]:
# Loading data
import os
import pandas as pd

file_path = "Datasets/Data/backloggd_games.csv"

if os.path.exists(file_path):
    df = pd.read_csv(file_path)
    print(df.shape) # row and columns
    print(df.columns) # name/index
    df.head() # Preview

else:
    print("The file does not exist. Check the path or place the file in the correct location")

In [None]:
# Remove missing values, convert to string, and clean characters like [], '
all_genres = df['Genres'].dropna().astype(str).str.replace(r'[\[\]\']', '', regex=True)

# Split genres by comma, explode into rows, and strip whitespace
all_genres = all_genres.str.split(',').explode().str.strip()


# We remove empty genres 
all_genres = all_genres[all_genres != ""]

print(all_genres)


In [None]:
# Counting occurrences
genre_counts = all_genres.value_counts()
print("Number of games by genre :\n", genre_counts.head(10))

In [None]:
import matplotlib.pyplot as plt
from ipywidgets import IntSlider, interactive_output, VBox
from IPython.display import display 

# Widget to select how many genres to display
top_n_slider1 = IntSlider(value=10, min=1, max=23, step=1, description='Top Game Genres:')

# Graphical
def plot_top_n1(n):
    plt.figure(figsize=(12, 6))
    genre_counts.head(n).plot(kind='bar')
    plt.title(f"Top {n} Game Genres")
    plt.xlabel("Genre")
    plt.ylabel("Number of games")
    plt.xticks(rotation=0)
    plt.tight_layout()
    plt.show()


# Binding and display
display(VBox([top_n_slider1, interactive_output(plot_top_n1, {'n': top_n_slider1})]))

2. Most played games.
We use "Plays" and "Title" columns.

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
from ipywidgets import IntSlider, interactive_output, VBox
from IPython.display import display

df = pd.read_csv("Datasets/Data/backloggd_games.csv")

# Convert K and M abbreviations to integers
def parse_play_count(x):
    x = str(x).replace(',', '').strip().upper()
    if x.endswith('K'):
        return int(float(x[:-1]) * 1_000)
    elif x.endswith('M'):
        return int(float(x[:-1]) * 1_000_000)
    elif x.isdigit():
        return int(x)
    else:
        return None

# Parsing
df['Cumulative_number_plays'] = df['Plays'].apply(parse_play_count)
df = df[df['Title'].notna() & df['Cumulative_number_plays'].notna()]


In [None]:

# Widget to select how many genres to display
top_n_slider2 = IntSlider(value=10, min=1, max=100, step=1, description='Top Played Games:')

# Graphical
def plot_top_n2(n):
    top_df = df[['Title', 'Cumulative_number_plays']] \
        .sort_values(by='Cumulative_number_plays', ascending=False) \
        .head(n)

    plt.figure(figsize=(12, 10))
    plt.bar(top_df['Title'], top_df['Cumulative_number_plays'])
    plt.title(f"Top {n} Most played games")
    plt.xlabel("Title of games")
    plt.ylabel("Number of games")
    plt.xticks(rotation=45, ha='right')
    plt.tight_layout()
    plt.show()

# Binding and display
display(VBox([top_n_slider2, interactive_output(plot_top_n2, {'n': top_n_slider2})]))


VBox(children=(IntSlider(value=10, description='Top Played Games:', min=1), Output()))