# Gamer Engagement Classification
Authors: Cayke Felipe dos Anjos and James Warsing

<img src="images/eldenring.jpg" alt="Alt text" width="1200"/>

# Overview
This project analyzes gamer engagement data to offer strategic business recommendations for a new game studio. We aim to train statistical models in order to predict the most engaging gaming genres and difficulties for game production. Gaming engagement is very correlated to profitability as more players tend to bring new players in and it also allows more people to purchase in game features or Downloadable Content (DLC), incresing the revenue.  As result this project provides three business recommendations: what genres and dificulties should a future game have for a variety of gamer profiles.

## Business Problem

The company is expanding its portfolio by investing in a new game studio. Launching a new game in today's competitive entertainment industry requires a solid understanding of what drives game success and attracts audiences. The game industry is known for its substantial risks and high capital demands. Recent successes in games with high investment and higher return rate such as the incredibly difficult role playing game "Elden Ring" costing around $\$200$ millions but selling over 25 million copies and the action game "Grand Theft Auto V" which similarly costed around $\$265$ millions but is estimated to have sold almost $\$8$ billions are certainly a good example of how successful this industry can be. However, bad investments also do exist, like the first person shooting game "Immortals of Aveum", which costed $\$125$ millions but sold only around $\$2$ millions, which caused massive layoffs on the studio.

Our project aims to analyze a gamer engagement dataset. By using data analysis techniques and statistical modelling, we seek to predict the best features that correlate with high player engagement. The goal is to provide three concrete business recommendations that maximize engagement and lower business risks, ensuring a strong entry into the market.

Questions we tried to answer with analysis:
* What are the top features that correlate with gamer engagement?
* How different are the audiences and their engagement choices?
* What genres are most engaging for multiple audiences?

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [9]:
df = pd.read_csv('data/online_gaming_behavior_dataset.csv')
df.rename(columns={'GameGenre':'Genre','GameDifficulty':'Difficulty','PlayerLevel':'Level','EngagementLevel':'Engagement'}, inplace=True)
df

Unnamed: 0,PlayerID,Age,Gender,Location,Genre,PlayTimeHours,InGamePurchases,Difficulty,SessionsPerWeek,AvgSessionDurationMinutes,Level,AchievementsUnlocked,Engagement
0,9000,43,Male,Other,Strategy,16.271119,0,Medium,6,108,79,25,Medium
1,9001,29,Female,USA,Strategy,5.525961,0,Medium,5,144,11,10,Medium
2,9002,22,Female,USA,Sports,8.223755,0,Easy,16,142,35,41,High
3,9003,35,Male,USA,Action,5.265351,1,Easy,9,85,57,47,Medium
4,9004,33,Male,Europe,Action,15.531945,0,Medium,2,131,95,37,Medium
...,...,...,...,...,...,...,...,...,...,...,...,...,...
40029,49029,32,Male,USA,Strategy,20.619662,0,Easy,4,75,85,14,Medium
40030,49030,44,Female,Other,Simulation,13.539280,0,Hard,19,114,71,27,High
40031,49031,15,Female,USA,RPG,0.240057,1,Easy,10,176,29,1,High
40032,49032,34,Male,USA,Sports,14.017818,1,Medium,3,128,70,10,Medium


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40034 entries, 0 to 40033
Data columns (total 13 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   PlayerID                   40034 non-null  int64  
 1   Age                        40034 non-null  int64  
 2   Gender                     40034 non-null  object 
 3   Location                   40034 non-null  object 
 4   Genre                      40034 non-null  object 
 5   PlayTimeHours              40034 non-null  float64
 6   InGamePurchases            40034 non-null  int64  
 7   Difficulty                 40034 non-null  object 
 8   SessionsPerWeek            40034 non-null  int64  
 9   AvgSessionDurationMinutes  40034 non-null  int64  
 10  Level                      40034 non-null  int64  
 11  AchievementsUnlocked       40034 non-null  int64  
 12  Engagement                 40034 non-null  object 
dtypes: float64(1), int64(7), object(5)
memory usag

In [5]:
df['Location'].value_counts()

Location
USA       16000
Europe    12004
Asia       8095
Other      3935
Name: count, dtype: int64

In [7]:
df['GameGenre'].value_counts()

GameGenre
Sports        8048
Action        8039
Strategy      8012
Simulation    7983
RPG           7952
Name: count, dtype: int64

In [12]:
df['Gender'].value_counts()

Gender
Male      23959
Female    16075
Name: count, dtype: int64

In [13]:
df['Difficulty'].value_counts()

Difficulty
Easy      20015
Medium    12011
Hard       8008
Name: count, dtype: int64

In [14]:
df['Engagement'].value_counts()

Engagement
Medium    19374
High      10336
Low       10324
Name: count, dtype: int64

In [15]:
df[['Gender','Difficulty','Engagement']] = df[['Gender','Difficulty','Engagement']].astype('category')
df['Gender'] = df['Gender'].cat.reorder_categories(['Male','Female'])
df['Difficulty'] = df['Difficulty'].cat.reorder_categories(['Easy','Medium','Hard'])
df['Engagement'] = df['Engagement'].cat.reorder_categories(['Low','Medium','High'])

In [16]:
df['Gender'].cat.codes

0        0
1        1
2        1
3        0
4        0
        ..
40029    0
40030    1
40031    1
40032    0
40033    0
Length: 40034, dtype: int8

In [17]:
df['Difficulty'].cat.codes

0        1
1        1
2        0
3        0
4        1
        ..
40029    0
40030    2
40031    0
40032    1
40033    0
Length: 40034, dtype: int8

In [18]:
df['Engagement'].cat.codes

0        1
1        1
2        2
3        1
4        1
        ..
40029    1
40030    2
40031    2
40032    1
40033    1
Length: 40034, dtype: int8