<div><h1><center>Among Us - Exploratory Data Analysis 🎮🚀</center></h1></div>

## OVERVIEW 🌟
This Notebook contains an exploratory data analysis (EDA) of gameplay statistics from the popular multiplayer game, **Among Us**. Originally released in 2018, the game surged in popularity during the summer of 2020, leading to a wealth of data regarding player performance and game dynamics.

<img src = "https://wonder-day.com/wp-content/uploads/2020/10/wonder-day-among-us-wallpapers-21.jpg" style = "width:900px;">

## DATASET DESCRIPTION 📊
The dataset includes the following columns:

- **Game Completed Date**: 🗓️ The date when the game was completed.
- **Team**: 🤝 The team of the player (Crewmate or Imposter).
- **Outcome**: 🏆 The result of the game (Win, Loss, etc.).
- **Tasks Completed**: ✅ The number of tasks completed by a Crewmate during the game.
- **All Tasks Completed**: ✔️ A boolean indicating whether all tasks were completed by the Crewmates during the game.
- **Murdered**: ⚰️ A boolean indicating whether a Crewmate was murdered.
- **Imposter Kills**: 🔪 The number of Crewmates killed by Imposters during the game.
- **Game Length**: ⏳ The total duration of the game.
- **Ejected**: 🚪 A boolean indicating whether a player was ejected during the game.
- **Sabotages Fixed**: 🔧 The number of sabotages that were fixed by Crewmates.
- **Time to Complete All Tasks**: ⏲️ The total time taken by Crewmates to complete all tasks.
- **Rank Change**: 📈 The change in competitive rank after playing three games, influenced by performance in each game.
- **Region/Game Code**: 🌍 The server region and game code of the match.

---

> ## IMPORTING LIBRARIES 📂

In [1]:
import numpy as np
import pandas as pd

import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt

import glob

In [2]:
path = '../input/among-us-dataset' 
all_files = glob.glob(path + "/*.csv")

In [3]:
li = []
idx = 0
for file_path in all_files:
    idx += 1
    df = pd.read_csv(filepath_or_buffer = file_path)
    df['user_id'] = idx
    li.append(df)

df = pd.concat(li)
df.head()

Unnamed: 0,Game Completed Date,Team,Outcome,Task Completed,All Tasks Completed,Murdered,Imposter Kills,Game Length,Ejected,Sabotages Fixed,Time to complete all tasks,Rank Change,Region/Game Code,user_id
0,01/04/2021 at 11:41:52 pm EST,Crewmate,Loss,2,No,Yes,-,03m 39s,No,0.0,-,,NA / JRMRHF,1
1,01/04/2021 at 11:37:30 pm EST,Imposter,Win,-,-,-,1,08m 27s,No,,-,+++,NA / JRMRHF,1
2,01/04/2021 at 11:28:21 pm EST,Crewmate,Loss,3,No,No,-,09m 35s,No,0.0,-,---,NA / JRMRHF,1
3,01/04/2021 at 11:17:55 pm EST,Crewmate,Win,6,No,No,-,08m 35s,No,0.0,-,++,NA / JRMRHF,1
4,01/04/2021 at 11:08:40 pm EST,Crewmate,Loss,10,Yes,Yes,-,19m 51s,No,0.0,12m 58s,--,NA / JRMRHF,1


---

> ## DATA PREPROCESSING 👨🏻‍🔧

In [4]:
df[['Region', 'Game_Code']] = df['Region/Game Code'].str.split(' / ', expand = True)
df.drop(columns = ['Region/Game Code'], inplace = True)

df.reset_index(drop = True, inplace = True)
df.head()

Unnamed: 0,Game Completed Date,Team,Outcome,Task Completed,All Tasks Completed,Murdered,Imposter Kills,Game Length,Ejected,Sabotages Fixed,Time to complete all tasks,Rank Change,user_id,Region,Game_Code
0,01/04/2021 at 11:41:52 pm EST,Crewmate,Loss,2,No,Yes,-,03m 39s,No,0.0,-,,1,,JRMRHF
1,01/04/2021 at 11:37:30 pm EST,Imposter,Win,-,-,-,1,08m 27s,No,,-,+++,1,,JRMRHF
2,01/04/2021 at 11:28:21 pm EST,Crewmate,Loss,3,No,No,-,09m 35s,No,0.0,-,---,1,,JRMRHF
3,01/04/2021 at 11:17:55 pm EST,Crewmate,Win,6,No,No,-,08m 35s,No,0.0,-,++,1,,JRMRHF
4,01/04/2021 at 11:08:40 pm EST,Crewmate,Loss,10,Yes,Yes,-,19m 51s,No,0.0,12m 58s,--,1,,JRMRHF


In [5]:
df['Game Completed Date'].str.split(' ', expand = True)[4].unique()

array(['EST'], dtype=object)

In [6]:
# Date format - MM/DD/YYYY
df_fetch_datetime = df['Game Completed Date'].str.split(' ', expand = True)

date = []
time = []
for idx, val in enumerate(df_fetch_datetime.iterrows()):
    date.append(pd.to_datetime(val[1][0]).strftime('%y-%m-%d'))
    time.append(pd.to_datetime(val[1][2]).strftime('%H:%M:%S'))

df['Game date'] = date
df['Game time'] = time

df.drop(columns = 'Game Completed Date', inplace = True)

df.head()

Unnamed: 0,Team,Outcome,Task Completed,All Tasks Completed,Murdered,Imposter Kills,Game Length,Ejected,Sabotages Fixed,Time to complete all tasks,Rank Change,user_id,Region,Game_Code,Game date,Game time
0,Crewmate,Loss,2,No,Yes,-,03m 39s,No,0.0,-,,1,,JRMRHF,21-01-04,11:41:52
1,Imposter,Win,-,-,-,1,08m 27s,No,,-,+++,1,,JRMRHF,21-01-04,11:37:30
2,Crewmate,Loss,3,No,No,-,09m 35s,No,0.0,-,---,1,,JRMRHF,21-01-04,11:28:21
3,Crewmate,Win,6,No,No,-,08m 35s,No,0.0,-,++,1,,JRMRHF,21-01-04,11:17:55
4,Crewmate,Loss,10,Yes,Yes,-,19m 51s,No,0.0,12m 58s,--,1,,JRMRHF,21-01-04,11:08:40


In [7]:
df['Imposter Kills'] = df['Imposter Kills'].replace('-', 0).astype('int')
df['Task Completed'] = df['Task Completed'].replace('-', 0).astype('int')

---

<div>
    <h1><center>📊 Exploratory Data Analysis</center></h1>
</div>   

> ## Individual Analysis - Time to complete all tasks vary between successful and failed games 👍🏻👎🏻

In [8]:
def user_task_completion(user_id : int) -> None:
    user_data = df.copy()[df['user_id'] == user_id]
    first_match_id = user_data.index.min()
    
    user_data = user_data[user_data['All Tasks Completed'] == 'Yes']
    user_data.index += 1
    
    # Preprocessing TTCAT
    temp = "00:" + user_data['Time to complete all tasks'].str.replace(r'(\d+)m\s*(\d+)s?', r'\1:\2', regex=True).str.replace(' ', '')
    all_task_completion_time = []
    for mins in temp:
        all_task_completion_time.append(pd.to_timedelta(mins).total_seconds() / 60)
    user_data['Time to complete all tasks'] = all_task_completion_time
    
    # Win Games
    user_win = user_data[user_data['Outcome'] == 'Win']
    # Lost Games
    user_lost = user_data[user_data['Outcome'] == 'Loss']
    
    # Plotting
    fig = go.Figure()
    # User Win
    fig.add_trace(go.Scatter(
        x = user_win.index - first_match_id,
        y = user_win['Time to complete all tasks'],
        mode = 'markers+lines',
        marker = dict(size = 12),
        name = 'User Win'
    ))
    
    # User Lost
    fig.add_trace(go.Scatter(
        x = user_lost.index - first_match_id,
        y = user_lost['Time to complete all tasks'],
        mode = 'markers+lines',
        marker = dict(size = 12),
        name = 'User Lost'
    ))
    fig.update_layout(
            title=f"Player - {user_id} : Time to complete all tasks vary between successful and failed games 👍🏻👎🏻",
            title_x = 0.5,
            xaxis=dict(title="Game Number", tickvals=np.arange(1, 101), ),
            yaxis=dict(title="Time To Complete All Tasks (MIN)"),
            legend=dict(title="Game Outcome"),
            template="plotly_white",height=600,width=1050)
    fig.show()

> 📌 Note : Pass the `Player ID` to a function to analyze.

In [9]:
user_task_completion(6)

---

> ## Crewmate: Accelerated Task Completion for Victory 🏆 | Imposter: The Ultimate Killing Machine 🔪

In [10]:
# Performing gropu_by
crewmate_all_tasks = df.copy()[(df['All Tasks Completed'] == 'Yes') & (df['Outcome'] == 'Win')].groupby(by = ['user_id'])['Outcome'].count()
imposter_kills = df.copy()[(df['Team'] == 'Imposter')].groupby(by = 'user_id')['Imposter Kills'].sum()

from plotly.subplots import make_subplots
fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])

pull = np.zeros(29)
pull[7] = 0.3
fig.add_trace(go.Pie(labels=crewmate_all_tasks.index, values=crewmate_all_tasks.values, name='All Task completed - Wining Count', pull = pull), 1, 1)

pull[7], pull[11] = 0, 0.3
fig.add_trace(go.Pie(labels=imposter_kills.index, values=imposter_kills.values, name='Imposter Kills', pull = pull), 1, 2)

fig.update_traces(hole=.4, hoverinfo=None, textposition='inside')

fig.update_layout(
    title_text="Crewmate: Accelerated Task Completion for Victory 🏆 | Imposter: The Ultimate Killing Machine 🔪",
    title_x = 0.5,
    height=600,
    width=1050,
    # Add annotations in the center of the donut pies.
    annotations=[dict(text='Pro Crewmate', x=sum(fig.get_subplot(1, 1).x) / 2, y=0.5,
                      font_size=14, showarrow=False, xanchor="center"),
                 dict(text='Pro Imposter', x=sum(fig.get_subplot(1, 2).x) / 2, y=0.5,
                      font_size=14, showarrow=False, xanchor="center")])
fig.show()

---

> ## Ejection Analysis: Crewmates vs. Imposters ⏏️📊

In [11]:
crue_ej = []
crue_not_ej = []
imp_ej = []
imp_not_ej = []

for i in range(1, len(df['user_id'].unique()) + 1):
    user = df[df['user_id'] == i]
    
    # Cruemate
    crue_team_ej= user[(user['Team'] == 'Crewmate') & (user['Ejected'] == 'Yes')]
    crue_team_not_ej= user[(user['Team'] == 'Crewmate') & (user['Ejected'] == 'No')]
    crue_ej.append(crue_team_ej.shape[0])
    crue_not_ej.append(crue_team_not_ej.shape[0])

    # Imposter
    imp_team = user[(user['Team'] == 'Imposter') & (user['Ejected'] == 'Yes')]
    imp_team_not_ej = user[(user['Team'] == 'Imposter') & (user['Ejected'] == 'No')]
    imp_ej.append(imp_team.shape[0])
    imp_not_ej.append(imp_team_not_ej.shape[0])

# Creating a DataFrame
temp_df = pd.DataFrame({'crumate ejection': crue_ej, 'imposter ejection' : imp_ej, 'crumate no ejection': crue_not_ej, 'imposter no ejection': imp_not_ej})

In [12]:
fig = make_subplots(rows=1, cols=2, subplot_titles=("Crewmates", "Imposters"))

fig.add_trace(go.Scatter(
    x = temp_df.index + 1,
    y = temp_df['crumate no ejection'],
    mode = 'markers+lines',
    marker = dict(size = 10),
    name = 'Not ejected'
), 1, 1)
fig.add_trace(go.Scatter(
    x = temp_df.index + 1,
    y = temp_df['crumate ejection'],
    mode = 'markers+lines',
    marker = dict(size = 10),
    name = 'Ejected'
), 1, 1)

fig.update_xaxes(title_text="Players", tickvals=temp_df.index + 1, row=1, col=1)
fig.update_yaxes(title_text="Number of times Ejected", row=1, col=1)

fig.add_trace(go.Scatter(
    x = temp_df.index + 1,
    y = temp_df['imposter no ejection'],
    mode = 'markers+lines',
    marker = dict(size = 10),
    name = 'Not Ejected'
), 1, 2)
fig.add_trace(go.Scatter(
    x = temp_df.index + 1,
    y = temp_df['imposter ejection'],
    mode = 'markers+lines',
    marker = dict(size = 10),
    name = 'Ejected'
), 1, 2)

fig.update_xaxes(title_text="Players", tickvals=temp_df.index + 1, row=1, col=2)
fig.update_yaxes(title_text="Number of times Ejected", row=1, col=2)

fig.update_layout(
        title="Ejection Analysis: Crewmates vs. Imposters ⏏️📊",
        title_x = 0.45,
        legend=dict(title="Total times Ejected"),
        template="plotly_white",height=700,width=1500)

fig.show()

---

> ## Players' Winning vs. Losing Statistics 📊🏅

In [13]:
win, loss = [], []
for user in df['user_id'].unique():
    X = df.copy()[df['user_id'] == user]['Outcome'].value_counts()
    win.append(X['Win'])
    loss.append(X['Loss'])

win_loss_df = pd.DataFrame({'win_count': win, 'loss_count': loss})
win_loss_df['Player'] = 'Palyer - ' + (win_loss_df.index + 1).astype('str')

In [14]:
fig = go.Figure()

fig.add_trace(go.Bar(x=win_loss_df['win_count'], y=win_loss_df['Player'], orientation='h', name='Win'))

fig.add_trace(go.Bar(x= -win_loss_df['loss_count'], y=win_loss_df['Player'], orientation='h', name='Loss'))

fig.update_layout(
    title = "Players' Winning vs. Losing Statistics 📊🏅",
    title_x=0.45,
    xaxis = dict(title = 'Win/Loss Count'),
    yaxis = dict(title = 'Players'),
    barmode='relative',  # Use 'relative' to stack bars next to each other
    bargap=0.5,         # Adjust the gap between bars
    width = 1050,
    height = 700,
    template="plotly_white"
)
              
fig.show()

---

> ## Imposters' Win-Loss Statistics Analysis 📊🏅

In [15]:
win, loss = [], []
for user in df['user_id'].unique():
    X = df.copy()[(df['user_id'] == user) & (df['Team'] == 'Imposter')]['Outcome'].value_counts()
    win.append(X['Win'])
    loss.append(X['Loss'])
    
win_loss_df = pd.DataFrame({'win_count': win, 'loss_count': loss})
win_loss_df['Player'] = 'Palyer - ' + (win_loss_df.index + 1).astype('str')

In [16]:
fig = go.Figure()

fig.add_trace(go.Bar(x=win_loss_df['win_count'], y=win_loss_df['Player'], orientation='h', name='Win'))

fig.add_trace(go.Bar(x= -win_loss_df['loss_count'], y=win_loss_df['Player'], orientation='h', name='Loss'))

fig.update_layout(
    title = "Imposters' Win-Loss Statistics Analysis 📊🏅",
    title_x=0.45,
    xaxis = dict(title = 'Win/Loss Count'),
    yaxis = dict(title = 'Players'),
    barmode='relative',  # Use 'relative' to stack bars next to each other
    bargap=0.5,         # Adjust the gap between bars
    width = 1050,
    height = 700,
    template="plotly_white"
)
              
fig.show()

---

> ## Crewmates' Win-Loss Statistics Analysis 📊🏅

In [17]:
win, loss = [], []
for user in df['user_id'].unique():
    X = df.copy()[(df['user_id'] == user) & (df['Team'] == 'Crewmate')]['Outcome'].value_counts()
    win.append(X['Win'])
    loss.append(X['Loss'])
    
win_loss_df = pd.DataFrame({'win_count': win, 'loss_count': loss})
win_loss_df['Player'] = 'Palyer - ' + (win_loss_df.index + 1).astype('str')

In [18]:
fig = go.Figure()

fig.add_trace(go.Bar(x=win_loss_df['win_count'], y=win_loss_df['Player'], orientation='h', name='Win'))

fig.add_trace(go.Bar(x= -win_loss_df['loss_count'], y=win_loss_df['Player'], orientation='h', name='Loss'))

fig.update_layout(
    title = "Crewmates' Win-Loss Statistics Analysis 📊🏅",
    title_x=0.45,
    xaxis = dict(title = 'Win/Loss Count'),
    yaxis = dict(title = 'Players'),
    barmode='relative',  # Use 'relative' to stack bars next to each other
    bargap=0.5,         # Adjust the gap between bars
    width = 1050,
    height = 700,
    template="plotly_white"
)
              
fig.show()

---

> ## Overall Win-Loss Statistics for Crewmates and Imposters 📊🏅

In [19]:
X = df.copy()[df['Team'] == 'Crewmate']['Outcome'].value_counts()
y = df.copy()[df['Team'] == 'Imposter']['Outcome'].value_counts()
    
win_loss_df = pd.DataFrame([[X['Win'], X['Loss']], [y['Win'], y['Loss']]])
win_loss_df.index = ['Crewmate', 'Imposter']

In [20]:
fig = go.Figure()

fig.add_trace(go.Bar(x=win_loss_df.index, y=win_loss_df[0], orientation='v', name='Win', width = 0.45))

fig.add_trace(go.Bar(x= win_loss_df.index, y=win_loss_df[1], orientation='v', name='Loss', width = 0.45))

fig.update_layout(
    title = "Overall Win-Loss Statistics for Crewmates and Imposters 📊🏅",
    title_x=0.45,
    xaxis = dict(title = 'Team'),
    yaxis = dict(title = 'Count'),
#     barmode='relative',  # Use 'relative' to stack bars next to each other
    bargap=0.1,         # Adjust the gap between bars
    width = 1050,
    height = 700,
    template="plotly_white"
)
              
fig.show()

---

> ## Total Trophies Earned by Each Player 🏆✨

In [21]:
df['Rank Change'] = df['Rank Change'].fillna(0)
players_rank = list()
for i in range(1, 30):
    count = 1000
    user_data = df[df['user_id'] == i]['Rank Change']
    
    for rank in user_data:
        if rank == 0:
            continue
        elif(rank[0] == '+'):
            count += rank.count('+')
        else:
            count -= rank.count('-')
    players_rank.append(count)

> 📌 Note : I used `1000 as the baseline trophy count` to adjust the totals based on each match's performance.

In [22]:
fig = go.Figure()
fig.add_trace(go.Scatter(
    x = np.arange(1, 30),
    y = players_rank,
    mode="markers+text",
    marker = dict(size = 12),
    # line=dict(width=1),
    text = [f"{players_rank[i]} 🏆" for i in range(0, 29)],
    textposition="bottom center",
))
fig.update_layout(
    title = "Total Trophies Earned by Each Player 🏆✨",
    title_x = 0.45,
    xaxis = dict(title = "Players", tickvals = np.arange(1, 30)), 
    yaxis = dict(title = 'Total Trophies'),
    width = 1500,
    height = 700,
    template="plotly_white"
)
fig.show()

---

<div><h3><center>🚧 Work in progress</center></h3></div>