## Heroes Of Pymoli Data Analysis
* **Trend 1:** Male players dominate the purchase record. Male players dominate the total number of purchase and the total value of purchase.
* **Trend 2:** Age group of 20-24 has the highest number of purchase and the highest revenue of purchase.
* **Trend 3:** The most popular items (that has the highest number of purchase) does not overlap with the most profitable items (that has the highest revenue of purchase).

In [1]:
# Import libraries
import pandas as pd
import os
import numpy as np

In [2]:
# Import data
file_path = os.path.join('.', 'purchase_data.json')
df = pd.read_json(file_path)
df.head()

Unnamed: 0,Age,Gender,Item ID,Item Name,Price,SN
0,38,Male,165,Bone Crushing Silver Skewer,3.37,Aelalis34
1,21,Male,119,"Stormbringer, Dark Blade of Ending Misery",2.32,Eolo46
2,34,Male,174,Primitive Blade,2.46,Assastnya25
3,21,Male,92,Final Critic,1.36,Pheusrical25
4,23,Male,63,Stormfury Mace,1.27,Aela59


In [3]:
# Check whether there is any null entry
df.isnull().sum()

Age          0
Gender       0
Item ID      0
Item Name    0
Price        0
SN           0
dtype: int64

### Player count

In [4]:
# Check whether same player occurs more than once
df['SN'].value_counts().head()

Undirrala66    5
Saedue76       4
Sondastan54    4
Qarwen67       4
Hailaphos89    4
Name: SN, dtype: int64

In [5]:
# Count players
player_total = len(df['SN'].unique()) # total number of players
player_count = pd.DataFrame([{'Total Players': player_total}])
player_count

Unnamed: 0,Total Players
0,573


### Purchasing Analysis (Total)

* Number of Unique Items
* Average Purchase Price
* Total Number of Purchases
* Total Revenue

In [6]:
number_unique_items = len(df['Item ID'].unique())
average_purchase_price = df['Price'].mean()
total_number_purchase = df['Item ID'].count()
total_revenue = df['Price'].sum()

purchase_analysis_total = pd.DataFrame([{'Number of Unique Items': number_unique_items,
                                  'Average Purchase Price': round(average_purchase_price, 2),
                                 'Total Number of Purchases': total_number_purchase,
                                 'Total Revenue': round(total_revenue, 2)}])
purchase_analysis_total

Unnamed: 0,Average Purchase Price,Number of Unique Items,Total Number of Purchases,Total Revenue
0,2.93,183,780,2286.33


### Gender Demographics

* Percentage and Count of Male Players
* Percentage and Count of Female Players
* Percentage and Count of Other / Non-Disclosed

In [7]:
# Group by gender
df_grouped_gender = df.groupby(['Gender']) # group by gender

# Gender demographics
df_gender = pd.DataFrame()
df_gender['Total Count'] = df_grouped_gender['SN'].nunique() # count unique players
df_gender['Percentage of Palyers (%)'] = round(df_gender['Total Count'] / player_total * 100, 2) # percentage
gender_demo = df_gender[['Percentage of Palyers (%)', 'Total Count']] # reorganize columns
gender_demo

Unnamed: 0_level_0,Percentage of Palyers (%),Total Count
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,17.45,100
Male,81.15,465
Other / Non-Disclosed,1.4,8


### Purchasing Analysis (Gender)

* The below each broken by gender
  * Purchase Count
  * Average Purchase Price
  * Total Purchase Value
  * Normalized Totals

In [8]:
# Store data into a dataframe
purchasing_analysis_gender = pd.DataFrame() # Initiate the dataframe
purchasing_analysis_gender['Purchase Count'] = df_grouped_gender.count()['Item Name']
purchasing_analysis_gender['Average Purchase Price'] = round(df_grouped_gender.mean()['Price'], 2)
purchasing_analysis_gender['Total Purchase Value'] = round(df_grouped_gender.sum()['Price'], 2)
purchasing_analysis_gender['Normalized Totals'] = round(purchasing_analysis_gender['Total Purchase Value'] \
                                                  / gender_demo['Total Count'], 2)

# Format price as currency
purchasing_analysis_gender['Average Purchase Price'] = \
                        purchasing_analysis_gender['Average Purchase Price'].map('${:,.2f}'.format)
purchasing_analysis_gender['Total Purchase Value'] = \
                        purchasing_analysis_gender['Total Purchase Value'].map('${:,.2f}'.format)
purchasing_analysis_gender['Normalized Totals'] = \
                        purchasing_analysis_gender['Normalized Totals'].map('${:,.2f}'.format)
purchasing_analysis_gender

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Normalized Totals
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,136,$2.82,$382.91,$3.83
Male,633,$2.95,"$1,867.68",$4.02
Other / Non-Disclosed,11,$3.25,$35.74,$4.47


### Age Demographics

* The below each broken into bins of 4 years (i.e. &lt;10, 10-14, 15-19, etc.) 
  * Purchase Count
  * Average Purchase Price
  * Total Purchase Value
  * Normalized Totals

In [9]:
# Look for min and max of age
df['Age'].describe()

count    780.000000
mean      22.729487
std        6.930604
min        7.000000
25%       19.000000
50%       22.000000
75%       25.000000
max       45.000000
Name: Age, dtype: float64

In [10]:
# Create bins
bins_raw = np.arange(10,51,5)
bins = np.insert(bins_raw, 0, 0)

# Convert continuous variable to categorical variable
labels = ['<10','10-14','15-19','20-24','25-29','30-34','35-39','40-44','45-49']
age_cat = pd.cut(df['Age'], bins = bins, right = False, labels = labels)

# Create new dataframe with age categories
df_age = df
df_age['Age Categories'] = age_cat
df_age.head()

# Store grouped data into a dataframe
df_grouped_age = df_age.groupby(['Age Categories']) # group by age categories
age_demo = pd.DataFrame()
age_demo['Percentage of Players (%)'] = round(df_grouped_age['SN'].nunique() / player_total * 100, 2)
age_demo['Total Count'] = df_grouped_age['SN'].nunique()
age_demo

Unnamed: 0_level_0,Percentage of Players (%),Total Count
Age Categories,Unnamed: 1_level_1,Unnamed: 2_level_1
<10,3.32,19
10-14,4.01,23
15-19,17.45,100
20-24,45.2,259
25-29,15.18,87
30-34,8.2,47
35-39,4.71,27
40-44,1.75,10
45-49,0.17,1


### Purchasing Analysis (Age)
* Purchase Count
* Average Purchase Price
* Total Purchase Value
* Normalized Totals

In [11]:
# Store data into a dataframe
purchase_analysis_age = pd.DataFrame()
purchase_analysis_age['Purchase Count'] = df_grouped_age.count()['Item ID']
purchase_analysis_age['Average Purchase Price'] = round(df_grouped_age.mean()['Price'], 2)
purchase_analysis_age['Total Purchase Value'] = round(df_grouped_age.sum()['Price'], 2)
purchase_analysis_age['Normalized Totals'] = round(purchase_analysis_age['Total Purchase Value'] \
                                                  / age_demo['Total Count'], 2)

# Format price as currency
purchase_analysis_age['Average Purchase Price'] = \
                        purchase_analysis_age['Average Purchase Price'].map('${:,.2f}'.format)
purchase_analysis_age['Total Purchase Value'] = \
                        purchase_analysis_age['Total Purchase Value'].map('${:,.2f}'.format)
purchase_analysis_age['Normalized Totals'] = \
                        purchase_analysis_age['Normalized Totals'].map('${:,.2f}'.format)
purchase_analysis_age

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Normalized Totals
Age Categories,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,28,$2.98,$83.46,$4.39
10-14,35,$2.77,$96.95,$4.22
15-19,133,$2.91,$386.42,$3.86
20-24,336,$2.91,$978.77,$3.78
25-29,125,$2.96,$370.33,$4.26
30-34,64,$3.08,$197.25,$4.20
35-39,42,$2.84,$119.40,$4.42
40-44,16,$3.19,$51.03,$5.10
45-49,1,$2.72,$2.72,$2.72


### Top Spenders

* Identify the the top 5 spenders in the game by total purchase value, then list (in a table):
  * SN
  * Purchase Count
  * Average Purchase Price
  * Total Purchase Value

In [12]:
# Group by player
df_player = df.groupby(['SN'])

# Store player-specific info into dataframe
spenders = pd.DataFrame()
spenders['Purchase Count'] = df_player['Item ID'].count()
spenders['Average Purchase Price ($)'] = round(df_player['Price'].mean(), 2) #.map('${:,.2f}'.format)
spenders['Total Purchase Value ($)'] = round(df_player['Price'].sum(), 2) #.map('${:,.2f}'.format)

# Sort and find top 5 spenders
spenders_sorted = spenders.sort_values(by = ['Total Purchase Value ($)'], ascending = False)
top_spenders = spenders_sorted.iloc[0:5]
top_spenders

Unnamed: 0_level_0,Purchase Count,Average Purchase Price ($),Total Purchase Value ($)
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Undirrala66,5,3.41,17.06
Saedue76,4,3.39,13.56
Mindimnya67,4,3.18,12.74
Haellysu29,3,4.24,12.73
Eoda93,3,3.86,11.58


### Most Popular Items

* Identify the 5 most popular items by purchase count, then list (in a table):
  * Item ID
  * Item Name
  * Purchase Count
  * Item Price
  * Total Purchase Value

In [13]:
# Group by Item ID
df_items = df.groupby(['Item ID', 'Item Name'])

# Store item-specific info into a dataframe
items = pd.DataFrame()
items['Purchase Count'] = df_items['Item ID'].count()
items['Item Price ($)'] = round(df_items['Price'].sum() / items['Purchase Count'], 2) #.map('${:,.2f}'.format)
items['Total Purchase Value ($)'] = round(df_items['Price'].sum(), 2) #.map('${:,.2f}'.format)

# Sort and find top 5 items by purchase count
items_sorted = items.sort_values(['Purchase Count'], ascending = False)
top_items = items_sorted.iloc[0:5]
top_items

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price ($),Total Purchase Value ($)
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
39,"Betrayal, Whisper of Grieving Widows",11,2.35,25.85
84,Arcane Gem,11,2.23,24.53
31,Trickster,9,2.07,18.63
175,Woeful Adamantite Claymore,9,1.24,11.16
13,Serenity,9,1.49,13.41


### Most Profitable Items

* Identify the 5 most profitable items by total purchase value, then list (in a table):
  * Item ID
  * Item Name
  * Purchase Count
  * Item Price
  * Total Purchase Value

In [14]:
# Sort and find top 5 items by purchase value
items_sorted_by_value = items.sort_values(['Total Purchase Value ($)'], ascending = False)
top_items_by_revenue = items_sorted_by_value.iloc[0:5]
top_items_by_revenue

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price ($),Total Purchase Value ($)
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
34,Retribution Axe,9,4.14,37.26
115,Spectral Diamond Doomblade,7,4.25,29.75
32,Orenmir,6,4.95,29.7
103,Singed Scalpel,6,4.87,29.22
107,"Splitter, Foe Of Subtlety",8,3.61,28.88
