Heroes Of Pymoli Data Analysis

We have analyzed the in-game purchases of the obscure game Heroes of Pymoli. We have looked for demographic and purchase patterns by gender, age, and the item being purchased.

Of the 576 active players, the vast majority are male, spoiler alert (84%). It is unclear whether they live in their moms' basement, but we feel it's a safe assumption.

44.79% of the gamers are in the age range 20-24. Presumably these are college students skipping class. Notably, 2.08% of the gamers are age 40 and over. Probably playing games at the office. This age group spends the least on average at only $3.19 per person. They are probably pretty bad at the game and don't progress far enough to buy the expensive items.

Gamer Lisosia93 is a big spender. They bought 5 in-game purchases for a total of $18.96. One wonders if they are any good at the game or just paying to win.

The item "Final Critic" has been purchased the most, 13 times, for a total revenue of $59.99. It must be super helpful.

In summary, with only 576 player and total revenue of only $2,379.77 Heroes of Pymoli seems to be a flop. Time to shut it down.



In [54]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

Player Count: Display the total number of players

In [55]:
total_players_df = pd.DataFrame({'Total Players':[purchase_data["SN"].nunique()]})
total_players_df.style.hide_index()

Total Players
576


Purchasing Analysis (Total)

Analyze the items purchased by showing

- The number of unique items purchased
- The average price paid for a purchased items
- The total number of purchases made
- The total revenue received from items sold

In [56]:
purch_analysis_tot_df = pd.DataFrame({"Number of Unique Items":[purchase_data["Item Name"].nunique()],
            "Average Price":[purchase_data["Price"].mean()],
            "Total Number of Purchases":[purchase_data["Purchase ID"].nunique()],
            "Total Revenue":[purchase_data["Price"].sum()]
           })
purch_analysis_tot_df.style.hide_index().format({"Average Price": "${:,.2f}", "Total Revenue": "${:,.2f}"})

Number of Unique Items,Average Price,Total Number of Purchases,Total Revenue
179,$3.05,780,"$2,379.77"


Gender Demographics

Analyze the gender demographics of the players by showing:
- Number of players by gender
- Percentage of players by gender

In [57]:
#Create a df grouped by User ID to eliminate duplicate purchases for same User
user_grouped_purch_data = purchase_data.groupby(["SN"])

#Grab the first line of each User ID grouping in order to create a df where each line is a unique user
first_user_occurance = user_grouped_purch_data.first()

#Create a df column of counts by gender for the unique user df
gender_demog_df = pd.DataFrame(first_user_occurance["Gender"].value_counts())

#Add a df column to hold the calc of percentage of players in each gender
gender_demog_df["Percentage of Players"]=gender_demog_df["Gender"]/gender_demog_df["Gender"].sum()

#Rename the Gender column to Total Count
gender_demog_df.rename(columns = {'Gender':'Total Count'}, inplace=True)

gender_demog_df.style.format({"Percentage of Players": "{:.2%}"})

Unnamed: 0,Total Count,Percentage of Players
Male,484,84.03%
Female,81,14.06%
Other / Non-Disclosed,11,1.91%


Purchasing Analysis (GENDER)

Analyze the player purchases made by gender by showing:

- Number of purchases by gender
- Average price of a purchase by gender
- Total value of purchases by gender
- Average total purchase per person by gender 

In [58]:
#Create a df grouped by Gender
gender_grouped_purch_data = purchase_data.groupby(["Gender"])

#Count the purchases by Gender
purch_by_gender = gender_grouped_purch_data["Purchase ID"].count()

#Calc the average purchase price by Gender
avg_purch_price_by_gender = gender_grouped_purch_data["Price"].sum()/gender_grouped_purch_data["Purchase ID"].count()

#Calc the total value of purchases by gender
tot_purch_value_by_gender = gender_grouped_purch_data["Price"].sum()

#Calc the average purchase per person by Gender
avg_purch_price_by_gender_per_person = gender_grouped_purch_data["Price"].sum()/gender_grouped_purch_data["SN"].nunique()

#Extrat the genders
gamer_gender_df = pd.DataFrame(gender_grouped_purch_data.first())
gamer_gender = gamer_gender_df.index

# Create a summary data frame to hold the results
purch_analysis_gender_final = pd.DataFrame(list(zip(purch_by_gender, avg_purch_price_by_gender, tot_purch_value_by_gender, avg_purch_price_by_gender_per_person)), index = gamer_gender, columns =['Purchase Count',	'Average Purchase Price',	'Total Purchase Value', 'Avg Total Purchase per Person'])

purch_analysis_gender_final.style.format({"Average Purchase Price": "${:,.2f}", "Total Purchase Value": "${:,.2f}", "Avg Total Purchase per Person": "${:,.2f}"})

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,$3.20,$361.94,$4.47
Male,652,$3.02,"$1,967.64",$4.07
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


Age Demographics

Analyze the age demographics of players by showing:

- Number of players in each 5 year age band
- Percentage of players in each 5 year age band

In [59]:
# make the bins and labels
bins = [0,9,14,19,24,29,34,39,200]
labels = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]
purchase_data_bin_age = first_user_occurance
purchase_data_bin_age["Age Bin"] = pd.cut(purchase_data_bin_age["Age"], bins=bins, labels=labels)

#Create a df grouped by age bin
purchase_data_grouped_by_bin = purchase_data_bin_age.groupby(["Age Bin"])

# Calculate the numbers and percentages by age group
bin_count = purchase_data_grouped_by_bin["Age Bin"].count()
tot_players_count = first_user_occurance.count()
bin_pct = bin_count/tot_players_count["Age"]

# Create a summary data frame to hold the results
age_demog_summary_df = pd.DataFrame(list(zip(labels, bin_count, bin_pct)), columns =['Age Ranges', 'Total Count', 'Percentage of Players']) 

# Optional: round the percentage column to two decimal points
# Display Age Demographics Table
age_demog_summary_df.set_index('Age Ranges', inplace=True)
age_demog_summary_df.style.format({"Percentage of Players": "{:.2%}"})

Unnamed: 0_level_0,Total Count,Percentage of Players
Age Ranges,Unnamed: 1_level_1,Unnamed: 2_level_1
<10,17,2.95%
10-14,22,3.82%
15-19,107,18.58%
20-24,258,44.79%
25-29,77,13.37%
30-34,52,9.03%
35-39,31,5.38%
40+,12,2.08%


Purchasing Analysis by Age

Analyze the purchasing behavior of different age groups by showing:

- Total number of purchases by age groups
- Average purchase price by age groups
- Total amount of purchases by age groups
- Average total purchase amount per person in each age groups

In [60]:
#make bins and labels
bins = [0,9,14,19,24,29,34,39,200]
labels = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]
purch_data_age_bin = purchase_data
purch_data_age_bin["Age Bin"] = pd.cut(purch_data_age_bin["Age"], bins=bins, labels=labels)

#This is all purchase records grouped by Age Bin
purch_data_grouped_by_age_bin = purch_data_age_bin.groupby(["Age Bin"])

#Count the purchases by Bin
purch_by_bin = purch_data_grouped_by_age_bin["Purchase ID"].count()

# #Calc the average purchase price by Bin
avg_purch_price_by_age_bin = purch_data_grouped_by_age_bin["Price"].sum()/purch_data_grouped_by_age_bin["Purchase ID"].count()

# #Calc the total value of purchases by Bin
tot_purch_value_by_bin = purch_data_grouped_by_age_bin["Price"].sum()
tot_purch_value_by_bin

# #Calc the average purchase per person by Bin
avg_purch_price_by_bin_per_person = purch_data_grouped_by_age_bin["Price"].sum()/purch_data_grouped_by_age_bin["SN"].nunique()

# Create a summary data frame to hold the results
bin_purch_summary_df = pd.DataFrame(list(zip(labels, purch_by_bin, avg_purch_price_by_age_bin, tot_purch_value_by_bin, avg_purch_price_by_bin_per_person)), columns =['Age Ranges','Purchase Count',	'Average Purchase Price',	'Total Purchase Value',	'Avg Total Purchase per Person']) 

bin_purch_summary_df.set_index('Age Ranges', inplace=True)
bin_purch_summary_df.style.format({"Average Purchase Price": "${:,.2f}", "Total Purchase Value": "${:,.2f}", "Avg Total Purchase per Person": "${:,.2f}"})

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Age Ranges,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,23,$3.35,$77.13,$4.54
10-14,28,$2.96,$82.78,$3.76
15-19,136,$3.04,$412.89,$3.86
20-24,365,$3.05,"$1,114.06",$4.32
25-29,101,$2.90,$293.00,$3.81
30-34,73,$2.93,$214.00,$4.12
35-39,41,$3.60,$147.67,$4.76
40+,13,$2.94,$38.24,$3.19


Top Spenders Analysis

Identify the top 5 spender by amount and their:

- Gamer ID name
- The number of purchases they made
- The average price of their purchases
- The total amount of their purchases

In [61]:
purch_data_by_gamer = purchase_data.groupby(["SN"])

#Count the purchases by Gamer
purch_by_gamer = purch_data_by_gamer["Purchase ID"].count()

#Calc the average purchase price by Gamer
avg_purch_price_by_gamer = purch_data_by_gamer["Price"].sum()/purch_data_by_gamer["Purchase ID"].count()

#Calc the total value of purchases by Gamer
tot_purch_value_by_gamer = purch_data_by_gamer["Price"].sum()

#Extrat the user ID's
gamer_SN_df = pd.DataFrame(purch_data_by_gamer.first())
gamer_SN = gamer_SN_df.index


# Create a summary data frame to hold the results
gamer_purch_summary_df = pd.DataFrame(list(zip(purch_by_gamer, avg_purch_price_by_gamer, tot_purch_value_by_gamer)), index = gamer_SN, columns =['Purchase Count',	'Average Purchase Price',	'Total Purchase Value'])

gamer_purch_summary_df.sort_values(by=['Total Purchase Value'], inplace=True, ascending=False)

gamer_top_spenders = gamer_purch_summary_df.head(5)
gamer_top_spenders.style.format({"Average Purchase Price": "${:,.2f}", "Total Purchase Value": "${:,.2f}"})

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$3.79,$18.96
Idastidru52,4,$3.86,$15.45
Chamjask73,3,$4.61,$13.83
Iral74,4,$3.40,$13.62
Iskadarya95,3,$4.37,$13.10


Most Popular Items Analysis

Identify the most popular items by number of purchases and their:
- Item ID
- Item Name
- Number of times the item was purchased
- Average price paid for the item
- Total amount of purchases for the item

In [62]:
#Create df grouped by Item ID
item_summary_df = purchase_data[["Item ID", "Item Name", "Price"]]
group_by_item = item_summary_df.groupby(["Item ID"])

#Extrat the item ID's
item_ID_df = pd.DataFrame(group_by_item.first())


#Item purchase count
purch_by_item = group_by_item["Item ID"].count()
item_ID_df["Purchase Count"]=purch_by_item
 
#Item price
item_price = group_by_item["Price"].mean()
item_ID_df["Price"]=item_price

#Item total purchase value
tot_purch_value_item = group_by_item["Price"].sum()
item_ID_df["Total Purchase Value"]=tot_purch_value_item

item_ID_df=item_ID_df[['Item Name', 'Purchase Count', 'Price', 'Total Purchase Value']]
item_ID_df.rename(columns = {'Price':'Item Price'}, inplace=True)

item_ID_df

item_ID_df.sort_values(by=['Purchase Count'], inplace=True, ascending=False)

most_pop_items = item_ID_df.head(5)
most_pop_items.style.format({"Item Price": "${:,.2f}", "Total Purchase Value": "${:,.2f}"})

Unnamed: 0_level_0,Item Name,Purchase Count,Item Price,Total Purchase Value
Item ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13,$4.61,$59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
145,Fiery Glass Crusader,9,$4.58,$41.22
132,Persuasion,9,$3.22,$28.99
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77


Most Profitable Item Analysis

Identify the top 5 most popular items by total amount spent on that item, including:
- Item ID
- Item Name
- Number of time the item was purchased
- Total amount spend on the item

In [63]:
item_ID_df.sort_values(by=['Total Purchase Value'], inplace=True, ascending=False)
most_profit_items = item_ID_df.head(5)
most_profit_items.style.format({"Item Price": "${:,.2f}", "Total Purchase Value": "${:,.2f}"})

Unnamed: 0_level_0,Item Name,Purchase Count,Item Price,Total Purchase Value
Item ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13,$4.61,$59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
82,Nirvana,9,$4.90,$44.10
145,Fiery Glass Crusader,9,$4.58,$41.22
103,Singed Scalpel,8,$4.35,$34.80
