### Heroes Of Pymoli Data Analysis

Analysis:
*82.68% of total revenue was from male players, and they account for 83.59% of total purchases. 
*All single players account for less than 1% of total purchases and total revenue, so player activity is pretty evenly  disbursed.
*The largest male demographic is ages 20-24 accounting for 38.97% of total purchases and 38.66% of total revenue
*The largest female demographic is ages 20-24 accounting for 7.18% of total purchases and 7.38% of total revenue. 

*Top 5 Items Purchased:
    Final Critic
    Oathbreaker, Last Hope of the Breaking Storm
    Fiery Glass Crusader
    Nirvana
    Extraction, Quickblade Of Trembling Hands
*Top 5 items purchased among Males:
    Final Critic
    Oathbreaker, Last Hope of the Breaking Storm
    Lightning, Etcher of the King
    Persuasion
    Malificent Bag
*Top 5 items purchased among Females:
    Nirvana
    Thorn, Satchel of Dark Souls
    Thorn, Conqueror of the Corrupted
    Heartless Bone Dualblade
    Oathbreaker, Last Hope of the Breaking Storm
    
However, some item names have multiple item id's and the top 5 items account for only 6.03% of total purchases, and 8.48% of total revenue , so player preference is widely spread over the 183 items offered.


In [4]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# define file path 
data_file = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(data_file)
# check data integrity
purchase_data.count()


Purchase ID    780
SN             780
Age            780
Gender         780
Item ID        780
Item Name      780
Price          780
dtype: int64

## Player Count

In [5]:
player_count = purchase_data["SN"].nunique()
player_count = [{"Total Players": player_count}]
pd.DataFrame(player_count)

Unnamed: 0,Total Players
0,576


## Purchasing Analysis (Total)

In [6]:
items = len(purchase_data["Item ID"].unique())
avg_price = round(purchase_data["Price"].mean(),2)
tot_purch = purchase_data["Purchase ID"].count()
tot_rev = purchase_data["Price"].sum()

pa_summary = [{"Number of Unique Items": items, 
               "Average Price": avg_price, 
               "Number of Purchases": tot_purch, 
               "Total Revenue": tot_rev}]

df = pd.DataFrame(pa_summary)
def formatx(x):
    return "${:.2f}".format((x))
df["Average Price"] = df["Average Price"].apply(formatx)
df["Total Revenue"] = df["Total Revenue"].apply(formatx)

df

Unnamed: 0,Average Price,Number of Purchases,Number of Unique Items,Total Revenue
0,$3.05,780,183,$2379.77


## Gender Demographics

In [7]:
gender_group = purchase_data.groupby(["Gender"])
total_count_by_gender = gender_group["SN"].nunique()
percent_of_tot_players = (round(total_count_by_gender / (purchase_data["SN"].nunique()),4))*100


gender_df = pd.DataFrame({"Total Count":total_count_by_gender, "Percentage of Players": percent_of_tot_players})
sort_gender_df = gender_df.sort_values(["Percentage of Players"], ascending=False)
def formatc(y):
        return "{:.2f}%".format((y))
sort_gender_df["Percentage of Players"] = sort_gender_df["Percentage of Players"].apply(formatc)
sort_gender_df



Unnamed: 0_level_0,Total Count,Percentage of Players
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Male,484,84.03%
Female,81,14.06%
Other / Non-Disclosed,11,1.91%



## Purchasing Analysis (Gender)

In [8]:
purchase_count_by_gender = gender_group["Purchase ID"].nunique()
avg_price_by_gender = round(gender_group["Price"].mean(),2).map("${:.2f}".format)
total_value = gender_group["Price"].sum().map("${:.2f}".format)
avg_purchase_per_player = round(gender_group["Price"].sum() / gender_group["SN"].nunique(),2).map("${:.2f}".format)

pa_df = pd.DataFrame({"Purchase Count":purchase_count_by_gender, 
                      "Average Purchase Price": avg_price_by_gender, 
                      "Total Purchase Value": total_value, 
                      "Avg Total Purchase Per Person": avg_purchase_per_player})

pa_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase Per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,$3.20,$361.94,$4.47
Male,652,$3.02,$1967.64,$4.07
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


## Age Demographics

In [9]:
bins = [0,9,14,19,24,29,34,39,100]
group_labels = ["<10","10-14","15-19","20-24","25-29","30-34","35-39","40+"]

purchase_data["Age Group"] = pd.cut(purchase_data["Age"], bins, labels=group_labels)
age_group = purchase_data.groupby("Age Group")
total_sn_count= age_group["SN"].nunique()
perc_of_players = round(total_sn_count/purchase_data["SN"].nunique()*100,2).map("{:.2f}%".format)

age_df =pd.DataFrame({"Total Player Count": total_sn_count, "Percentage of Players": perc_of_players})
age_df

Unnamed: 0_level_0,Total Player Count,Percentage of Players
Age Group,Unnamed: 1_level_1,Unnamed: 2_level_1
<10,17,2.95%
10-14,22,3.82%
15-19,107,18.58%
20-24,258,44.79%
25-29,77,13.37%
30-34,52,9.03%
35-39,31,5.38%
40+,12,2.08%


## Purchasing Analysis (Age)

In [10]:
purchase_count = age_group["Purchase ID"].nunique()
avg_price = round(age_group["Price"].mean(),2).map("${:.2f}".format)
tot_purchase = age_group["Price"].sum().map("${:.2f}".format)
avg_tot_purchase = round(age_group["Price"].sum() / age_group["SN"].nunique(),2).map("${:.2f}".format)

pa_age = pd.DataFrame({"Purchase Count": purchase_count, 
                       "Average Purchase Price": avg_price, 
                       "Total Purchase Value": tot_purchase, 
                       "Average Total Purchase Per Person": avg_tot_purchase })
pa_age

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Average Total Purchase Per Person
Age Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,23,$3.35,$77.13,$4.54
10-14,28,$2.96,$82.78,$3.76
15-19,136,$3.04,$412.89,$3.86
20-24,365,$3.05,$1114.06,$4.32
25-29,101,$2.90,$293.00,$3.81
30-34,73,$2.93,$214.00,$4.12
35-39,41,$3.60,$147.67,$4.76
40+,13,$2.94,$38.24,$3.19


## Top Spenders

In [11]:
sn_group = purchase_data.groupby(["SN"])
sn_purch_count = sn_group["Purchase ID"].count()
avg_price_sn = sn_group["Price"].mean()
tot_purch_sn = sn_group["Price"].sum()


sn_df = pd.DataFrame({"Purchase Count": sn_purch_count,
                     "Average Purchase Price": avg_price_sn,
                    "Total Purchase Value": tot_purch_sn}).sort_values("Total Purchase Value", ascending=False)
def formatc(z):
        return "${:.2f}".format((z))
sn_df["Total Purchase Value"] = sn_df["Total Purchase Value"].apply(formatc)
sn_df["Average Purchase Price"] = sn_df["Average Purchase Price"].apply(formatc)


sn_df.head()

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$3.79,$18.96
Idastidru52,4,$3.86,$15.45
Chamjask73,3,$4.61,$13.83
Iral74,4,$3.40,$13.62
Iskadarya95,3,$4.37,$13.10


## Most Popular Items

In [12]:
item_group = purchase_data.groupby(["Item ID", "Item Name"])

item_purch_count = item_group["Purchase ID"].nunique()
item_price = item_group["Price"].max()
tot_item_purch_val = item_group["Price"].sum()


item_df = pd.DataFrame({"Purchase Count": item_purch_count,
                        "Item Price" : item_price,
                        "Total Purchase Value": tot_item_purch_val})

item_df_sort = item_df.sort_values("Purchase Count", ascending =False)

def formatc(a):
        return "${:.2f}".format((a))
item_df_sort["Total Purchase Value"] = item_df_sort["Total Purchase Value"].apply(formatc)
item_df_sort["Item Price"] = item_df_sort["Item Price"].apply(formatc)

item_df_sort.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
145,Fiery Glass Crusader,9,$4.58,$41.22
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77
82,Nirvana,9,$4.90,$44.10
19,"Pursuit, Cudgel of Necromancy",8,$1.02,$8.16


## Most Profitable Items

In [42]:
item_df_sort2 = item_df.sort_values("Total Purchase Value", ascending =False)

def formatc(a):
        return "${:.2f}".format((a))
item_df_sort2["Total Purchase Value"] = item_df_sort2["Total Purchase Value"].apply(formatc)
item_df_sort2["Item Price"] = item_df_sort2["Item Price"].apply(formatc)

item_df_sort2.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
82,Nirvana,9,$4.90,$44.10
145,Fiery Glass Crusader,9,$4.58,$41.22
92,Final Critic,8,$4.88,$39.04
103,Singed Scalpel,8,$4.35,$34.80
