# Heroes Of Pymoli Data Analysis

Of the 576 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).
Our peak age demographic falls between 20-24 (40.28%) with secondary groups falling between 15-19 (26.04%) and 25-29 (10.24%). 

In [50]:
#Dependencies and Setup
import pandas as pd
import numpy as np

#File to Load
file_to_load="./Resources/purchase_data.csv"

#Read Purchasing File and store into Pandas data frame
purchase_data=pd.read_csv(file_to_load)

## Player Count

In [51]:
#Count the number of unique players
player_count=len(purchase_data["SN"].unique())


print(f"The total number of players is {player_count}")


The total number of players is 576


## Purchasing Analysis (Total)

In [52]:
#Number of unique items
unique_items_count=len(purchase_data["Item Name"].unique())
#Average price (average price per transaction)
average_price='${:,.2f}'.format(purchase_data["Price"].mean())
#Number of purchases
number_of_purchases=purchase_data["Price"].count()
#Total revenue, as the sum of all transaction amounts
total_revenue='${:,.2f}'.format(purchase_data["Price"].sum())

In [53]:
#creating a dictionary and then converting to dataframe
purchasing_analysis_df=pd.DataFrame({"Number of Unique Items":[unique_items_count],
                                    "Average Price":[average_price],
                                    "Number of Purchases":[number_of_purchases],
                                    "Total Revenue":[total_revenue]})

In [54]:
#display the summary data frame
purchasing_analysis_df


Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,179,$3.05,780,"$2,379.77"


## Gender Demographics

In [55]:
#selecting "SN", and "Gender", and creating groupby object on "SN"
gender_df=purchase_data[["SN","Gender"]].groupby(["SN"])
#Using last() to avoid duplicates, and 
#then counting to get a Pandas series; also converting to dataframe
gender_counts=pd.DataFrame(gender_df["Gender"].last().value_counts())
#renaming column
gender_counts.columns=["Total Count"]
#calculating percentage of each gender
gender_counts["Percentage of Players"]=gender_counts["Total Count"]/player_count*100
#formatting percentage
gender_counts["Percentage of Players"]=gender_counts["Percentage of Players"].map('{:,.2f}'.format)
#display dataframe
gender_counts.index.name="Gender"
gender_counts


Unnamed: 0_level_0,Total Count,Percentage of Players
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Male,484,84.03
Female,81,14.06
Other / Non-Disclosed,11,1.91


## Purchasing Analysis (Gender)

In [58]:
#creating a groupby object
gender_analysis=purchase_data[["Gender","Price"]].groupby(["Gender"])
#aggregating by "Gender" to sum and count
gender_analysis_df=pd.DataFrame({"Purchase Count":gender_analysis["Price"].count(),
                                "Total Purchase Value":gender_analysis["Price"].sum()})
#creating average purchase price column
gender_analysis_df["Average Purchase Price"]=gender_analysis_df["Total Purchase Value"]/gender_analysis_df["Purchase Count"]


In [59]:
#merging unique individual counts with transaction counts
gender_analysis_df=pd.merge(gender_counts,gender_analysis_df,on="Gender",how="left")

In [60]:
#creating per person average field
gender_analysis_df["Avg Total Purchase per Person"]=gender_analysis_df["Total Purchase Value"]/gender_analysis_df["Total Count"]

In [61]:
#formatting dollar values
gender_analysis_df["Total Purchase Value"]=gender_analysis_df["Total Purchase Value"].map('${:,.2f}'.format)
gender_analysis_df["Average Purchase Price"]=gender_analysis_df["Average Purchase Price"].map('${:,.2f}'.format)
gender_analysis_df["Avg Total Purchase per Person"]=gender_analysis_df["Avg Total Purchase per Person"].map('${:,.2f}'.format)


In [62]:
#Dropping "Percentage of Players"
gender_analysis_df=gender_analysis_df[["Purchase Count","Average Purchase Price","Total Purchase Value","Avg Total Purchase per Person"]]

In [63]:
gender_analysis_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Male,652,$3.02,"$1,967.64",$4.07
Female,113,$3.20,$361.94,$4.47
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


## Age Demographics

In [64]:
#list for binning
bins = [0, 10,15,20,25,30,35,40,100]

# Create the names for the bins
group_names = ["<10","10-14","15-19","20-24","25-29","30-34","35-39","40+"]

In [65]:
#using groupby to get unique combinations of "SN" and "Age"
age_demo_df=purchase_data.groupby(["SN","Age"],as_index=False).count()
age_demo_df=age_demo_df[["SN","Age"]]
#binning age in age groups
age_demo_df["Age Group"]=pd.cut(age_demo_df["Age"],bins,labels=group_names)
age_demo_df=age_demo_df[["Age Group"]]
#a counting variable
age_demo_df["Total Count"]=1
#aggregating by grouping variable for age group counts
age_demo_df=age_demo_df.groupby("Age Group").sum()


In [66]:
#calculate percentage of each group and apply format
age_demo_df["Percentage of Players"]=age_demo_df["Total Count"]/player_count*100
age_demo_df["Percentage of Players"]=age_demo_df["Percentage of Players"].map('{:,.2f}'.format)

In [67]:
age_demo_df

Unnamed: 0_level_0,Total Count,Percentage of Players
Age Group,Unnamed: 1_level_1,Unnamed: 2_level_1
<10,24,4.17
10-14,41,7.12
15-19,150,26.04
20-24,232,40.28
25-29,59,10.24
30-34,37,6.42
35-39,26,4.51
40+,7,1.22


## Purchasing Analysis (Age)

In [68]:
#selecting required fields
age_analysis_df=purchase_data.loc[:,["SN","Age","Price"]]
#counting variable
age_analysis_df["Purchase Count"]=1
#binning age into age groups
age_analysis_df["Age Group"]=pd.cut(age_analysis_df["Age"],bins,labels=group_names)
#using groupby to get unique SN, Age Group combinations
age_unique_analysis_df=age_analysis_df.groupby(["SN","Age Group"]).size().reset_index()
#using groupby to get SN counts in each age group
age_unique_analysis_df=age_unique_analysis_df.groupby(["Age Group"]).count().reset_index()
#dropping column not needed
age_unique_analysis_df=age_unique_analysis_df.iloc[:,[0,1]]
#renaming field
age_unique_analysis_df=age_unique_analysis_df.rename(columns={"SN":"Player Count"})
#selecting needed columns
age_analysis_df=age_analysis_df[["SN","Age Group","Price","Purchase Count"]]
#groupby sum
age_analysis_df=age_analysis_df.groupby(["Age Group"],as_index=False).sum()
#merging to unique player counts
age_analysis_df=pd.merge(age_analysis_df,age_unique_analysis_df,on="Age Group",how="left")

In [69]:
age_unique_analysis_df

Unnamed: 0,Age Group,Player Count
0,<10,24
1,10-14,41
2,15-19,150
3,20-24,232
4,25-29,59
5,30-34,37
6,35-39,26
7,40+,7


In [70]:
#calculating fields and formatting
age_analysis_df["Average Purchase Price"]=age_analysis_df["Price"]/age_analysis_df["Purchase Count"]
age_analysis_df["Average Total Purchase per Person"]=age_analysis_df["Price"]/age_analysis_df["Player Count"]
age_analysis_df["Average Purchase Price"]=age_analysis_df["Average Purchase Price"].map('${:,.2f}'.format)
age_analysis_df["Price"]=age_analysis_df["Price"].map('${:,.2f}'.format)
age_analysis_df["Average Total Purchase per Person"]=age_analysis_df["Average Total Purchase per Person"].map('${:,.2f}'.format)
#renaming field
age_analysis_df=age_analysis_df.rename(columns={"Price":"Total Purchase Value"})
#ordering columns
age_analysis_df=age_analysis_df.iloc[:,[0,2,4,1,5]]

In [71]:
#removing default index, and replacing with age group
age_analysis_df=age_analysis_df.set_index("Age Group")

In [72]:
age_analysis_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Average Total Purchase per Person
Age Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,32,$3.40,$108.96,$4.54
10-14,54,$2.90,$156.60,$3.82
15-19,200,$3.11,$621.56,$4.14
20-24,325,$3.02,$981.64,$4.23
25-29,77,$2.88,$221.42,$3.75
30-34,52,$2.99,$155.71,$4.21
35-39,33,$3.40,$112.35,$4.32
40+,7,$3.08,$21.53,$3.08


## Top Spenders

In [73]:
#Selecting required fields
top_spenders_df=purchase_data.loc[:,["SN","Price"]]
#counting variable
top_spenders_df["Purchase Count"]=1
#groupby sum to get transaction counts, and transaction totals per individual
top_spenders_df=top_spenders_df.groupby("SN").sum()
#calculating new field, renaming field, sorting, ordering columns
top_spenders_df["Average Purchase Price"]=top_spenders_df["Price"]/top_spenders_df["Purchase Count"]
top_spenders_df=top_spenders_df.rename(columns={"Price":"Total Purchase Value"})
top_spenders_df=top_spenders_df.sort_values("Total Purchase Value",ascending = False)
top_spenders_df=top_spenders_df.iloc[:,[1,2,0]]
top_spenders_df["Total Purchase Value"]=top_spenders_df["Total Purchase Value"].map('${:,.2f}'.format)
top_spenders_df["Average Purchase Price"]=top_spenders_df["Average Purchase Price"].map('${:,.2f}'.format)

In [74]:
top_spenders_df.head()

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$3.79,$18.96
Idastidru52,4,$3.86,$15.45
Chamjask73,3,$4.61,$13.83
Iral74,4,$3.40,$13.62
Iskadarya95,3,$4.37,$13.10


## Most Popular Items

In [78]:
#selecting required fields
popular_df=purchase_data.loc[:,["Item ID","Item Name","Price"]]

In [79]:
#transaction count per item
purchase_count=popular_df.groupby(["Item ID","Item Name"]).count().rename(columns={"Price":"Purchase Count"})
#picking mean price as the price of an item
item_price=popular_df.groupby(["Item ID","Item Name"]).mean().rename(columns={"Price":"Item Price"})
#total purchase dollars per item
total_purchase_value=popular_df.groupby(["Item ID","Item Name"]).sum().rename(columns={"Price":"Total Purchase Value"})

In [80]:
#merging three dataframe into one
popular_df=pd.merge(purchase_count,item_price,on=(["Item ID","Item Name"]),how="left")
popular_df=pd.merge(popular_df,total_purchase_value,on=(["Item ID","Item Name"]),how="left")
#sorting by purchase counts
popular_df=popular_df.sort_values("Purchase Count",ascending=False)
#formatting dollar figures
popular_df["Total Purchase Value"]=popular_df["Total Purchase Value"].map('${:,.2f}'.format)
popular_df["Item Price"]=popular_df["Item Price"].map('${:,.2f}'.format)


In [81]:
popular_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
145,Fiery Glass Crusader,9,$4.58,$41.22
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77
82,Nirvana,9,$4.90,$44.10
19,"Pursuit, Cudgel of Necromancy",8,$1.02,$8.16


## Most Profitable Items

In [82]:
#combining 3 dataframes into one by merging
profitable_df=pd.merge(purchase_count,item_price,on=(["Item ID","Item Name"]),how="left")
profitable_df=pd.merge(profitable_df,total_purchase_value,on=(["Item ID","Item Name"]),how="left")
#sorting by total purchase value
profitable_df=profitable_df.sort_values("Total Purchase Value",ascending=False)
#formatting dollar figures
profitable_df["Total Purchase Value"]=profitable_df["Total Purchase Value"].map('${:,.2f}'.format)
profitable_df["Item Price"]=profitable_df["Item Price"].map('${:,.2f}'.format)
profitable_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
82,Nirvana,9,$4.90,$44.10
145,Fiery Glass Crusader,9,$4.58,$41.22
92,Final Critic,8,$4.88,$39.04
103,Singed Scalpel,8,$4.35,$34.80
