<h1>Heroes of Pymoli Data Analysis</h1>

In [4]:
#dependancies
import pandas as pd

#load csv file
file_to_load = "Resources\purchase_data.csv"

#csv data to dataframe
purchase_data = pd.read_csv(file_to_load)

<h3>Player Count</h3>

In [5]:
#finds unique player id
totalplayers = purchase_data['SN'].nunique()

#makes dictionary of player number
playerdic = {'Total Players': totalplayers}

#outputs player number as dataframe
pd.DataFrame(playerdic, index =[0])

Unnamed: 0,Total Players
0,576


<h3>Purchasing Analysis (Total)</h3>

In [6]:
#creates summary dictionary from purchase_data
summary = {}
summary['Number of Unique Items'] = purchase_data['Item Name'].nunique()
summary['Average Price'] = purchase_data['Price'].mean()
summary['Total Number of Purchases'] = purchase_data['Purchase ID'].nunique()
summary['Total Revenue'] = purchase_data['Price'].sum()

In [7]:
#converts summary dictionary to dataframe
summarydf = pd.DataFrame(summary, index = [0])

#displays formatted dataframe 
summarydf.style.format({'Average Price':'${:.2f}', 'Total Revenue':'${:.2f}'})

Unnamed: 0,Number of Unique Items,Average Price,Total Number of Purchases,Total Revenue
0,179,$3.05,780,$2379.77


<h3>Gender Demographics</h3>

In [17]:
#groups by gender and counts unique players
gendernumber = purchase_data.groupby('Gender')['SN'].nunique()

#finds percntage of players per gender catagory
percent = gendernumber / totalplayers 

#creates dictionary of gender statistics
genderdic = {'Total Count': gendernumber, 'Percentage of Players': percent }

#converts to dataframe
df = pd.DataFrame(genderdic).sort_values("Total Count", ascending = False)

#displays and formats dataframe
df.style.format({'Percentage of Players':'{:.2%}'})

Unnamed: 0_level_0,Total Count,Percentage of Players
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Male,484,84.03%
Female,81,14.06%
Other / Non-Disclosed,11,1.91%


<h3>Purchasing Analysis (Gender)</h3>

In [18]:
#groups by gender to get purchase statistics
purchasecount = purchase_data.groupby('Gender')['Purchase ID'].count()
purchaseave = purchase_data.groupby('Gender')['Price'].mean()
purchasetotal = purchase_data.groupby('Gender')['Price'].sum()
perplayer = purchasetotal/gendernumber

#creates dictionary of gender purchase statistics
purchasedic = {'Purchase Count': purchasecount, 'Average Price': purchaseave, 'Total Price': purchasetotal, 'Per Player': perplayer}

#converts dictionary to dataframe
pdf = pd.DataFrame(purchasedic).sort_values("Purchase Count", ascending = False)

#displays formatted dataframe
pdf.style.format({'Average Price': '${:.2f}', 'Total Price': '${:.2f}','Per Player': '${:.2f}'})

Unnamed: 0_level_0,Purchase Count,Average Price,Total Price,Per Player
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Male,652,$3.02,$1967.64,$4.07
Female,113,$3.20,$361.94,$4.47
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


<h3>Age Demographics</h3>

In [10]:
#list of age bins
bins = [0, 9, 14, 19, 24, 29, 34, 39, 100]

#list of bin names
range_names = ["<10", "10-14", "15-19", "20-24", "25-29","30-34","35-39","40+" ]

#appends bins to purchase_data
purchase_data["Age Range"] = pd.cut(purchase_data["Age"], bins, labels = range_names, include_lowest = True)

#groups by bins to obtain age group statistics
age = purchase_data.groupby('Age Range')['SN'].nunique()
percentage_age = age/totalplayers
agedic = {'Total Count': age, 'Percentage of Players': percentage_age}
age_summary = pd.DataFrame(agedic)
age_summary.style.format({'Percentage of Players':'{:.2%}'})

Unnamed: 0_level_0,Total Count,Percentage of Players
Age Range,Unnamed: 1_level_1,Unnamed: 2_level_1
<10,17,2.95%
10-14,22,3.82%
15-19,107,18.58%
20-24,258,44.79%
25-29,77,13.37%
30-34,52,9.03%
35-39,31,5.38%
40+,12,2.08%


<h3>Purchasing Analysis (Age)</h3>

In [12]:
#groups by bins to obtain age group purchasing statistics
purchaseagecount = purchase_data.groupby('Age Range')['Purchase ID'].count()
purchaseageave = purchase_data.groupby('Age Range')['Price'].mean()
purchaseagetotal = purchase_data.groupby('Age Range')['Price'].sum()
perageplayer = purchaseagetotal/age

#creates dictionary of age purchase statistics
purchaseagedic = {'Purchase Count': purchaseagecount, 'Average Purchase Price': purchaseageave, 'Total Purchase Value': purchaseagetotal, 'Avg Total Purchase per Player': perageplayer}

#converts dictionary to dataframe
adf = pd.DataFrame(purchaseagedic)

#displays formatted dataframe
adf.style.format({'Average Purchase Price': '${:.2f}', 'Total Purchase Value': '${:.2f}','Avg Total Purchase per Player': '${:.2f}'})

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Player
Age Range,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,23,$3.35,$77.13,$4.54
10-14,28,$2.96,$82.78,$3.76
15-19,136,$3.04,$412.89,$3.86
20-24,365,$3.05,$1114.06,$4.32
25-29,101,$2.90,$293.00,$3.81
30-34,73,$2.93,$214.00,$4.12
35-39,41,$3.60,$147.67,$4.76
40+,13,$2.94,$38.24,$3.19


<h3>Top Spenders</h3>

In [13]:
#groups by player id to obtain player purchasing statistics
tsmean = purchase_data.groupby('SN')['Price'].mean()
tstotal = purchase_data.groupby('SN')['Purchase ID'].count()
tssum = purchase_data.groupby('SN')['Price'].sum()

#creates dictionary of player purchase statistics
tsdic = {'Purchase Count': tstotal, 'Average Purchase Price': tsmean, 'Total Purchase Value': tssum}

#converts dictionary to dataframe specifies preview of top entries
topdf = pd.DataFrame(tsdic).sort_values("Total Purchase Value", ascending = False).head()

#displays formatted dataframe
topdf.style.format({'Average Purchase Price': '${:.2f}', 'Total Purchase Value': '${:.2f}'})


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$3.79,$18.96
Idastidru52,4,$3.86,$15.45
Chamjask73,3,$4.61,$13.83
Iral74,4,$3.40,$13.62
Iskadarya95,3,$4.37,$13.10


<h3>Most Popular Items</h3>

In [14]:
#groups by item id and name to obtain item purchasing statistics
topitemsmean = purchase_data.groupby(['Item ID','Item Name'])['Price'].mean()
topitemscount = purchase_data.groupby(['Item ID','Item Name'])['Price'].count()
topitemstotal = purchase_data.groupby(['Item ID','Item Name'])['Price'].sum()

#creates dictionary of item purchase statistics
topitemsdic = {'Purchase Count': topitemscount, 'Item Price': topitemsmean, 'Total Purchase Value': topitemstotal}

#converts dictionary to dataframe specifies preview of top entries and sort values by descending purchase count
topidf = pd.DataFrame(topitemsdic).sort_values("Purchase Count", ascending = False).head()

#displays formatted dataframe
topidf.style.format({'Item Price': '${:.2f}', 'Total Purchase Value': '${:.2f}'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13,$4.61,$59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
145,Fiery Glass Crusader,9,$4.58,$41.22
132,Persuasion,9,$3.22,$28.99
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77


<h3>Most Profitable Items</h3>

In [15]:
#converts dictionary to dataframe specifies preview of top entries and sort values by descending total purchase value
toppf = pd.DataFrame(topitemsdic).sort_values("Total Purchase Value", ascending = False).head()

#displays formatted dataframe
toppf.style.format({'Item Price': '${:.2f}', 'Total Purchase Value': '${:.2f}'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13,$4.61,$59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
82,Nirvana,9,$4.90,$44.10
145,Fiery Glass Crusader,9,$4.58,$41.22
103,Singed Scalpel,8,$4.35,$34.80


<h1>Data Observations</h1>
<br>

<h3>1. Age Demographics</h3><br>
Based on analysis of age demographics the 20 - 24 age group contains the highest percentage of players, almost 45% of total players. This age group is also the 3rd highest average total purchase per person group, after the 35-39 and under 10 group. The total value of purchases made by this group is \$1114.06 which is almost 47% of total purchase value. This makes this age group a valuable demographic to target for marketing of the game and ingame purchases. The average item price purchased was \$3.05 with only \$4.35 per player which shows that on average a player in this demographic is currently only making a single purchase.

<h3>2. Player Purchasing Statistics</h3><br>
Based on analysis of the top purchasing players no single player is disproportionality skewing purchase data with large numbers of purchases. The top 5 players only purchasing a max number of items of 5 and a total value of \$18.96. This means that using the purchasing statistics derived from this data is robust against purchasing outliers. It also indicates that the total revenue of the game is not heavily reliant on a small group of people which means loss of interest from the top spenders won't heavily impact revenue but also could mean that players are not invested enough in the game to make repeated purchases.

<h3>3. Item Purchasing Statistics</h3><br>
The average item price was \$3.05 and the top 5 purchased items were above the average price indicating that players were attracted to and willing to purchase the higher priced items. However no single item made up the bulk of purchases with the top item only purchased 13 times. This could also indicate that the items were percieved by the majority of players to be priced to high for the in game value. The top 10 most purchased items were all over \$2 which also suggests that cheaper items were not made desirable enough for players to purchase. Another factor could be that purchased items appear to be specific weapon names which probably require a single purchase per player and remain a permanent feature for the player. 


<h2>Summary</h2><br>
In conclusion, it may be considered more profitable for the game to have items percieved as desirable for the 20 - 24 age demographic set at a lower price to elicit more frequent purchases and to also have these items as non-permenant features that deliver a short-term benifit for players.
