### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [26]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

## Player Count

* Display the total number of players


In [80]:
groupeddata = purchase_data.groupby(["SN"])
playercount = len(groupeddata)
# print(f"There are {playercount} unique players")
pc_df = pd.DataFrame(data=[playercount],columns=["Total Amount of Unique Players"])
pc_df

Unnamed: 0,Total Amount of Unique Players
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [82]:
uniqueitems = purchase_data["Item ID"].unique()
averageprice = purchase_data["Price"]
numpurchases = purchase_data.groupby(["Purchase ID"])
totalrevenue = purchase_data["Price"]
PurchaseSum = pd.DataFrame({
    "Total Unique Items" :[len(uniqueitems)],
    "Average Purchase Price" :[round(averageprice.mean(),2)],
    "Total Number of Purchases" :[len(numpurchases)],
    "Total Revenue" :[sum(totalrevenue)]
})
PurchaseSum = PurchaseSum.style.format({ "Average Purchase Price":'${:,.2f}',"Total Revenue":'${:,.2f}'})
PurchaseSum


Unnamed: 0,Total Unique Items,Average Purchase Price,Total Number of Purchases,Total Revenue
0,183,$3.05,780,"$2,379.77"


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [83]:
genderdata = groupeddata['Gender'].max().value_counts()

malecount = genderdata['Male']
malepercent = (malecount/playercount)*100

femcount = genderdata['Female']
fempercent = (femcount/playercount)*100

othcount = genderdata['Other / Non-Disclosed']
othpercent = (othcount/playercount)*100

genderdf = pd.DataFrame({
    "Gender":['Male','Female','Other/Non-Disclosed'],
    "Total Count of Gender" :[malecount,femcount,othcount],
    "Percent" :[malepercent,fempercent,othpercent],
})

genderdf = genderdf.set_index('Gender')
genderdf = genderdf.style.format({'Percent': '{:,.2f}%'})
genderdf

Unnamed: 0_level_0,Total Count of Gender,Percent
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Male,484,84.03%
Female,81,14.06%
Other/Non-Disclosed,11,1.91%



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [30]:
maleonlydf = purchase_data[purchase_data['Gender']=='Male']
femonlydf = purchase_data[purchase_data['Gender']=='Female']
otheronlydf = purchase_data[purchase_data['Gender']=='Other / Non-Disclosed']

#male demographics
malepurchasecount = maleonlydf['Purchase ID'].count()
maleavgprice = maleonlydf['Price'].mean()
maletotalrev = maleonlydf['Price'].sum()
maleperperson = (maletotalrev/malecount)

#female demographics
femepurchasecount = femonlydf['Purchase ID'].count()
femavgprice = femonlydf['Price'].mean()
femtotalrev = femonlydf['Price'].sum()
femperperson = (femtotalrev/femcount)

#other demographics
otherpurchasecount = otheronlydf['Purchase ID'].count()
otheravgprice = otheronlydf['Price'].mean()
othertotalrev = otheronlydf['Price'].sum()
otherperperson = (othertotalrev/othcount)


genderdemodf = pd.DataFrame({
    "Gender":['Male','Female','Other/Non-Disclosed'],
    "Total Purchases":[malepurchasecount,femepurchasecount,otherpurchasecount],
    "Average Price":[maleavgprice,femavgprice,otheravgprice],
    "Total Revenue":[maletotalrev,femtotalrev,othertotalrev],
    "Avg Price/Gender":[maleperperson,femperperson,otherperperson]

 })
genderdemodf = genderdemodf.set_index('Gender')
genderdemodf = genderdemodf.style.format({'Average Price': '${:,.2f}','Total Revenue': '${:,.2f}','Avg Price/Gender': '${:,.2f}'})
genderdemodf


Unnamed: 0_level_0,Total Purchases,Average Price,Total Revenue,Avg Price/Gender
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Male,652,$3.02,"$1,967.64",$4.07
Female,113,$3.20,$361.94,$4.47
Other/Non-Disclosed,15,$3.35,$50.19,$4.56


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [31]:
# datatobin = pd.DataFrame(purchase_data['SN'].unique())
datatobin = pd.DataFrame([purchase_data['SN'],purchase_data['Age']]).transpose()
datatobin.drop_duplicates('SN', keep='first', inplace=True)

# datatobin = pd.merge(datatobin,datatoright, how = 'left', left_on='SN', right_on='SN' )
# datatobin

# bins = [0, 10,15, 20, 25, 30, 35, 40, 200]


binrange = [10 + (5*i) for i in range(7)]
bins = [0]+binrange+[200]
labels = ['<10','10-14','15-19','20-24','25-29','30-34','35-39','+40']
whatever = pd.cut(datatobin['Age'], bins = bins, labels = labels)

binnedpd = purchase_data
binnedpd['whatever'] = whatever

groupedbin = binnedpd.groupby(["whatever"])
binnedcount = groupedbin['SN'].count()
agepercent = (binnedcount / playercount)*100
# agepercent
# #####ASSIGN BINNED COUNT PER AGE RANGE THEN CAN USE IN CALCULATIONS BELOW
# binnedcount
agedf1 = pd.DataFrame({"Total Count":binnedcount,
                      "Percentage of Players":agepercent})
agedf1["Percentage of Players"] = agedf1["Percentage of Players"].map("{:.2f}%".format)
agedf1
                                                                       


Unnamed: 0_level_0,Total Count,Percentage of Players
whatever,Unnamed: 1_level_1,Unnamed: 2_level_1
<10,24,4.17%
10-14,41,7.12%
15-19,150,26.04%
20-24,232,40.28%
25-29,59,10.24%
30-34,37,6.42%
35-39,26,4.51%
+40,7,1.22%


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [32]:
binrange = [10 + (5*i) for i in range(7)]
bins = [0]+binrange+[200]
labels = ['<10','10-14','15-19','20-24','25-29','30-34','35-39','+40']
PurchaseAnalysis = pd.cut(purchase_data['Age'], bins = bins, labels = labels)
binnedPA = purchase_data
binnedPA['PurchaseAnalysis'] = PurchaseAnalysis

lessthan10df = purchase_data[purchase_data['PurchaseAnalysis']=='<10']
tentofourteendf = purchase_data[purchase_data['PurchaseAnalysis']=='10-14']
fifteentonineteendf = purchase_data[purchase_data['PurchaseAnalysis']=='15-19']
twentytotwentyfourdf = purchase_data[purchase_data['PurchaseAnalysis']=='20-24']
twentyfivetotwentyninedf = purchase_data[purchase_data['PurchaseAnalysis']=='25-29']
thirtytothirtyfourdf = purchase_data[purchase_data['PurchaseAnalysis']=='30-34']
thirtyfivetothirtyninedf = purchase_data[purchase_data['PurchaseAnalysis']=='35-39']
overfortydf = purchase_data[purchase_data['PurchaseAnalysis']=='+40']

lessthan10pc = lessthan10df['Purchase ID'].count()
lessthan10avp = lessthan10df['Price'].mean()
lessthan10tr = lessthan10df['Price'].sum()
lessthan10pp = (lessthan10tr/lessthan10df['whatever'].count())

tentofourteenpc = tentofourteendf['Purchase ID'].count()
tentofourteenavp = tentofourteendf['Price'].mean()
tentofourteentr = tentofourteendf['Price'].sum()
tentofourteenpp = (tentofourteentr/tentofourteendf['whatever'].count())

# fifteentonineteendf
fifteentonineteenpc = fifteentonineteendf['Purchase ID'].count()
fifteentonineteenavp = fifteentonineteendf['Price'].mean()
fifteentonineteentr = fifteentonineteendf['Price'].sum()
fifteentonineteenpp = (fifteentonineteentr/fifteentonineteendf['whatever'].count())

# twentytotwentyfourdf
twentytotwentyfourpc = twentytotwentyfourdf['Purchase ID'].count()
twentytotwentyfouravp = twentytotwentyfourdf['Price'].mean()
twentytotwentyfourtr = twentytotwentyfourdf['Price'].sum()
twentytotwentyfourpp = (twentytotwentyfourtr/twentytotwentyfourdf['whatever'].count())

# twentyfivetotwentyninedf
twentyfivetotwentyninepc = twentyfivetotwentyninedf['Purchase ID'].count()
twentyfivetotwentynineavp = twentyfivetotwentyninedf['Price'].mean()
twentyfivetotwentyninetr = twentyfivetotwentyninedf['Price'].sum()
twentyfivetotwentyninepp = (twentyfivetotwentyninetr/twentyfivetotwentyninedf['whatever'].count())

# thirtytothirtyfourdf
thirtytothirtyfourpc = thirtytothirtyfourdf['Purchase ID'].count()
thirtytothirtyfouravp = thirtytothirtyfourdf['Price'].mean()
thirtytothirtyfourtr = thirtytothirtyfourdf['Price'].sum()
thirtytothirtyfourpp = (thirtytothirtyfourtr/thirtytothirtyfourdf['whatever'].count())

# thirtyfivetothirtyninedf
thirtyfivetothirtyninepc = thirtyfivetothirtyninedf['Purchase ID'].count()
thirtyfivetothirtynineavp = thirtyfivetothirtyninedf['Price'].mean()
thirtyfivetothirtyninetr = thirtyfivetothirtyninedf['Price'].sum()
thirtyfivetothirtyninepp = (thirtyfivetothirtyninetr/thirtyfivetothirtyninedf['whatever'].count())

# overfortydf
overfortypc = overfortydf['Purchase ID'].count()
overfortyavp = overfortydf['Price'].mean()
overfortytr = overfortydf['Price'].sum()
overfortypp = (lessthan10tr/overfortydf['whatever'].count())


PADF = pd.DataFrame({
    "Age Range":['<10','10-14','15-19','20-24','25-29','30-34','35-39','+40'],
    "Total Purchases":[lessthan10pc,tentofourteenpc,fifteentonineteenpc,twentytotwentyfourpc,twentyfivetotwentyninepc,thirtytothirtyfourpc,thirtyfivetothirtyninepc,overfortypc],
    "Average Price":[lessthan10avp,tentofourteenavp,fifteentonineteenavp,twentytotwentyfouravp,twentyfivetotwentynineavp,thirtytothirtyfouravp,thirtyfivetothirtynineavp,overfortyavp],
    "Total Revenue":[lessthan10tr,tentofourteentr,fifteentonineteentr,twentytotwentyfourtr,twentyfivetotwentyninetr,thirtytothirtyfourtr,thirtyfivetothirtyninetr,overfortytr],
    "Avg Price/Age Range":[lessthan10pp,tentofourteenpp,fifteentonineteenpp,twentytotwentyfourpp,twentyfivetotwentyninepp,thirtytothirtyfourpp,thirtyfivetothirtyninepp,overfortypp]

 })
PADF = PADF.set_index('Age Range')
PADF = PADF.style.format({'Average Price': '${:,.2f}','Total Revenue': '${:,.2f}','Avg Price/Age Range': '${:,.2f}'})
PADF



# otherpurchasecount = otheronlydf['Purchase ID'].count()
# otheravgprice = otheronlydf['Price'].mean()
# othertotalrev = otheronlydf['Price'].sum()
# otherperperson = (othertotalrev/othcount)


Unnamed: 0_level_0,Total Purchases,Average Price,Total Revenue,Avg Price/Age Range
Age Range,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,32,$3.40,$108.96,$4.54
10-14,54,$2.90,$156.60,$3.82
15-19,200,$3.11,$621.56,$4.14
20-24,325,$3.02,$981.64,$4.23
25-29,77,$2.88,$221.42,$3.75
30-34,52,$2.99,$155.71,$4.21
35-39,33,$3.40,$112.35,$4.32
+40,7,$3.08,$21.53,$15.57


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [33]:
fortopspender = pd.DataFrame([purchase_data['SN'],purchase_data['Price']]).transpose()

topspendertotal = purchase_data.groupby(["SN"]).sum()["Price"]
topspendercount = purchase_data.groupby(["SN"]).count()["Price"]
topspenderavg = purchase_data.groupby(["SN"]).mean()["Price"]


topspenderchart = pd.DataFrame({
     "Purchase Count":topspendercount,
     "Average Purchase Price":topspenderavg,
     "Total Purchase Value":topspendertotal
})
topspenderchart = topspenderchart.reset_index()
topspenderchart.drop_duplicates("SN",inplace=True)
topspenderchart.sort_values("Total Purchase Value",ascending=False,inplace=True)
topspenderchart = topspenderchart.set_index('SN')
topspenderchart = topspenderchart.head()
topspenderchart = topspenderchart.style.format({"Average Purchase Price": '${:,.2f}',"Total Purchase Value": '${:,.2f}'})
topspenderchart



Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$3.79,$18.96
Idastidru52,4,$3.86,$15.45
Chamjask73,3,$4.61,$13.83
Iral74,4,$3.40,$13.62
Iskadarya95,3,$4.37,$13.10


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [74]:
popularitemsdf = pd.DataFrame([purchase_data['Item ID'],purchase_data['Item Name'],purchase_data['Price']]).transpose()

totalpurchasevalue = popularitemsdf.groupby(['Item ID']).sum()["Price"]
purchasecount = popularitemsdf.groupby(['Item ID']).count()["Price"]

mpi_df = pd.DataFrame({
    "Item ID":popularitemsdf['Item ID'],
    "Item Name":popularitemsdf['Item Name'],
    "Purchase Count":purchasecount,
    "Item Price":popularitemsdf['Price'],
    "Total Purchase Value":totalpurchasevalue
})
mpi_df.drop_duplicates("Item ID",inplace=True)
mpi_df.drop_duplicates("Item Name",inplace=True)
mpi_df.sort_values("Purchase Count",ascending=False,inplace=True)
mpi_df = mpi_df.set_index(["Item ID", "Item Name"])
profit_df = mpi_df
mpi_df = mpi_df.head()
mpi_df = mpi_df.style.format({"Item Price": '${:,.2f}',"Total Purchase Value": '${:,.2f}'})
mpi_df

# topspendertotal = purchase_data.groupby(["SN"]).sum()["Price"]
# topspendercount = purchase_data.groupby(["SN"]).count()["Price"]
# topspenderavg = purchase_data.groupby(["SN"]).mean()["Price"]


Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
46,Hopeless Ebon Dualblade,9,$1.33,$41.22
85,Malificent Bag,9,$1.75,$31.77
160,Azurewrath,8,$4.40,$17.76
73,Ritual Mace,8,$2.05,$25.28
105,Hailstorm Shadowsteel Scythe,8,$3.03,$33.84


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [75]:
profit_df.sort_values("Total Purchase Value",ascending=False,inplace=True)
profit_dfhead = profit_df.head()
profit_dfhead = profit_dfhead.style.format({"Item Price": '${:,.2f}',"Total Purchase Value": '${:,.2f}'})
profit_dfhead

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
46,Hopeless Ebon Dualblade,9,$1.33,$41.22
39,"Betrayal, Whisper of Grieving Widows",8,$3.94,$39.04
105,Hailstorm Shadowsteel Scythe,8,$3.03,$33.84
85,Malificent Bag,9,$1.75,$31.77
50,Dawn,7,$4.60,$30.80
