### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count

* Display the total number of players


In [2]:
tplay = purchase_data.SN.nunique()
tplay

576

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
uniitem = purchase_data["Item Name"].nunique()
uniitem

179

In [4]:
avgprice = purchase_data.Price.mean()


In [5]:
purchsum = pd.DataFrame({"Total Players":[tplay], "Unique Items":[uniitem], "Average Price": [avgprice]})
purchsum["Average Price"] = purchsum["Average Price"].map("${:.2f}".format)
purchsum.head()

Unnamed: 0,Total Players,Unique Items,Average Price
0,576,179,$3.05


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [6]:
#Groupby gender, unique count of SN
gen_gb = purchase_data.groupby(["Gender"])
gen_count = gen_gb.nunique()
gen_count.head()

Unnamed: 0_level_0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Female,113,81,22,1,90,90,79
Male,652,484,39,1,178,178,144
Other / Non-Disclosed,15,11,8,1,13,13,12


In [7]:
#Counts by Gender
malecount = gen_count.loc["Male", "SN"]
femalecount = gen_count.loc["Female", "SN"]
othercount = gen_count.loc["Other / Non-Disclosed", "SN"]


In [8]:
#Calc percentages
maleperc = 100*malecount/tplay
femaleperc = 100*femalecount/tplay
otherperc = 100*othercount/tplay


In [9]:
#Summary table
gendersum = pd.DataFrame({"Gender": ["Male", "Female", "Other / Non-Disclosed", "Total"],
                          "Unique Players": [malecount, femalecount, othercount, tplay],
                         "% of Players": [maleperc, femaleperc, otherperc, "100%"]})

gendersum.head()

Unnamed: 0,Gender,Unique Players,% of Players
0,Male,484,84.0278
1,Female,81,14.0625
2,Other / Non-Disclosed,11,1.90972
3,Total,576,100%



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [10]:
#total volume of purchases by gender
gen_purch = gen_gb.count()
malepur = gen_purch.loc["Male", "Purchase ID"]
femalepur = gen_purch.loc["Female", "Purchase ID"]
otherpur = gen_purch.loc["Other / Non-Disclosed", "Purchase ID"]


In [11]:
#Purchases per player by gender
malepurplay = malepur / malecount
femalepurplay = femalepur / femalecount
otherpurplay = otherpur / othercount


In [12]:
#avg purchase price by gender
gen_avgp = gen_gb.mean()
maleavgp = gen_avgp.loc["Male", "Price"]
femaleavgp = gen_avgp.loc["Female", "Price"]
otheravgp = gen_avgp.loc["Other / Non-Disclosed", "Price"]
gen_avgp

Unnamed: 0_level_0,Purchase ID,Age,Item ID,Price
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,379.380531,21.345133,85.477876,3.203009
Male,392.516871,22.917178,93.095092,3.017853
Other / Non-Disclosed,334.6,24.2,80.8,3.346


In [13]:
#total spent by gender
gen_pr = gen_gb.sum()
malespent = gen_pr.loc["Male", "Price"]
femalespent = gen_pr.loc["Female", "Price"]
otherspent = gen_pr.loc["Other / Non-Disclosed", "Price"]

In [14]:
#avg spent per player by gender
maleavgspent = malespent / malecount
femaleavgspent = femalespent / femalecount
otheravgspent = otherspent / othercount


In [15]:
#New summart table
gender_aggs = pd.DataFrame ({"Gender": ["Male", "Female", "Other / Non-Disclosed"],
                            "Total Purchases": [malepur, femalepur, otherpur],
                            "Purchases per Player": [malepurplay, femalepurplay, otherpurplay],
                             "Avg Purchase Price": [maleavgp, femaleavgp, otheravgp],
                             "Total Spent": [malespent, femalespent, otherspent],
                             "Avg Spent per Player": [maleavgspent, femaleavgspent, otheravgspent]})
#cleaner formatting
#Format to %
gender_aggs["Purchases per Player"] = gender_aggs["Purchases per Player"].map("{:.2f}".format)
gender_aggs["Avg Purchase Price"] = gender_aggs["Avg Purchase Price"].map("${:.2f}".format)
gender_aggs["Total Spent"] = gender_aggs["Total Spent"].map("${:.2f}".format)
gender_aggs["Avg Spent per Player"] = gender_aggs["Avg Spent per Player"].map("${:.2f}".format)
gender_aggs.head()

Unnamed: 0,Gender,Total Purchases,Purchases per Player,Avg Purchase Price,Total Spent,Avg Spent per Player
0,Male,652,1.35,$3.02,$1967.64,$4.07
1,Female,113,1.4,$3.20,$361.94,$4.47
2,Other / Non-Disclosed,15,1.36,$3.35,$50.19,$4.56


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [16]:
#bin by 10under, every 5, 40up
bins = [0, 10, 15, 20, 25, 30, 35, 40, 150]

#bin names
group_names = ["10 & Under", "11-15", "16-20", "21-25", "26-30", "31-35", "36-40", "Older than 40"]

purchase_data["Age_Group"] = pd.cut(purchase_data["Age"], bins, labels=group_names, include_lowest=True)
purchase_data.Age_Group.value_counts()

21-25            325
16-20            200
26-30             77
11-15             54
31-35             52
36-40             33
10 & Under        32
Older than 40      7
Name: Age_Group, dtype: int64

In [17]:
#Total per age group
age_gb = purchase_data.groupby(["Age_Group"])
agegroup = age_gb.nunique()
group1 = agegroup.loc["10 & Under", "SN"]
group2 = agegroup.loc["11-15", "SN"]
group3 = agegroup.loc["16-20", "SN"]
group4 = agegroup.loc["21-25", "SN"]
group5 = agegroup.loc["26-30", "SN"]
group6 = agegroup.loc["31-35", "SN"]
group7 = agegroup.loc["36-40", "SN"]
group8 = agegroup.loc["Older than 40", "SN"]


In [18]:
#Percent by age group
group_perc1 = group1 / tplay * 100
group_perc2 = group2 / tplay * 100
group_perc3 = group3 / tplay * 100
group_perc4 = group4 / tplay * 100
group_perc5 = group5 / tplay * 100
group_perc6 = group6 / tplay * 100
group_perc7 = group7 / tplay * 100
group_perc8 = group8 / tplay * 100

In [19]:
#Age Group Summary table
age_group_sum = pd.DataFrame ({"Age Group": group_names,
                               "Number of Players": [group1, group2, group3, group4, group5, group6, group7, group8],
                              "% of Players": [group_perc1, group_perc2, group_perc3, group_perc4, group_perc5,
                                               group_perc6, group_perc7, group_perc8]})
#Format percent
age_group_sum["% of Players"] = age_group_sum["% of Players"].map("{:.2f}%".format)
age_group_sum.head(10)

Unnamed: 0,Age Group,Number of Players,% of Players
0,10 & Under,24,4.17%
1,11-15,41,7.12%
2,16-20,150,26.04%
3,21-25,232,40.28%
4,26-30,59,10.24%
5,31-35,37,6.42%
6,36-40,26,4.51%
7,Older than 40,7,1.22%


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [20]:
#total volume of purchases by agegroup
age_purch = age_gb.count()
g1_pur = age_purch.loc["10 & Under", "Purchase ID"]
g2_pur = age_purch.loc["11-15", "Purchase ID"]
g3_pur = age_purch.loc["16-20", "Purchase ID"]
g4_pur = age_purch.loc["21-25", "Purchase ID"]
g5_pur = age_purch.loc["26-30", "Purchase ID"]
g6_pur = age_purch.loc["31-35", "Purchase ID"]
g7_pur = age_purch.loc["36-40", "Purchase ID"]
g8_pur = age_purch.loc["Older than 40", "Purchase ID"]


In [21]:
#Avg purchase per play by age group
#Purchases per player by gender
g1_purplay = g1_pur / group1
g2_purplay = g2_pur / group2
g3_purplay = g3_pur / group3
g4_purplay = g4_pur / group4
g5_purplay = g5_pur / group5
g6_purplay = g6_pur / group6
g7_purplay = g7_pur / group7
g8_purplay = g8_pur / group8


In [22]:
#avg purchase price by agegroup
age_avgp = age_gb.mean()
g1_avgp = age_avgp.loc["10 & Under", "Price"]
g2_avgp = age_avgp.loc["11-15", "Price"]
g3_avgp = age_avgp.loc["11-15", "Price"]
g4_avgp = age_avgp.loc["11-15", "Price"]
g5_avgp = age_avgp.loc["11-15", "Price"]
g6_avgp = age_avgp.loc["11-15", "Price"]
g7_avgp = age_avgp.loc["11-15", "Price"]
g8_avgp = age_avgp.loc["11-15", "Price"]

In [23]:
#total spent by age group
age_spent = age_gb.sum()
g1_spent = age_spent.loc["10 & Under", "Price"]
g2_spent = age_spent.loc["11-15", "Price"]
g3_spent = age_spent.loc["16-20", "Price"]
g4_spent = age_spent.loc["21-25", "Price"]
g5_spent = age_spent.loc["26-30", "Price"]
g6_spent = age_spent.loc["31-35", "Price"]
g7_spent = age_spent.loc["36-40", "Price"]
g8_spent = age_spent.loc["Older than 40", "Price"]

In [24]:
#avg spent per player by age group
g1_spentplay = g1_spent / group1
g2_spentplay = g2_spent / group2
g3_spentplay = g3_spent / group3
g4_spentplay = g4_spent / group4
g5_spentplay = g5_spent / group5
g6_spentplay = g6_spent / group6
g7_spentplay = g7_spent / group7
g8_spentplay = g8_spent / group8

In [25]:
#sum table for age group spendings
age_aggs = pd.DataFrame ({"Age Group": group_names,
                               "Total Purchases": [g1_pur, g2_pur, g3_pur, g4_pur, g5_pur, g6_pur, g7_pur, g8_pur],
                               "Purchases per Player": [g1_purplay, g2_purplay, g3_purplay, g4_purplay,
                                                        g5_purplay, g6_purplay, g7_purplay, g8_purplay],
                               "Avg Purchase Price": [g1_avgp, g2_avgp, g3_avgp, g4_avgp, g5_avgp, g6_avgp, g7_avgp, g8_avgp],
                               "Total Spent": [g1_spent, g2_spent, g3_spent, g4_spent, g5_spent, g6_spent, g7_spent, g8_spent],
                               "Avg Spent per Player": [g1_spentplay, g2_spentplay, g3_spentplay, g4_spentplay, g5_spentplay, g6_spentplay, g7_spentplay, g8_spentplay]})
#reformat
age_aggs["Purchases per Player"] = age_aggs["Purchases per Player"].map("{:.2f}".format)
age_aggs["Avg Purchase Price"] = age_aggs["Avg Purchase Price"].map("${:.2f}".format)
age_aggs["Total Spent"] = age_aggs["Total Spent"].map("${:.2f}".format)
age_aggs["Avg Spent per Player"] = age_aggs["Avg Spent per Player"].map("${:.2f}".format)
age_aggs.head()

Unnamed: 0,Age Group,Total Purchases,Purchases per Player,Avg Purchase Price,Total Spent,Avg Spent per Player
0,10 & Under,32,1.33,$3.40,$108.96,$4.54
1,11-15,54,1.32,$2.90,$156.60,$3.82
2,16-20,200,1.33,$2.90,$621.56,$4.14
3,21-25,325,1.4,$2.90,$981.64,$4.23
4,26-30,77,1.31,$2.90,$221.42,$3.75


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [34]:
#groupby sn. Create DFs for count and sum
sn_gb = purchase_data.groupby(["SN"])
#Total purchases per player
tpurch = sn_gb["Purchase ID"].count().reset_index(drop = False)
#Total spent per player
tprice = sn_gb.Price.sum().reset_index(drop = False)
tpurch.head()

Unnamed: 0,SN,Purchase ID
0,Adairialis76,1
1,Adastirin33,1
2,Aeda94,1
3,Aela59,1
4,Aelaria33,1


In [71]:
#Merge DFs
merge_df = pd.merge(tpurch, tprice, on = "SN", how = "inner")
#rename purchase id to total pruchases, price to total spent
top_df = merge_df.rename(columns={"Purchase ID": "Total Purchases", "Price": "Total Spent"})
top_df.head()

Unnamed: 0,SN,Total Purchases,Total Spent
0,Adairialis76,1,2.28
1,Adastirin33,1,4.48
2,Aeda94,1,4.91
3,Aela59,1,4.32
4,Aelaria33,1,1.79


In [75]:
#Sort by total spent
top_df = top_df.sort_values(["Total Spent"], ascending=False)
#add average price
top_df["Average Price"] = top_df["Total Spent"] / top_df["Total Purchases"]
#new index
top_spent_df = top_df.reset_index(drop=True)
top_spent_df.head()


Unnamed: 0,SN,Total Purchases,Total Spent,Average Price
0,Lisosia93,5,18.96,3.792
1,Idastidru52,4,15.45,3.8625
2,Chamjask73,3,13.83,4.61
3,Iral74,4,13.62,3.405
4,Iskadarya95,3,13.1,4.366667


In [76]:
#format to $
top_spent_df["Total Spent"] = top_spent_df["Total Spent"].map("${:.2f}".format)
top_spent_df["Average Price"] = top_spent_df["Average Price"].map("${:.2f}".format)
top_spent_df.head()

Unnamed: 0,SN,Total Purchases,Total Spent,Average Price
0,Lisosia93,5,$18.96,$3.79
1,Idastidru52,4,$15.45,$3.86
2,Chamjask73,3,$13.83,$4.61
3,Iral74,4,$13.62,$3.40
4,Iskadarya95,3,$13.10,$4.37


In [None]:
#summary table

## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

