### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [None]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)
purchase_data.head()

## Player Count

* Display the total number of players


In [None]:
tplay = purchase_data.SN.nunique()
tplay

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [None]:
#Find number of unique items
uniitem = purchase_data["Item Name"].nunique()
uniitem

In [None]:
#Find the average paid for all items
avgprice = purchase_data.Price.mean()


In [None]:
purchsum = pd.DataFrame({"Unique Items":[uniitem], "Average Price": [avgprice]})
purchsum["Average Price"] = purchsum["Average Price"].map("${:.2f}".format)
purchsum.head()

## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [None]:
#Groupby gender, unique count of SN
gen_gb = purchase_data.groupby(["Gender"])
gen_count = gen_gb.nunique()
gen_count.head()

In [None]:
#Counts by Gender
malecount = gen_count.loc["Male", "SN"]
femalecount = gen_count.loc["Female", "SN"]
othercount = gen_count.loc["Other / Non-Disclosed", "SN"]


In [None]:
#Calc percentages
maleperc = 100*malecount/tplay
femaleperc = 100*femalecount/tplay
otherperc = 100*othercount/tplay


In [None]:
#Summary table
gendersum = pd.DataFrame({"Gender": ["Male", "Female", "Other / Non-Disclosed", "Total"],
                          "Unique Players": [malecount, femalecount, othercount, tplay],
                         "% of Players": [maleperc, femaleperc, otherperc, "100%"]})

gendersum.head()


## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [None]:
#total volume of purchases by gender
gen_purch = gen_gb.count()
malepur = gen_purch.loc["Male", "Purchase ID"]
femalepur = gen_purch.loc["Female", "Purchase ID"]
otherpur = gen_purch.loc["Other / Non-Disclosed", "Purchase ID"]


In [None]:
#Purchases per player by gender
malepurplay = malepur / malecount
femalepurplay = femalepur / femalecount
otherpurplay = otherpur / othercount


In [None]:
#avg purchase price by gender
gen_avgp = gen_gb.mean()
maleavgp = gen_avgp.loc["Male", "Price"]
femaleavgp = gen_avgp.loc["Female", "Price"]
otheravgp = gen_avgp.loc["Other / Non-Disclosed", "Price"]
gen_avgp

In [None]:
#total spent by gender
gen_pr = gen_gb.sum()
malespent = gen_pr.loc["Male", "Price"]
femalespent = gen_pr.loc["Female", "Price"]
otherspent = gen_pr.loc["Other / Non-Disclosed", "Price"]

In [None]:
#avg spent per player by gender
maleavgspent = malespent / malecount
femaleavgspent = femalespent / femalecount
otheravgspent = otherspent / othercount


In [None]:
#New summart table
gender_aggs = pd.DataFrame ({"Gender": ["Male", "Female", "Other / Non-Disclosed"],
                            "Total Purchases": [malepur, femalepur, otherpur],
                            "Purchases per Player": [malepurplay, femalepurplay, otherpurplay],
                             "Avg Purchase Price": [maleavgp, femaleavgp, otheravgp],
                             "Total Spent": [malespent, femalespent, otherspent],
                             "Avg Spent per Player": [maleavgspent, femaleavgspent, otheravgspent]})
#cleaner formatting
#Format to %
gender_aggs["Purchases per Player"] = gender_aggs["Purchases per Player"].map("{:.2f}".format)
gender_aggs["Avg Purchase Price"] = gender_aggs["Avg Purchase Price"].map("${:.2f}".format)
gender_aggs["Total Spent"] = gender_aggs["Total Spent"].map("${:.2f}".format)
gender_aggs["Avg Spent per Player"] = gender_aggs["Avg Spent per Player"].map("${:.2f}".format)
gender_aggs.head()

## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [None]:
#bin by 10under, every 5, 40up
bins = [0, 10, 15, 20, 25, 30, 35, 40, 150]

#bin names
group_names = ["10 & Under", "11-15", "16-20", "21-25", "26-30", "31-35", "36-40", "Older than 40"]

purchase_data["Age_Group"] = pd.cut(purchase_data["Age"], bins, labels=group_names, include_lowest=True)
purchase_data.Age_Group.value_counts()

In [None]:
#Total per age group
age_gb = purchase_data.groupby(["Age_Group"])
agegroup = age_gb.nunique()
group1 = agegroup.loc["10 & Under", "SN"]
group2 = agegroup.loc["11-15", "SN"]
group3 = agegroup.loc["16-20", "SN"]
group4 = agegroup.loc["21-25", "SN"]
group5 = agegroup.loc["26-30", "SN"]
group6 = agegroup.loc["31-35", "SN"]
group7 = agegroup.loc["36-40", "SN"]
group8 = agegroup.loc["Older than 40", "SN"]


In [None]:
#Percent by age group
group_perc1 = group1 / tplay * 100
group_perc2 = group2 / tplay * 100
group_perc3 = group3 / tplay * 100
group_perc4 = group4 / tplay * 100
group_perc5 = group5 / tplay * 100
group_perc6 = group6 / tplay * 100
group_perc7 = group7 / tplay * 100
group_perc8 = group8 / tplay * 100

In [None]:
#Age Group Summary table
age_group_sum = pd.DataFrame ({"Age Group": group_names,
                               "Number of Players": [group1, group2, group3, group4, group5, group6, group7, group8],
                              "% of Players": [group_perc1, group_perc2, group_perc3, group_perc4, group_perc5,
                                               group_perc6, group_perc7, group_perc8]})
#Format percent
age_group_sum["% of Players"] = age_group_sum["% of Players"].map("{:.2f}%".format)
age_group_sum.head(10)

## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [None]:
#total volume of purchases by agegroup
age_purch = age_gb.count()
g1_pur = age_purch.loc["10 & Under", "Purchase ID"]
g2_pur = age_purch.loc["11-15", "Purchase ID"]
g3_pur = age_purch.loc["16-20", "Purchase ID"]
g4_pur = age_purch.loc["21-25", "Purchase ID"]
g5_pur = age_purch.loc["26-30", "Purchase ID"]
g6_pur = age_purch.loc["31-35", "Purchase ID"]
g7_pur = age_purch.loc["36-40", "Purchase ID"]
g8_pur = age_purch.loc["Older than 40", "Purchase ID"]


In [None]:
#Avg purchase per play by age group
#Purchases per player by gender
g1_purplay = g1_pur / group1
g2_purplay = g2_pur / group2
g3_purplay = g3_pur / group3
g4_purplay = g4_pur / group4
g5_purplay = g5_pur / group5
g6_purplay = g6_pur / group6
g7_purplay = g7_pur / group7
g8_purplay = g8_pur / group8


In [None]:
#avg purchase price by agegroup
age_avgp = age_gb.mean()
g1_avgp = age_avgp.loc["10 & Under", "Price"]
g2_avgp = age_avgp.loc["11-15", "Price"]
g3_avgp = age_avgp.loc["11-15", "Price"]
g4_avgp = age_avgp.loc["11-15", "Price"]
g5_avgp = age_avgp.loc["11-15", "Price"]
g6_avgp = age_avgp.loc["11-15", "Price"]
g7_avgp = age_avgp.loc["11-15", "Price"]
g8_avgp = age_avgp.loc["11-15", "Price"]

In [None]:
#total spent by age group
age_spent = age_gb.sum()
g1_spent = age_spent.loc["10 & Under", "Price"]
g2_spent = age_spent.loc["11-15", "Price"]
g3_spent = age_spent.loc["16-20", "Price"]
g4_spent = age_spent.loc["21-25", "Price"]
g5_spent = age_spent.loc["26-30", "Price"]
g6_spent = age_spent.loc["31-35", "Price"]
g7_spent = age_spent.loc["36-40", "Price"]
g8_spent = age_spent.loc["Older than 40", "Price"]

In [None]:
#avg spent per player by age group
g1_spentplay = g1_spent / group1
g2_spentplay = g2_spent / group2
g3_spentplay = g3_spent / group3
g4_spentplay = g4_spent / group4
g5_spentplay = g5_spent / group5
g6_spentplay = g6_spent / group6
g7_spentplay = g7_spent / group7
g8_spentplay = g8_spent / group8

In [None]:
#sum table for age group spendings
age_aggs = pd.DataFrame ({"Age Group": group_names,
                               "Total Purchases": [g1_pur, g2_pur, g3_pur, g4_pur, g5_pur, g6_pur, g7_pur, g8_pur],
                               "Purchases per Player": [g1_purplay, g2_purplay, g3_purplay, g4_purplay,
                                                        g5_purplay, g6_purplay, g7_purplay, g8_purplay],
                               "Avg Purchase Price": [g1_avgp, g2_avgp, g3_avgp, g4_avgp, g5_avgp, g6_avgp, g7_avgp, g8_avgp],
                               "Total Spent": [g1_spent, g2_spent, g3_spent, g4_spent, g5_spent, g6_spent, g7_spent, g8_spent],
                               "Avg Spent per Player": [g1_spentplay, g2_spentplay, g3_spentplay, g4_spentplay, g5_spentplay, g6_spentplay, g7_spentplay, g8_spentplay]})
#reformat
age_aggs["Purchases per Player"] = age_aggs["Purchases per Player"].map("{:.2f}".format)
age_aggs["Avg Purchase Price"] = age_aggs["Avg Purchase Price"].map("${:.2f}".format)
age_aggs["Total Spent"] = age_aggs["Total Spent"].map("${:.2f}".format)
age_aggs["Avg Spent per Player"] = age_aggs["Avg Spent per Player"].map("${:.2f}".format)
age_aggs.head()

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [None]:
#groupby sn. Create DFs for count and sum
sn_gb = purchase_data.groupby(["SN"])
#Total purchases per player
tpurch = sn_gb["Purchase ID"].count().reset_index(drop = False)
#Total spent per player
tprice = sn_gb.Price.sum().reset_index(drop = False)
tpurch.head()

In [None]:
#Merge DFs
merge_df = pd.merge(tpurch, tprice, on = "SN", how = "inner")
#rename purchase id to total pruchases, price to total spent
top_df = merge_df.rename(columns={"Purchase ID": "Total Purchases", "Price": "Total Spent"})
top_df.head()

In [None]:
#Sort by total spent
top_df = top_df.sort_values(["Total Spent"], ascending=False)
#add average price
top_df["Average Price"] = top_df["Total Spent"] / top_df["Total Purchases"]
#new index
top_spent_df = top_df.reset_index(drop=True)
top_spent_df.head()


In [None]:
#format to $
top_spent_df["Total Spent"] = top_spent_df["Total Spent"].map("${:.2f}".format)
top_spent_df["Average Price"] = top_spent_df["Average Price"].map("${:.2f}".format)
top5 = top_spent_df.head(5)
top5

## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [None]:
#create df for item info
items_df = purchase_data[["Purchase ID", "Item ID", "Item Name", "Price"]]
items_df.head()

In [None]:
#group id and name
items_gb = items_df.groupby(["Item ID", "Item Name"])


In [None]:
#create df from counts
items_count_df = items_gb.count()
items_count_rename = items_count_df.rename(columns={"Purchase ID": "Number of Purchases"})
del items_count_rename["Price"]
items_count_rename.head()

In [None]:
#create df for totals
items_tot_df = items_gb.sum()
items_tot_rename = items_tot_df.rename(columns={"Price": "Total"})
del items_tot_rename["Purchase ID"]
items_tot_rename.head()


In [None]:
items_price = items_gb.mean()
del items_price["Purchase ID"]
items_price.head()


In [None]:
#merge
items_m = pd.merge(items_count_rename, items_price, on = ["Item ID", "Item Name"], how = "inner")
items_merge = pd.merge(items_m, items_tot_rename, on = ["Item ID", "Item Name"], how = "inner")
items_merge.head()

In [None]:
#sort by purchase total, clean format
top_seller = items_merge.sort_values(["Number of Purchases"], ascending=False)
top_seller["Price"] = top_seller["Price"].map("${:.2f}".format)
top_seller["Total"] = top_seller["Total"].map("${:.2f}".format)
top_seller.head(10)


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [None]:
#sort by top total, clean format
top_earner = items_merge.sort_values(["Total"], ascending=False)
top_earner["Price"] = top_earner["Price"].map("${:.2f}".format)
top_earner["Total"] = top_earner["Total"].map("${:.2f}".format)
top_earner.head(10)

In [None]:
#write to summary tables in xlsx to copy into report
#List of variables and sum tables to print
    # tplay, purchsum, gendersum, gender_aggs, age_group_sum,
    # age_aggs, top5, top_seller, top_earner

#writer = pd.ExcelWriter('pandas_multiple.xlsx', engine='xlsxwriter')

# Write each dataframe to a different worksheet.
#df1.to_excel(writer, sheet_name='Sheet1')
#df2.to_excel(writer, sheet_name='Sheet2')
#df3.to_excel(writer, sheet_name='Sheet3')

# Close the Pandas Excel writer and output the Excel file.
#writer.save()