### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [22]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
file_to_load = "purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
prch_df = pd.read_csv(file_to_load)
prch_df.columns

Index(['Purchase ID', 'SN', 'Age', 'Gender', 'Item ID', 'Item Name', 'Price'], dtype='object')

## Player Count

* Display the total number of players


In [23]:
prch_df["SN"].nunique()

576

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [24]:
prch_summ = pd.DataFrame({"Number of Unique Items": prch_df["Item ID"].nunique(),
                            "Average Price": "${0:.2f}".format(prch_df["Price"].mean()), 
                            "Number of Purchases": prch_df["Purchase ID"].count(), 
                            "Total Revenue": "${0:.2f}".format(prch_df["Price"].sum())},index=[0])
prch_summ

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,183,$3.05,780,$2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [107]:
prch_unique = prch_df.drop_duplicates(subset = ["SN"])
prch_unique.count()

male_total = prch_unique.loc[prch_unique["Gender"] == "Male"]
female_total = prch_unique.loc[prch_unique["Gender"] == "Female"]
other_total = prch_unique.loc[prch_unique["Gender"] == "Other / Non-Disclosed"]
male_count = prch_unique.loc[prch_unique["Gender"] == "Male"]["SN"].count()
female_count = prch_unique.loc[prch_unique["Gender"] == "Female"]["SN"].count()
other_count = prch_unique.loc[prch_unique["Gender"] == "Other / Non-Disclosed"]["SN"].count()
pcnt_male = "{0:.0f}%".format(male_players/(male_players + female_players + other_players)*100)
pcnt_female = "{0:.0f}%".format(female_players/(male_players + female_players + other_players)*100)
pcnt_other = "{0:.0f}%".format(other_players/(male_players + female_players + other_players)*100)

gender_df = pd.DataFrame({"Gender": ["Male", "Female", "Other / Non-Disclosed"], 
                          "Total Count": [male_players, female_players, other_players], 
                          "Percentage of Players": [pcnt_male, pcnt_female, pcnt_other]})
gender_df.groupby(["Gender"])
gender_df

Unnamed: 0,Gender,Total Count,Percentage of Players
0,Male,484,84%
1,Female,81,14%
2,Other / Non-Disclosed,11,2%



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [106]:
prch_unique.head()
prch_count_male = prch_df["Purchase ID"].loc[prch_df["Gender"] == "Male"].count()
prch_count_female = prch_df["Purchase ID"].loc[prch_df["Gender"] == "Female"].count()
prch_count_other = prch_df["Purchase ID"].loc[prch_df["Gender"] == "Other / Non-Disclosed"].count()
prch_px_avg_male = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Gender"] == "Male"].mean())
prch_px_avg_female = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Gender"] == "Female"].mean())
prch_px_avg_other = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Gender"] == "Other / Non-Disclosed"].mean())
prch_total_male = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Gender"] == "Male"].sum())
prch_total_female = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Gender"] == "Female"].sum())
prch_total_other = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Gender"] == "Other / Non-Disclosed"].sum())
prch_ttl_avg_male = "{0:.2f}".format(prch_df["Purchase ID"].loc[prch_df["Gender"] == "Male"].mean())
prch_ttl_avg_female = "{0:.2f}".format(prch_df["Purchase ID"].loc[prch_df["Gender"] == "Female"].mean())
prch_ttl_avg_other = "{0:.2f}".format(prch_df["Purchase ID"].loc[prch_df["Gender"] == "Other / Non-Disclosed"].mean())

gender_prch_df = pd.DataFrame({"Gender": ["Male", "Female", "Other"], 
                               "Purchase Count": [prch_count_male, prch_count_female, prch_count_other],
                               "Average Puchase Price": [prch_px_avg_male, prch_px_avg_female, prch_px_avg_other], 
                               "Total Purchase Value": [prch_total_male, prch_total_female, prch_total_other], 
                               "Average Total Purchase Per Person": [prch_ttl_avg_male, prch_ttl_avg_female, prch_ttl_avg_other]})
gender_prch_df

Unnamed: 0,Gender,Purchase Count,Average Puchase Price,Total Purchase Value,Average Total Purchase Per Person
0,Male,652,$3.02,$1967.64,392.52
1,Female,113,$3.20,$361.94,379.38
2,Other,15,$3.35,$50.19,334.6


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [183]:
prch_df["Age"].describe()
bins = [0, 7, 20, 25, 30, 35, 40, 45]
age_bins = ["<= 7", 
            "8 - 20", 
            "21 - 25", 
            "26 - 30", 
            "31 - 35", 
            "36 - 40", 
            "41 - 45"]

age_total = prch_df["Age Summary"].count()

prch_df["Age Summary"] = pd.cut(prch_df["Age"], bins, labels = age_bins)

A = prch_df.loc[prch_df["Age Summary"] == "<= 7"]["Age Summary"].count()
B = prch_df.loc[prch_df["Age Summary"] == "8 - 20"]["Age Summary"].count()
C = prch_df.loc[prch_df["Age Summary"] == "21 - 25"]["Age Summary"].count()
D = prch_df.loc[prch_df["Age Summary"] == "26 - 30"]["Age Summary"].count()
E = prch_df.loc[prch_df["Age Summary"] == "31 - 35"]["Age Summary"].count()
F = prch_df.loc[prch_df["Age Summary"] == "36 - 40"]["Age Summary"].count()
G = prch_df.loc[prch_df["Age Summary"] == "41 - 45"]["Age Summary"].count()

Aa = "{0:.2f}%".format((A/age_total)*100)
Bb = "{0:.2f}%".format((B/age_total)*100)
Cc = "{0:.2f}%".format((C/age_total)*100)
Dd = "{0:.2f}%".format((D/age_total)*100)
Ee = "{0:.2f}%".format((E/age_total)*100)
Ff = "{0:.2f}%".format((F/age_total)*100)
Gg = "{0:.2f}%".format((G/age_total)*100)

age_summ = pd.DataFrame({"Age Summary": age_bins,
                         "Total Count": [A, B, C, D, E, F, G], 
                         "Percentage of Players": [Aa, Bb, Cc, Dd, Ee, Ff, Gg]})
age_summ.groupby(["Age Summary"])
age_summ

Unnamed: 0,Age Summary,Total Count,Percentage of Players
0,<= 7,9,1.15%
1,8 - 20,277,35.51%
2,21 - 25,325,41.67%
3,26 - 30,77,9.87%
4,31 - 35,52,6.67%
5,36 - 40,33,4.23%
6,41 - 45,7,0.90%


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [184]:
A_prch = prch_df["Purchase ID"].loc[prch_df["Age Summary"] == "<= 7"].count()
B_prch = prch_df["Purchase ID"].loc[prch_df["Age Summary"] == "8 - 20"].count()
C_prch = prch_df["Purchase ID"].loc[prch_df["Age Summary"] == "21 - 25"].count()
D_prch = prch_df["Purchase ID"].loc[prch_df["Age Summary"] == "26 - 30"].count()
E_prch = prch_df["Purchase ID"].loc[prch_df["Age Summary"] == "31 - 35"].count()
F_prch = prch_df["Purchase ID"].loc[prch_df["Age Summary"] == "36 - 40"].count()
G_prch = prch_df["Purchase ID"].loc[prch_df["Age Summary"] == "41 - 45"].count()

Aa_px_avg = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Age Summary"] == "<= 7"].mean())
Bb_px_avg = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Age Summary"] == "8 - 20"].mean())
Cc_px_avg = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Age Summary"] == "21 - 25"].mean())
Dd_px_avg = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Age Summary"] == "26 - 30"].mean())
Ee_px_avg = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Age Summary"] == "31 - 35"].mean())
Ff_px_avg = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Age Summary"] == "36 - 40"].mean())
Gg_px_avg = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Age Summary"] == "41 - 45"].mean())

Aa_px_ttl = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Age Summary"] == "<= 7"].sum())
Bb_px_ttl = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Age Summary"] == "8 - 20"].sum())
Cc_px_ttl = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Age Summary"] == "21 - 25"].sum())
Dd_px_ttl = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Age Summary"] == "26 - 30"].sum())
Ee_px_ttl = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Age Summary"] == "31 - 35"].sum())
Ff_px_ttl = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Age Summary"] == "36 - 40"].sum())
Gg_px_ttl = "${0:.2f}".format(prch_df["Price"].loc[prch_df["Age Summary"] == "41 - 45"].sum())

A_prch = prch_df["Purchase ID"].loc[prch_df["Age Summary"] == "<= 7"].mean()
B_prch = prch_df["Purchase ID"].loc[prch_df["Age Summary"] == "8 - 20"].mean()
C_prch = prch_df["Purchase ID"].loc[prch_df["Age Summary"] == "21 - 25"].mean()
D_prch = prch_df["Purchase ID"].loc[prch_df["Age Summary"] == "26 - 30"].mean()
E_prch = prch_df["Purchase ID"].loc[prch_df["Age Summary"] == "31 - 35"].mean()
F_prch = prch_df["Purchase ID"].loc[prch_df["Age Summary"] == "36 - 40"].mean()
G_prch = prch_df["Purchase ID"].loc[prch_df["Age Summary"] == "41 - 45"].mean()

age_prch_summ = pd.DataFrame({"Age Summary": age_bins, 
                              "Purchase Count": [A_prch, 
                                                 B_prch, 
                                                 C_prch, 
                                                 D_prch, 
                                                 E_prch, 
                                                 F_prch, 
                                                 G_prch], 
                              "Average Purchase Price": [Aa_px_avg, 
                                                         Bb_px_avg, 
                                                         Cc_px_avg, 
                                                         Dd_px_avg, 
                                                         Ee_px_avg, 
                                                         Ff_px_avg, 
                                                         Gg_px_avg], 
                              "Total Purchase Value": [Aa_px_ttl,
                                                       Bb_px_ttl,
                                                       Cc_px_ttl,
                                                       Dd_px_ttl,
                                                       Ee_px_ttl,
                                                       Ff_px_ttl,
                                                       Gg_px_ttl], 
                              "Average Total Purchase Per Person": [A_prch,
                                                                    B_prch,
                                                                    C_prch,
                                                                    D_prch,
                                                                    E_prch,
                                                                    F_prch,
                                                                    G_prch]})
age_prch_summ

Unnamed: 0,Age Summary,Purchase Count,Average Purchase Price,Total Purchase Value,Average Total Purchase Per Person
0,<= 7,402.888889,$3.65,$32.89,402.888889
1,8 - 20,394.509025,$3.08,$854.23,394.509025
2,21 - 25,386.338462,$3.02,$981.64,386.338462
3,26 - 30,387.324675,$2.88,$221.42,387.324675
4,31 - 35,362.057692,$2.99,$155.71,362.057692
5,36 - 40,391.151515,$3.40,$112.35,391.151515
6,41 - 45,540.857143,$3.08,$21.53,540.857143


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

In [218]:
# prch_df.sort_values(["SN","Price"],ascending=False).groupby("Price").head(3)
# prch_df.groupby(["SN", "Price", "Purchase ID"]).size().reset_index(name="counts")
# prch_df.groupby(["SN", "Price"]).size().sort_values(ascending=False)


SN              Price
Hada39          2.48     2
Zontibe81       3.79     1
Haerith37       1.66     1
Haisurra41      4.40     1
Haisrisuir60    4.19     1
                2.67     1
Haillyrgue51    4.60     1
                2.52     1
                2.38     1
Hailaphos89     3.81     1
Haestyphos66    1.97     1
Haerithp41      4.40     1
Haellysu29      3.77     1
Frichjask31     4.43     1
Haeladil46      3.81     1
Haedairiadru51  3.09     1
Hada39          3.61     1
Frichynde86     1.63     1
Frichossala54   3.08     1
Frichosiala98   3.54     1
Frichosia58     4.43     1
Frichocesta66   4.82     1
                3.55     1
Hala31          1.02     1
Halaecal66      2.22     1
                4.89     1
Heunadil74      1.54     1
Hiral75         1.98     1
Hilaerin92      3.61     1
Hiasurria41     1.61     1
                        ..
Lisotesta51     1.94     1
Lisossanya98    3.44     1
Lisossala30     4.35     1
Lisossa46       2.50     1
Lisossa25       1.71     1
Marils

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

