### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
purchaseFile = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchaseData = pd.read_csv(purchaseFile)

## Player Count

* Display the total number of players


In [2]:
print(purchaseData.columns)




Index(['Purchase ID', 'SN', 'Age', 'Gender', 'Item ID', 'Item Name', 'Price'], dtype='object')


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
purchaseDf= pd.DataFrame(purchaseData)
numPurchases = purchaseDf.groupby("Purchase ID")
purchaseCount = len(numPurchases)
uniqueItems = purchaseDf.groupby("Item ID")
uniqueCount = len(uniqueItems["Item ID"])
uniquePlayer = purchaseDf.groupby("SN")
playerCount = len(uniquePlayer["SN"])
avgPrice = purchaseDf["Price"].mean()
#purchaseSummary['Unique ID'] = purchaseDf["Item ID"].value_counts()
revenue = purchaseDf.groupby("Item ID")["Price"].sum().sum()

purchaseSummary = pd.DataFrame({"Total Players": [playerCount],
                                "Unique Items" : [uniqueCount],
                               "Avg Purchase Price" : [avgPrice],
                               "Number of Purchases" : [purchaseCount],
                               "Total Revenue" : [revenue]})
purchaseSummary["Avg Purchase Price"] = purchaseSummary["Avg Purchase Price"].map("${:.2f}".format)
purchaseSummary["Total Revenue"] = purchaseSummary["Total Revenue"].map("${:.2f}".format)

print(purchaseSummary.T)                           





                            0
Total Players             576
Unique Items              183
Avg Purchase Price      $3.05
Number of Purchases       780
Total Revenue        $2379.77


In [4]:
genderDf = purchaseDf.drop_duplicates("SN", keep = "first") 
genderCount = genderDf["Gender"].count()
gendersGroup = genderDf.groupby("Gender")
male = gendersGroup["SN"].count()["Male"]
female = gendersGroup["SN"].count()["Female"]

others = gendersGroup["SN"].count()["Other / Non-Disclosed"]

malepct = male / playerCount
femalepct = female / playerCount
otherspct = others / playerCount

GenderSummary = pd.DataFrame({"Men": [male],
                             "Men Pct of Players":[malepct],
                             "Women of Players": [female],
                              "Women Pct of Players": [femalepct],
                             "Others": [others],
                              "Other Pct of Players": [otherspct]})
print(GenderSummary.T)


                               0
Men                   484.000000
Men Pct of Players      0.840278
Women of Players       81.000000
Women Pct of Players    0.140625
Others                 11.000000
Other Pct of Players    0.019097


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [5]:
genders = purchaseDf.groupby(["Gender", "Purchase ID"])

malePurchaseCount = len(genders["Purchase ID"].count()["Male"])
femalePurchseCount = len(genders["Purchase ID"].count()["Female"])
othersPurchaseCount = len(genders["SN"].count()["Other / Non-Disclosed"])
maleAvgPurchase = purchaseDf.groupby("Gender")["Price"].mean()["Male"].mean()
femaleAvgPurchase = purchaseDf.groupby("Gender")["Price"].mean()["Female"].mean()
otherAvgPurchase = purchaseDf.groupby("Gender")["Price"].mean()["Other / Non-Disclosed"].mean()
maleTtlPurchase = purchaseDf.groupby("Gender")["Price"].sum()["Male"].sum()
femaleTtlPurchase = purchaseDf.groupby("Gender")["Price"].sum()["Female"].sum()
otherTtlPurchase = purchaseDf.groupby("Gender")["Price"].sum()["Other / Non-Disclosed"].sum()
maleAvgTtlPurchase = purchaseDf.groupby("Gender")["Price"].sum()["Male"].sum() / male
femaleAvgTtlPurchase = purchaseDf.groupby("Gender")["Price"].sum()["Female"].sum() / female
otherAvgTtlPurchase = purchaseDf.groupby("Gender")["Price"].sum()["Other / Non-Disclosed"].sum() / others
GenderPurchaseSummary = pd.DataFrame({"Gender": ["Male", "Female", "Others"],
                                     "Purchase Count": [malePurchaseCount, femalePurchseCount, othersPurchaseCount],
                                     "Average Purchase Price": [maleAvgPurchase, femaleAvgPurchase, otherAvgPurchase],
                                     "Total Purchase Value": [maleTtlPurchase, femaleTtlPurchase, otherTtlPurchase ],
                                     "Avg Total Purchase Per Peson": [maleAvgTtlPurchase, femaleAvgTtlPurchase, otherAvgTtlPurchase]})
GenderPurchaseSummary["Total Purchase Value"] = GenderPurchaseSummary["Total Purchase Value"].map("${:.2f}".format)
GenderPurchaseSummary["Avg Total Purchase Per Peson"] = GenderPurchaseSummary["Avg Total Purchase Per Peson"].map("${:.2f}".format)


print(GenderPurchaseSummary)


   Gender  Purchase Count  Average Purchase Price Total Purchase Value  \
0    Male             652                3.017853             $1967.64   
1  Female             113                3.203009              $361.94   
2  Others              15                3.346000               $50.19   

  Avg Total Purchase Per Peson  
0                        $4.07  
1                        $4.47  
2                        $4.56  



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [6]:
purchaseDf.describe()["Age"]

count    780.000000
mean      22.714103
std        6.659444
min        7.000000
25%       20.000000
50%       22.000000
75%       25.000000
max       45.000000
Name: Age, dtype: float64

## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [7]:
bins = [0, 9, 14, 19, 24, 29, 34, 39, 100]
group_age = ["<10", "10-14", "15-19", "20-24", "25-30", "30-34", "35-40", "40+"]
ageDf = purchaseDf

ageDf["Age Group"] = pd.cut(ageDf["Age"], bins, labels = group_age)
ageDf.unstack
ageDf = ageDf.drop_duplicates("SN", keep = "first") 
ageDf = ageDf.groupby("Age Group")

#ageCount = (ageDf["Age Group"].value_counts())
ageCount = ageDf["SN"].count()

agePct = ( ageCount / playerCount) 
agePct = agePct.map("{:,.2%}".format)
summary = pd.DataFrame({"Age Count": [ageCount],
                        "Percentage of Players": [agePct]})

print(summary)
summary.to_csv('ages.txt', index=False, sep=' ', header=None)


                                           Age Count  \
0  Age Group
<10       17
10-14     22
15-19    1...   

                               Percentage of Players  
0  Age Group
<10       2.95%
10-14     3.82%
15-1...  


## Purchasing Analysis (Age)

In [11]:

age_avg_purchase = purchaseDf.groupby("Age Group")["Price"].mean().map("${:.2f}".format)
age_count_purchase = purchaseDf.groupby("Age Group")["Price"].count()
age_ttl_purchase = purchaseDf.groupby("Age Group")["Price"].sum().map("${:.2f}".format) 
age_avg_person_purchase = (purchaseDf.groupby("Age Group")["Price"].sum() / ageCount).map("${:.2f}".format)  
age_avg_person_purchase
age_purchase_summary = pd.DataFrame({"Purchase Count": [age_count_purchase],
                                    "Average Purchase Price":[age_avg_purchase],
                                    "Total Purchase Value": [age_ttl_purchase],
                                    "Avg Total Purchase Per Person": [age_avg_person_purchase]})
print(age_purchase_summary)
age_purchase_summary.to_csv('age_purchase.txt', index=False, sep=' ', header=None)


                                      Purchase Count  \
0  Age Group
<10       23
10-14     28
15-19    1...   

                              Average Purchase Price  \
0  Age Group
<10      $3.35
10-14    $2.96
15-19 ...   

                                Total Purchase Value  \
0  Age Group
<10        $77.13
10-14      $82.78
...   

                       Avg Total Purchase Per Person  
0  Age Group
<10      $4.54
10-14    $3.76
15-19 ...  


* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

