### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [2]:
# Dependencies and Setup
import pandas as pd
import numpy as np
from pandas import DataFrame

# File to Load (Remember to Change These)
file_to_load = "./Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

## Player Count

* Display the total number of players


In [3]:
totalplayers = (len(purchase_data))
print("Total Players: "+str(totalplayers))

Total Players: 780


In [4]:
#unique counts
uniqueplayers = len(purchase_data["SN"].unique()) 
print("Unique Players: " +str(uniqueplayers))

Unique Players: 576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [5]:
uniqueitems = len(purchase_data["Item ID"].unique())

avgprice = "$"+str(round(purchase_data["Price"].mean(),2))

avgage = int(purchase_data["Age"].mean())

purchasecount = purchase_data["Price"].count()

totalsales = "$"+str(purchase_data["Price"].sum())

summarytable = pd.DataFrame({"Unique Items":uniqueitems, 
                             "Average Price":[avgprice], 
                             "Average Age":[avgage], 
                             "Purchase Count":[purchasecount],
                            "Total Sales":[totalsales]})

summarytable

Unnamed: 0,Unique Items,Average Price,Average Age,Purchase Count,Total Sales
0,179,$3.05,22,780,$2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [6]:
#unique player list 
uniqueplayerlist = purchase_data[["SN",'Gender']].drop_duplicates() 

#counts of genders 
gendercount = uniqueplayerlist["Gender"].value_counts() 

#% 
genderpercentage = ((gendercount/uniqueplayers)*100).round(1) 
genderpercentage 
  
#DF 
gendersummarydf = pd.DataFrame({"Total":gendercount, 
                                "Percentage":genderpercentage}) 
gendersummarydf 

Unnamed: 0,Total,Percentage
Male,484,84.0
Female,81,14.1
Other / Non-Disclosed,11,1.9



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [8]:
#purchase count by gender
Purchasinggb = (purchase_data.groupby(['Gender']))
PurchaseCount = (Purchasinggb.count()['Price'])
PurchaseAvg = (Purchasinggb.mean()['Price'])
PurchaseTotal = (Purchasinggb.sum()['Price']/gendercount)
TotalPurchaseValue = Purchasinggb.sum()['Price']

Gender
Female                   113
Male                     652
Other / Non-Disclosed     15
Name: Price, dtype: int64

In [9]:
#frames = [PurchaseCount, PurchaseAvg, PurchaseTotal]

#result = pd.concat(frames) 

#result
PurchasingAnalysis = pd.DataFrame({'Purchase Count':  PurchaseCount , 
                                  'Avg. Purchase Price':  PurchaseAvg ,
                                 'Avg. Purchase Total': PurchaseTotal ,
                                  'Avg. Total by Person': TotalPurchaseValue})
PurchasingAnalysis

Unnamed: 0_level_0,Purchase Count,Avg. Purchase Price,Avg. Purchase Total,Avg. Total by Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,3.203009,4.468395,361.94
Male,652,3.017853,4.065372,1967.64
Other / Non-Disclosed,15,3.346,4.562727,50.19


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [111]:
nodupdf = purchase_data.loc[:, ['SN', 'Age']].drop_duplicates()
bins = [0, 9.9, 14.9, 19.9, 24.9, 29.9, 34.9, 39.9, 990]
nodupdf["Age Ranges"] = pd.cut(nodupdf["Age"], bins, labels=GroupNames)
AgeDemographicsTotals = nodupdf ['Age Ranges'].value_counts()
AgeDemographicsPercents = (AgeDemographicsTotals / uniqueplayers * 100).round(2)
AgeDemographics = pd.DataFrame({'Age Group Total': AgeDemographicsTotals, 
                                'Percentage of Total': AgeDemographicsPercents})
AgeDemographics.sort_index()

Unnamed: 0,Age Group Total,Percentage of Total
<10,17,2.95
10-14,22,3.82
15-19,107,18.58
20-24,258,44.79
25-29,77,13.37
30-34,52,9.03
35-40,31,5.38
10+,12,2.08


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [50]:
nodupdf = purchase_data.loc[:, ['SN', 'Age']].drop_duplicates()
bins = [0, 9.9, 14.9, 19.9, 24.9, 29.9, 34.9, 39.9, 990]
nodupdf["Age Ranges"] = pd.cut(nodupdf["Age"], bins, labels=GroupNames)

In [121]:
#Purchase count
purchasecount = len(purchase_data.groupby(['Age']))
purchasecount
#Average purchase price
AvgPP = "$"+str(Agepurchasinggb.count()['Price'].mean())
AvgPP
#Average purchase total
AgeAvgPT = "$"+str(Agepurchasinggb.count()['Price'].mean())
PurchasingAnalysis = pd.DataFrame({'Purchase Count': [ purchasecount ], 
                                   'Avg. Purchase Price': [ AvgPP ],
                                  'Avg. Purchase Total': [AgeAvgPT]})
PurchasingAnalysis

Unnamed: 0,Purchase Count,Avg. Purchase Price,Avg. Purchase Total
0,39,$20.0,$20.0


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [152]:
topspenderspcgb = purchase_data.groupby(['SN'],).count()['Price']
topspendersavgppgb = purchase_data.groupby(['SN']).mean()['Price'].round(2)
topspenderstpv = purchase_data.groupby(['SN']).sum()['Price']
TopSpendersDF = pd.DataFrame({'Purchase Count':  topspenderspcgb , 
                                   'Avg. Purchase Price':  topspendersavgppgb ,
                                 'Total Purchase Value': topspenderstpv})
TopSpendersDF.sort_values(["Total Purchase Value"], ascending=[False]).head()

Unnamed: 0_level_0,Purchase Count,Avg. Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,3.79,18.96
Idastidru52,4,3.86,15.45
Chamjask73,3,4.61,13.83
Iral74,4,3.4,13.62
Iskadarya95,3,4.37,13.1


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [181]:
MPIpurchasecount=purchase_data.groupby(['Item ID', 'Item Name']).count()
MPIitemprice=purchase_data.groupby(['Item ID','Price']).mean().round(2)
MPItotalpurchasevalue=purchase_data.groupby(['Item ID','Price']).sum()
MPItotalpurchasevalue
MPIdf = pd.DataFrame({'Purchase Count':  MPIpurchasecount , 
                                   'Avg. Item Price':  MPIitemprice ,
                                 'Total Purchase Value': MPItotalpurchasevalue})
MPIdf.sort_values(["Total Purchase Value"], ascending=[False]).head()

ValueError: If using all scalar values, you must pass an index

## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [182]:
MPIdf.sort_values(["Total Purchase Value"], ascending=[False]).head()

Unnamed: 0,Purchase Count,Avg. Item Price,Total Purchase Value
0,...,Purchase ID Age Item ID Pric...,Purchase ID Age Item ID Price ...
