### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchaseData = pd.read_csv(file_to_load)
purchaseData.head()


Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count

* Display the total number of players


In [2]:



avgPrice = purchaseData["Price"].mean()
unique = pd.DataFrame(purchaseData.groupby([ "Gender", "SN"], as_index=False).count())
totalnum = unique["Purchase ID"].count()
print(totalnum)
total = pd.DataFrame({"Total Players":[totalnum]})
total


576


Unnamed: 0,Total Players
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
items = pd.DataFrame(purchaseData.groupby(["Item Name"], as_index = False).count())

itemstotal = pd.DataFrame(items["Item Name"].value_counts().reset_index())
avgitem = purchaseData["Price"].sum()/totalnum

itemdata = pd.DataFrame({"Unique Items":[len(purchaseData["Item Name"].unique())]})
itemdata["Average Price"] = round(purchaseData["Price"].sum()/purchaseData["Item Name"].count(), 2)
itemdata

Unnamed: 0,Unique Items,Average Price
0,179,3.05


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [4]:


genderdf = pd.DataFrame(unique["Gender"].value_counts())
genderdf["Percentage of Players"] = round(100*genderdf["Gender"]/totalnum, 2)
genderdf.rename(columns = {"Gender":"Total Count"}, inplace = True) 
genderdf.head()

Unnamed: 0,Total Count,Percentage of Players
Male,484,84.03
Female,81,14.06
Other / Non-Disclosed,11,1.91



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [22]:
purchaseAnal = pd.DataFrame(purchaseData.groupby(['Gender', 'Price'], as_index = False).count())

purchases = pd.DataFrame(purchaseData["Gender"].value_counts())
purchases.rename(columns = {"Gender":"Purchase Count"}, inplace = True)

gender = ["Male", "Female", "Other / Non-Disclosed"]
gendermean = [round(purchaseData.loc[purchaseData["Gender"] == i]["Price"].mean(), 2) for i in gender]

avgspend = unique.loc[unique["Gender"] == "Male"].mean()

purchases["Average Purchase Price"] = gendermean
purchases["Total Purchase Value"] =  purchases["Average Purchase Price"]*purchases["Purchase Count"]
purchases['Average Total Purchase \n Price per Person'] = round(purchases["Average Purchase Price"]*purchases["Purchase Count"]/genderdf["Total Count"], 2)

purchases.head()


Unnamed: 0,Purchase Count,Average Purchase Price,Total Purchase Value,Average Total Purchase Price per Person
Male,652,3.02,1969.04,4.07
Female,113,3.2,361.6,4.46
Other / Non-Disclosed,15,3.35,50.25,4.57


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [6]:
bins = [0, 9, 15, 19, 25, 29, 35, 40, 200]
ageData = purchaseData
labels = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]
ageData["age group"] = pd.cut(ageData["Age"], bins, labels = labels)
ageData.reset_index()

agebase = pd.DataFrame(ageData["age group"].value_counts())
agebase.reset_index(level=0, inplace=True)
ar = agebase.values
#seems kind of long-winded, but I get an array of values in the column "Age Group" this way
arr = [i[0] for i in ar]
agepurchaseu = [len(purchaseData.loc[purchaseData["age group"] == i]["SN"].unique()) for i in arr] 
#number of unique players for each age group
agebase["Players Count"] = agepurchaseu
agebase["Percentage of Players"] = round((100*agebase["Players Count"]/totalnum), 2)
agebase.rename(columns = {"age group":"Purchase Count"}, inplace = True)
agebase.rename(columns = {"index": "Age Group"}, inplace = True)


agebase

Unnamed: 0,Age Group,Purchase Count,Players Count,Percentage of Players
0,20-24,424,301,52.26
1,15-19,101,81,14.06
2,30-34,87,62,10.76
3,10-14,63,48,8.33
4,25-29,42,34,5.9
5,35-39,33,26,4.51
6,<10,23,17,2.95
7,40+,7,7,1.22


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [7]:




ageprice = [(purchaseData.loc[purchaseData["age group"] == i]["Price"].sum()) for i in arr]
agepurchaset = [len(purchaseData.loc[purchaseData["age group"] == i]) for i in arr]
agebase["Purchase Total"] = ageprice
agebase["Average Purchase Price"] = round(agebase["Purchase Total"]/agebase["Purchase Count"], 2)
agebase["Average Purchase Total \n Per Player"] = round(agebase["Purchase Total"]/agebase["Players Count"], 2)

agebase


Unnamed: 0,Age Group,Purchase Count,Players Count,Percentage of Players,Purchase Total,Average Purchase Price,Average Purchase Total Per Player
0,20-24,424,301,52.26,1295.96,3.06,4.31
1,15-19,101,81,14.06,307.24,3.04,3.79
2,30-34,87,62,10.76,266.03,3.06,4.29
3,10-14,63,48,8.33,188.43,2.99,3.93
4,25-29,42,34,5.9,111.1,2.65,3.27
5,35-39,33,26,4.51,112.35,3.4,4.32
6,<10,23,17,2.95,77.13,3.35,4.54
7,40+,7,7,1.22,21.53,3.08,3.08


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [8]:
#an array containing all of the unique SN's
playerList = purchaseData["SN"].unique()
#creates an array, with each element representing the total amount spent by a particular player
spending = [purchaseData.loc[purchaseData["SN"] == i]["Price"].sum() for i in playerList]
#dataframe for players and their net purchases
playerdf = pd.DataFrame({"Player" : playerList, "amount spent": spending})
purchaseCount =  [purchaseData.loc[purchaseData["SN"] == i].count()[0] for i in playerList]
playerdf["Number of Purchases"] = purchaseCount
playerdfClean =playerdf.sort_values(by = 'amount spent', ascending = False)
playerdfClean.head()


Unnamed: 0,Player,amount spent,Number of Purchases
72,Lisosia93,18.96,5
253,Idastidru52,15.45,4
201,Chamjask73,13.83,3
120,Iral74,13.62,4
134,Iskadarya95,13.1,3


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [24]:
itemDf = pd.DataFrame(purchaseData["Item Name"].value_counts())
itemDf.reset_index(level=0, inplace=True)

itemDf.rename(columns = {"Item Name": "Number of \nTimes Purchased"}, inplace = True)
itemDf.rename(columns = {"index": "Item Name"}, inplace = True)
ar = itemDf.values
arr = [i[0] for i in ar]

itemArray = [purchaseData.loc[purchaseData["Item Name"] == i]["Price"].unique()[0] for i in arr]
#print(itemArray)
itemDf["Purchase Price"] = itemArray
itemDf.head()

Unnamed: 0,Item Name,Number of Times Purchased,Purchase Price
0,Final Critic,13,4.88
1,"Oathbreaker, Last Hope of the Breaking Storm",12,4.23
2,Persuasion,9,3.19
3,Nirvana,9,4.9
4,Fiery Glass Crusader,9,4.58


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [25]:
MaxItem = itemDf.sort_values(by = 'Purchase Price', ascending = False)
MaxItem.head()




Unnamed: 0,Item Name,Number of Times Purchased,Purchase Price
157,Stormfury Mace,2,4.99
42,"Mercy, Katana of Dismay",5,4.94
150,Stormfury Longsword,2,4.93
127,"Hellreaver, Heirloom of Inception",3,4.93
68,"Blazeguard, Reach of Eternity",5,4.91
