### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [110]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# Raw data file
file_to_load = "Resources/purchase_data.csv"

# Read purchasing file and store into pandas data frame
purchase_data = pd.read_csv(file_to_load)
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count

* Display the total number of players


In [111]:
total_players = len(purchase_data["SN"].unique())
print("There are " + str(total_players) + " total players.")

There are 576 total players.


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [112]:
unique_items = len(purchase_data["Item ID"].unique())
avg_price = purchase_data["Price"].mean()
num_purchases = len(purchase_data)
total_rev = purchase_data["Price"].sum()


purch_analysis = pd.DataFrame(
    {"Number of Unique Items": [unique_items], "Average Price": [avg_price], "Number of Purchases": [num_purchases], "Total Revenue": [total_rev]})
purch_analysis

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,183,3.050987,780,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [113]:
# *IF* the number of players = the number of purchases...(as default example df showed)...
# (NOTE, technically number of unique players is less than total number of purchases)

count = purchase_data["Gender"].value_counts()
male_count = count["Male"]
female_count = count["Female"]
unknown_count = count["Other / Non-Disclosed"] 

gender_dem = pd.DataFrame([
    {"Gender": "Male", "Percentage": male_count / total_players * 100,
     "Count": male_count},
    {"Gender": "Female", "Percentage": female_count / total_players * 100, "Count": female_count},
    {"Gender": "Other / Non-Disclosed", "Percentage": unknown_count / total_players * 100, "Count": unknown_count},
])
gender_dem[["Gender", "Percentage", "Count"]]

Unnamed: 0,Gender,Percentage,Count
0,Male,113.194444,652
1,Female,19.618056,113
2,Other / Non-Disclosed,2.604167,15



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [114]:
purchase_count_female = female_count
purchase_count_male = male_count
purchase_count_other = unknown_count

total_purchase_price_female = purchase_data.loc[purchase_data["Gender"] == "Female", "Price"].sum()
total_purchase_price_male = purchase_data.loc[purchase_data["Gender"] == "Male", "Price"].sum()
total_purchase_price_other = purchase_data.loc[purchase_data["Gender"] == "Other / Non-Disclosed", "Price"].sum()

avg_purchase_price_female = purchase_data.loc[purchase_data["Gender"] == "Female", "Price"].mean()
avg_purchase_price_male = purchase_data.loc[purchase_data["Gender"] == "Male", "Price"].mean()
avg_purchase_price_other = purchase_data.loc[purchase_data["Gender"] == "Other / Non-Disclosed", "Price"].mean()

avg_spent_female = total_purchase_price_female / purchase_count_female
avg_spent_male = total_purchase_price_male / purchase_count_male
avg_spent_other = total_purchase_price_other / purchase_count_other



purch_analysis_df = pd.DataFrame([
    {"Gender": "Female", "Purchase Count": purchase_count_female, "Average Purchase Price": avg_purchase_price_female, 
     "Total Purchase Value": total_purchase_price_female, "Avg Purchase Total per Person": avg_spent_female},
    {"Gender": "Male", "Purchase Count": purchase_count_male, "Average Purchase Price": avg_purchase_price_male, 
     "Total Purchase Value": total_purchase_price_male, "Avg Purchase Total per Person": avg_spent_male},
    {"Gender": "Other / Non-Disclosed", "Purchase Count": purchase_count_other, "Average Purchase Price": avg_purchase_price_other, 
     "Total Purchase Value": total_purchase_price_other, "Avg Purchase Total per Person": avg_spent_other},
])
purch_analysis_df[["Gender", "Purchase Count", "Average Purchase Price", "Total Purchase Value", "Avg Purchase Total per Person"]]


Unnamed: 0,Gender,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Purchase Total per Person
0,Female,113,3.203009,361.94,3.203009
1,Male,652,3.017853,1967.64,3.017853
2,Other / Non-Disclosed,15,3.346,50.19,3.346


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [131]:
# Establish bins for ages
age_bins = [0, 9.90, 14.90, 19.90, 24.90, 29.90, 34.90, 39.90, 99999]
group_names = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

purchase_data["bin"] = pd.cut(purchase_data["Age"], age_bins, labels = group_names)

count_less_10 = len(purchase_data.loc[purchase_data["bin"] == "<10"])
count_10_14 = len(purchase_data.loc[purchase_data["bin"] == "10-14"])
count_15_19 = len(purchase_data.loc[purchase_data["bin"] == "15-19"])
count_20_24 = len(purchase_data.loc[purchase_data["bin"] == "20-24"])
count_25_29 = len(purchase_data.loc[purchase_data["bin"] == "25-29"])
count_30_34 = len(purchase_data.loc[purchase_data["bin"] == "30-34"])
count_35_39 = len(purchase_data.loc[purchase_data["bin"] == "35-39"])
count_more_40 = len(purchase_data.loc[purchase_data["bin"] == "40+"])


age_demographics = pd.DataFrame([
    {"Age": "<10", "Percentage of Player": count_less_10 / total_players * 100 , "Total Count": count_less_10},
    {"Age": "10-14", "Percentage of Player": count_10_14 / total_players * 100 , "Total Count": count_10_14},
    {"Age": "15-19", "Percentage of Player": count_15_19 / total_players * 100 , "Total Count": count_15_19},
    {"Age": "20-24", "Percentage of Player": count_20_24 / total_players * 100 , "Total Count": count_20_24},
    {"Age": "25-29", "Percentage of Player": count_25_29 / total_players * 100 , "Total Count": count_25_29},
    {"Age": "30-34", "Percentage of Player": count_30_34 / total_players * 100 , "Total Count": count_30_34},
    {"Age": "35-39", "Percentage of Player": count_35_39 / total_players * 100 , "Total Count": count_35_39},
    {"Age": "40+", "Percentage of Player": count_more_40 / total_players * 100 , "Total Count": count_more_40},
])

age_demographics



Unnamed: 0,Age,Percentage of Player,Total Count
0,<10,3.993056,23
1,10-14,4.861111,28
2,15-19,23.611111,136
3,20-24,63.368056,365
4,25-29,17.534722,101
5,30-34,12.673611,73
6,35-39,7.118056,41
7,40+,2.256944,13


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [132]:
purchase_count_less_10 = count_less_10
#repeat for all ages


total_purchase_price_less_10 = purchase_data.loc[purchase_data["bin"] == "<10", "Price"].sum()

avg_purchase_price_less_10 = purchase_data.loc[purchase_data["bin"] == "<10", "Price"].mean()


avg_spent_less_10 = total_purchase_price_less_10 / purchase_count_less_10




purch_analysis_df = pd.DataFrame([
    {"Age": "<10", "Purchase Count": purchase_count_less_10, "Average Purchase Price": avg_purchase_price_less_10, 
     "Total Purchase Value": total_purchase_price_less_10, "Avg Purchase Total per Person": avg_spent_less_10},
])
purch_analysis_df[["Age", "Purchase Count", "Average Purchase Price", "Total Purchase Value", "Avg Purchase Total per Person"]]


# Purchase Count	Average Purchase Price	Total Purchase Value	Average Purchase Total per Person
# 10-14	28	$2.96	$82.78	$2.96
# 15-19	136	$3.04	$412.89	$3.04
# 20-24	365	$3.05	$1,114.06	$3.05
# 25-29	101	$2.90	$293.00	$2.90
# 30-34	73	$2.93	$214.00	$2.93
# 35-39	41	$3.60	$147.67	$3.60
# 40+	13	$2.94	$38.24	$2.94
# <10	23	$3.35	$77.13	$3.35

Unnamed: 0,Age,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Purchase Total per Person
0,<10,23,3.353478,77.13,3.353478


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [8]:
# Purchase Count	Average Purchase Price	Total Purchase Value
# SN			
# Lisosia93	5	$3.79	$18.96
# Idastidru52	4	$3.86	$15.45
# Chamjask73	3	$4.61	$13.83
# Iral74	4	$3.40	$13.62
# Iskadarya95	3	$4.37	$13.10

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$3.79,$18.96
Idastidru52,4,$3.86,$15.45
Chamjask73,3,$4.61,$13.83
Iral74,4,$3.40,$13.62
Iskadarya95,3,$4.37,$13.10


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [9]:
# 		Purchase Count	Item Price	Total Purchase Value
# Item ID	Item Name			
# 178	Oathbreaker, Last Hope of the Breaking Storm	12	$4.23	$50.76
# 145	Fiery Glass Crusader	9	$4.58	$41.22
# 108	Extraction, Quickblade Of Trembling Hands	9	$3.53	$31.77
# 82	Nirvana	9	$4.90	$44.10
# 19	Pursuit, Cudgel of Necromancy	8	$1.02	$8.16

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
145,Fiery Glass Crusader,9,$4.58,$41.22
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77
82,Nirvana,9,$4.90,$44.10
19,"Pursuit, Cudgel of Necromancy",8,$1.02,$8.16


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [10]:
# 		Purchase Count	Item Price	Total Purchase Value
# Item ID	Item Name			
# 178	Oathbreaker, Last Hope of the Breaking Storm	12	$4.23	$50.76
# 82	Nirvana	9	$4.90	$44.10
# 145	Fiery Glass Crusader	9	$4.58	$41.22
# 92	Final Critic	8	$4.88	$39.04
# 103	Singed Scalpel	8	$4.35	$34.80

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
82,Nirvana,9,$4.90,$44.10
145,Fiery Glass Crusader,9,$4.58,$41.22
92,Final Critic,8,$4.88,$39.04
103,Singed Scalpel,8,$4.35,$34.80
