### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [223]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

## Player Count

* Display the total number of players


In [224]:
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [225]:
total_players = len(purchase_data.SN.unique())
total_players

576

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [226]:

#Find the unique items and show the total number, average price, total purchases and total revenue
unique_items = purchase_data.loc[purchase_data["Item ID"].unique()]
num_of_prod = unique_items["Item ID"].count()
ave_price = purchase_data["Price"].mean()
num_of_purch = purchase_data["Purchase ID"].count()
total_rev = purchase_data["Price"].sum()


#Print a summary of the new DF
summary = pd.DataFrame({"Number of Products" : [num_of_prod],
                       "Average Price" : [ave_price],
                       "Total Purchases" : [num_of_purch],
                       "Total Revenue" : [total_rev]
                       })
summary



Unnamed: 0,Number of Products,Average Price,Total Purchases,Total Revenue
0,183,3.050987,780,2379.77


## Gender Demographics

In [227]:
#First drop_duplicatses to get an accurate count

unique_gen = purchase_data.drop_duplicates(subset="SN", keep='first')
unique_VC = unique_gen['Gender'].value_counts()
unique_VC

#Percentage and Count of Male Players
percMale = unique_VC['Male']/unique_VC.sum()
countMale = unique_VC['Male']

#Percentage and Count of Female Players
percFemale = unique_VC['Female']/unique_VC.sum()
countFemale = unique_VC['Female']

#Percentage and Count of Other / Non-Disclosed
percOther = unique_VC['Other / Non-Disclosed']/unique_VC.sum()
countOther = unique_VC['Other / Non-Disclosed']

#Print summary of new DataFrame
summary = pd.DataFrame({"Count by Gender" : [unique_gen],
                        "Count of Males" : [countMale],
                        "Percent of Males" : [percMale],
                        "Count of Females" : [countFemale],
                        "Percent of Females" : [percFemale],
                        "Count of Non-Disclosed" : [countOther],
                        "Percent of Non-Disclosed" : [percOther] 
                       })
summary



Unnamed: 0,Count by Gender,Count of Males,Percent of Males,Count of Females,Percent of Females,Count of Non-Disclosed,Percent of Non-Disclosed
0,Purchase ID SN Age Gender ...,484,0.840278,81,0.140625,11,0.019097



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [221]:

#Caluculate purchase count, purchase average, and purchase total by gender

purch_data = pd.DataFrame(purchase_data)
gen_purch = purch_data.groupby(["Gender"])

gen_purch_df = pd.DataFrame({"Purchase Count" : gen_purch["Price"].count(),
                             "Average Purchase Price" : gen_purch["Price"].mean(),
                             "Total Purchase Price" : gen_purch["Price"].sum(),
                             
                            })
gen_purch_df.head()


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Price
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,113,3.203009,361.94
Male,652,3.017853,1967.64
Other / Non-Disclosed,15,3.346,50.19


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [229]:
#print ages to determine bin parameters
all_ages = purchase_data.Age.unique()
all_ages

array([20, 40, 24, 23, 22, 36, 35, 21, 30, 38, 29, 11,  7, 19, 37, 10,  8,
       18, 27, 33, 32, 25, 12, 34, 17, 15, 13, 26, 16, 28, 31, 39, 44, 41,
        9, 14, 42, 43, 45], dtype=int64)

## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [230]:
#define bins
bins = [0, 9, 15.9, 20.9, 25.9, 30.9, 35.9, 40.9, 999]
age_group = ['<10', '10-15','15-20', '20-25', '25-30', '30-35', '35-40', '40+']

#sort ages into bins
purchase_data["Age Range"] = pd.cut(purchase_data['Age'],bins, labels=age_group)
purchase_data

#create new df
age_ranges = purchase_data.groupby('Age Range')

#Count of total players by age
count_by_age = age_ranges["SN"].nunique()

#Percentage by age
perc_by_age = (count_by_age/total_players) * 100

#New df with all values
age_demos = pd.DataFrame({"Percentage by Age" : perc_by_age, 'Total Count' : count_by_age})

age_demos


Unnamed: 0_level_0,Percentage by Age,Total Count
Age Range,Unnamed: 1_level_1,Unnamed: 2_level_1
<10,2.951389,17
10-15,8.333333,48
15-20,26.041667,150
20-25,40.277778,232
25-30,10.243056,59
30-35,6.423611,37
35-40,4.513889,26
40+,1.215278,7


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [231]:
#Organize purchases by "SN"
buyers_records = purchase_data.groupby('SN')

#Count purchases by name
buyers_count = buyers_records["Purchase ID"].count()

#Average of each buyer by name
avg_buyers_count = buyers_records['Price'].mean()

#Buyer's total
buyers_total = buyers_records['Price'].sum()

#Data Frame 
top_spenders = pd.DataFrame({"Count of Purchases" : buyers_count,
                             "Average Purchase Price" : avg_buyers_count,
                             "Buyer's Purchase Value" : buyers_total})


spenders_in_order = top_spenders.sort_values(["Buyer's Purchase Value"],ascending=False).head()
spenders_in_order

Unnamed: 0_level_0,Count of Purchases,Average Purchase Price,Buyer's Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,3.792,18.96
Idastidru52,4,3.8625,15.45
Chamjask73,3,4.61,13.83
Iral74,4,3.405,13.62
Iskadarya95,3,4.366667,13.1


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [215]:
#Create new df
df_items = purchase_data[['Item ID', 'Item Name', 'Price']]

#Group by Item ID and Name
item_group = df_items.groupby(['Item ID','Item Name'])

#Count how many times an item was puchased
item_purch_count = item_group['Price'].count()

#Calcualate purchase value
tot_purch_val = (item_group['Price'].sum())

#Show item prices
item_prices = tot_purch_val/item_purch_count

#New df
most_pop_items = pd.DataFrame({'Purchase Count': item_purch_count,
                               'Item Price': item_prices,
                               'Total Purchase Value': tot_purch_val})

#Format df from Most to least Popular

most_pop_formatted =most_pop_items.sort_values(['Purchase Count'], ascending=False).head()
most_pop_formatted

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,4.23,50.76
145,Fiery Glass Crusader,9,4.58,41.22
108,"Extraction, Quickblade Of Trembling Hands",9,3.53,31.77
82,Nirvana,9,4.9,44.1
19,"Pursuit, Cudgel of Necromancy",8,1.02,8.16


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [217]:
#Sort total purchase value in descending order
most_pop_formatted =most_pop_items.sort_values(['Total Purchase Value'], ascending=True).head()
most_pop_formatted

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
42,The Decapitator,1,1.75,1.75
104,Gladiator's Glaive,1,1.93,1.93
23,Crucifer,1,1.99,1.99
126,Exiled Mithril Longsword,1,2.0,2.0
125,Whistling Mithril Warblade,2,1.0,2.0


In [228]:
#Below are my observations from the above data:

#1 Even though males far out-number female players, females spend more on average
#2 Age 20- 25 has the highest percentage and number of players
#3 The most popular game is Oathbreaker, Last Hope of the Breaking Storm