### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [11]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

## Player Count

* Display the total number of players


In [2]:
total = purchase_data["SN"].unique()
print(f'Total number of players: {len(purchase_data["SN"].unique())}')

Total number of players: 576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
#number of unique items
unique_items = len(purchase_data["Item ID"].unique())
#average purchase price
average = round(purchase_data["Price"].mean(),2)
#total number of purchases
total_purchases = len(purchase_data["Purchase ID"].unique())
#total revenue 
total_revenue = purchase_data["Price"].sum()

summary_data = pd.DataFrame({
    "Number of Unique Items":[unique_items],
    "Average Purchase Price":[average],
    "Total Purchases":[total_purchases],
    "Total Revenue":[total_revenue]
})
summary_data

Unnamed: 0,Number of Unique Items,Average Purchase Price,Total Purchases,Total Revenue
0,183,3.05,780,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [4]:
#identify number of unique genders 
all_gender = purchase_data.groupby("Gender")["SN"].unique()

#count of male, female, and other players
male = (len(all_gender["Male"]))
female = (len(all_gender["Female"]))
other = (len(all_gender["Other / Non-Disclosed"]))

#percentage male, female, and other players
perc_male = round((male/len(total))*100,2)
perc_female = round((female/len(total))*100,2)
perc_other = round((other/len(total))*100,2)

all_gender_df = pd.DataFrame({
    "Total Count": {"Male":male,"Female":female,"Other / Non-Disclosed":other},
    "Percentage of Players": {"Male":perc_male,"Female":perc_female,"Other / Non-Disclosed":perc_other}
})
all_gender_df.reindex(["Male","Female","Other / Non-Disclosed"])

Unnamed: 0,Total Count,Percentage of Players
Male,484,84.03
Female,81,14.06
Other / Non-Disclosed,11,1.91



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [5]:
#purchase count by gender
male_purchase = purchase_data.loc[purchase_data["Gender"] == "Male"]
female_purchase = purchase_data.loc[purchase_data["Gender"] == "Female"]
other_purchase = purchase_data.loc[purchase_data["Gender"] == "Other / Non-Disclosed"]

#average purchase price by gender 
male_average = round(male_purchase["Price"].mean(),2)
female_average = round(female_purchase["Price"].mean(),2)
other_average = round(other_purchase["Price"].mean(),2)

#total purchase value
male_total = male_purchase["Price"].sum()
female_total = female_purchase["Price"].sum()
other_total = other_purchase["Price"].sum()

#average total per person by gender 
#total purchase / total
test = purchase_data.groupby("SN")
test

#gender_summary = pd.DataFrame({
#    "Purchase Count":{"Male":len(male_purchase),"Female":len(female_purchase),"Other / Non-Disclosed":len(other_purchase)},
#    "Average Purchase Price":{"Male":male_average,"Female":female_average,"Other / Non-Disclosed":other_average},
#    "Total Purchase Value":{"Male":male_total,"Female":female_total,"Other / Non-Disclosed":other_total},
#    "Avg Total Purchase Per Person":{"Male":"Figure","Female":"This","Other / Non-Disclosed":"Out"}
#})
#gender_summary.reindex(["Male","Female","Other / Non-Disclosed"])

<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x000001C9EDFCC358>

## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [6]:
#find unique ages
ages = purchase_data.groupby("SN")["Age"].unique()
#establish bins for ages and categories for bins 
bins = (0,9,14,19,24,29,34,39,np.inf)
names = ["< 10","10-14","15-19","20-24","25-29","30-34","35-39","40+"]
#categorize players using bins
ages_df = pd.DataFrame(ages)
ages_df["Age Range"] = pd.cut(ages_df["Age"],bins,labels=names)

#create variables to calculate numbers and percentages by age group
age_1 = len(ages_df.loc[ages_df["Age Range"] == "< 10"])
age_2 = len(ages_df.loc[ages_df["Age Range"] == "10-14"])
age_3 = len(ages_df.loc[ages_df["Age Range"] == "15-19"])
age_4 = len(ages_df.loc[ages_df["Age Range"] == "20-24"])
age_5 = len(ages_df.loc[ages_df["Age Range"] == "25-29"])
age_6 = len(ages_df.loc[ages_df["Age Range"] == "30-34"])
age_7 = len(ages_df.loc[ages_df["Age Range"] == "35-39"])
age_8 = len(ages_df.loc[ages_df["Age Range"] == "40+"])
total = len(total)

#create and display summary dataframe
age_summary = pd.DataFrame({
    "Total Count":{names[0]:age_1,names[1]:age_2,names[2]:age_3,
                   names[3]:age_4,names[4]:age_5,names[5]:age_6,
                   names[6]:age_7,names[7]:age_8},
    #calculate percentages by group
    "Percentage of Players":{names[0]:round((age_1/total*100),2),names[1]:round((age_2/total*100),2),names[2]:round((age_3/total*100),2),
                   names[3]:round((age_4/total*100),2),names[4]:round((age_5/total*100),2),names[5]:round((age_6/total*100),2),
                   names[6]:round((age_7/total*100),2),names[7]:round((age_8/total*100),2)}
})
age_summary.reindex(["< 10","10-14","15-19","20-24","25-29","30-34","35-39","40+"])

Unnamed: 0,Total Count,Percentage of Players
< 10,17,2.95
10-14,22,3.82
15-19,107,18.58
20-24,258,44.79
25-29,77,13.37
30-34,52,9.03
35-39,31,5.38
40+,12,2.08


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [7]:
#establish bins for ages and categories for bins 
bins = (0,9,14,19,24,29,34,39,np.inf)
names = ["< 10","10-14","15-19","20-24","25-29","30-34","35-39","40+"]
#categorize players using bins
ages_purchases = pd.DataFrame(purchase_data)
ages_purchases["Age Range"] = pd.cut(purchase_data["Age"],bins,labels=names)

#group purchase data by age grop
age_1_purchase = ages_purchases[ages_purchases["Age Range"] == names[0]]
age_2_purchase = ages_purchases[ages_purchases["Age Range"] == names[1]]
age_3_purchase = ages_purchases[ages_purchases["Age Range"] == names[2]]
age_4_purchase = ages_purchases[ages_purchases["Age Range"] == names[3]]
age_5_purchase = ages_purchases[ages_purchases["Age Range"] == names[4]]
age_6_purchase = ages_purchases[ages_purchases["Age Range"] == names[5]]
age_7_purchase = ages_purchases[ages_purchases["Age Range"] == names[6]]
age_8_purchase = ages_purchases[ages_purchases["Age Range"] == names[7]]

#average purchase price by age group 
age_1_average = round(age_1_purchase["Price"].mean(),2)
age_2_average = round(age_2_purchase["Price"].mean(),2)
age_3_average = round(age_3_purchase["Price"].mean(),2)
age_4_average = round(age_4_purchase["Price"].mean(),2)
age_5_average = round(age_5_purchase["Price"].mean(),2)
age_6_average = round(age_6_purchase["Price"].mean(),2)
age_7_average = round(age_7_purchase["Price"].mean(),2)
age_8_average = round(age_8_purchase["Price"].mean(),2)

#total purchase value by age group
age_1_total = age_1_purchase["Price"].sum()
age_2_total = age_2_purchase["Price"].sum()
age_3_total = age_3_purchase["Price"].sum()
age_4_total = age_4_purchase["Price"].sum()
age_5_total = age_5_purchase["Price"].sum()
age_6_total = age_6_purchase["Price"].sum()
age_7_total = age_7_purchase["Price"].sum()
age_8_total = age_8_purchase["Price"].sum()

gender_summary = pd.DataFrame({
    "Purchase Count":{names[0]:len(age_1_purchase),names[1]:len(age_2_purchase),names[2]:len(age_3_purchase),
    names[3]:len(age_4_purchase),names[4]:len(age_5_purchase),names[5]:len(age_6_purchase),names[6]:len(age_7_purchase),names[7]:len(age_8_purchase)},
    "Average Purchase Price":{names[0]:age_1_average, names[1]:age_2_average, names[2]:age_3_average,
                              names[3]:age_4_average, names[4]:age_5_average, names[5]:age_6_average,
                              names[6]:age_7_average, names[7]:age_8_average},
    "Total Purchase Value":{names[0]:age_1_total, names[1]:age_2_total, names[2]:age_3_total,
                            names[3]:age_4_total, names[4]:age_5_total, names[5]:age_6_total,
                            names[6]:age_7_total, names[7]:age_8_total},
    "Avg Total Purchase Per Person":{"< 10":"Figure","10-14":"This","Other / Non-Disclosed":"Out"}
})
gender_summary.reindex(["< 10","10-14","15-19","20-24","25-29","30-34","35-39","40+"])

Unnamed: 0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase Per Person
< 10,23.0,3.35,77.13,Figure
10-14,28.0,2.96,82.78,This
15-19,136.0,3.04,412.89,
20-24,365.0,3.05,1114.06,
25-29,101.0,2.9,293.0,
30-34,73.0,2.93,214.0,
35-39,41.0,3.6,147.67,
40+,13.0,2.94,38.24,


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [31]:
#get data grouped by users 
grouped_users = purchase_data.groupby('SN')

#create new dataframe
top_spenders = pd.DataFrame(purchase_data[["SN"]])

#find purchase count 
top_spenders["Purchase Count"] = grouped_users["Item ID"].transform('count')

#find average purchase price
top_spenders["Purchase Price"] = round(grouped_users["Price"].transform('mean'),2)

#find total purchase value
top_spenders["Total Purchase Value"] = grouped_users["Price"].transform(sum)

#find the top 5 spenders
top_spenders.drop_duplicates(subset="SN").sort_values("Total Purchase Value",ascending=False).head(5)

Unnamed: 0,SN,Purchase Count,Purchase Price,Total Purchase Value
74,Lisosia93,5,3.79,18.96
290,Idastidru52,4,3.86,15.45
222,Chamjask73,3,4.61,13.83
128,Iral74,4,3.4,13.62
148,Iskadarya95,3,4.37,13.1


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [None]:
items = 

## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

