Heroes Of Pymoli Data Analysis
Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).

Note
Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [84]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)


Player Count: Display the total number of players

In [85]:
total_players_df = pd.DataFrame({'Total Players':[purchase_data["SN"].nunique()]})
total_players_df

Unnamed: 0,Total Players
0,576


Purchasing Analysis (Total)

Analyze the items purchased by showing

- The number of unique items purchased
- The average price paid for a purchased items
- The total number of purchases made
- The total revenue received from items sold

In [86]:
purch_analysis_tot_df = pd.DataFrame({"Number of Unique Items":[purchase_data["Item Name"].nunique()],
            "Average Price":[purchase_data["Price"].mean()],
            "Total Number of Purchases":[purchase_data["Purchase ID"].nunique()],
            "Total Revenue":[purchase_data["Price"].sum()]
           })
purch_analysis_tot_df

Unnamed: 0,Number of Unique Items,Average Price,Total Number of Purchases,Total Revenue
0,179,3.050987,780,2379.77


Gender Demographics

Analyze the gender demographics of the players by showing:
- Number of players by gender
- Percentage of players by gender

In [87]:
#Create a df grouped by User ID to eliminate duplicate purchases for same User
user_grouped_purch_data = purchase_data.groupby(["SN"])

#Grab the first line of each User ID grouping in order to create a df where each line is a unique user
first_user_occurance = user_grouped_purch_data.first()

#Create a df column of counts by gender for the unique user df
gender_demog_df = pd.DataFrame(first_user_occurance["Gender"].value_counts())

#Add a df column to hold the calc of percentage of players in each gender
gender_demog_df["Percentage of Players"]=gender_demog_df["Gender"]/gender_demog_df["Gender"].sum()

#Rename the Gender column to Total Count
gender_demog_df.rename(columns = {'Gender':'Total Count'}, inplace=True)

gender_demog_df

#STILL NEED TO FORMAT!!

Unnamed: 0,Total Count,Percentage of Players
Male,484,0.840278
Female,81,0.140625
Other / Non-Disclosed,11,0.019097


Purchasing Analysis (GENDER)

Analyze the player purchases made by gender by showing:

- Number of purchases by gender
- Average price of a purchase by gender
- Total value of purchases by gender
- Average total purchase per person by gender 

In [88]:
#Create a df grouped by Gender
gender_grouped_purch_data = purchase_data.groupby(["Gender"])

#Count the purchases by Gender
purch_by_gender = pd.DataFrame(gender_grouped_purch_data["Purchase ID"].count())

#Calc the average purchase price by Gender
avg_purch_price_by_gender = pd.DataFrame(gender_grouped_purch_data["Price"].sum()/gender_grouped_purch_data["Purchase ID"].count())

#Calc the total value of purchases by gender
tot_purch_value_by_gender = pd.DataFrame(gender_grouped_purch_data["Price"].sum())

#Calc the average purchase per person by Gender
avg_purch_price_by_gender_per_person = pd.DataFrame(gender_grouped_purch_data["Price"].sum()/gender_grouped_purch_data["SN"].nunique())

#Clean up column headings
purch_by_gender.columns=["Purchase Count"]
avg_purch_price_by_gender.columns=["Average Purchase Price"]
tot_purch_value_by_gender.columns=["Total Purchase Value"]
avg_purch_price_by_gender_per_person.columns=["Avg Total Purchase per Person"]

#Merge into one dataframe
purch_analysis_gender_temp1 = pd.merge(purch_by_gender, avg_purch_price_by_gender, on='Gender')
purch_analysis_gender_temp2 = pd.merge(purch_analysis_gender_temp1, tot_purch_value_by_gender, on='Gender')
purch_analysis_gender_final = pd.merge(purch_analysis_gender_temp2, avg_purch_price_by_gender_per_person, on='Gender')

purch_analysis_gender_final

#STILL NEED TO FORMAT!!

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,3.203009,361.94,4.468395
Male,652,3.017853,1967.64,4.065372
Other / Non-Disclosed,15,3.346,50.19,4.562727


Age Demographics

Analyze the age demographics of players by showing:

- Number of players in each 5 year age band
- Percentage of players in each 5 year age band

In [89]:
bins = [0,9,14,19,24,29,34,39,200]
labels = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]
purchase_data_bin_age = first_user_occurance
purchase_data_bin_age["Age Bin"] = pd.cut(purchase_data_bin_age["Age"], bins=bins, labels=labels)

#Create a df grouped by age bin
purchase_data_grouped_by_bin = purchase_data_bin_age.groupby(["Age Bin"])

# Calculate the numbers and percentages by age group
bin_count = purchase_data_grouped_by_bin["Age Bin"].count()
tot_players_count = first_user_occurance.count()
bin_pct = bin_count/tot_players_count["Age"]

# Create a summary data frame to hold the results
age_demog_summary_df = pd.DataFrame(list(zip(bin_count_df, bin_pct_df)), index=labels, columns =['Total Count', 'Percentage of Players']) 

# Optional: round the percentage column to two decimal points
# Display Age Demographics Table
age_demog_summary_df


Unnamed: 0,Total Count,Percentage of Players
<10,17,0.029514
10-14,22,0.038194
15-19,107,0.185764
20-24,258,0.447917
25-29,77,0.133681
30-34,52,0.090278
35-39,31,0.053819
40+,12,0.020833


Purchasing Analysis by Age

Analyze the purchasing behavior of different age groups by showing:

- Total number of purchases by age groups
- Average purchase price by age groups
- Total amount of purchases by age groups
- Average total purchase amount per person in each age groups

In [90]:
bins = [0,9,14,19,24,29,34,39,200]
labels = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]
purch_data_age_bin = purchase_data
purch_data_age_bin["Age Bin"] = pd.cut(purch_data_age_bin["Age"], bins=bins, labels=labels)

#This is all purchase records grouped by Age Bin
purch_data_grouped_by_age_bin = purch_data_age_bin.groupby(["Age Bin"])

#Count the purchases by Bin
purch_by_bin = purch_data_grouped_by_age_bin["Purchase ID"].count()

# #Calc the average purchase price by Bin
avg_purch_price_by_age_bin = purch_data_grouped_by_age_bin["Price"].sum()/purch_data_grouped_by_age_bin["Purchase ID"].count()

# #Calc the total value of purchases by Bin
tot_purch_value_by_bin = purch_data_grouped_by_age_bin["Price"].sum()
tot_purch_value_by_bin

# #Calc the average purchase per person by Bin
avg_purch_price_by_bin_per_person = purch_data_grouped_by_age_bin["Price"].sum()/purch_data_grouped_by_age_bin["SN"].nunique()

# Create a summary data frame to hold the results
bin_purch_summary_df = pd.DataFrame(list(zip(labels, purch_by_bin, avg_purch_price_by_age_bin, tot_purch_value_by_bin, avg_purch_price_by_bin_per_person)), columns =['Age Ranges','Purchase Count',	'Average Purchase Price',	'Total Purchase Value',	'Avg Total Purchase per Person']) 

bin_purch_summary_df.set_index('Age Ranges', inplace=True)
bin_purch_summary_df


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Age Ranges,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,23,3.353478,77.13,4.537059
10-14,28,2.956429,82.78,3.762727
15-19,136,3.035956,412.89,3.858785
20-24,365,3.052219,1114.06,4.318062
25-29,101,2.90099,293.0,3.805195
30-34,73,2.931507,214.0,4.115385
35-39,41,3.601707,147.67,4.763548
40+,13,2.941538,38.24,3.186667


Top Spenders Analysis

Identify the top 5 spender by amount and their:

- Gamer ID name
- The number of purchases they made
- The average price of their purchases
- The total amount of their purchases

In [91]:
purch_data_by_gamer = purchase_data.groupby(["SN"])

#Count the purchases by Gamer
purch_by_gamer = purch_data_by_gamer["Purchase ID"].count()

#Calc the average purchase price by Gamer
avg_purch_price_by_gamer = purch_data_by_gamer["Price"].sum()/purch_data_by_gamer["Purchase ID"].count()

#Calc the total value of purchases by Gamer
tot_purch_value_by_gamer = purch_data_by_gamer["Price"].sum()

#Extrat the user ID's
gamer_SN_df = pd.DataFrame(purch_data_by_gamer.first())
gamer_SN = gamer_SN_df.index


# Create a summary data frame to hold the results
gamer_purch_summary_df = pd.DataFrame(list(zip(purch_by_gamer, avg_purch_price_by_gamer, tot_purch_value_by_gamer)), index = gamer_SN, columns =['Purchase Count',	'Average Purchase Price',	'Total Purchase Value'])

gamer_purch_summary_df.sort_values(by=['Total Purchase Value'], inplace=True, ascending=False)

gamer_top_spenders = gamer_purch_summary_df.head(5)
gamer_top_spenders

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,3.792,18.96
Idastidru52,4,3.8625,15.45
Chamjask73,3,4.61,13.83
Iral74,4,3.405,13.62
Iskadarya95,3,4.366667,13.1


Most Popular Items Analysis

Identify the most popular items by number of purchases and their:
- Item ID
- Item Name
- Number of times the item was purchased
- Average price paid for the item
- Total amount of purchases for the item

In [92]:
#Create df grouped by Item ID
item_summary_df = purchase_data[["Item ID", "Item Name", "Price"]]
group_by_item = item_summary_df.groupby(["Item ID"])

#Extrat the item ID's
item_ID_df = pd.DataFrame(group_by_item.first())


#Item purchase count
purch_by_item = group_by_item["Item ID"].count()
item_ID_df["Purchase Count"]=purch_by_item
 
#Item price
item_price = group_by_item["Price"].mean()
item_ID_df["Price"]=item_price

#Item total purchase value
tot_purch_value_item = group_by_item["Price"].sum()
item_ID_df["Total Purchase Value"]=tot_purch_value_item

item_ID_df=item_ID_df[['Item Name', 'Purchase Count', 'Price', 'Total Purchase Value']]
item_ID_df.rename(columns = {'Price':'Item Price'}, inplace=True)

item_ID_df

item_ID_df.sort_values(by=['Purchase Count'], inplace=True, ascending=False)

most_pop_items = item_ID_df.head(5)
most_pop_items


Unnamed: 0_level_0,Item Name,Purchase Count,Item Price,Total Purchase Value
Item ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13,4.614615,59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12,4.23,50.76
145,Fiery Glass Crusader,9,4.58,41.22
132,Persuasion,9,3.221111,28.99
108,"Extraction, Quickblade Of Trembling Hands",9,3.53,31.77


Most Profitable Item Analysis

Identify the top 5 most popular items by total amount spent on that item, including:
- Item ID
- Item Name
- Number of time the item was purchased
- Total amount spend on the item

In [93]:
item_ID_df.sort_values(by=['Total Purchase Value'], inplace=True, ascending=False)
most_profit_items = item_ID_df.head(5)
most_profit_items


Unnamed: 0_level_0,Item Name,Purchase Count,Item Price,Total Purchase Value
Item ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13,4.614615,59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12,4.23,50.76
82,Nirvana,9,4.9,44.1
145,Fiery Glass Crusader,9,4.58,41.22
103,Singed Scalpel,8,4.35,34.8
