### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np
import csv
# File to Load (Remember to Change These)
file_to_load = "purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)
df = pd.DataFrame(purchase_data)
df

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44
5,5,Yalae81,22,Male,81,Dreamkiss,3.61
6,6,Itheria73,36,Male,169,"Interrogator, Blood Blade of the Queen",2.18
7,7,Iskjaskst81,20,Male,162,Abyssal Shard,2.67
8,8,Undjask33,22,Male,21,Souleater,1.10
9,9,Chanosian48,35,Other / Non-Disclosed,136,Ghastly Adamantite Protector,3.58


## Player Count

* Display the total number of players


In [2]:
# purchase_data.columns
# ------------------------
# df.describe
# 780 Rows x 7 Columns
# ------------------------
unique = df["SN"].unique()
# unique
# -----------------------
total_players = len(unique)
# print("Total Players = " + str(totalPlayers))
# 576

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
# Number of unique items
unique_items = len(df["Item Name"].unique())
# ----------------------------
# Average Price
average_price = round((df["Price"].mean()),2)
# ----------------------------
# Number of Purchases
tx_total = len(df["SN"])
# ----------------------------
# Total Revenue
total_revenue = str(round(sum(df["Price"]),2))
# -----------------------------
# Summary Data Frame 1
summary_df = pd.DataFrame({"Number of Unique Items": [unique_items], 
                           "Average Price": [average_price], 
                           "Number of Purchases": [tx_total],
                           "Total Revenue": [total_revenue]
                          })
summary_df

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,179,3.05,780,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [4]:
# Data Frames: Gender
only_males_df = df.loc[df["Gender"]=="Male",:]
only_females_df = df.loc[df["Gender"]=="Female",:]
only_others_df = df.loc[df["Gender"]=="Other / Non-Disclosed",:]
# ------------------------------
# Unique Users: Gender
unique_males = only_males_df["SN"].unique()
unique_females = only_females_df["SN"].unique()
unique_others = only_others_df["SN"].unique()
#------------------------
# Gender Counts
male_count = len(unique_males)
female_count = len(unique_females)
other_count = len(unique_others)
# ---------------
# Gender Percents
male_percent = round(100*male_count/total_players, 2)
female_percent = round(100*female_count/total_players,2)
other_percent = round(100*other_count/total_players,2)
#-------------------------
# Summary Data Frame 2
summary_df_2 = pd.DataFrame({" ": ["Male", "Female", "Other"],
                           "Total Count": [male_count, female_count, other_count], 
                           "Percentage of Players": [male_percent, female_percent, other_percent]
                          })
summary_df_2

Unnamed: 0,Unnamed: 1,Total Count,Percentage of Players
0,Male,484,84.03
1,Female,81,14.06
2,Other,11,1.91



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [5]:
# Purchase Counts: Gender
male_purchase_count = round(len(only_males_df))
female_purchase_count = round(len(only_females_df))
other_purchase_count = round(len(only_others_df))
# -------------------------------------
# Average Prices: Gender
male_average_price = round(only_males_df["Price"].mean(),2)
female_average_price = round(only_females_df["Price"].mean(),2)
other_average_price = round(only_others_df["Price"].mean(),2)
# ---------------------------------------------
# Total Values: Gender
male_total_value = only_males_df["Price"].sum()
female_total_value = only_females_df["Price"].sum()
other_total_value = only_others_df["Price"].sum()
# ---------------------------------------------
# Average Total Purchases: Gender
male_avg_total_per_person = round(male_total_value/male_count,2)
female_avg_total_per_person = round(female_total_value/female_count,2)
other_avg_total_per_person = round(other_total_value/other_count,2)
#---------------------------------------------------
# Summary df 3
summary_df_3 = pd.DataFrame({"Gender":["Male","Female","Other"],
                            "Purchase Count": [male_purchase_count, female_purchase_count, other_purchase_count],
                            "Average Purchase Price": [male_average_price, female_average_price, other_average_price],
                            "Total Purchase Value": [male_total_value, female_total_value, other_total_value],
                            "Avg Total Purchase per Person": [male_avg_total_per_person, female_avg_total_per_person, other_avg_total_per_person]})
summary_df_3

Unnamed: 0,Gender,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
0,Male,652,3.02,1967.64,4.07
1,Female,113,3.2,361.94,4.47
2,Other,15,3.35,50.19,4.56


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [6]:
# Age Demographics
age_bins = [0,18,24,34,45]
age_labels = ["Teen", "Young Adult", "Adult", "Advanced User" ]
# Append list to original df
df["Age Demographic"] = pd.cut(df["Age"], age_bins, labels = age_labels)
# df.head()
#-------------
# Groupby Age
# age_df = df.groupby("Age Demographic")
# age_df.count()
#---------------
#Data Frames: Age
t_df = df.loc[df["Age Demographic"]=="Teen",:]
ya_df = df.loc[df["Age Demographic"]=="Young Adult",:]
a_df = df.loc[df["Age Demographic"]=="Adult",:]
au_df = df.loc[df["Age Demographic"]=="Advanced User",:]
# ------------------------------
# Unique Users: Age
unique_t = t_df["SN"].unique()
unique_ya = ya_df["SN"].unique()
unique_a = a_df["SN"].unique()
unique_au = au_df["SN"].unique()
#------------------------
# Age Counts
t_count = len(unique_t)
ya_count = len(unique_ya)
a_count = len(unique_a)
au_count = len(unique_au)
# ---------------
# Age Percents
t_percent = round(100*t_count/total_players, 2)
ya_percent = round(100*ya_count/total_players,2)
a_percent = round(100*a_count/total_players,2)
au_percent = round(100*au_count/total_players,2)
#-------------------------
# Summary Data Frame 4
summary_df_4 = pd.DataFrame({" ": ["Teen", "Young Adult", "Adult", "Advanced User"],
                           "Total Count": [t_count, ya_count, a_count, au_count], 
                           "Percentage of Players": [t_percent, ya_percent, a_percent, au_percent]
                          })
summary_df_4

Unnamed: 0,Unnamed: 1,Total Count,Percentage of Players
0,Teen,129,22.4
1,Young Adult,275,47.74
2,Adult,129,22.4
3,Advanced User,43,7.47


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [7]:
# Bin by age (see above: Age Demographics)
# -----------------------------------
# Purchase Counts: Age
t_purchase_count = len(t_df)
ya_purchase_count = len(ya_df)
a_purchase_count = len(a_df)
au_purchase_count = len(au_df)
#------------------------------
# Average Purchase Prices: Age
t_average_purchase = round(t_df["Price"].mean(),2)
ya_average_purchase = round(ya_df["Price"].mean(),2)
a_average_purchase = round(a_df["Price"].mean(),2)
au_average_purchase = round(au_df["Price"].mean(),2)
# --------------------------------------
# Total Values: Age
t_total_value = t_df["Price"].sum()
ya_total_value = ya_df["Price"].sum()
a_total_value = a_df["Price"].sum()
au_total_value = au_df["Price"].sum()
# ---------------------------------------------
# Average Total Purchases: Age
t_avg_total_per_person = round(t_total_value/t_count,2)
ya_avg_total_per_person = round(ya_total_value/ya_count,2)
a_avg_total_per_person = round(a_total_value/a_count,2)
au_avg_total_per_person = round(au_total_value/au_count,2)
#---------------------------------------------------
# Summary df 5
summary_df_5 = pd.DataFrame({"Age Demographic":age_labels,
                            "Purchase Count": [t_purchase_count, ya_purchase_count, a_purchase_count, au_purchase_count],
                            "Average Purchase Price": [t_average_purchase, ya_average_purchase, a_average_purchase, au_average_purchase],
                            "Total Purchase Value": [t_total_value, ya_total_value, a_total_value, au_total_value],
                            "Avg Total Purchase per Person": [t_avg_total_per_person, ya_avg_total_per_person, a_avg_total_per_person, au_avg_total_per_person]})
summary_df_5

Unnamed: 0,Age Demographic,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
0,Teen,164,3.07,502.82,3.9
1,Young Adult,388,3.05,1184.04,4.31
2,Adult,174,2.91,507.0,3.93
3,Advanced User,54,3.44,185.91,4.32


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [30]:
# Top Spenders
#-------------------------
#Identify the the top 5 spenders in the game by total purchase value, then list (in a table):
# SN
# Purchase Count
# Average Purchase Price
# Total Purchase Value
# ---------------------------------------------------
# Grouped df by SN
grouped_sn = df.groupby(["SN"])
# ------------------------------------------------------
# Purchase Count by SN
sn_purchase_count = grouped_sn["Price"].count()
# ------------------------------------------------------
# Total Purchase Value by SN
total_purchase_value_per_person = grouped_sn["Price"].sum()
#total_purchase_value_per_person_df_desc = total_purchase_value_per_person_df.sort_values("Price", ascending=False)
#total_purchase_value_per_person_df_desc
# -----------------------------------------
# Average Purchase Price
average_purchase_price_per_person=grouped_sn["Price"].mean()
# ---------------------------------------
# Summary df 6
summary_df_6 = pd.DataFrame({"Purchase Count": sn_purchase_count,
                            "Average Purchase Price": round(average_purchase_price_per_person,2),
                            "Total Purchase Value": total_purchase_value_per_person,
                            })
sorted_summary_df_6 = summary_df_6.sort_values(by=["Total Purchase Value"], ascending = False)
sorted_summary_df_6.head(5)

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,3.79,18.96
Idastidru52,4,3.86,15.45
Chamjask73,3,4.61,13.83
Iral74,4,3.4,13.62
Iskadarya95,3,4.37,13.1


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [29]:
# Most Popular Items
item_df = df[["Item Name", "Item ID", "Price"]]
grouped_item_df = item_df.groupby(["Item ID", "Item Name"])
item_purchase_count = grouped_item_df["Item ID"].count()
item_price = grouped_item_df["Price"].mean()
item_total_purchase_value = grouped_item_df["Price"].sum()
#--------------------------------
summary_df_7 = pd.DataFrame({"Purchase Count": item_purchase_count,
                            "Item Price": item_price,
                            "Total Purchase Value": item_total_purchase_value})
sorted_summary_df_7 = summary_df_7.sort_values(by=["Purchase Count"], ascending = False)
sorted_summary_df_7.head(5)


Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,4.23,50.76
145,Fiery Glass Crusader,9,4.58,41.22
108,"Extraction, Quickblade Of Trembling Hands",9,3.53,31.77
82,Nirvana,9,4.9,44.1
19,"Pursuit, Cudgel of Necromancy",8,1.02,8.16


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [26]:
# Most Profitable Items
profitable_items = sorted_summary_df_7 = summary_df_7.sort_values(by=["Total Purchase Value"], ascending = False)
profitable_items.head(5)

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,4.23,50.76
82,Nirvana,9,4.9,44.1
145,Fiery Glass Crusader,9,4.58,41.22
92,Final Critic,8,4.88,39.04
103,Singed Scalpel,8,4.35,34.8


In [31]:
#3 Observations
#----------------------------
print("Observation 1: Price has little correlation with popularity and total sales of an item.  The more popular an item, the more you can charge for it.")
print("Observation 2: Teens and Adults are similar in population and total spending per person. Young adults make the largest demographic with the most spending.")
print("Observation 3: The average price paid per item for the top spenders is $3.40-$4.61, the current average price for items is $3.05. These top spenders are not only buying items, but are opting for the more expensive ones.")

Observation 1: Price has little correlation with popularity and total sales of an item.  The more popular an item, the more you can charge for it.
Observation 2: Teens and Adults are similar in population and total spending per person. Young adults make the largest demographic with the most spending.
Observation 3: The average price paid per item for the top spenders is $3.40-$4.61, the current average price for items is $3.05. These top spenders are not only buying items, but are opting for the more expensive ones.
