# Heroes of Pymoli Purchase Data

After analyzing purchasing data for the new fantasy game, *Heroes of Pymoli*, the following insights were uncovered:

- More males play than females. While purchase averages are similar, males contribute to total revnue more by sheer quantity. 

- This game is most popular amongst adolescents aged 20-24. Consequently, this age group is spending the most. However, children under ten spend more per purchase on average. 

- *Oathbreaker, Last Hope of the Breaking Storm* was the most popular AND profitable item. *Nirvana* and *Fiery Glass Crusader* closely followed, but *Nirvana* was more profitable.  

These insights reveal the game's primary and secondary target consumers. While men 20-24 contribute to sales, there is also an opportunity to get more women involved as they spend more per purchase. Similarly, parents of children under ten spend more per purchase and can be targeted. 

As for profitable items, games with a higher price tend to be purchase more frequently, indicating players aren't falling into the price conscious category. Items like *The Decapitator* can be removed or promoted less in the game as they aren't popular or profitable. 

Further analysis can include an inspection of the characteristics for best/worst selling items, to identify what is attractive to gamers and increase sales. Additionally, looking at popular and profitable items by gender may yield interesting insights (do specific items appeal to one gender more than the other?).

# Data Exploration

In [88]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
purchase_data = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(purchase_data)

purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",$3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,$1.56
2,2,Ithergue48,24,Male,92,Final Critic,$4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,$3.27
4,4,Iskosia90,23,Male,131,Fury,$1.44


### Player Count

* Display the total number of players


In [89]:
total_number_of_players = purchase_data["SN"].value_counts()
print("Total Number of Players:", len(total_number_of_players))

Total Number of Players: 576


### Purchasing Analysis Overview 

 - Number of Unique Items	
 - Average Price: \$3.05 
 - Number of Purchases: 780
 - Total Revenue: \$2379.77

In [90]:
Unique_Items = len(purchase_data["Item ID"].value_counts())
Average_Price = purchase_data["Price"].mean()
Number_of_Purchases = purchase_data["Price"].count()
Total_Revenue = purchase_data["Price"].sum()


#Must add unique values to summary table, after correcting (above)
Purchase_Analysis_df = pd.DataFrame({"Number of Unique_Items": [Unique_Items],
                           "Average Price": [Average_Price],
                           "Number of Purchases": [Number_of_Purchases],
                           "Total Revenue": [Total_Revenue]
                           }
)                    


pd.options.display.float_format = '${:.2f}'.format

display(Purchase_Analysis_df)

Unnamed: 0,Number of Unique_Items,Average Price,Number of Purchases,Total Revenue
0,183,$3.05,780,$2379.77


# Demographic Analysis: Gender

* More men play than women


### Gender Breakdown

In [91]:
gender_data = purchase_data.loc[:,["Gender","Age","SN"]]
gender_data = gender_data.drop_duplicates()


num_of_players = gender_data.count()[0]
total_count = gender_data["Gender"].value_counts()

gender_percent = (gender_data["Gender"].value_counts()/num_of_players) 
gender_percent

#New DataFrame to hold Gender Demographics
gender_demo = pd.DataFrame({"Total Count": total_count, "Percentage of Players": gender_percent})

gender_demo['Percentage of Players'] = gender_demo['Percentage of Players'].map("{:,.0%}".format)
gender_demo

Unnamed: 0,Total Count,Percentage of Players
Male,484,84%
Female,81,14%
Other / Non-Disclosed,11,2%



### Purchasing by Gender

 - Female players spend .40 cents more than male players on average  
 - Since there are more male players, total purchase value is higher amongst males
 - Marketing should strive to capture more female players since they're spending more on average
 

In [92]:
#Purchase Count by Gender
purchase_count = purchase_data["Gender"].value_counts()
purchase_count


gender_group = purchase_data.groupby(['Gender'])
gender_group.head()

#Average purchase price by gender
avg_pp = gender_group["Price"].mean()
avg_pp

#Total Purchase Value
avg_s = gender_group["Price"].sum()
avg_s


avg_tpp = avg_s/total_count
avg_tpp


# #New DataFrame to hold Purchase Analysis by Gender

purchase_demo = pd.DataFrame({"Purchase Count": purchase_count, "Average Purchase Price": avg_pp, 
                             "Total Purchase Value": avg_s, "Total Purchase per Person": avg_tpp
                             }
                            )

pd.options.display.float_format = '${:.2f}'.format

display(purchase_demo)

Unnamed: 0,Purchase Count,Average Purchase Price,Total Purchase Value,Total Purchase per Person
Female,113,$3.20,$361.94,$4.47
Male,652,$3.02,$1967.64,$4.07
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


# Demographic Analysis: Age

 - The target consumer is adolescents 20-24 years old (45%), followed by 15-19 years old (18%). 

### Age Breakdown

In [175]:
age_data = purchase_data.loc[:, ["Gender", "SN", "Age"]]
age_data = age_data.drop_duplicates()
age_data.head(2)

Unnamed: 0,Gender,SN,Age
0,Male,Lisim78,20
1,Male,Lisovynya38,40


In [94]:
bins = [0,9,14,19,24,29,34,39,100] 
bin_names = ["<10", "10-14", "15-19", "20-24", "25-29","30-34","35-39","40+"] 

bin_pd = pd.cut(age_data['Age'],bins, labels=bin_names)
display(bin_pd)

0      20-24
1        40+
2      20-24
3      20-24
4      20-24
       ...  
773    20-24
774    10-14
775    20-24
777    20-24
778      <10
Name: Age, Length: 576, dtype: category
Categories (8, object): [<10 < 10-14 < 15-19 < 20-24 < 25-29 < 30-34 < 35-39 < 40+]

In [95]:
# Calculate the Numbers and Percentages by Age Group
age_count = bin_pd.value_counts()
app_a = age_count / bin_pd.count()
age_df = pd.DataFrame({"Total Count": age_count, "Percentage of Players": app_a})

# Minor Data Munging
age_df['Percentage of Players'] = age_df['Percentage of Players'].map("{:,.2%}".format)

# Display Age Demographics Table
age_df = age_df.sort_index()
age_df = age_df.sort_index()
age_df

Unnamed: 0,Total Count,Percentage of Players
<10,17,2.95%
10-14,22,3.82%
15-19,107,18.58%
20-24,258,44.79%
25-29,77,13.37%
30-34,52,9.03%
35-39,31,5.38%
40+,12,2.08%


### Purchases By Age Range

 - The majority of purchases come from adolescent 20-24 years old. Their age group accounting for a little under half    of all spending.
 - However, children under ten spend tend to spend .30 cents more on average and spend the most per person.
 - The creators of Pymoli should focus on targeting parents with children under ten, as  parents have the purchasing      power.

In [126]:
#Select relevant columns
age_pa = purchase_data.loc[:, ["Gender", "SN", "Age","Price"]]
#Add age ranges
age_pa["Age Ranges"] = pd.cut(age_pa["Age"], bins, labels=bin_names)
age_pa.head(2)

Unnamed: 0,Gender,SN,Age,Price,Age Ranges
0,Male,Lisim78,20,$3.53,20-24
1,Male,Lisovynya38,40,$1.56,40+


In [140]:
#Total Purchase Count
total_count = age_pa.groupby("Age Ranges").count()["Price"]

#Total Purchase Value
total_purchases = age_pa.groupby("Age Ranges").sum()["Price"]

#Average Purchase Price
average_purchase = age_pa.groupby("Age Ranges").mean()["Price"]

#Average Item Spend Per Person
average_purchase_pp = total_purchases / age_df["Total Count"]



# New DF 
purchase_analysis_df = pd.DataFrame({"Total Purchase Value": total_purchases, 
                                     "Total Spent Per Person":average_purchase_pp,
                                     "Average Purchase Price": average_purchase, 
                                     "Purchase Frequency": total_count, "Age Range Count": age_df["Total Count"],"Age Range Percent": age_df["Percentage of Players"]})
purchase_analysis_df.head()

Unnamed: 0,Total Purchase Value,Total Spent Per Person,Average Purchase Price,Purchase Frequency,Age Range Count,Age Range Percent
<10,$77.13,$4.54,$3.35,23,17,2.95%
10-14,$82.78,$3.76,$2.96,28,22,3.82%
15-19,$412.89,$3.86,$3.04,136,107,18.58%
20-24,$1114.06,$4.32,$3.05,365,258,44.79%
25-29,$293.00,$3.81,$2.90,101,77,13.37%


## Purchasing Analysis

### Top Spenders

 - The top top spenders tend to purchase 3-5 items and spend \\$13 - $18 in total. *Lisosia93* is the top spender.

In [159]:
spenders = purchase_data.loc[:,["SN","Item Name","Price"]]
spenders.head(2)

Unnamed: 0,SN,Item Name,Price
0,Lisim78,"Extraction, Quickblade Of Trembling Hands",$3.53
1,Lisovynya38,Frenzied Scimitar,$1.56


In [168]:
sn_purchase_cnt = spenders.groupby(["SN"]).count()["Price"]

sn_avg_price = spenders.groupby(["SN"]).mean()["Price"]

sn_tpv = spenders.groupby(["SN"]).sum()["Price"]

Top_Spenders = pd.DataFrame({"Purchase Count": sn_purchase_cnt, "Average Purchase Price": sn_avg_price, 
                             "Total Purchase Value": sn_tpv
                             }
                            )

display(Top_Spenders.sort_values(by="Total Purchase Value",ascending=False)[:5])


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$3.79,$18.96
Idastidru52,4,$3.86,$15.45
Chamjask73,3,$4.61,$13.83
Iral74,4,$3.40,$13.62
Iskadarya95,3,$4.37,$13.10


###  Popular Items

 - The most popular (frequently purchased) game is *Oathbreaker, Last Hope of the Breaking Storm*.


In [172]:
popular = purchase_data[["Item ID","Item Name","Price"]]                        

#Purchase Count
pop_count = popular.groupby(['Item ID','Item Name']).count()["Price"]

pop_price = popular.groupby(['Item ID','Item Name']).mean()["Price"]

pop_total = popular.groupby(['Item ID','Item Name']).sum()["Price"]


popular_items_df = pd.DataFrame({"Purchase Count": pop_count, "Item Price": pop_price,
                            "Total Purchase Value": pop_total
                          }                            
                        )

display(popular_items_df.sort_values(by="Purchase Count",ascending=False)[:5])

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
145,Fiery Glass Crusader,9,$4.58,$41.22
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77
82,Nirvana,9,$4.90,$44.10
19,"Pursuit, Cudgel of Necromancy",8,$1.02,$8.16


### Most Profitable Items

 - *Oathbreaker, Last Hope of the Breaking Storm* is the most profitable item (\$50.76).

 - Nirvana is the second most profitable item (\$44.10).


In [173]:
display(popular_items_df.sort_values(by="Total Purchase Value",ascending=False)[:5])

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
82,Nirvana,9,$4.90,$44.10
145,Fiery Glass Crusader,9,$4.58,$41.22
92,Final Critic,8,$4.88,$39.04
103,Singed Scalpel,8,$4.35,$34.80


In [174]:
display(popular_items_df.sort_values(by="Total Purchase Value",ascending=True)[:5])

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
42,The Decapitator,1,$1.75,$1.75
104,Gladiator's Glaive,1,$1.93,$1.93
23,Crucifer,1,$1.99,$1.99
126,Exiled Mithril Longsword,1,$2.00,$2.00
125,Whistling Mithril Warblade,2,$1.00,$2.00
