# Heroes Of Pymoli Data Analysis

* Considering that total amount of players is 573 and total amount of purchases is 780, we can see that game produces a good amount of repetitive purchases.


* A demographic range of 20-24 years brings the most profit overall by the highest amount of players. But an average purchase of this age category is under 3 dollars, where age categories 30-34 and 40+ are willing to spend above 3 dollars on average.


* Gender group "Other / Non-Disclosed" had the highest purchase amount on average, even thought this group is only 1% out of all players.


* Top 4 out of 5 most popular items that were purchased cost around 2 dollars. But item "Retribution Axe" is well above 4 dollars, and it brought the most revenue of 37.26 dollars. This tells about an importance of this item's value which should be considered while improving and upgrading the game. 


In [40]:
# Import dependencies
import pandas as pd

# Read json file
json_path = 'purchase_data.json'
game_data = pd.read_json(json_path, orient="records")
game_data.head()

Unnamed: 0,Age,Gender,Item ID,Item Name,Price,SN
0,38,Male,165,Bone Crushing Silver Skewer,3.37,Aelalis34
1,21,Male,119,"Stormbringer, Dark Blade of Ending Misery",2.32,Eolo46
2,34,Male,174,Primitive Blade,2.46,Assastnya25
3,21,Male,92,Final Critic,1.36,Pheusrical25
4,23,Male,63,Stormfury Mace,1.27,Aela59


## Player Count

In [41]:
# Separate unique players 
players_breakdown = game_data[["Gender", "SN", "Age"]]
players_breakdown = players_breakdown.drop_duplicates()

# Count a total of players
total_players = players_breakdown["SN"].count()

# Display a total of players
pd.DataFrame({"Total Players": [total_players]})

Unnamed: 0,Total Players
0,573


## Purchasing Analysis (Total)

In [42]:
# Count total revenue, purchse, and average price of unique items
unique_items = len(game_data["Item ID"].unique()) 
avg_price = game_data["Price"].mean()
total_purchase = game_data["SN"].count()
total_revenue = game_data["Price"].sum()

# Set a dataframe
df = pd.DataFrame({"Number of Unique Items": [unique_items],
                    "Average Price": [avg_price],
                    "Number of Purchases": [total_purchase],
                    "Total Revenue": [total_revenue]})

purchase_df = df[['Number of Unique Items','Average Price','Number of Purchases','Total Revenue']]

# Format results 
purchase_df["Average Price"] = purchase_df["Average Price"].map("${:.2f}".format)
purchase_df["Total Revenue"] = purchase_df["Total Revenue"].map("${:,.2f}".format)

purchase_df


Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,183,$2.93,780,"$2,286.33"


## Gender Demographics

In [43]:
# Count a total amount players' gender and percentage
total_gender = pd.DataFrame(players_breakdown['Gender'].value_counts())
percent_gender =(total_gender / game_data['Gender'].count())*100
gender_df = pd.DataFrame(percent_gender)
gender_df["Total"] = total_gender

# Set a dataframe
gender_demogr = gender_df.rename(columns={'Gender':'Percentage of Players',
                                         'Total': 'Total Count'})
# Format results
gender_demogr['Percentage of Players'] = gender_demogr['Percentage of Players'].map("{:.2f}".format)
gender_demogr

Unnamed: 0,Percentage of Players,Total Count
Male,59.62,465
Female,12.82,100
Other / Non-Disclosed,1.03,8



## Purchasing Analysis (Gender)

In [44]:
# Count an amount of purchases, average purchase, and total value by gender
gender_sales = pd.DataFrame({"Purchase Count": game_data.groupby(["Gender"]).count()["Price"],
                            "Average Purchase Price": game_data.groupby(["Gender"]).mean()["Price"],
                            "Total Purchase Value": game_data.groupby(["Gender"]).sum()["Price"],
                            "Normalized Totals": game_data.groupby(["Gender"]).sum()["Price"] / gender_demogr["Total Count"]})
# Format results
gender_sales["Total Purchase Value"] = gender_sales["Total Purchase Value"].map("${:,.2f}".format)
gender_sales["Average Purchase Price"] = gender_sales["Average Purchase Price"].map("${:,.2f}".format)
gender_sales["Purchase Count"] = gender_sales["Purchase Count"].map("{:,}".format)
gender_sales["Normalized Totals"] = gender_sales["Normalized Totals"].map("${:,.2f}".format)

# Change order of the dataframe
gender_sales = gender_sales[["Purchase Count", "Average Purchase Price", "Total Purchase Value", "Normalized Totals"]]

# Round the results
gender_sales= gender_sales.round(2)
gender_sales


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Normalized Totals
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,136,$2.82,$382.91,$3.83
Male,633,$2.95,"$1,867.68",$4.02
Other / Non-Disclosed,11,$3.25,$35.74,$4.47


## Age Demographics

In [45]:
# Creat categories of age
age_bins = [0, 9.90, 14.90, 19.90, 24.90, 29.90, 34.90, 39.90, 99999]
name_groups = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

# Connect age categories and labels
age_demogr = players_breakdown.loc[:,["Age"]]
age_demogr["Age Ranges"] = pd.cut(age_demogr["Age"], bins=age_bins, labels=name_groups)

# Count percentage of players by age categories
age_demogr_total = age_demogr["Age Ranges"].value_counts()                               
age_demogr_percent = (age_demogr_total / total_players) * 100

# Display a dataframe
age_ranges = pd.DataFrame({"Total Count": age_demogr_total.round(2), 
                           "Percentage of Players": age_demogr_percent.round(2)})
age_ranges.sort_index()



Unnamed: 0,Percentage of Players,Total Count
<10,3.32,19
10-14,4.01,23
15-19,17.45,100
20-24,45.2,259
25-29,15.18,87
30-34,8.2,47
35-39,4.71,27
40+,1.92,11


## Purchasing Analysis (Age)

In [46]:
# Add column of age range to the original datafrane
game_data["Age Range"] = pd.cut(game_data["Age"], bins=age_bins, labels=name_groups)

# Count an amount of purchases, average purchase, and total value by age
age_data = pd.DataFrame({ "Total Purchase Value": game_data.groupby(["Age Range"]).sum()["Price"],
                    "Purchase Count": game_data.groupby(["Age Range"]).count()["Price"],
                    "Average Purchase Price": game_data.groupby(["Age Range"]).mean()["Price"],
                    "Normalized Totals": game_data.groupby(["Age Range"]).sum()["Price"] / age_ranges["Total Count"]})
# Rearrange columns
age_data = age_data[["Purchase Count", "Average Purchase Price", "Total Purchase Value", "Normalized Totals"]]

# Format results
age_data["Average Purchase Price"] = age_data["Average Purchase Price"].map("${:,.2f}".format)
age_data["Total Purchase Value"] = age_data["Total Purchase Value"].map("${:,.2f}".format)
age_data ["Purchase Count"] = age_data["Purchase Count"].map("{:,}".format)
age_data["Normalized Totals"] = age_data["Normalized Totals"].map("${:,.2f}".format)

age_data
 

Unnamed: 0,Purchase Count,Average Purchase Price,Total Purchase Value,Normalized Totals
10-14,35,$2.77,$96.95,$4.22
15-19,133,$2.91,$386.42,$3.86
20-24,336,$2.91,$978.77,$3.78
25-29,125,$2.96,$370.33,$4.26
30-34,64,$3.08,$197.25,$4.20
35-39,42,$2.84,$119.40,$4.42
40+,17,$3.16,$53.75,$4.89
<10,28,$2.98,$83.46,$4.39


## Top Spenders

In [47]:
# Count total and average purchases by indiviual players
buyer_data = pd.DataFrame({"Total Purchase Value": game_data.groupby(["SN"]).sum()["Price"],
                            "Purchase Count": game_data.groupby(["SN"]).count()["Price"],
                            "Average Purchase Price": game_data.groupby(["SN"]).mean()["Price"]})
# Rearrange dataframe
buyer_data = buyer_data[["Purchase Count", "Average Purchase Price", "Total Purchase Value"]]

# Format results
buyer_data["Average Purchase Price"] = buyer_data["Average Purchase Price"].map("${:,.2f}".format)
buyer_data["Total Purchase Value"] = buyer_data["Total Purchase Value"].map("${:,.2f}".format)

# Sort results by total purchase
buyer_data.sort_values("Total Purchase Value", ascending=False).head(5)

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Qarwen67,4,$2.49,$9.97
Sondim43,3,$3.13,$9.38
Tillyrin30,3,$3.06,$9.19
Lisistaya47,3,$3.06,$9.19
Tyisriphos58,2,$4.59,$9.18


## Most Popular Items

In [48]:
# Create set of data based on information about game items
item_purchase = game_data[["Item ID", "Item Name", "Price"]]

# Count an amount of purchases, average purchase, and total value of purchases by items
item_df = pd.DataFrame({"Total Purchase Value": game_data.groupby(["Item ID", "Item Name"]).sum()["Price"], 
                        "Item Price": game_data.groupby(["Item ID", "Item Name"]).mean()["Price"], 
                        "Purchase Count": game_data.groupby(["Item ID", "Item Name"]).count()["Price"]})
# Rearrange dataframe
item_df = item_df[["Purchase Count", "Item Price", "Total Purchase Value"]]

# Format results
item_df["Item Price"] = item_df["Item Price"].map("${:,.2f}".format)
item_df["Purchase Count"] = item_df["Purchase Count"].map("{:,}".format)
item_df["Total Purchase Value"] = item_df["Total Purchase Value"].map("${:,.2f}".format)

# Sort by count of purchses
item_df.sort_values("Purchase Count", ascending=False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
13,Serenity,9,$1.49,$13.41
34,Retribution Axe,9,$4.14,$37.26
175,Woeful Adamantite Claymore,9,$1.24,$11.16
31,Trickster,9,$2.07,$18.63
106,Crying Steel Sickle,8,$2.29,$18.32


## Most Profitable Items

In [49]:
# Sort by total purchase value
item_df.sort_values("Total Purchase Value", ascending=False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
170,Shadowsteel,5,$1.98,$9.90
21,Souleater,3,$3.27,$9.81
37,"Shadow Strike, Glory of Ending Hope",5,$1.93,$9.65
127,"Heartseeker, Reaver of Souls",3,$3.21,$9.63
120,Agatha,5,$1.91,$9.55
