# Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (82%). There also exists, a smaller, but notable proportion of female players (16%).

* Our peak age demographic falls between 20-24 (42%) with secondary groups falling between 15-19 (17.80%) and 25-29 (15.48%).

* Our players are putting in significant cash during the lifetime of their gameplay. Across all major age and gender demographics, the average purchase for a user is roughly $491.   
-----------------------------------------------------------------

* Considering that a total amount of player is 573 and total amount of purchases is 780, we can see that game produces a good amount of repetitive purchases.

* A demographic range of 20-24 years brings the most profit overall by the highest amount of players. But an average purchase of this age category is under 3 dollars, where age categories 30-34 and 40+ are willing to spend above 3 dollars on average.

* Top 4 out of 5 most popular items that were purchased cost around 2 dollars and under. But 1 item is well above 4 dollars. This tells about an importance of the item's value which should be considered while improving and upgrading the game. 


In [3]:
import pandas as pd
# Reading json file
json_path = 'purchase_data.json'
game_data = pd.read_json(json_path, orient="records")
game_data.head()

Unnamed: 0,Age,Gender,Item ID,Item Name,Price,SN
0,38,Male,165,Bone Crushing Silver Skewer,3.37,Aelalis34
1,21,Male,119,"Stormbringer, Dark Blade of Ending Misery",2.32,Eolo46
2,34,Male,174,Primitive Blade,2.46,Assastnya25
3,21,Male,92,Final Critic,1.36,Pheusrical25
4,23,Male,63,Stormfury Mace,1.27,Aela59


## Player Count

In [4]:
# Count total amount of players
total_players = len(game_data["SN"].unique())

game = pd.DataFrame({"Total Players": [total_players]})
game

Unnamed: 0,Total Players
0,573


## Purchasing Analysis (Total)

In [5]:
# Count total revenue and average price of unique items
unique_items = len(game_data["Item ID"].unique()) 
mean_price = game_data["Price"].mean()
total_purchase = game_data["SN"].count()
total_revenue = game_data["Price"].sum()

# Set a dataframe
df = pd.DataFrame({"Number of Unique Items": [unique_items],
                    "Average Price": [mean_price],
                    "Number of Purchases": [total_purchase],
                    "Total Revenue": [total_revenue]})
purchase_df = df[['Number of Unique Items','Average Price','Number of Purchases','Total Revenue']]

# Format results 
purchase_df["Average Price"] = purchase_df["Average Price"].map("${:.2f}".format)
purchase_df["Total Revenue"] = purchase_df["Total Revenue"].map("${:,.2f}".format)

purchase_df


Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,183,$2.93,780,"$2,286.33"


## Gender Demographics

In [38]:
# Count total amount players' gender
total_gender = pd.DataFrame(game_data['Gender'].value_counts())
percent_gender =(total_gender / game_data['Gender'].count())*100
gender_df = pd.DataFrame(percent_gender)
gender_df["Total"] = total_gender

# Set a dataframe
gender_demo = gender_df.rename(columns={'Gender':'Percentage of Players',
                                         'Total': 'Total Count'})
# Format results
gender_demo['Percentage of Players'] = gender_demo['Percentage of Players'].map("{:.2f}".format)
gender_demo

Unnamed: 0,Percentage of Players,Total Count
Male,81.15,633
Female,17.44,136
Other / Non-Disclosed,1.41,11



## Purchasing Analysis (Gender)

In [40]:
# Count total purchases by gender

gender_total_purchase = game_data.groupby(["Gender"]).sum()["Price"]
gender_purchase_count = game_data.groupby(["Gender"]).count()["Price"]
gender_average_price = game_data.groupby(["Gender"]).mean()["Price"]
gender_normalized_totals = game_data.groupby(["Gender"]).sum()["Price"] / gender_demo["Total Count"]



gender_sales = pd.DataFrame({"Total Purchase Value": [gender_total_purchase],
                    "Purchase Count": [gender_purchase_count],
                    "Average Purchase Price": [gender_average_price],
                    "Normalized Totals": [gender_normalized_totals]})

gender_sales = gender_sales[["Purchase Count", "Average Purchase Price", "Total Purchase Value", "Normalized Totals"]]


# gender_sales["Total Purchase Value"] = gender_sales["Total Purchase Value"].map("${:,.2f}".format)
# gender_sales["Average Purchase Price"] = gender_sales["Average Purchase Price"].map("${:,.2f}".format)
# gender_sales["Purchase Count"] = gender_sales["Purchase Count"].map("{:,}".format)
# gender_sales["Normalized Totals"] = gender_sales["Normalized Totals"].map("${:,.2f}".format)


gender_sales= gender_sales.round(2)
gender_sales


Unnamed: 0,Purchase Count,Average Purchase Price,Total Purchase Value,Normalized Totals
0,Gender Female 136 Male ...,Gender Female 2.815515 Male ...,Gender Female 382.91 Male ...,Female 2.815515 Male ...


## Age Demographics

In [7]:
age_bins = [0, 9.90, 14.90, 19.90, 24.90, 29.90, 34.90, 39.90, 99999]
name_groups = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

age_demo = game_data.loc[:,["Age"]]
age_demo["Age Ranges"] = pd.cut(age_demo["Age"], bins=age_bins, labels=name_groups)
                            
age_demo_total = age_demo["Age Ranges"].value_counts()                               
age_demo_percent = (age_demo_total / total_players) * 100

age_ranges = pd.DataFrame({"Total Count": age_demo_total.round(2), 
                           "Percentage of Players": age_demo_percent.round(2)})
age_ranges.sort_index()



Unnamed: 0,Percentage of Players,Total Count
<10,4.89,28
10-14,6.11,35
15-19,23.21,133
20-24,58.64,336
25-29,21.82,125
30-34,11.17,64
35-39,7.33,42
40+,2.97,17


## Purchasing Analysis (Age)

In [2]:
game_data["Age Range"] = pd.cut(game_data["Age"], bins=age_bins, labels=name_groups)

age_data = pd.DataFrame({"Purchase Count": game_data.groupby(["Age Range"]).count()["Price"],
                "Average Purchase Price": game_data.groupby(["Age Range"]).mean()["Price"],
                "Total Purchase Value": game_data.groupby(["Age Range"]).sum()["Price"],
                "Normalized Totals": game_data.groupby(["Age Range"]).count()["Price"] / age_ranges["Total Count"]})

age_data = age_data[["Purchase Count", "Average Purchase Price", "Total Purchase Value", "Normalized Totals"]]

age_data["Average Purchase Price"] = age_data["Average Purchase Price"].map("${:,.2f}".format)
age_data["Total Purchase Value"] = age_data["Total Purchase Value"].map("${:,.2f}".format)
age_data ["Purchase Count"] = age_data["Purchase Count"].map("{:,}".format)
age_data["Normalized Totals"] = age_data["Normalized Totals"].map("${:,.2f}".format)

age_data



NameError: name 'pd' is not defined

## Top Spenders

In [9]:
buyer_data = pd.DataFrame({"Total Purchase Value": game_data.groupby(["SN"]).sum()["Price"],
                            "Purchase Count": game_data.groupby(["SN"]).count()["Price"],
                            "Average Purchase Price": game_data.groupby(["SN"]).mean()["Price"]})

buyer_data = buyer_data[["Purchase Count", "Average Purchase Price", "Total Purchase Value"]]

buyer_data["Average Purchase Price"] = buyer_data["Average Purchase Price"].map("${:,.2f}".format)
buyer_data["Total Purchase Value"] = buyer_data["Total Purchase Value"].map("${:,.2f}".format)

buyer_data.sort_values("Total Purchase Value", ascending=False).head(5)

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Qarwen67,4,$2.49,$9.97
Sondim43,3,$3.13,$9.38
Tillyrin30,3,$3.06,$9.19
Lisistaya47,3,$3.06,$9.19
Tyisriphos58,2,$4.59,$9.18


## Most Popular Items

In [10]:
item_purchase = game_data[["Item ID", "Item Name", "Price"]]

item_df = pd.DataFrame({"Total Purchase Value": game_data.groupby(["Item ID", "Item Name"]).sum()["Price"], 
                        "Item Price": game_data.groupby(["Item ID", "Item Name"]).mean()["Price"], 
                        "Purchase Count": game_data.groupby(["Item ID", "Item Name"]).count()["Price"]})

item_df = item_df[["Purchase Count", "Item Price", "Total Purchase Value"]]

item_df["Item Price"] = item_df["Item Price"].map("${:,.2f}".format)
item_df["Purchase Count"] = item_df["Purchase Count"].map("{:,}".format)
item_df["Total Purchase Value"] = item_df["Total Purchase Value"].map("${:,.2f}".format)

item_df.sort_values("Purchase Count", ascending=False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
13,Serenity,9,$1.49,$13.41
34,Retribution Axe,9,$4.14,$37.26
175,Woeful Adamantite Claymore,9,$1.24,$11.16
31,Trickster,9,$2.07,$18.63
106,Crying Steel Sickle,8,$2.29,$18.32


## Most Profitable Items

In [12]:
item_df.sort_values("Total Purchase Value", ascending=False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
170,Shadowsteel,5,$1.98,$9.90
21,Souleater,3,$3.27,$9.81
37,"Shadow Strike, Glory of Ending Hope",5,$1.93,$9.65
127,"Heartseeker, Reaver of Souls",3,$3.21,$9.63
120,Agatha,5,$1.91,$9.55
