### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [135]:

# DEPENTENCIES AND SETUP

import pandas as pd
import numpy as np

# Specifying the csv path
file_to_load = "Resources/purchase_data.csv"


# csv reader read the Purchasing File(purchase_data.csv) and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)


# Head displays the top five rows by default
# We can give the row number inside the paranthesis to display the rows
# To display maximum rows
pd.set_option('display.max_rows', None)
purchase_data.head()




Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count

* Display the total number of players


In [123]:
# PLAYER COUNT


# Calculating "Total Players" 
# By extracting the unique player list from column "SN" by using unique()
# And counting them by using len()
# unique removes duplicates
Total_Number_Of_Players = len(purchase_data["SN"].unique())




# Create Data Frame with Total Players and index = 0
Player_Count = pd.DataFrame({"Total Players":Total_Number_Of_Players},index = [0])
Player_Count



Unnamed: 0,Total Players
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [124]:
# PURCHASING ANALYSIS (TOTAL) 


# Calculating the "Number Of Unique Items" 
# By extracting the unique item list from the column "Item ID" by using unique() 
# And counting them by using len()
Number_Of_Unique_Items = len(purchase_data["Item ID"].unique())


# Calculating the "Average Price" of the Purchase by taking mean of the column "Price"
Average_Price = purchase_data["Price"].mean()


# Caluclating the "Number Of Purchases" by counting the column "Purchase ID" 
Number_Purchases = purchase_data["Purchase ID"].count()


# Calculating the "Total Revenue" by taking sum of the column "Price"
Total_Revenue = purchase_data["Price"].sum()



In [125]:



# Create "Purchase_Analysis_Total" DataFrame to hold the above results 
Purchasing_Analysis_Total = pd.DataFrame({"Number of Unique Items":Number_Of_Unique_Items,
                                          "Average Price":Average_Price,
                                          "Number of Purchases":Number_Purchases,
                                          "Total Revenue":Total_Revenue}, index=[0])




# Data Munging
# Here Formating can be used to give "$" symbol to the value of the "Average Price" and "Total Revenue" 
# Formating can be used to round the decimal number   
# Then mapped to the Purchasing_Analysis["Average Price"] and to the Purchasing_Analysis["Total Revenue"]

Purchasing_Analysis_Total["Average Price"] = Purchasing_Analysis_Total["Average Price"].map("${:.2f}".format)
Purchasing_Analysis_Total["Total Revenue"] = Purchasing_Analysis_Total["Total Revenue"].map("${:.2f}".format)
Purchasing_Analysis_Total



Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,183,$3.05,780,$2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [126]:
# GENDER DEMOGRAPHICS



# Calculating the Unique Members
# Grouping based on Gender ie looking for "Male","Female" and "Other / Non-Disclosed" 
# And calculating Unique_Members by comparing column "gender" with column "SN" by using nunique

Gender_Grouped = purchase_data.groupby("Gender")
Unique_Members = Gender_Grouped["SN"].nunique()

# Calculating the "Percentage of Players"
# We have already calculated the "Total_Number_Of_Players"
Percentage_Of_Players = (Unique_Members/Total_Number_Of_Players) * 100


# Create "Gender_Data" DataFrame to hold the results
Gender_Data = pd.DataFrame({"Total Count": Unique_Members,
                            "Percentage of Players": Percentage_Of_Players},
                           index = ["Male","Female","Other / Non-Disclosed"])


# Data Munging
# Column "Percentage of Players" is formatted to round the values to 2 decimal places  
# And mapped to the Gender_Data["Percentage of Players"]
Gender_Data["Percentage of Players"] = Gender_Data["Percentage of Players"].map("{:.2f}".format)
Gender_Data



Unnamed: 0,Total Count,Percentage of Players
Male,484,84.03
Female,81,14.06
Other / Non-Disclosed,11,1.91


In [38]:

## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [127]:
# PURCHASING ANALYSIS (GENDER)


# Groupby based on "Gender" 
# Calculate the "Purchase Count","Average Purchase Price","Total Purchase Value", and "Average Purchase Total per Person"  
Gender_Grouped = purchase_data.groupby("Gender")



# Calculate the "Purchase Count"
Purchase_Count = Gender_Grouped["Age"].count()

# Calculate the "Average Purchase Price"
Average_Purchase_Price = Gender_Grouped["Price"].mean()

# Calculate the "Total Purchase Value"
Total_Purchase_Value = Gender_Grouped["Price"].sum()

# Calculate the "Average Purchase Total per Person"
Average_Purchase_Total_Per_Person =(Gender_Grouped["Price"].sum()/Unique_Members)






# Create "Purchasing_Analysis_Data" DataFrame to hold the above results 
Purchasing_Analysis_Gender = pd.DataFrame({"Purchase Count": Purchase_Count,
                                         "Average Purchase Price": Average_Purchase_Price,
                                         "Total Purchase Value": Total_Purchase_Value,
                                         "Average Purchase Total Per Person": Average_Purchase_Total_Per_Person},
                                        index = ["Male","Female","Other / Non-Disclosed"])


# Data Munging

Purchasing_Analysis_Gender["Average Purchase Price"] = Purchasing_Analysis_Gender["Average Purchase Price"].map("${:.2f}".format)
Purchasing_Analysis_Gender["Total Purchase Value"]  = Purchasing_Analysis_Gender["Total Purchase Value"].map("${:.2f}".format)
Purchasing_Analysis_Gender["Average Purchase Total Per Person"] = Purchasing_Analysis_Gender["Average Purchase Total Per Person"].map("${:.2f}".format)
Purchasing_Analysis_Gender.head()



Unnamed: 0,Purchase Count,Average Purchase Price,Total Purchase Value,Average Purchase Total Per Person
Male,652,$3.02,$1967.64,$4.07
Female,113,$3.20,$361.94,$4.47
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [128]:
# AGE DEMOGRAPHICS


# Establish bins for ages
Age_Bins = [0, 9.90, 14.90, 19.90, 24.90, 29.90, 34.90, 39.90, 99999]
# Create Labels
Group_Names = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]


# Slice the data using pd.cut  
# Categorize the existing players based on Age_Bins  
purchase_data["Age Group"] = pd.cut(purchase_data["Age"],Age_Bins,labels=Group_Names)
purchase_data


# Groupby  based on "Age Group" 
Purchase_Data_Age_Grouped = purchase_data.groupby("Age Group")

# Calculate "Total Count" and "Percenatge Of Players" by age category
Unique_Members = Purchase_Data_Age_Grouped["SN"].nunique()
Percent_Of_Players = (Unique_Members/Total_Number_Of_Players)*100






# Create "Age_Data" DataFrame to hold the above results 
Age_Data = pd.DataFrame({"Total Count": Unique_Members,"Percentage Of Players": Percent_Of_Players})


# Data Munging
#change the index to none and do clean formatting 
Age_Data.index.name = None
Age_Data["Percentage Of Players"] = Age_Data["Percentage Of Players"].map("{:.2f}".format)
Age_Data



Unnamed: 0,Total Count,Percentage Of Players
<10,17,2.95
10-14,22,3.82
15-19,107,18.58
20-24,258,44.79
25-29,77,13.37
30-34,52,9.03
35-39,31,5.38
40+,12,2.08


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [129]:
# PURCHASING ANALYSIS(AGE)


# Calculate Purchase Count, Average Purchase Price, Total Purchase Value , Average Purchase Per Person
Purchase_Count = Purchase_Data_Age_Grouped["Age"].count()
Average_Purchase = Purchase_Data_Age_Grouped["Price"].mean()
Total_Purchase_Value = Purchase_Data_Age_Grouped["Price"].sum()
Average_Purchase_Person =(Purchase_Data_Age_Grouped["Price"].sum()/Unique_Members)






# Create "Purchase_Analysis_Age" DataFrame to hold the above results 
Purchase_Analysis_Age = pd.DataFrame({"Purchase Count":Purchase_Count,
                                      "Average Purchase Price":Average_Purchase,
                                      "Total Purchase Value":Total_Purchase_Value,
                                      "Average Purchase Total Per Person":Average_Purchase_Person})


# Data Munging
Purchase_Analysis_Age["Average Purchase Price"] = Purchase_Analysis_Age["Average Purchase Price"].map("${:.2f}".format)
Purchase_Analysis_Age["Total Purchase Value"]  = Purchase_Analysis_Age["Total Purchase Value"].map("${:.2f}".format)
Purchase_Analysis_Age["Average Purchase Total Per Person"] = Purchase_Analysis_Age["Average Purchase Total Per Person"].map("${:.2f}".format)
Purchase_Analysis_Age



Unnamed: 0,Purchase Count,Average Purchase Price,Total Purchase Value,Average Purchase Total Per Person
<10,23,$3.35,$77.13,$4.54
10-14,28,$2.96,$82.78,$3.76
15-19,136,$3.04,$412.89,$3.86
20-24,365,$3.05,$1114.06,$4.32
25-29,101,$2.90,$293.00,$3.81
30-34,73,$2.93,$214.00,$4.12
35-39,41,$3.60,$147.67,$4.76
40+,13,$2.94,$38.24,$3.19


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [130]:
# TOP SPENDERS


# Groupby based on "SN" 
Purchase_Data_SN_Grouped = purchase_data.groupby("SN")

# Calculate "Purchase Count", "Average Purchase Price" and "Total Purchase Value" 
Purchase_Count = Purchase_Data_SN_Grouped["Age"].count()
Average_Purchase = Purchase_Data_SN_Grouped["Price"].mean()
Total_Purchase_Value = Purchase_Data_SN_Grouped["Price"].sum()






# Create "Top Spenders" DataFrame to hold the above results
Top_Spenders = pd.DataFrame({"Purchase Count": Purchase_Count,
                             "Average Purchase Price": Average_Purchase,
                             "Total Purchase Value": Total_Purchase_Value})


# Sort column "Total Purchase Value"  in Descending Order 
Top_Spenders = Top_Spenders.sort_values("Total Purchase Value", ascending = False)

# Data Munging  
Top_Spenders["Average Purchase Price"] = Top_Spenders["Average Purchase Price"].map("${:.2f}".format)
Top_Spenders["Total Purchase Value"]  = Top_Spenders["Total Purchase Value"].map("${:.2f}".format)

#To display maximum rows
pd.set_option('display.max_rows', None)
#Top_Spenders.head()
Top_Spenders.head(10)


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$3.79,$18.96
Idastidru52,4,$3.86,$15.45
Chamjask73,3,$4.61,$13.83
Iral74,4,$3.40,$13.62
Iskadarya95,3,$4.37,$13.10
Ilarin91,3,$4.23,$12.70
Ialallo29,3,$3.95,$11.84
Tyidaim51,3,$3.94,$11.83
Lassilsala30,3,$3.84,$11.51
Chadolyla44,3,$3.82,$11.46


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [132]:
# MOST POPULAR ITEMS


# Retrieve the "Item ID", "Item Name", and "Item Price" columns (using .loc)  from DataFrame "purchase_data" 
Popular_Data = purchase_data.loc[:,["Item ID","Item Name","Price"]]

# Group by "Item ID" and "Item Name" 
Popular_Grouped = purchase_data.groupby(["Item ID","Item Name"])

# Calculate "Purchase Count", "Item Price", and "Total Purchase Value"
Purchase_Count = Popular_Grouped["Age"].count()
Item_Price = Popular_Grouped["Price"].mean()
Total_Purchase_Value = Popular_Grouped["Price"].sum()







# Create "Most_Popular_Items" DataFrame 
Most_Popular_Items = pd.DataFrame({"Purchase Count": Purchase_Count,
                                   "Item Price": Item_Price,
                                   "Total Purchase Value": Total_Purchase_Value})


# Sort columns "Purchase Count" (first) in Descending Order 
#Most_Popular_Items = Most_Popular_Items.sort_values("Purchase Count", ascending = False)
# Sort columns "Purchase Count" (first) and "Total Purchase Value" (second) in Descending Order 
Most_Popular_Items = Most_Popular_Items.sort_values(['Purchase Count','Total Purchase Value'], ascending = [False, False])

# Data Munging 
Most_Popular_Items["Total Purchase Value"]  = Most_Popular_Items["Total Purchase Value"].map("${:.2f}".format)
Most_Popular_Items["Item Price"] = Most_Popular_Items["Item Price"].map("${:.2f}".format)

#To display maximum rows
pd.set_option('display.max_rows', None)
#Most_Popular_Items.head()
Most_Popular_Items.head(10)



Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
82,Nirvana,9,$4.90,$44.10
145,Fiery Glass Crusader,9,$4.58,$41.22
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77
92,Final Critic,8,$4.88,$39.04
103,Singed Scalpel,8,$4.35,$34.80
59,"Lightning, Etcher of the King",8,$4.23,$33.84
72,Winter's Bite,8,$3.77,$30.16
60,Wolf,8,$3.54,$28.32
37,"Shadow Strike, Glory of Ending Hope",8,$3.16,$25.28


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [134]:
# MOST PROFITABLE ITEMS


# Group by "Item ID" and "Item Name" 
Popular_Grouped = purchase_data.groupby(["Item ID","Item Name"])

# Calculate "Purchase Count", "Item Price", and "Total Purchase Value"
Purchase_Count = Popular_Grouped["Age"].count()
Item_Price = Popular_Grouped["Price"].mean()
Total_Purchase_Value = Popular_Grouped["Price"].sum()






# Create "Most_Popular_Items" DataFrame 
Most_Profitable_Items = pd.DataFrame({"Purchase Count":Purchase_Count,
                                   "Item Price":Item_Price,
                                   "Total Purchase Value":Total_Purchase_Value})


# Sorting column "Total Purchase Value" in Descending Order 
Most_Profitable_Items = Most_Profitable_Items.sort_values("Total Purchase Value", ascending = False)

# Data Munging
Most_Profitable_Items["Total Purchase Value"]  = Most_Profitable_Items["Total Purchase Value"].map("${:.2f}".format)
Most_Profitable_Items["Item Price"] = Most_Profitable_Items["Item Price"].map("${:.2f}".format)

#To display maximum rows
pd.set_option('display.max_rows', None)
#Most_Profitable_Items.head()
Most_Profitable_Items.head(10)



Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
82,Nirvana,9,$4.90,$44.10
145,Fiery Glass Crusader,9,$4.58,$41.22
92,Final Critic,8,$4.88,$39.04
103,Singed Scalpel,8,$4.35,$34.80
59,"Lightning, Etcher of the King",8,$4.23,$33.84
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77
78,"Glimmer, Ender of the Moon",7,$4.40,$30.80
72,Winter's Bite,8,$3.77,$30.16
60,Wolf,8,$3.54,$28.32





# Heros Of PyMoli Data Analysis



Heroes of Pymoli is a fictional mobile game. Provided data is used to identify purchase trends of players based on gender and age group along with specific details within the mobile game such as most popular and most profitable items. As per the analysis, observable trends are-

Heroes of PyMoli is most popular among male players, 84% of the total, with a smaller but notable portion of female players at 14%. As expected majority of the revenue is contributed by male players (more than 67%).

Game is mostly played by younger players (age group between 15-29 constitutes approx. 77% of total players) and within that, it is most popular between age group 20-24 (44%).

It is notable that top ten popular and profitable items are all above average item price of \\$3.05.

Even though female players are comparatively less in number, average purchase total per female player is 10% higher (\\$4.47) compared to male players (\\$4.07).

None of the players have spent more than \\$20.00 for in game purchases.

The most popular and most profitable items are "Oathbreaker,Last Hope of the Breaking Storm", "Fiery Glass Crusader" and "Nirvana".


