### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_df = pd.read_csv(file_to_load)
purchase_df

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44
...,...,...,...,...,...,...,...
775,775,Aethedru70,21,Female,60,Wolf,3.54
776,776,Iral74,21,Male,164,Exiled Doomblade,1.63
777,777,Yathecal72,20,Male,67,"Celeste, Incarnation of the Corrupted",3.46
778,778,Sisur91,7,Male,92,Final Critic,4.19


## Player Count

* Display the total number of players


In [2]:
#Count total number of players 
player_count = len(purchase_df["SN"].value_counts())

#Create a data frame for to house the the output
player_counts = pd.DataFrame({"Total Player":[player_count]})

#Display the summary
player_counts

Unnamed: 0,Total Player
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
#Calculate Unique Items, Average Price, Number of Purchasesand and Total Revenue
unique = len(purchase_df["Item Name"].unique())
av_price = purchase_df["Price"].mean()
purchases = purchase_df["Purchase ID"].count()
total_rev = purchase_df["Price"].sum()

#House the outpts in a data frame
summary_df =  pd.DataFrame({"Number of Unique Items": [unique],
                            "Average Price": [av_price],
                            "Number of Purchases": [purchases],
                            "Total Revenue": [total_rev]})

#Format the outputs
summary_df["Average Price"]= summary_df["Average Price"].map("${:,.2f}".format)
summary_df["Total Revenue"]= summary_df["Total Revenue"].map("${:,.2f}".format)


#Display the data frame
summary_df

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,179,$3.05,780,"$2,379.77"


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [4]:
#Grab the necessary data from the SN & Gender columns
genders = purchase_df[["SN", "Gender"]]
gender_df = genders.drop_duplicates()

#Calculate the male, female, and other gender counts
gender_counts = gender_df["Gender"].value_counts()
Totals = gender_counts.sum()

#Calculate the 
Male_per = gender_counts[0]/Totals*100
Female_per = gender_counts[1]/Totals*100
Other_per = gender_counts[2]/Totals*100

#Summarize and format the outputs
summary_demographics = pd.DataFrame({"Total count":[gender_counts[0], gender_counts[1], gender_counts[2]],
     "Percentage of Players":[Male_per, Female_per, Other_per]})
summary_demographics.index = (["Male", "Female", "Others/Non-Disclosed"])
summary_demographics["Percentage of Players"] = summary_demographics["Percentage of Players"].map("{:.2f}%".format)

#Display the summary of the outputs
summary_demographics

Unnamed: 0,Total count,Percentage of Players
Male,484,84.03%
Female,81,14.06%
Others/Non-Disclosed,11,1.91%



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [5]:
#Group by the Gender column
grouped_gender = purchase_df.groupby(["Gender"])

#Calculate Purchase Count, Average Purchase Price, Total Purchase Value, & Avg Total Purchase per Person
purchase_count = grouped_gender ["SN"].count()
average_pp = grouped_gender ["Price"].mean()
purchase_value = grouped_gender ["Price"].sum()
average_total_pp = purchase_value/gender_counts

#Create the summary data frame
summary_df = pd.DataFrame({"Purchase Count": purchase_count,
                            "Average Purchase Price": average_pp,
                            "Total Purchase Value": purchase_value,
                            "Avg Total Purchase per Person": average_total_pp})

#Format the data frame
summary_df.style.format({'Average Purchase Price': '${:,.2f}',
                           'Total Purchase Value': '${:,.2f}',
                           'Avg Total Purchase Per Person': '${:,.2f}'})
#Display the data frame
summary_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,3.203009,361.94,4.468395
Male,652,3.017853,1967.64,4.065372
Other / Non-Disclosed,15,3.346,50.19,4.562727


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [6]:
#Establish the bins for ages
age_bin = [0, 9.9, 14.9, 19.9, 24.9, 29.9, 34.9, 39.9, 999.9]
age_range = ['<10','10-14','15-19','20-24','25-29','30-34','35-39','40+']

#Use the bins to categorize the players
purchase_df[""] = pd.cut(purchase_df["Age"], age_bin, labels=age_range)

#Group by the age demographics
age_dem = purchase_df.groupby("")
count = age_dem.agg({'SN': "nunique"})

#Calculate the player count and the percentage
player_count = count.sum()
player_percentage = round((count/player_count)*100,2)

#Set indices
count = count.reset_index()                                                
player_percentage = player_percentage.reset_index()

#Merge and summarize the data frame 
summary2_df = count.merge(player_percentage, on="")
summary2_df.set_index("", inplace = True) 

#Rename columns
summary2_df = summary2_df.rename(columns={"": " ",
                                        "SN_x":"Total Count",
                                        "SN_y": "Percentage of Players"})
#Display the data frame
summary2_df

Unnamed: 0,Total Count,Percentage of Players
,,
<10,17.0,2.95
10-14,22.0,3.82
15-19,107.0,18.58
20-24,258.0,44.79
25-29,77.0,13.37
30-34,52.0,9.03
35-39,31.0,5.38
40+,12.0,2.08


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [7]:
#Calculate the Purchase Count, Average Purchase Price, & Total Purchase Value
p_count= age_dem["Purchase ID"].count()
ave_price = round(age_dem["Price"].mean(),2)
total_pval = age_dem["Price"].sum()

#Calculate the number of people of people in each age group
ppl_per_group = summary2_df[("Total Count")]

#Calculate and round the Average Purchase Total Per Person
average_total_pp = round((total_pval /ppl_per_group),2)

#Display the summary of the outputs
summary3_df = pd.DataFrame({"Purchase Count": p_count,
                            "Average Purchase Price": ave_price,
                            "Total Purchase Value": total_pval,
                            "Avg Total Purchase per Person": average_total_pp})
#Format the summary data frame
summary3_df.style.format({"Average Purchase Price": "${:,f}", 
                          "Total Purchase Value": "${:,2f}", 
                          "Avg Total Purchase Per Person": "${:,f}"})
#Display the summary of the outputs
summary3_df

Unnamed: 0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
,,,,
<10,23.0,3.35,77.13,4.54
10-14,28.0,2.96,82.78,3.76
15-19,136.0,3.04,412.89,3.86
20-24,365.0,3.05,1114.06,4.32
25-29,101.0,2.9,293.0,3.81
30-34,73.0,2.93,214.0,4.12
35-39,41.0,3.6,147.67,4.76
40+,13.0,2.94,38.24,3.19


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [8]:
#Group data by the "SN" column
spenders = purchase_df.groupby(["SN"])
#Calculate Purchase Count, Average Purchase Price, & Total Purchase Value
purchase_count = spenders['SN'].count()
ave_purchase = round(spenders['Price'].mean(),2)
total_pval = spenders['Price'].sum()

#Summarize and format the outputs
spender_df = pd.DataFrame({'Purchase Count': purchase_count,
                        'Average Purchase Price': ave_purchase,
                        'Total Purchase Value': total_pval})
spender_dfSort = spender_df.sort_values(by = 'Total Purchase Value', ascending = False)

#Display first 5 lines
spender_dfSort.head()


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,3.79,18.96
Idastidru52,4,3.86,15.45
Chamjask73,3,4.61,13.83
Iral74,4,3.4,13.62
Iskadarya95,3,4.37,13.1


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [11]:
#Grad Item ID, Item Name, and Item Price columns
popular_items = purchase_df[['Item ID', 'Item Name', 'Price']]

#Group by columns Item ID and Item Name
id_groupby = popular_items.groupby(['Item ID', 'Item Name'])

#Calculate Purchase Count, Item Price, and Total Purchase Value
purchase_count = id_groupby['Item Name'].count()
total_purchase = id_groupby['Price'].sum()
item_price = round((total_purchase/purchase_count),2)

#Create the summary data frame and sort the data
summary_id = pd.DataFrame({"Purchase Count": purchase_count, "Item Price": item_price, "Total Purchase Value": total_purchase})
id_sort = summary_id.sort_values("Purchase Count", ascending=False)

#Display the data frame
id_sort.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13,4.61,59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12,4.23,50.76
145,Fiery Glass Crusader,9,4.58,41.22
132,Persuasion,9,3.22,28.99
108,"Extraction, Quickblade Of Trembling Hands",9,3.53,31.77


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [10]:
#Sort by Total Purchase Value
id_sort = id.sort_values("Total Purchase Value", ascending=False)

#Display first 5 lines
id_sort.head()

AttributeError: 'builtin_function_or_method' object has no attribute 'sort_values'