### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

In [27]:
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count

* Display the total number of players


In [3]:
total_players = purchase_data["SN"].value_counts()
total_players_count = total_players.count()
total_player_analysis = pd.DataFrame({"Total Players":[total_players_count]})
total_player_analysis

Unnamed: 0,Total Players
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [4]:
unique_item = purchase_data["Item Name"].value_counts()
unique_item_count = unique_item.count()
average_price = round(purchase_data["Price"].mean(),2)
total_reveune = purchase_data["Price"].sum()
total_purchase = purchase_data["Item Name"].count()

In [5]:
purchasing_analysis = pd.DataFrame({"Number of Unquie Item":[unique_item_count],
                                    "Average Price":[average_price], "Number of Purchases":[total_purchase],
                                   "Total Revenue":[total_reveune]})
purchasing_analysis

Unnamed: 0,Number of Unquie Item,Average Price,Number of Purchases,Total Revenue
0,179,3.05,780,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [6]:
#remove duplicated players
gender_demo_clean = purchase_data.drop_duplicates(subset ="SN")

#get total head counts of the new df
total_count = len(gender_demo_clean["SN"].unique())

#calculate # of male players
gender_male = gender_demo_clean.loc[gender_demo_clean["Gender"]=="Male"]
gender_male_count = len(gender_male["SN"].unique())
percent_male = round((gender_male_count/total_count)*100,2)

#calculate # of female players
gender_female = gender_demo_clean.loc[gender_demo_clean["Gender"] == "Female"]
gender_female_count = len(gender_female["SN"].unique())
percent_female = round((gender_female_count/total_count)*100,2)

#calculate # of other players
gender_other = gender_demo_clean.loc[gender_demo_clean["Gender"] == "Other / Non-Disclosed"]
gender_other_count = len(gender_other["SN"].unique())
percent_other = round((gender_other_count/total_count)*100,2)

#percent_gender = gender_demo_clean["Gender"].value_counts(normalize=True) * 100
#gender_count = gender_demo_clean["Gender"].value_counts()
#gender_analysis = pd.DataFrame({"Total Counts":[gender_count], "Percentage of Players":[percent_gender]})

#create a df to contain all values
percent_gender = pd.DataFrame({"Total Counts": [gender_male_count, gender_female_count, gender_other_count],
                               "Percentage of Players": [percent_male,percent_female,percent_other],
                               "Gender": ["Male","Female","Other"]})
percent_gender = percent_gender.set_index("Gender")
percent_gender

Unnamed: 0_level_0,Total Counts,Percentage of Players
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Male,484,84.03
Female,81,14.06
Other,11,1.91



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [7]:
male = purchase_data.loc[purchase_data["Gender"] == "Male"]
female = purchase_data.loc[purchase_data["Gender"] == "Female"]
other = purchase_data.loc[purchase_data["Gender"] == "Other / Non-Disclosed"]

purchase_male = len(male["SN"])
purchase_female = len(female["SN"])
purchase_other = len(other["SN"])

averge_male = round(male["Price"].mean(),2)
average_female = round(female["Price"].mean(),2)
average_other = round(other["Price"].mean(),2)

total_male = male["Price"].sum()
total_female = female["Price"].sum()
total_other = other["Price"].sum()

av_total_male = round(total_male/gender_male_count,2)
av_total_female = round(total_female / gender_female_count,2)
av_total_other = round(total_other / gender_other_count,2)

summary_table = pd.DataFrame({"Purchase Count": [purchase_male, purchase_female, purchase_other],
                                "Average Purchase Price": [averge_male, average_female, average_other],
                            "Total Purchase Value": [total_male, total_female, total_other],
                            "Ave Total Purchase Per Person":[av_total_male, av_total_female, av_total_other],
                             "Gender": ["Male","Female","Other"]})

summary_table = summary_table.set_index("Gender")

summary_table

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Ave Total Purchase Per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Male,652,3.02,1967.64,4.07
Female,113,3.2,361.94,4.47
Other,15,3.35,50.19,4.56


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [37]:
bins = [0,9,14,19,24,29,34,39,60]
groups = [">10", "10-14","15-19","20-24","25-29","30-34","35-39","40+",]
gender_demo_clean["Age Summary"] = pd.cut(gender_demo_clean["Age"], bins = bins, labels = groups)

group1 = gender_demo_clean.groupby(["Age Summary"]).get_group(">10")
group1_count = len(group1["SN"])
group1_percent = round(group1_count/total_count *100 ,2)

group2 = gender_demo_clean.groupby(["Age Summary"]).get_group("10-14")
group2_count = len(group2["SN"])
group2_percent = round(group2_count/total_count *100 ,2)

group3 = gender_demo_clean.groupby(["Age Summary"]).get_group("15-19")
group3_count = len(group3["SN"])
group3_percent = round(group3_count/total_count *100,2)

group4 = gender_demo_clean.groupby(["Age Summary"]).get_group("20-24")
group4_count = len(group4["SN"])
group4_percent = round(group4_count/total_count *100,2)

group5 = gender_demo_clean.groupby(["Age Summary"]).get_group("25-29")
group5_count = len(group5["SN"])
group5_percent = round(group5_count/total_count *100,2)

group6 = gender_demo_clean.groupby(["Age Summary"]).get_group("30-34")
group6_count = len(group6["SN"])
group6_percent = round(group6_count/total_count *100,2)

group7 = gender_demo_clean.groupby(["Age Summary"]).get_group("35-39")
group7_count = len(group7["SN"])
group7_percent = round(group7_count/total_count *100,2)

group8 = gender_demo_clean.groupby(["Age Summary"]).get_group("40+")
group8_count = len(group8["SN"])
group8_percent = round(group8_count/total_count *100,2)

age_analysis = pd.DataFrame({"Age Summary":groups,
                             "Total Player Count":[group1_count,group2_count,group3_count,group4_count,group5_count,
                                                  group6_count,group7_count,group8_count],
                             "Percentage of Players": [group1_percent,group2_percent,group3_percent,group4_percent,
                                                    group5_percent,group6_percent,group7_percent,group8_percent]})

age_analysis = age_analysis.set_index("Age Summary")

age_analysis

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0_level_0,Total Player Count,Percentage of Players
Age Summary,Unnamed: 1_level_1,Unnamed: 2_level_1
>10,17,2.95
10-14,22,3.82
15-19,107,18.58
20-24,258,44.79
25-29,77,13.37
30-34,52,9.03
35-39,31,5.38
40+,12,2.08


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [42]:
purchase_data["Age Summary"] = pd.cut(purchase_data["Age"],bins=bins, labels = groups)

#calculate values for players who are ">10"
purchase_group1 = purchase_data.groupby(["Age Summary"]).get_group(">10")
purchase_group1_count = len(purchase_group1["SN"])
purchase_group1_total_value = purchase_group1["Price"].sum()
purchase_group1_aver_price = round(purchase_group1_total_value/purchase_group1_count,2)
purchase_group1_total_price = round(purchase_group1_total_value/group1_count,2)

#calculate values for who are "10-14"
purchase_group2 = purchase_data.groupby(["Age Summary"]).get_group("10-14")
purchase_group2_count = len(purchase_group2["SN"])
purchase_group2_total_value = purchase_group2["Price"].sum()
purchase_group2_aver_price = round(purchase_group2_total_value/purchase_group2_count,2)
purchase_group2_total_price = round(purchase_group2_total_value/group2_count,2)

#calculate values for who are "15-19"
purchase_group3 = purchase_data.groupby(["Age Summary"]).get_group("15-19")
purchase_group3_count = len(purchase_group3["SN"])
purchase_group3_total_value = purchase_group3["Price"].sum()
purchase_group3_aver_price = round(purchase_group3_total_value/purchase_group3_count,2)
purchase_group3_total_price = round(purchase_group3_total_value/group3_count,2)

#calculate values for who are "20-24"
purchase_group4 = purchase_data.groupby(["Age Summary"]).get_group("20-24")
purchase_group4_count = len(purchase_group4["SN"])
purchase_group4_total_value = purchase_group4["Price"].sum()
purchase_group4_aver_price = round(purchase_group4_total_value/purchase_group4_count,2)
purchase_group4_total_price = round(purchase_group4_total_value/group4_count,2)

#calculate values for who are "25-29"
purchase_group5 = purchase_data.groupby(["Age Summary"]).get_group("25-29")
purchase_group5_count = len(purchase_group5["SN"])
purchase_group5_total_value = purchase_group5["Price"].sum()
purchase_group5_aver_price = round(purchase_group5_total_value/purchase_group5_count,2)
purchase_group5_total_price = round(purchase_group5_total_value/group5_count,2)

#calculate values for who are "30-34"
purchase_group6 = purchase_data.groupby(["Age Summary"]).get_group("30-34")
purchase_group6_count = len(purchase_group6["SN"])
purchase_group6_total_value = purchase_group6["Price"].sum()
purchase_group6_aver_price = round(purchase_group6_total_value/purchase_group6_count,2)
purchase_group6_total_price = round(purchase_group6_total_value/group6_count,2)

#calculate values for who are "35-39"
purchase_group7 = purchase_data.groupby(["Age Summary"]).get_group("35-39")
purchase_group7_count = len(purchase_group7["SN"])
purchase_group7_total_value = purchase_group7["Price"].sum()
purchase_group7_aver_price = round(purchase_group7_total_value/purchase_group7_count,2)
purchase_group7_total_price = round(purchase_group7_total_value/group7_count,2)

#calculate values for who are "40+"
purchase_group8 = purchase_data.groupby(["Age Summary"]).get_group("40+")
purchase_group8_count = len(purchase_group8["SN"])
purchase_group8_total_value = purchase_group8["Price"].sum()
purchase_group8_aver_price = round(purchase_group8_total_value/purchase_group8_count,2)
purchase_group8_total_price = round(purchase_group8_total_value/group8_count,2)

purchase_analysis = pd.DataFrame({"Age Summary":groups,
                                 "Purchase Count":[purchase_group1_count,purchase_group2_count,purchase_group3_count,
                                                  purchase_group4_count,purchase_group5_count,purchase_group6_count,
                                                  purchase_group7_count,purchase_group8_count],
                                 "Average Purchase Price":[purchase_group1_aver_price,purchase_group2_aver_price,
                                                          purchase_group3_aver_price,purchase_group4_aver_price,
                                                          purchase_group5_aver_price,purchase_group6_aver_price,
                                                          purchase_group7_aver_price,purchase_group8_aver_price],
                                 "Total Purchase Value":[purchase_group1_total_value,purchase_group2_total_value,
                                                        purchase_group3_total_value,purchase_group4_total_value,
                                                        purchase_group5_total_value, purchase_group6_total_value,
                                                        purchase_group7_total_value, purchase_group7_total_value],
                                 "Avg Total Purchase Per Person":[purchase_group1_total_price,purchase_group2_total_price,
                                                                 purchase_group3_total_price,purchase_group4_total_price,
                                                                 purchase_group5_total_price,purchase_group6_total_price,
                                                                 purchase_group7_total_price,purchase_group8_total_price]})

purchase_analysis = purchase_analysis.set_index("Age Summary")

purchase_analysis

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase Per Person
Age Summary,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
>10,23,3.35,77.13,4.54
10-14,28,2.96,82.78,3.76
15-19,136,3.04,412.89,3.86
20-24,365,3.05,1114.06,4.32
25-29,101,2.9,293.0,3.81
30-34,73,2.93,214.0,4.12
35-39,41,3.6,147.67,4.76
40+,13,2.94,147.67,3.19


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

