### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load
purchase_data_csv = "Resources/purchase_data.csv"

# Read purchasing file and store into data frame
purchase_data_df = pd.read_csv(purchase_data_csv)

purchase_data_df.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count

* Display the total number of players


In [2]:
totalplayercount = len(purchase_data_df['SN'].unique())
print ("The total number of players:" + str(totalplayercount))

The total number of players:576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
# Basic calculations 
unique_items = len(purchase_data_df['Item Name'].unique())
avg_purchase = purchase_data_df["Price"].mean()
rev = purchase_data_df["Price"].sum()

# Create new data frame
purchase_analysis = pd.DataFrame({"Number of Unique Items": [unique_items],
                                           "Average Price": [avg_purchase],
                                           "Total Revenue": [rev]})
purchase_analysis

Unnamed: 0,Number of Unique Items,Average Price,Total Revenue
0,179,3.050987,2379.77


In [4]:
purchase_data_df.mean()

Purchase ID    389.500000
Age             22.714103
Item ID         92.114103
Price            3.050987
dtype: float64

## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [5]:
# Drop duplicate values
duplicate = purchase_data_df.drop_duplicates(subset='SN', keep="first")

# Count genders
tcount = duplicate["Gender"].count()
mcount = duplicate["Gender"].value_counts()['Male']
fcount = duplicate["Gender"].value_counts()['Female']
ondcount = tcount - mcount - fcount

mpercentage = round(mcount / tcount*100)
print ("The male count is "+ str(mcount)+ " and percentage of male is " +str(mpercentage)+'%.')

The male count is 484 and percentage of male is 84.0%.


In [6]:
fpercentage = round(fcount / tcount*100)
print ("The female count is "+ str(fcount)+ " and percentage of female is " +str(fpercentage)+'%.')

The female count is 81 and percentage of female is 14.0%.


In [7]:
#Count and Percentage of Other / Non-Disclosed
ondcount = purchase_data_df.Gender.str.count("Other / Non-Disclosed").sum()
ondpercentage = round(ondcount / tcount*100)
print ("The other / non-disclosed count is "+ str(ondcount)+ " and percentage of other / non-disclosed is " +str(ondpercentage)+'%.')

The other / non-disclosed count is 15 and percentage of other / non-disclosed is 3.0%.



## Purchasing Analysis (Gender)

In [8]:
#Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender
#Create a summary data frame to hold the results
#Optional: give the displayed data cleaner formatting
#Display the summary data frame

In [9]:
# Group data and make calculations
grouped_df = purchase_data_df.groupby('Gender')
p_count = grouped_df["SN"].count()
avg_price = grouped_df["Price"].mean().map("{:.2f}".format)
avg_total_pp = grouped_df["Price"].sum().map("{:.2f}".format)

# Drop duplicates
duplicate = purchase_data_df.drop_duplicates(subset='SN', keep="first")
grouped_dup = duplicate.groupby(["Gender"])

# Create new data frame
purchase_analysis_gender = pd.DataFrame({"Purchase Count": p_count,
                              "Average Purchase Price": avg_price,
                              "Total Purchase Value": avg_total_pp})
purchase_analysis_gender

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,113,3.2,361.94
Male,652,3.02,1967.64
Other / Non-Disclosed,15,3.35,50.19


## Age Demographics

# Purchasing Analysis (Age)

In [10]:
# Create bins and labels
age_bins = [0,13, 19, 28, 100]
age_labels = ["Young", "Teen", "Young Adult","Adult"]

# Grouping
bin_df = purchase_data_df.copy()
bin_df["Age Groups"] = pd.cut(bin_df["Age"], age_bins, labels=age_labels)
group_bin = bin_df.groupby(["Age Groups"])

# Count bins for new data frame
bin_count = group_bin["SN"].count()
total_count = purchase_data_df["SN"].count()
percentage = ((bin_count / total_count) * 100).map("{:.2f}%".format)

# Create new  data frame
purchase_analysis_age = pd.DataFrame({"Total Count": bin_count,
                         "Percentage of Players": percentage})
purchase_analysis_age

Unnamed: 0_level_0,Total Count,Percentage of Players
Age Groups,Unnamed: 1_level_1,Unnamed: 2_level_1
Young,49,6.28%
Teen,138,17.69%
Young Adult,453,58.08%
Adult,140,17.95%


* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [11]:
# Additional calcuations by bin grouping
bin_count = group_bin["Age"].count()
bin_price_avg = group_bin["Price"].mean().map("{:.2f}".format)
bin_total = group_bin["Price"].sum().map("{:.2f}".format)

# Create new data frame
purchasing_analysis_age = pd.DataFrame({"Purchase Count": bin_count,
                         "Average Purchase Price": bin_price_avg,
                         "Total Purchase Value": bin_total})
purchasing_analysis_age

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
Age Groups,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Young,49,3.12,153.0
Teen,138,3.04,419.8
Young Adult,453,3.03,1371.83
Adult,140,3.11,435.14


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [12]:
# Create group and perform basic cacluations
group_sn = purchase_data_df.groupby(["SN"])
sn_count = group_sn["Item ID"].count()
sn_total = group_sn["Price"].sum()
sn_avg = (sn_total / sn_count).map("{:.2f}".format)

# Create new data frame
sn_df = pd.DataFrame({"Purchase Count": sn_count,
                         "Total Purchase Value": sn_total, 
                        "Average Purchase Price": sn_avg})

# Sort data
sn_df = sn_df.sort_values("Total Purchase Value", ascending=False)
sn_df.head()

Unnamed: 0_level_0,Purchase Count,Total Purchase Value,Average Purchase Price
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,18.96,3.79
Idastidru52,4,15.45,3.86
Chamjask73,3,13.83,4.61
Iral74,4,13.62,3.4
Iskadarya95,3,13.1,4.37


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [13]:
# Group data and perform calculations
pop_item_group = purchase_data_df.groupby(["Item ID", "Item Name"])
pop_item_count = pop_item_group["SN"].count()
pop_item_total = pop_item_group["Price"].sum()
pop_price = (pop_item_total / pop_item_count)
pop_value = (pop_item_count * pop_price)

# Create new data frame 
pop_df = pd.DataFrame({"Purchase Count": pop_item_count,
                          "Item Price": pop_price,
                          "Total Purchase Value": pop_value})

# Sort
pop_df = pop_df.sort_values("Purchase Count", ascending=False)
pop_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,4.23,50.76
145,Fiery Glass Crusader,9,4.58,41.22
108,"Extraction, Quickblade Of Trembling Hands",9,3.53,31.77
82,Nirvana,9,4.9,44.1
19,"Pursuit, Cudgel of Necromancy",8,1.02,8.16


# Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [14]:
# Group data and perform calculations
prof_item_group = purchase_data_df.groupby(["Item ID", "Item Name"])
prof_count = prof_item_group["Gender"].count()
prof_item_total = prof_item_group["Price"].sum()
prof_price = (prof_item_total / prof_count)

# Create new data frame
prof_df = pd.DataFrame({"Purchase Count": prof_count,
                          "Item Price": prof_price,
                          "Total Purchase Value": prof_item_total})

# Sort
prof_df = prof_df.sort_values("Total Purchase Value", ascending=False)
prof_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,4.23,50.76
82,Nirvana,9,4.9,44.1
145,Fiery Glass Crusader,9,4.58,41.22
92,Final Critic,8,4.88,39.04
103,Singed Scalpel,8,4.35,34.8
