### Heroes Of Pymoli Data Analysis
* Of the 576 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%). About 2% of the players did not disclose their gender.

* Our peak age demographic falls between 20-24 with secondary groups falling between 15-19 and 25-29.  

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

## Player Count

* Display the total number of players


In [2]:
total_players = len(purchase_data['SN'].value_counts())
table_players_total = pd.DataFrame({"Total Players": [total_players] })
table_players_total

Unnamed: 0,Total Players
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
items_unique = len(purchase_data['Item ID'].value_counts())
items_unique
table = pd.DataFrame({"Number of Unique Items":[items_unique]})
#table['Number of Unique Items'] = pd.DataFrame(['items_unique'])
table
#table['Average Price', 'Total Revenue'] = pd.Series(["{:,.2f}%".format])

Unnamed: 0,Number of Unique Items
0,179


In [4]:
revenue_total = purchase_data['Price'].sum()
revenue_total

2379.77

In [5]:
price_average = purchase_data['Price'].mean()
price_average

3.0509871794871795

In [6]:
table['Average Price'] = price_average
table

Unnamed: 0,Number of Unique Items,Average Price
0,179,3.050987


In [7]:
# Note: the purchase IDs are not always increasing by 1. 
# Note: when view csv file in excel, there are only about 500rows, - not sure why value_counts() outputs 780.
# 779 (counting from 0) is the maximum purchase ID in the given file.
purchases_total = purchase_data['Purchase ID'].value_counts()
table['Number of Purchases'] = len(purchases_total)
table

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases
0,179,3.050987,780


In [8]:
revenue_total = purchase_data['Price'].sum()
table['Total Revenue'] = revenue_total
table

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,179,3.050987,780,2379.77


In [9]:
# Add formatting:
#table['Average Price'] = pd.Series(["${:.2f}".format])
#table['Average Price'] = pd.Series(["${:,.2f}".format])
#df['var3'] = pd.Series(["{0:.2f}%".format(val * 100) for val in df['var3']], index = df.index)
#format_mapping={'Average Price': '${:,.2f}'}
#df['var2'] = pd.Series([round(val, 2) for val in df['var2']], index = df.index)
table['Average Price'] = pd.Series(["${:,.2f}".format(price_average)])
table['Total Revenue'] = pd.Series(["${:,.2f}".format(revenue_total)])
table

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,179,$3.05,780,"$2,379.77"


In [11]:
# Example output below

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,179,$3.05,780,"$2,379.77"


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [12]:
players_by_gender = purchase_data['Gender'].value_counts()
players_by_gender
table_gender = pd.DataFrame({'Total Count': (players_by_gender)})
table_gender

Unnamed: 0,Total Count
Male,652
Female,113
Other / Non-Disclosed,15


In [13]:
players_unique = purchase_data['Gender'].value_counts()
#print(players_unique)

players_perc = purchase_data['Gender'].value_counts(normalize=True) * 100
players_perc
table_gender = pd.DataFrame({'Total Count': (players_by_gender)})
table_gender
#print(players_by_gender)
#players_percentage_by_gender = (players_by_gender / players_unique) * 100
#players_percentage_by_gender

Unnamed: 0,Total Count
Male,652
Female,113
Other / Non-Disclosed,15


In [14]:
players_perc = (purchase_data['Gender'].value_counts()/purchase_data['Gender'].count())
players_perc
table_gender['Percentage of Players'] = players_perc
table_gender
#print(players_perc.dtype)


Unnamed: 0,Total Count,Percentage of Players
Male,652,0.835897
Female,113,0.144872
Other / Non-Disclosed,15,0.019231


In [15]:
# Format the percentage column
table_gender['Percentage of Players'] = pd.Series(["{0:.2f}%".format(val * 100) for val in table_gender['Percentage of Players']], index = table_gender.index)
table_gender

Unnamed: 0,Total Count,Percentage of Players
Male,652,83.59%
Female,113,14.49%
Other / Non-Disclosed,15,1.92%


In [4]:
# example b

Unnamed: 0,Total Count,Percentage of Players
Male,484,84.03%
Female,81,14.06%
Other / Non-Disclosed,11,1.91%



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [29]:
purchase_data_grouped_by_gender = purchase_data.groupby('Gender').count()
purchase_data_grouped_by_gender

Unnamed: 0_level_0,Purchase ID,SN,Age,Item ID,Item Name,Price
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Female,113,113,113,113,113,113
Male,652,652,652,652,652,652
Other / Non-Disclosed,15,15,15,15,15,15


In [31]:
# Redefine total players count again here:
players_total = purchase_data['SN'].count()
players_total

780

In [39]:
grouped_price_mean = purchase_data.groupby('Gender')['Price'].mean()
grouped_price_mean

Gender
Female                   3.203009
Male                     3.017853
Other / Non-Disclosed    3.346000
Name: Price, dtype: float64

In [40]:
test1_df = pd.DataFrame({'Average Purchase Price': grouped_price_mean})
test1_df

Unnamed: 0_level_0,Average Purchase Price
Gender,Unnamed: 1_level_1
Female,3.203009
Male,3.017853
Other / Non-Disclosed,3.346


In [45]:
grouped_purchase_count = purchase_data.groupby('Gender')['Purchase ID'].count()
grouped_purchase_count

Gender
Female                   113
Male                     652
Other / Non-Disclosed     15
Name: Purchase ID, dtype: int64

In [46]:
test1_df = pd.DataFrame({'Purchase Count': grouped_purchase_count,
                         'Average Purchase Price': grouped_price_mean,
                         })
test1_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,113,3.203009
Male,652,3.017853
Other / Non-Disclosed,15,3.346


In [47]:
grouped_purchase_value_total = purchase_data.groupby('Gender')['Price'].sum()
grouped_purchase_value_total

Gender
Female                    361.94
Male                     1967.64
Other / Non-Disclosed      50.19
Name: Price, dtype: float64

In [54]:
test1_df = pd.DataFrame({'Purchase Count': grouped_purchase_count,
                         'Average Purchase Price': grouped_price_mean,
                         'Total Purchase Value': grouped_purchase_value_total
                         })
test1_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,113,3.203009,361.94
Male,652,3.017853,1967.64
Other / Non-Disclosed,15,3.346,50.19


In [55]:
grouped_purchase_average_per_person = grouped_purchase_value_total / players_total
grouped_purchase_average_per_person

Gender
Female                   0.464026
Male                     2.522615
Other / Non-Disclosed    0.064346
Name: Price, dtype: float64

# Below I don't understand what the last column means. 
## When trying to calculate total purchase over purchase count, then get the average price, which sounds logical. 
## Then, what is it?

In [66]:
test1_df = pd.DataFrame({'Purchase Count': grouped_purchase_count,
                         'Average Purchase Price': grouped_price_mean,
                         'Total Purchase Value': grouped_purchase_value_total,
                         'Avg Total Purchase per Person': grouped_purchase_average_per_person
                         })
test1_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,3.203009,361.94,0.464026
Male,652,3.017853,1967.64,2.522615
Other / Non-Disclosed,15,3.346,50.19,0.064346


In [67]:
test1_df['Average Purchase Price'] = test1_df["Average Purchase Price"].map("${:,.2f}".format)
#test1_df['Total Purchase Value'] = test1_df["Total Purchase Value"].map("${:,.2f}".format)
#test1_df['Avg Total Purchase per Person'] = test1_df["Avg Total Purchase per Person"].map("${:,.2f}".format)
#Purch_Anal_Gen["Total Purchase Value"] = Purch_Anal_Gen["Total Purchase Value"].map("${:.2f}".format)
#test1_df['Total Purchase Value'] = pd.Series(["${:,.2f}".format(grouped_purchase_value_total)])
#test1_df['Avg Total Purchase per Person'] = pd.Series(["${:,.2f}".format(grouped_purchase_average_per_person)])
test1_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,$3.20,361.94,0.464026
Male,652,$3.02,1967.64,2.522615
Other / Non-Disclosed,15,$3.35,50.19,0.064346


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,$3.20,$361.94,$4.47
Male,652,$3.02,"$1,967.64",$4.07
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


Unnamed: 0,Total Count,Percentage of Players
<10,17,2.95%
10-14,22,3.82%
15-19,107,18.58%
20-24,258,44.79%
25-29,77,13.37%
30-34,52,9.03%
35-39,31,5.38%
40+,12,2.08%


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Age Ranges,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,23,$3.35,$77.13,$4.54
10-14,28,$2.96,$82.78,$3.76
15-19,136,$3.04,$412.89,$3.86
20-24,365,$3.05,"$1,114.06",$4.32
25-29,101,$2.90,$293.00,$3.81
30-34,73,$2.93,$214.00,$4.12
35-39,41,$3.60,$147.67,$4.76
40+,13,$2.94,$38.24,$3.19


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$3.79,$18.96
Idastidru52,4,$3.86,$15.45
Chamjask73,3,$4.61,$13.83
Iral74,4,$3.40,$13.62
Iskadarya95,3,$4.37,$13.10


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13,$4.61,$59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
145,Fiery Glass Crusader,9,$4.58,$41.22
132,Persuasion,9,$3.22,$28.99
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13,$4.61,$59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
82,Nirvana,9,$4.90,$44.10
145,Fiery Glass Crusader,9,$4.58,$41.22
103,Singed Scalpel,8,$4.35,$34.80
