### Heroes Of Pymoli Data Analysis

-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [86]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

## Player Count

* Display the total number of players


In [68]:
#how am I defining players?  SN column
#how do I get a total number of values in that column?
number_of_players = purchase_data['SN'].nunique()
players = [{'Total Number of Players': number_of_players}]
total_players = pd.DataFrame(players)
total_players

Unnamed: 0,Total Number of Players
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [69]:
#number of unique items = Item Name column (count unique)
#average price = Price column (mean)
#number of purchases = Purchase ID (total count)
#total revenue = Price column (sum)

unique_items = purchase_data['Item Name'].count()
unique_items

average_price = purchase_data['Price'].mean()
average_price

num_purchases = purchase_data['Purchase ID'].count()
num_purchases

total_revenue = purchase_data['Price'].sum()
total_revenue

my_values = [{"Number of Unique Items": unique_items, "Average Price": average_price, "Number of Purchases": num_purchases, "Total Revenue": total_revenue}]

summary_table = pd.DataFrame(my_values)
summary_table

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,780,3.050987,780,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [70]:
#Count of Females = SN column (nunique!)
female_players = purchase_data[purchase_data['Gender'] == 'Female']
female_players['SN'].nunique()

81

In [71]:
#Count of Males = Purchase ID column (nunique!), grouped by gender
male_players = purchase_data[purchase_data['Gender'] == 'Male']
male_players['SN'].nunique()

484

In [72]:
#Count of Other / Non-Disclosed  = Purchase ID column (nunique!), grouped by gender
non_binary_players = purchase_data[purchase_data['Gender'] == 'Other / Non-Disclosed']
non_binary_players['SN'].nunique()

11

In [73]:
#percentage_by_gender = number of females divided by number of players from earlier question
percent_females =  (female_players['SN'].nunique() / total_players) * 100
percent_females

Unnamed: 0,Total Number of Players
0,14.0625


In [74]:
#percentage_by_gender = number of males divided by number of players from earlier question
percent_males =  (male_players['SN'].nunique() / total_players) * 100
percent_males

Unnamed: 0,Total Number of Players
0,84.027778


In [75]:
#percentage_by_gender = number of non-binary players divided by number of players from earlier question
percent_non_binary =  (non_binary_players['SN'].nunique() / total_players) * 100
percent_non_binary

Unnamed: 0,Total Number of Players
0,1.909722


In [76]:
gender_values = [{"Gender": "Male", "Total Count": 484, "Percentage of Players": 84.03},
                 {"Gender": "Female", "Total Count": 81, "Percentage of Players": 14.01},
                 {"Gender": "Other / Non-Disclosed", "Total Count": 11, "Percentage of Players": 1.91}]
                  
gender_summary = pd.DataFrame(gender_values)
gender_summary

Unnamed: 0,Gender,Total Count,Percentage of Players
0,Male,484,84.03
1,Female,81,14.01
2,Other / Non-Disclosed,11,1.91



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [77]:
#total_purchases_by_gender = purchase_data.grouby(['Gender'])
#total_purchases_by_gender['Purchase ID'].count()

purchases_by_group = purchase_data.groupby(['Gender'])
purchases_by_group['Price'].count()

Gender
Female                   113
Male                     652
Other / Non-Disclosed     15
Name: Price, dtype: int64

In [78]:
#average purchase price = mean of price (by gender)
gender_group = purchase_data.groupby(['Gender'])
gender_group['Price'].mean()

Gender
Female                   3.203009
Male                     3.017853
Other / Non-Disclosed    3.346000
Name: Price, dtype: float64

In [79]:
#total purchase value by gender
gender_group = purchase_data.groupby(['Gender'])
gender_group['Price'].sum()

Gender
Female                    361.94
Male                     1967.64
Other / Non-Disclosed      50.19
Name: Price, dtype: float64

In [80]:
#avg. purchase total per person

#purchase total of female players / number of females
avg_total_females = (female_players['Price'].sum() / female_players['SN'].nunique())
print('Avg Purchase Females' + ' ' + str(avg_total_females))

#purchase total of male players / number of males
avg_total_males = (male_players['Price'].sum() / male_players['SN'].nunique())
print('Avg Purchase Males' + ' ' + str(avg_total_males))

#purchase total of non-binary players / number of non-binary players
avg_total_non_binary = (non_binary_players['Price'].sum() / non_binary_players['SN'].nunique())
print('Avg Purchase Non-Binary Players' + ' ' + str(avg_total_non_binary))

Avg Purchase Females 4.468395061728395
Avg Purchase Males 4.065371900826446
Avg Purchase Non-Binary Players 4.5627272727272725


In [81]:
gender_analysis = [{"Gender": "Male", "Purchase Count": 652, "Average Purchase Price": '$3.02', "Total Purchase Value": '$1,967.64', "Avg Total Purchase per Person": '$4.07'},
                 {"Gender": "Female", "Purchase Count": 113, "Average Purchase Price": '$3.20', "Total Purchase Value": '$361.94', "Avg Total Purchase per Person": '$4.47'},
                 {"Gender": "Other / Non-Disclosed", "Purchase Count": 15, "Average Purchase Price": '$3.35', "Total Purchase Value": '$50.19', "Avg Total Purchase per Person": '$4.56'}]
                  
gender_totals = pd.DataFrame(gender_analysis)
gender_totals

Unnamed: 0,Gender,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
0,Male,652,$3.02,"$1,967.64",$4.07
1,Female,113,$3.20,$361.94,$4.47
2,Other / Non-Disclosed,15,$3.35,$50.19,$4.56


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [82]:
#create bins for ages - unexpectedly, this pd.cut replaced the ages with the age bins
age_bins = [0, 10, 15, 20, 25, 30, 35, 40, 45]
age_labels = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-45"]
purchase_data["Age"] = pd.cut(purchase_data["Age"], age_bins, labels=age_labels)
purchase_data

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,15-19,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,35-39,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,20-24,Male,92,Final Critic,4.88
3,3,Chamassasya86,20-24,Male,100,Blindscythe,3.27
4,4,Iskosia90,20-24,Male,131,Fury,1.44
...,...,...,...,...,...,...,...
775,775,Aethedru70,20-24,Female,60,Wolf,3.54
776,776,Iral74,20-24,Male,164,Exiled Doomblade,1.63
777,777,Yathecal72,15-19,Male,67,"Celeste, Incarnation of the Corrupted",3.46
778,778,Sisur91,<10,Male,101,Final Critic,4.19


In [83]:
#Calculate the number of players by age group
age_grouped_data = purchase_data.groupby("Age")
players_by_age = age_grouped_data[["SN"]].nunique()
players_by_age

Unnamed: 0_level_0,SN
Age,Unnamed: 1_level_1
<10,24
10-14,41
15-19,150
20-24,232
25-29,59
30-34,37
35-39,26
40-45,7


In [98]:
#Calculate the percentages of players by age group
#percent = number of each age group divided by total number of players

percent_by_age = (age_grouped_data[["SN"]].nunique() / 576) * 100
percent_by_age

Unnamed: 0_level_0,SN
Age,Unnamed: 1_level_1
<10,4.166667
10-14,7.118056
15-19,26.041667
20-24,40.277778
25-29,10.243056
30-34,6.423611
35-39,4.513889
40-45,1.215278


In [99]:
age_demographics = pd.concat([players_by_age, percent_by_age], axis=1)
age_demographics.columns = ["Total Count", "Percentage of Players"]
print(age_demographics)

       Total Count  Percentage of Players
Age                                      
<10             24               4.166667
10-14           41               7.118056
15-19          150              26.041667
20-24          232              40.277778
25-29           59              10.243056
30-34           37               6.423611
35-39           26               4.513889
40-45            7               1.215278


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

