### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

## Player Count

* Display the total number of players


In [35]:
purchase_data.columns

Index(['Purchase ID', 'SN', 'Age', 'Gender', 'Item ID', 'Item Name', 'Price'], dtype='object')

In [36]:
# Identify incomplete rows (all item counts are the same, no missing data)
purchase_data.count()

Purchase ID    780
SN             780
Age            780
Gender         780
Item ID        780
Item Name      780
Price          780
dtype: int64

In [37]:
# Check data types
purchase_data.dtypes

Purchase ID      int64
SN              object
Age              int64
Gender          object
Item ID          int64
Item Name       object
Price          float64
dtype: object

In [38]:
# Count of unique values in "SN" i.e. "Screen Name"
player_count = purchase_data["SN"].value_counts()
player_count = player_count.count()
player_count

576

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [6]:
# Count of unique games
item_count = purchase_data["Item ID"].value_counts()
item_count = item_count.count()
item_count

In [7]:
# Average price per purchase
avg_price = purchase_data["Price"].mean()

# Format to currency
avg_price = round(avg_price,2)
avg_price

In [8]:
# Number of purchases
purchase_count = purchase_data["Purchase ID"].value_counts()
purchase_count  = purchase_count.count()
purchase_count 

780

In [9]:
# Total Revenue
total_revenue = purchase_data["Price"].sum()
total_revenue

2379.77

In [10]:
# Data Frame for summary of above calculations 
summary_df = pd.DataFrame({
    "Number of Unique Items":[item_count],
    "Average Price":[avg_price], # HOW DO I FORMAT AVERAGE PRICE TO $0.00?
    "Number of Purchases":[purchase_count],
    "Total Revenue":[total_revenue]
})
summary_df

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,179,3.05,780,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [11]:
# Group by "Gender"
gender_group = purchase_data.groupby("Gender")
gender_group

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001E9E466AB50>

In [12]:
# Player count (i.e "SN") by "Gender" 
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.nunique.html 
# https://stackoverflow.com/questions/38309729/count-unique-values-with-pandas-per-groups
gender_count = gender_group.nunique()["SN"]
gender_count

Gender
Female                    81
Male                     484
Other / Non-Disclosed     11
Name: SN, dtype: int64

In [13]:
# Create a Data Frame of gender count and percentage of players
gender_df = pd.DataFrame({
    "Total Count": gender_count,
    "Percentage of Players": gender_count/gender_count.sum()*100
})
gender_df

Unnamed: 0_level_0,Total Count,Percentage of Players
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,81,14.0625
Male,484,84.027778
Other / Non-Disclosed,11,1.909722


In [14]:
# Sort by Total Count, in descending order
gender_df = gender_df.sort_values("Total Count", ascending=False)
gender_df

Unnamed: 0_level_0,Total Count,Percentage of Players
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Male,484,84.027778
Female,81,14.0625
Other / Non-Disclosed,11,1.909722



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [29]:
# Recall variable for purchase_count and calculate by gender_group
purchase_count = gender_group["Purchase ID"]
purchase_count = purchase_count.count()
purchase_count

Gender
Female                   113
Male                     652
Other / Non-Disclosed     15
Name: Purchase ID, dtype: int64

In [31]:
# Recall variable for avg_price and calculate by gender_group
avg_price = gender_group["Price"].mean()
avg_price

Gender
Female                   3.203009
Male                     3.017853
Other / Non-Disclosed    3.346000
Name: Price, dtype: float64

In [32]:
# Recall variable for total_revenue and calculate by gender_group
total_revenue = gender_group["Price"].sum()
total_revenue

Gender
Female                    361.94
Male                     1967.64
Other / Non-Disclosed      50.19
Name: Price, dtype: float64

In [48]:
# Create variable for avg_player_purchase using total_revenue and gender_count (distinct players)
avg_player_purchase = total_revenue/gender_count
avg_player_purchase

Gender
Female                   4.468395
Male                     4.065372
Other / Non-Disclosed    4.562727
dtype: float64

In [55]:
# Create a data to summarize purchases analysis
purchase_df = pd.DataFrame({
    "Purchase (count)": purchase_count,
    "Purchase (avg)": avg_price,
    "Purchase (total)": total_revenue,
    "Purchase (avg per player)": avg_player_purchase
})
purchase_df

Unnamed: 0_level_0,Purchase (count),Purchase (avg),Purchase (total),Purchase (avg per player)
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,3.203009,361.94,4.468395
Male,652,3.017853,1967.64,4.065372
Other / Non-Disclosed,15,3.346,50.19,4.562727


In [56]:
# Format to currency
purchase_df.style.format({
    "Purchase (avg)":"${:,.2f}",
    "Purchase (total)":"${:,.2f}",
    "Purchase (avg per player)":"${:,.2f}"
                         })

Unnamed: 0_level_0,Purchase (count),Purchase (avg),Purchase (total),Purchase (avg per player)
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,$3.20,$361.94,$4.47
Male,652,$3.02,"$1,967.64",$4.07
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [66]:
# Reference the "Age" column within the purchase_data DataFrame
purchase_data["Age"].head()

0    20
1    40
2    24
3    24
4    23
Name: Age, dtype: int64

In [71]:
# Set up bins
bins = [0,10,14,19,24,29,34,39,40]
bins

[0, 10, 14, 19, 24, 29, 34, 39, 40]

In [72]:
# Create labels for bins
bin_labels = [" 0 < 10","10 - 14","15 - 19","20 - 24","25 - 29","30 - 34","35 - 39","40 +"]
bin_labels

[' 0 < 10',
 '10 - 14',
 '15 - 19',
 '20 - 24',
 '25 - 29',
 '30 - 34',
 '35 - 39',
 '40 +']

In [79]:
# Slice the data (pd.cut) and place it into bins
pd.cut(purchase_data["Age"], bins, labels=bin_labels).head()

0    20 - 24
1       40 +
2    20 - 24
3    20 - 24
4    20 - 24
Name: Age, dtype: category
Categories (8, object): [0 < 10 < 10 - 14 < 15 - 19 < 20 - 24 < 25 - 29 < 30 - 34 < 35 - 39 < 40 +]

In [80]:
# Place series into a new column inside purchase_data (DataFrame)
purchase_data["Age Group"] = pd.cut(purchase_data["Age"], bins, labels=bin_labels)
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price,Age Group
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53,20 - 24
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56,40 +
2,2,Ithergue48,24,Male,92,Final Critic,4.88,20 - 24
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27,20 - 24
4,4,Iskosia90,23,Male,131,Fury,1.44,20 - 24


In [81]:
# Create a GroupBy object for "Age"
age_group = purchase_data.groupby("Age Group")
age_group

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001E9E574B790>

In [82]:
# Count the rows (purchases) that fall into each bin
age_group["Age Group"].count()

Age Group
 0 < 10     32
10 - 14     19
15 - 19    136
20 - 24    365
25 - 29    101
30 - 34     73
35 - 39     41
40 +         6
Name: Age Group, dtype: int64

## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [21]:
# Starter

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [22]:
 # Starter

## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, average item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [23]:
# Starter

## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [24]:
# Starter