### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

## Player Count

* Display the total number of players


In [2]:
purchase_data.columns

Index(['Purchase ID', 'SN', 'Age', 'Gender', 'Item ID', 'Item Name', 'Price'], dtype='object')

In [3]:
# Identify incomplete rows (all item counts are the same, no missing data)
purchase_data.count()

Purchase ID    780
SN             780
Age            780
Gender         780
Item ID        780
Item Name      780
Price          780
dtype: int64

In [4]:
# Check data types
purchase_data.dtypes

Purchase ID      int64
SN              object
Age              int64
Gender          object
Item ID          int64
Item Name       object
Price          float64
dtype: object

In [5]:
# Count of unique values in "SN" i.e. "Screen Name"
player_count = purchase_data["SN"].value_counts()
player_count = player_count.count()
player_count

576

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [6]:
# Count of unique games
item_count = purchase_data["Item ID"].value_counts()
item_count = item_count.count()
item_count

179

In [7]:
# Average price per purchase
avg_price = purchase_data["Price"].mean()
avg_price = round(avg_price,2) # Rounds to two(2) decimal places
avg_price

3.05

In [25]:
# Number of purchases
purchase_count = purchase_data["Purchase ID"].value_counts()
purchase_count  = purchase_count.count()
purchase_count 

780

In [26]:
# Total Revenue
total_revenue = purchase_data["Price"].sum()
total_revenue

2379.77

In [27]:
# Data Frame for summary of above calculations 
summary_df = pd.DataFrame({
    "Number of Unique Items":[item_count],
    "Average Price":[avg_price], # HOW DO I FORMAT AVERAGE PRICE TO $0.00?
    "Number of Purchases":[purchase_count],
    "Total Revenue":[total_revenue]
})
summary_df

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,179,Gender Female 3.203009 Male ...,780,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [42]:
# Group by "Gender"
gender_group = purchase_data.groupby("Gender")
gender_group

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000021313048A90>

In [43]:
# Player count (i.e "SN") by "Gender" 
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.nunique.html 
# https://stackoverflow.com/questions/38309729/count-unique-values-with-pandas-per-groups
gender_count = gender.nunique()["SN"]
gender_count

Gender
Female                    81
Male                     484
Other / Non-Disclosed     11
Name: SN, dtype: int64

In [44]:
# Create a Data Frame of gender count and percentage of players
gender_df = pd.DataFrame({
    "Total Count": gender_count,
    "Percentage of Players": round(gender_count/gender_count.sum()*100,1)
})
gender_df

Unnamed: 0_level_0,Total Count,Percentage of Players
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,81,14.1
Male,484,84.0
Other / Non-Disclosed,11,1.9


In [45]:
# Sort by Total Count, in descending order
gender_df = gender_df.sort_values("Total Count", ascending=False)
gender_df

Unnamed: 0_level_0,Total Count,Percentage of Players
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Male,484,84.0
Female,81,14.1
Other / Non-Disclosed,11,1.9



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [68]:
# Recall variable for purchase_count
purchase_count

Gender
Female                   113
Male                     652
Other / Non-Disclosed     15
Name: SN, dtype: int64

In [69]:
# Recall variable for avg_price
avg_price

Gender
Female                   3.20
Male                     3.02
Other / Non-Disclosed    3.35
Name: Price, dtype: float64

In [70]:
# Recall 
total_revenue

Gender
Female                    361.94
Male                     1967.64
Other / Non-Disclosed      50.19
Name: Price, dtype: float64

In [71]:
purchase_df = pd.DataFrame({
    "Purchase Count": purchase_count,
    "Avg Purchase Price": avg_price,
    "Total Purchase" : total_revenue
})
purchase_df

Unnamed: 0_level_0,Purchase Count,Avg Purchase Price,Total Purchase
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,113,3.2,361.94
Male,652,3.02,1967.64
Other / Non-Disclosed,15,3.35,50.19


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [None]:
# Starter

## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [None]:
# Starter

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [None]:
 # Starter

## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, average item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [None]:
# Starter

## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [None]:
# Starter