### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
from functools import reduce

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data_df = pd.read_csv(file_to_load)

In [2]:
# Show just the header
purchase_data_df.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [3]:
# Get a Series object containing the data type objects of each column of Dataframe.
# Index of series is column name.
dataTypeSeries = purchase_data_df.dtypes

print('Data type of each column of Dataframe :')
dataTypeSeries

Data type of each column of Dataframe :


Purchase ID      int64
SN              object
Age              int64
Gender          object
Item ID          int64
Item Name       object
Price          float64
dtype: object

In [4]:
# Identify incomplete rows
purchase_data_df.count()

Purchase ID    780
SN             780
Age            780
Gender         780
Item ID        780
Item Name      780
Price          780
dtype: int64

In [5]:
# Display a statistical overview of the DataFrame
purchase_data_df.describe()

Unnamed: 0,Purchase ID,Age,Item ID,Price
count,780.0,780.0,780.0,780.0
mean,389.5,22.714103,91.755128,3.050987
std,225.310896,6.659444,52.697702,1.169549
min,0.0,7.0,0.0,1.0
25%,194.75,20.0,47.75,1.98
50%,389.5,22.0,92.0,3.15
75%,584.25,25.0,138.0,4.08
max,779.0,45.0,183.0,4.99


## Player Count

* Display the total number of players


In [6]:
# Display the total number of players
players = len(purchase_data_df["SN"].unique())
print(f"Total Players = {players}")


Total Players = 576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [7]:
# Create logic to calculate 
#     number of items purchased, 
#     average price for items actually purchased,
#     number of purchases, and
#     total revenue

items = len(purchase_data_df["Item ID"].unique())
avg_price = purchase_data_df["Price"].mean()
purchases = len(purchase_data_df["Purchase ID"])
total = purchase_data_df["Price"].sum()

# Test print the items that will go into the summary data frame
print(f"Unique Items = {items}")
print(f"Average Price = ${avg_price:,.2f}")
print(f"Purchases = {purchases}")
print(f"Total Revenue = ${total:,.2f}")


Unique Items = 179
Average Price = $3.05
Purchases = 780
Total Revenue = $2,379.77


In [8]:
# Create a DataFrame of elements analyzed using a list of dictionaries, in this case a single dictionary
analysis_list = [
    {"Unique Items": items, "Average Price": avg_price, "Purchases": purchases, "Total Revenue": total},
]
analysis_df = pd.DataFrame(analysis_list)

# Format currency
analysis_df.style.format({"Average Price": "${:,.2f}", "Total Revenue": "${:,.2f}"})

# analysis_df

Unnamed: 0,Unique Items,Average Price,Purchases,Total Revenue
0,179,$3.05,780,"$2,379.77"


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [9]:
# Create dataframe of SN and Gender
gender_df = purchase_data_df[["SN", "Gender"]].drop_duplicates()
gender_df

Unnamed: 0,SN,Gender
0,Lisim78,Male
1,Lisovynya38,Male
2,Ithergue48,Male
3,Chamassasya86,Male
4,Iskosia90,Male
...,...,...
773,Hala31,Male
774,Jiskjask80,Male
775,Aethedru70,Female
777,Yathecal72,Male


In [10]:
# Number of unique individuals by gender
gender_counts = gender_df["Gender"].value_counts()

# Convert Gender Counts from value_count() to DataFrame
gender_counts_cvt = pd.DataFrame(gender_counts)
gender_counts_cvt_df = gender_counts_cvt.reset_index()
gender_counts_cvt_df.columns = ["Gender", "Total Count"] # change column names
gender_counts_cvt_df

Unnamed: 0,Gender,Total Count
0,Male,484
1,Female,81
2,Other / Non-Disclosed,11


In [11]:
# Percentage of unique individuals by gender
gender_percent = gender_df["Gender"].value_counts(normalize=True).mul(100).round(1).astype(str) + "%"

# Convert Gender Percentage from value_count() to DataFrame
gender_percent_cvt = pd.DataFrame(gender_percent)
gender_percent_cvt_df = gender_percent_cvt.reset_index()
gender_percent_cvt_df.columns = ["Gender", "Percentage"] # change column names
gender_percent_cvt_df

Unnamed: 0,Gender,Percentage
0,Male,84.0%
1,Female,14.1%
2,Other / Non-Disclosed,1.9%


In [12]:
# Merge two dataframes using an inner join
gender_merge_df = pd.merge(gender_counts_cvt_df, gender_percent_cvt_df, on="Gender")
gender_merge_df

Unnamed: 0,Gender,Total Count,Percentage
0,Male,484,84.0%
1,Female,81,14.1%
2,Other / Non-Disclosed,11,1.9%



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [13]:
# Extract the following columns: "Gender", "Price""
purchase_gender_df = purchase_data_df[["Gender", "Price"]]
purchase_gender_df.head()

Unnamed: 0,Gender,Price
0,Male,3.53
1,Male,1.56
2,Male,4.88
3,Male,3.27
4,Male,1.44


In [14]:
# Create a dataframe of Purchase Count by Gender
gender_group = purchase_gender_df.groupby(["Gender"])
gender_purchase_count_df = gender_group.count()

# Convert 2nd column title from Price to Purchase Count
gender_purchase_count_cvt_df = gender_purchase_count_df.reset_index()
gender_purchase_count_cvt_df.columns = ["Gender", "Purchase Count"] # change column names
gender_purchase_count_cvt_df

Unnamed: 0,Gender,Purchase Count
0,Female,113
1,Male,652
2,Other / Non-Disclosed,15


In [15]:
# Create a dataframe of Average Purchase Price by Gender
gender_purchase_avg_price_df = gender_group.mean()

# Convert 2nd column title from Price to Average Purchase Price
gender_purchase_avg_price_cvt_df = gender_purchase_avg_price_df.reset_index()
gender_purchase_avg_price_cvt_df.columns = ["Gender", "Average Purchase Price"] # change column names
gender_purchase_avg_price_cvt_df.style.format({"Average Purchase Price": "${:,.2f}"})
gender_purchase_avg_price_cvt_df

Unnamed: 0,Gender,Average Purchase Price
0,Female,3.203009
1,Male,3.017853
2,Other / Non-Disclosed,3.346


In [16]:
# Create a dataframe of Total Purchase Value by Gender
gender_purchase_total_df = gender_group.sum()

# Convert 2nd column title from Price to Average Purchase Price
gender_purchase_total_cvt_df = gender_purchase_total_df.reset_index()
gender_purchase_total_cvt_df.columns = ["Gender", "Total Purchase Value"] # change column names
gender_purchase_total_cvt_df.style.format({"Total Purchase Value": "${:,.2f}"})
gender_purchase_total_cvt_df

Unnamed: 0,Gender,Total Purchase Value
0,Female,361.94
1,Male,1967.64
2,Other / Non-Disclosed,50.19


In [17]:
# compile the list of dataframes you want to merge
data_frames = [gender_counts_cvt_df, 
    gender_purchase_count_cvt_df, 
    gender_purchase_avg_price_cvt_df, 
    gender_purchase_total_cvt_df]


# Merge two dataframes using an inner join
# gender_merge_2_df = pd.merge(
#    gender_counts_cvt_df, 
#    gender_purchase_count_cvt_df, 
#    gender_purchase_avg_price_cvt_df, 
#    gender_purchase_total_cvt_df, 
#    on=["Gender"])
# gender_merge_2_df

gender_merge_2_df = reduce(lambda  left,right: pd.merge(left,right,on=["Gender"], how='outer'), data_frames)
gender_merge_2_df

Unnamed: 0,Gender,Total Count,Purchase Count,Average Purchase Price,Total Purchase Value
0,Male,484,652,3.017853,1967.64
1,Female,81,113,3.203009,361.94
2,Other / Non-Disclosed,11,15,3.346,50.19


In [18]:
# Calculate the Avg Purchase Value per Person by Gender
gender_avg_purchase_val_df = gender_merge_2_df["Total Purchase Value"]/gender_merge_2_df["Total Count"]
gender_avg_purchase_val_df



0    4.065372
1    4.468395
2    4.562727
dtype: float64

In [19]:
# Use DataFrame.insert() to add a column 
gender_merge_2_df.insert(4, "Avg Purchase Value per Person", gender_avg_purchase_val_df, True)
gender_merge_2_df

Unnamed: 0,Gender,Total Count,Purchase Count,Average Purchase Price,Avg Purchase Value per Person,Total Purchase Value
0,Male,484,652,3.017853,4.065372,1967.64
1,Female,81,113,3.203009,4.468395,361.94
2,Other / Non-Disclosed,11,15,3.346,4.562727,50.19


In [20]:
# Format currency
gender_merge_2_df.style.format({
    "Average Purchase Price": "${:,.2f}", 
    "Avg Purchase Value per Person": "${:,.2f}", 
    "Total Purchase Value": "${:,.2f}"})

gender_merge_2_df

Unnamed: 0,Gender,Total Count,Purchase Count,Average Purchase Price,Avg Purchase Value per Person,Total Purchase Value
0,Male,484,652,3.017853,4.065372,1967.64
1,Female,81,113,3.203009,4.468395,361.94
2,Other / Non-Disclosed,11,15,3.346,4.562727,50.19


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, average item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

