### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np
import os

# File to Load (Remember to Change These)
file_to_load = os.path.join(".","Resources","purchase_data.csv")

# Read Purchasing File and store into Pandas data frame
purchase_df = pd.read_csv(file_to_load)

In [2]:
#====Initial evaluation of data  
purchase_df.describe()

Unnamed: 0,Purchase ID,Age,Item ID,Price
count,780.0,780.0,780.0,780.0
mean,389.5,22.714103,91.755128,3.050987
std,225.310896,6.659444,52.697702,1.169549
min,0.0,7.0,0.0,1.0
25%,194.75,20.0,47.75,1.98
50%,389.5,22.0,92.0,3.15
75%,584.25,25.0,138.0,4.08
max,779.0,45.0,183.0,4.99


In [3]:
#====Initial evaluation of data part 2
purchase_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 780 entries, 0 to 779
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Purchase ID  780 non-null    int64  
 1   SN           780 non-null    object 
 2   Age          780 non-null    int64  
 3   Gender       780 non-null    object 
 4   Item ID      780 non-null    int64  
 5   Item Name    780 non-null    object 
 6   Price        780 non-null    float64
dtypes: float64(1), int64(3), object(3)
memory usage: 42.8+ KB


In [4]:
#====checking for missing data
purchase_df.count()

Purchase ID    780
SN             780
Age            780
Gender         780
Item ID        780
Item Name      780
Price          780
dtype: int64

## Player Count

* Display the total number of players


In [5]:
#====Number of unique players
total_players = len(purchase_df["SN"].value_counts())
print(f"Total Players: {total_players}")

Total Players: 576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [6]:
#====Unique item count

#purchase_df["Item Name"].value_counts()
item_count = len(purchase_df["Item Name"].unique())
print(f"Available Items for purchase: {item_count}")

Available Items for purchase: 179


In [93]:
#====Average Purchase Price

ave_purch = round(purchase_df["Price"].mean(),2)


print(f"Average purchase amount per transaction: {ave_purch}")

Average purchase amount per transaction: 3.05


In [8]:
#====Total Number of purchases

purch_count = purchase_df["Purchase ID"].count()
print(f"The total number of transactions: {purch_count}")


The total number of transactions: 780


In [96]:
#====Total Revenue

tot_rev = round(purchase_df["Price"].sum(),2)
print(f"The total revenue from transactions: {tot_rev}")

The total revenue from transactions: 2379.77


In [97]:
#====Create summary df

#==Method A - Not Used
#summary_df = pd.DataFrame({
#    "Number of Items" : [item_count],
#    "Ave Purchase Price" : [ave_purch],
#    "Total Number of Purchase" : [purch_count],
#    "Total Revenue" : [tot_rev]
#})
#summary_df.transpose()

#==Method B
results = {"Results" : [item_count, ave_purch, purch_count, tot_rev]}
summary_df = pd.DataFrame(results, index=["Number of Items","Ave Purchase Price","Total Number of Purchase","Total Revenue"])

summary_df

Unnamed: 0,Results
Number of Items,179.0
Ave Purchase Price,3.05
Total Number of Purchase,780.0
Total Revenue,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [115]:
unique_players = len(purchase_df["SN"].unique())
unique_players

576

In [63]:
#====Gender - Unique Male Players

male_df = purchase_df.loc[purchase_df["Gender"] == "Male",:]
male_count = len(male_df["SN"].unique())
male_count


484

In [64]:
#====Gender - Unique female Players


female_df = purchase_df.loc[purchase_df["Gender"] == "Female",:]
female_count = len(female_df["SN"].unique())
female_count

81

In [65]:
#====Gender - Unique other Players


other_df = purchase_df.loc[purchase_df["Gender"] == "Other / Non-Disclosed",:]
other_count = len(other_df["SN"].unique())
other_count

11


## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [113]:
#====Gender Summary

#gender_ratio = purchase_df["Gender"].value_counts(normalize=True)
#gender_count = purchase_df["Gender"].value_counts()

gender_df = purchase_df.groupby(["Gender"])
gender_df["Gender"].value_counts()



Gender                 Gender               
Female                 Female                   113
Male                   Male                     652
Other / Non-Disclosed  Other / Non-Disclosed     15
Name: Gender, dtype: int64

In [33]:
#====Gender - Purch Counts

gpurch_count = gender_df["Purchase ID"].count()
gindex=gpurch_count.index
#type(gindex)
print(gpurch_count)


Gender
Female                   113
Male                     652
Other / Non-Disclosed     15
Name: Purchase ID, dtype: int64


In [27]:
#====Gender - Total Purch 

gpurch_rev = gender_df["Price"].sum()
print(gpurch_rev)

Gender
Female                    361.94
Male                     1967.64
Other / Non-Disclosed      50.19
Name: Price, dtype: float64


In [91]:
#====Gender - Ave Purch 

gpurch_ave = round(gender_df["Price"].mean(),2)
print(gpurch_ave)


Gender
Female                   3.20
Male                     3.02
Other / Non-Disclosed    3.35
Name: Price, dtype: float64


In [66]:
#==== Create list for unique players by gender
gender_player = [female_count, male_count, other_count]


In [138]:
gender_combine= list(zip(gender_player,gpurch_count,gpurch_ave,gpurch_rev))
gender_summary= pd.DataFrame(gender_combine, columns = ['Players by Gender','Transactions by Gender','Ave Purch by Gender','Total Rev by Gender'])
gender_summary.index=gindex 
gender_summary.rename_axis("Gender", inplace=True)

gender_summary["Ave Purch by Player"]= round(gender_summary["Total Rev by Gender"]/gender_summary["Players by Gender"],2)
gender_summary["Gender Ratio"]= gender_summary["Players by Gender"]/unique_players
gender_summary1 = gender_summary[["Players by Gender","Gender Ratio","Transactions by Gender","Total Rev by Gender","Ave Purch by Gender","Ave Purch by Player"]]
gender_summary1["Gender Ratio"] = gender_summary["Gender Ratio"].astype(float).map("{:.1%}".format) 
gender_summary1["Total Rev by Gender"] = gender_summary["Total Rev by Gender"].astype(float).map("${:.2f}".format) 
gender_summary1["Ave Purch by Gender"] = gender_summary["Ave Purch by Gender"].astype(float).map("${:.2f}".format)
gender_summary1["Ave Purch by Player"] = gender_summary["Ave Purch by Player"].astype(float).map("${:.2f}".format)
gender_summary1

Unnamed: 0_level_0,Players by Gender,Gender Ratio,Transactions by Gender,Total Rev by Gender,Ave Purch by Gender,Ave Purch by Player
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Female,81,14.1%,113,$361.94,$3.20,$4.47
Male,484,84.0%,652,$1967.64,$3.02,$4.07
Other / Non-Disclosed,11,1.9%,15,$50.19,$3.35,$4.56


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, average item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

