### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np
import os

# File to Load (Remember to Change These)
file_to_load = os.path.join(".","Resources","purchase_data.csv")

# Read Purchasing File and store into Pandas data frame
purchase_df = pd.read_csv(file_to_load)

In [2]:
#====Initial evaluation of data  
purchase_df.describe()

Unnamed: 0,Purchase ID,Age,Item ID,Price
count,780.0,780.0,780.0,780.0
mean,389.5,22.714103,91.755128,3.050987
std,225.310896,6.659444,52.697702,1.169549
min,0.0,7.0,0.0,1.0
25%,194.75,20.0,47.75,1.98
50%,389.5,22.0,92.0,3.15
75%,584.25,25.0,138.0,4.08
max,779.0,45.0,183.0,4.99


In [3]:
#====Initial evaluation of data part 2
purchase_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 780 entries, 0 to 779
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Purchase ID  780 non-null    int64  
 1   SN           780 non-null    object 
 2   Age          780 non-null    int64  
 3   Gender       780 non-null    object 
 4   Item ID      780 non-null    int64  
 5   Item Name    780 non-null    object 
 6   Price        780 non-null    float64
dtypes: float64(1), int64(3), object(3)
memory usage: 42.8+ KB


In [4]:
#====checking for missing data
purchase_df.count()

Purchase ID    780
SN             780
Age            780
Gender         780
Item ID        780
Item Name      780
Price          780
dtype: int64

## Player Count

* Display the total number of players


In [5]:
#====Number of unique players
total_players = len(purchase_df["SN"].value_counts())
print(f"Total Players: {total_players}")

Total Players: 576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [6]:
#====Unique item count

#purchase_df["Item Name"].value_counts()
item_count = len(purchase_df["Item Name"].unique())
print(f"Available Items for purchase: {item_count}")

Available Items for purchase: 179


In [7]:
#====Average Purchase Price

#purchase_df["Price"].round({"Price":2})  #<--Formatting, strike 1
#purchase_df["Price"].round("Price",2)    #<--Formatting, strike 2

ave_purch = purchase_df["Price"].mean()

#ave_purch = round("ave_purch",2)         #<--Formatting, strike 3

print(f"Average purchase amount per transaction: {ave_purch}")

Average purchase amount per transaction: 3.0509871794871795


In [8]:
#====Total Number of purchases

purch_count = purchase_df["Purchase ID"].count()
print(f"The total number of transactions: {purch_count}")


The total number of transactions: 780


In [9]:
#====Total Revenue

tot_rev = purchase_df["Price"].sum()
print(f"The total revenue from transactions: {tot_rev}")

The total revenue from transactions: 2379.77


In [10]:
#====Create summary df

#==Method A - Not Used
#summary_df = pd.DataFrame({
#    "Number of Items" : [item_count],
#    "Ave Purchase Price" : [ave_purch],
#    "Total Number of Purchase" : [purch_count],
#    "Total Revenue" : [tot_rev]
#})
#summary_df.transpose()

#==Method B
results = {"Results" : [item_count, ave_purch, purch_count, tot_rev]}
summary_df = pd.DataFrame(results, index=["Number of Items","Ave Purchase Price","Total Number of Purchase","Total Revenue"])

summary_df

Unnamed: 0,Results
Number of Items,179.0
Ave Purchase Price,3.050987
Total Number of Purchase,780.0
Total Revenue,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [68]:
#====Gender Summary

#gender_ratio = purchase_df["Gender"].value_counts(normalize=True)
#gender_count = purchase_df["Gender"].value_counts()

gender_df = purchase_df.groupby(["Gender"])
gender_df.describe()



Unnamed: 0_level_0,Purchase ID,Purchase ID,Purchase ID,Purchase ID,Purchase ID,Purchase ID,Purchase ID,Purchase ID,Age,Age,...,Item ID,Item ID,Price,Price,Price,Price,Price,Price,Price,Price
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
Gender,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
Female,113.0,379.380531,211.605484,15.0,199.0,392.0,558.0,775.0,113.0,21.345133,...,129.0,183.0,113.0,3.203009,1.158194,1.0,2.28,3.45,4.23,4.9
Male,652.0,392.516871,227.516414,0.0,193.75,390.5,592.25,779.0,652.0,22.917178,...,139.0,183.0,652.0,3.017853,1.175625,1.0,1.9625,3.09,4.08,4.99
Other / Non-Disclosed,15.0,334.6,234.524991,9.0,169.5,291.0,516.5,747.0,15.0,24.2,...,141.0,163.0,15.0,3.346,0.883813,1.33,3.1,3.45,3.875,4.75


In [65]:
#====Gender - Purch Counts

gpurch_count = gender_df["Purchase ID"].count()
#print(gpurch_count)
gpurch_count.index

Index(['Female', 'Male', 'Other / Non-Disclosed'], dtype='object', name='Gender')

In [66]:
#====Gender - Ave Purch 

gpurch_ave = gender_df["Price"].mean()
print(gpurch_ave)


Gender
Female                   3.203009
Male                     3.017853
Other / Non-Disclosed    3.346000
Name: Price, dtype: float64


In [43]:
#====Gender - Total Purch 

gpurch_rev = gender_df["Price"].sum()
print(gpurch_rev)

Gender
Female                    361.94
Male                     1967.64
Other / Non-Disclosed      50.19
Name: Price, dtype: float64


In [56]:
#====Gender - Total Purch 

gpurch_unique = gender_df["Purchase ID"].unique()
gpurch_unique_group = gpurch_unique.groupby(["Gender"]).count()
display(gpurch_unique)

Gender
Female                   [15, 18, 38, 41, 55, 66, 71, 72, 76, 81, 84, 8...
Male                     [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14...
Other / Non-Disclosed    [9, 22, 82, 111, 228, 237, 242, 291, 350, 401,...
Name: Purchase ID, dtype: object

In [73]:
gender_combine= list(zip(gpurch_count,gpurch_ave,gpurch_rev))
gender_summary= pd.DataFrame(gender_combine, columns = ['Trans by Gender','Ave Purch by Gender','Total Rev by Gender'])
gender_summary["Gender"]=gender_summary.index
gender_summary = gender_summary.reset_index(drop=True)
gender_summary

Unnamed: 0,Trans by Gender,Ave Purch by Gender,Total Rev by Gender,Gender
0,113,3.203009,361.94,0
1,652,3.017853,1967.64,1
2,15,3.346,50.19,2



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, average item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

