# Heroes of Pymoli - Homework 4 - Pandas - Tom Callegari

### 3 Noticeable Trends

* Within this sample, it appears that females spend more on average than males (\\$4.47 Females to \\$4.04 Males)
* Close to 76% of all purchasers are between the ages of 15 to 29
* The average total purchase per person among this group is lower at \\$4.00 than the overall average of \\$4.05


In [1]:
# Import the packages neeed for analysis
import numpy as np
import pandas as pd

In [2]:
# Import the PyMoli purchase_data.csv and name it 'data'
data = pd.read_csv('purchase_data.csv', encoding='utf-8')

### Player Count

* Display the total number of players

In [3]:
# Isolate total number of unique players
unique_players = data['SN'].nunique()

# Assemble the needed data into a dictionary
unique_dict = {
    'Total Players': int(unique_players)
}

# Insert the dictionary into a pandas dataframe and add an index label
total_players = pd.DataFrame(unique_dict, index=['Count'])
total_players

Unnamed: 0,Total Players
Count,576


### Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.
* Create a summary data frame to hold the results
* Optional: Give the displayed data cleaner formatting
* Display the summary data frame

In [4]:
# Isolate the variables needed to populate the summary table
unique_items = int(data['Item ID'].nunique())
mean_price = round(data['Price'].mean(), 2)
purchases = int(data['Purchase ID'].nunique())
revenue = data['Price'].sum()

# Assemble the needed data into a dictionary with f strings to add $ signs
purchase_dict = {
    'Number of Unique Items': unique_items,
    'Average Price': mean_price,
    'Number of Purchases': purchases,
    'Total Revenue': revenue
}

# Insert the dictionary into a pandas dataframe and add an index label
purchasing_analysis = pd.DataFrame(purchase_dict, index = ['Count'])

# Format the appropriate cells to show $ signs
purchasing_analysis = purchasing_analysis.style.format({
    'Average Price': '${:.2f}'.format, 
    'Total Revenue': '${:.2f}'.format
})

# Display the summary table
purchasing_analysis

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
Count,183,$3.05,780,$2379.77


### Gender Demographics

* Percentage and Count of Male Players
* Percentage and Count of Female Players
* Percentage and Count of Other / Non-Disclosed

In [5]:
# Isolate and compute the variables needed to populate the summary table
demographics = data.groupby(['SN', 'Gender']).agg({'SN': 'count'}).groupby('Gender').count()

females = demographics.iloc[0, 0]
females_perc = round(females / 576, 4) * 100

males = demographics.iloc[1, 0]
males_perc = round(males / 576, 4) * 100

others = demographics.iloc[2, 0]
others_perc = round(others / 576, 4) * 100

# Assemble the needed data into a dictionary with f strings to add % symbols
gender_dict = {
    'Total Count': [males, females, others],
    'Percentage of Players': [males_perc, females_perc, others_perc]  
}

# Insert the dictionary into a pandas dataframe and add an index label
gender_data = pd.DataFrame(gender_dict, index=['Male', 'Female', 'Other / Non-Disclosed'])

# Format appropriate cells in the table to show the % symbol
gender_data = gender_data.style.format({
    'Percentage of Players': '{:.2f}%'.format
})

# Display the summary table
gender_data

Unnamed: 0,Total Count,Percentage of Players
Male,484,84.03%
Female,81,14.06%
Other / Non-Disclosed,11,1.91%


### Purchasing Analysis (Gender)

* Run basic calculations to obatin purchase count, average purchase price, average purchase total per person etc. by gender
* Create a summary data frame to hold the results
* Optional: Give the displayed data cleaner formatting
* Display the summary data frame

In [6]:
# Groupby Gender then count the number of purchases and find the mean price of purchases
# then save as a new dataframe names purchase_analysis
purchase_analysis = data.groupby('Gender').agg({'Purchase ID': 'count', 'Price': 'mean'})

# Rename Price and Purchase ID to match gitlab example
purchase_analysis = purchase_analysis.rename(columns={'Price': 'Average Purchase Price', 'Purchase ID': 'Purchase Count'})

# Groupby SN (screen-name?) then Gender then get the mean purchase price for each Gender group
purchase_analysis['Avg Total Purchase per Person'] = data.groupby(['SN', 'Gender']).agg({'Price': 'sum'}).groupby('Gender').agg({'Price': 'mean'})

# Groupby Gender and get the sum of purchases spent per Gender group
purchase_analysis['Total Purchase Value'] = data.groupby('Gender').agg({'Price': 'sum'})

# Round Average Purchase Price and Avg Total Purchase per Person to 2 decimal places
purchase_analysis[['Average Purchase Price', 'Avg Total Purchase per Person']] = purchase_analysis[['Average Purchase Price', 'Avg Total Purchase per Person']].round(2)

# Reorder the columns to match gitlab example
purchase_analysis = purchase_analysis[['Purchase Count', 'Average Purchase Price', 'Total Purchase Value', 'Avg Total Purchase per Person']]

# Round each value to two decimal places
purchase_analysis = purchase_analysis.round(2)

# Format the data frame to include $
purchase_analysis = purchase_analysis.style.format({
    'Average Purchase Price': '${:.2f}'.format, 
    'Total Purchase Value': '${:,.2f}'.format, 
    'Avg Total Purchase per Person': '${:.2f}'.format
})

# Display the summary table
purchase_analysis


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,$3.20,$361.94,$4.47
Male,652,$3.02,"$1,967.64",$4.07
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


### Age Demographics

* Establish bins for ages
* Categorize the existing players using the age bins. Hint: use pd.cut()
* Calculate the numbers and percentages by age group
* Creat a summary data frame to hold the results
* Optional: Round the percentage column to two decimal points
* Display the Age Demographics Table

In [7]:
# Cut the Age variable into labeled ranges that represent the pre-specified bins
data['Age Bin'] = pd.cut(data['Age'],
                         [0, 9, 14, 19, 24, 29, 34, 39, 45],
                        labels = ['< 10', '10-14', '15-19', '20-24', '25-29', '30-34', '35-39', '40+'])

# Groupby the SN and Age Bin variables and aggregate a count number then groupby the Age Bin and get the final count
age_demographics = data.groupby(['SN', 'Age Bin']).agg({'SN': 'count'}).groupby('Age Bin').count()

# Rename the SN column to Total Count
age_demographics = age_demographics.rename(columns={'SN': 'Total Count'})

# Pull the age_total_count numbers out and divide by the total number of players and multiply by 100 to get a percentage
age_total_count = age_demographics['Total Count']
age_demographics['Percentage of Players'] = (age_total_count / 576) * 100

# Round the data frame to two decimal places
age_demographics = age_demographics.round(2)

# Style the data frame to include % symbols where appropriate
styled_age_demographics = age_demographics.style.format({
    'Percentage of Players': '{:.2f}%'.format
})

# Display the summary table
styled_age_demographics

Unnamed: 0_level_0,Total Count,Percentage of Players
Age Bin,Unnamed: 1_level_1,Unnamed: 2_level_1
< 10,17,2.95%
10-14,22,3.82%
15-19,107,18.58%
20-24,258,44.79%
25-29,77,13.37%
30-34,52,9.03%
35-39,31,5.38%
40+,12,2.08%


### Purchasing Analysis (Age)

* Bin the purchase_data data frame by age
* Run basic calculations to obtain purchase count, average purchase price, average total per person etc. in the table below
* Create a summary data frame to hold the results
* Optional: Give the displayed data cleaner formatting
* Display the summary data frame

In [8]:
# Groupby the Age Bin variable then aggregate the Purchase ID count and mean of purchase prices
purchasing_age = data.groupby('Age Bin').agg({'Purchase ID': 'count', 'Price': 'mean'})

# Rename the Price column to Average Purchase Price and the Purchase ID column to Purchase Count
purchasing_age = purchasing_age.rename(columns={'Price': 'Average Purchase Price', 'Purchase ID': 'Purchase Count'})

# Create the Avg Total Purchases per Person column by grouping by SN and Age Bin, getting the aggregate sum and grouping by Age Bin and getting the mean of price
purchasing_age['Avg Total Purchase per Person'] = data.groupby(['SN', 'Age Bin']).agg({'Price': 'sum'}).groupby('Age Bin').agg({'Price': 'mean'})

# Groupby Age Bin and get the aggregated sum of price to create the Total Purchase Value column
purchasing_age['Total Purchase Value'] = data.groupby('Age Bin').agg({'Price': 'sum'})

# Reorder the columns to match the given example
purchasing_age = purchasing_age[['Purchase Count', 'Average Purchase Price', 'Total Purchase Value', 'Avg Total Purchase per Person']]

# Round all the values in the data frame to two decimal places
purchasing_age = purchasing_age.round(2)

# Format the data frame to show $ signs where appropriate
styled_purchasing_age = purchasing_age.style.format({
    'Average Purchase Price': '${:.2f}'.format,
    'Total Purchase Value': '${:,.2f}'.format, 
    'Avg Total Purchase per Person': '${:.2f}'.format
})

# Display the summary table
styled_purchasing_age

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Age Bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
< 10,23,$3.35,$77.13,$4.54
10-14,28,$2.96,$82.78,$3.76
15-19,136,$3.04,$412.89,$3.86
20-24,365,$3.05,"$1,114.06",$4.32
25-29,101,$2.90,$293.00,$3.81
30-34,73,$2.93,$214.00,$4.12
35-39,41,$3.60,$147.67,$4.76
40+,13,$2.94,$38.24,$3.19


### Top Spenders

* Run basic calculations to obtain the results in the example table
* Create a summary data frame to hold the results
* Sort the total purchase value column in descending order
* Optional: Give the displayed data cleaner formatting
* Display a preview of the summary data frame

In [11]:
# Groupby SN variable and summarise Purchase ID counts and Price means
top_spenders = data.groupby('SN').agg({'Purchase ID': 'count', 'Price': 'mean'})

# Groupby SN variable and get the aggregated price sums then save as a new column called Total Purchase Value
top_spenders['Total Purchase Value'] = data.groupby('SN').agg({'Price': 'sum'})

# Rename the column labels to match the example
top_spenders = top_spenders.rename(columns={'Purchase ID': 'Purchase Count', 'Price': 'Average Purchase Price'})

# Filter values above 13 Total Purchase Value and then sort the dataframe based on descending Total Purchase Value
top_spenders = top_spenders[top_spenders['Total Purchase Value'] > 13].sort_values('Total Purchase Value', ascending=False)

# Round the data frame to two decimal places
top_spenders = top_spenders.round(2)

# Format the Average Purchase Price and Total Purchase Value columns to include $ signs
styled_spenders = top_spenders.style.format({
    'Average Purchase Price': '${:.2f}'.format,
    'Total Purchase Value': '${:.2f}'.format
})

# Display the summary table
styled_spenders

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$3.79,$18.96
Idastidru52,4,$3.86,$15.45
Chamjask73,3,$4.61,$13.83
Iral74,4,$3.40,$13.62
Iskadarya95,3,$4.37,$13.10


### Most Popular Items

* Retrieve the Item ID, Item Name and Item Price columns
* Group by Item ID and Item Name then perform calculations to obtain Purchase Count, Item Price and Total Purchase Value
* Create a summary data frame to hold the results
* Sort the purchase count column in descending order
* Optional: Give the displayed data cleaner formatting
* Display a preview of the summary data frame

In [11]:
# Groupby Item ID and Item Name then summarise the Purchase ID and Price columns to represent counts and averages
popular_items = data.groupby(['Item ID', 'Item Name']).agg({'Purchase ID': 'count', 'Price': 'mean'})

# Groubpy Item ID and Item Name and summarise to get the sum of Price values then save in a new column named Total Purchase Value
popular_items['Total Purchase Value'] = data.groupby(['Item ID', 'Item Name']).agg({'Price': 'sum'})

# Rename the column labels to match the example given
popular_items = popular_items.rename(columns={'Purchase ID': 'Purchase Count', 'Price': 'Average Purchase Price'})

# Filter the Purchase Count column to greater than or equal to 8 and sort the data frame descending based on Purchase Count
popular_items = popular_items[popular_items['Purchase Count'] >= 8].sort_values('Purchase Count', ascending=False)

# Round the data frame values to two decimal places
popular_items = popular_items.round(2)

# Format the Average Purchase Price and Total Purchase Value columns to include $ signs
styled_popular_items = popular_items.style.format({
    'Average Purchase Price': '${:.2f}'.format, 
    'Total Purchase Value': '${:.2f}'.format
})

# Display the summary table
styled_popular_items

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
82,Nirvana,9,$4.90,$44.10
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77
145,Fiery Glass Crusader,9,$4.58,$41.22
19,"Pursuit, Cudgel of Necromancy",8,$1.02,$8.16
34,Retribution Axe,8,$2.22,$17.76
37,"Shadow Strike, Glory of Ending Hope",8,$3.16,$25.28
59,"Lightning, Etcher of the King",8,$4.23,$33.84
60,Wolf,8,$3.54,$28.32
72,Winter's Bite,8,$3.77,$30.16


### Most Profitable Items

* Sort the above table by total purchase value in descending order
* Optional: Give the displayed data cleaner formatting
* Display a preview of the data frame

In [12]:
# Begin halfway through the previous code chunk and instead this time filter Total Purchase Value for greater than or equal
# to 34.80 and descending sort the data frame based on the Total Purchase Value column
popular_items_sorted = popular_items[popular_items['Total Purchase Value'] >= 34.80].sort_values('Total Purchase Value', ascending=False)

# Round the data frame to two decimal places
popular_items_sorted = popular_items_sorted.round(2)

# Format the Average Purchase Price and Total Purchase Value columns to include $ signs
styled_popular_items_sorted = popular_items_sorted.style.format({
    'Average Purchase Price': '${:.2f}'.format, 
    'Total Purchase Value': '${:,.2f}'.format
})

# Display the summary table
styled_popular_items_sorted

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
82,Nirvana,9,$4.90,$44.10
145,Fiery Glass Crusader,9,$4.58,$41.22
92,Final Critic,8,$4.88,$39.04
103,Singed Scalpel,8,$4.35,$34.80
