## Analysis of Player Purchasing in "Pymoli"
by:  Andrew Guenthner
Date:  20 Mar 2019

Notes:  To run the analyses, the input file "purchase_data.csv" needs to be in a folder
called "Resources" in the same directory as this notebook.  

An accompanying "README" file should also be available in the same directory as this
notebook.  

In [2]:
# Import dependencies
import pandas as pd

# Set up file input
purchase_data_source = 'Resources/purchase_data.csv'
purchase_df = pd.read_csv(purchase_data_source)

In [22]:
# Check file header to make sure the DataFrame has loaded correctly
purchase_df.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


These data sets are expected to be clean and ready-to-go.  To verify, a simple check
for any missing values will suffice.

In [4]:
# Quick check for any missing values ...
purchase_df.isnull().sum()

Purchase ID    0
SN             0
Age            0
Gender         0
Item ID        0
Item Name      0
Price          0
dtype: int64

In [8]:
# Note the data types of each column for reference.
purchase_df.dtypes

Purchase ID      int64
SN              object
Age              int64
Gender          object
Item ID          int64
Item Name       object
Price          float64
dtype: object

### Initial exploratory analysis

Some basic characteristics of the data: 

1) Total number of players in the dataset (assuming one screen name per player):

In [10]:
# Compute total number of unique screen names
purchase_df.SN.nunique()

576

This is a reasonable size group for drawing inferences from the total set, but small enough that
we may need to be careful about analyzing subsets, such as particular age groups.

2) Total number of items purchased, the average transaction price, total number of transactions, and total revenue represented by these transactions.  For clarity, a "transaction" refers to the event of purchasing a single item recorded by the system, whereas a "purchase" refers to the player behavior of buying one or more items.  

In [11]:
# Compute total number of unique item codes
purchase_df['Item Name'].nunique()

179

In [19]:
# Compute average purchase price
print('${:.2f}'.format(purchase_df.Price.mean()))

$3.05


In [20]:
#Total number of transactions
purchase_df['Purchase ID'].count()

780

In [27]:
#Total revenue generated
print(f'${purchase_df.Price.sum():,.2f}')

$2,379.77


The transactions dataset is large enough to draw some meaningful conclusions on, as long as deep sub-divisions are avoided.

### Gender-Based Analysis

The breakdown of the data by gender is as follows:

1) Players by gender (based on unique screen name count)

In [78]:
# Make a general-purpose group-by item
purch_by_gender = purchase_df.groupby('Gender')
# Count unique screen names by group
gender_count = pd.DataFrame(purch_by_gender['SN'].nunique().reset_index())
# Generate % data and format the dataframe 
gender_count['% by Gender'] = 100 * gender_count['SN'] / gender_count['SN'].sum()
gender_count['% by Gender'] = gender_count['% by Gender'].map('{:.0f}'.format)
gender_count.style.hide_index()

Gender,SN,% by Gender
Female,81,14
Male,484,84
Other / Non-Disclosed,11,2


The players identify as male by a large margin.

2) Purchase characteristics by gender, including number of transactions, average transaction amount, total purchases by each gender, and average spent on purchases per person by gender

In [91]:
# Count transactions by gender and start a new DataFrame
purch_count_gender = pd.DataFrame(purch_by_gender['Purchase ID'].count().reset_index())
# Average transaction amount by gender
price_avg_gender = pd.DataFrame(purch_by_gender['Price'].mean().reset_index())
# Total spend by gender
price_total_gender = pd.DataFrame(purch_by_gender['Price'].sum().reset_index())
# Merge these into a new dataframe for further analysis, and rename columns
merge_by_gender1 = purch_count_gender.merge(price_avg_gender,how='outer',on='Gender')
merge_by_gender2 = merge_by_gender1.merge(price_total_gender,how='outer',on='Gender')\
.rename(columns={'Purchase ID':'Transactions','Price_x':'Average Transaction','Price_y':'Total Spent'})
# Compute the average spent per player
merge_by_gender2['Average Spent per Player'] = merge_by_gender2['Total Spent'] / gender_count['SN']
# Format for display
merge_by_gender2['Average Transaction'] = merge_by_gender2['Average Transaction'].map('${:,.2f}'.format)
merge_by_gender2['Total Spent'] = merge_by_gender2['Total Spent'].map('${:,.2f}'.format)
merge_by_gender2['Average Spent per Player'] = merge_by_gender2['Average Spent per Player'].map('${:,.2f}'.format)
merge_by_gender2.style.hide_index()

Gender,Transactions,Average Transaction,Total Spent,Average Spent per Player
Female,113,$3.20,$361.94,$4.47
Male,652,$3.02,"$1,967.64",$4.07
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


Players identifying as male spent slightly less than those identifying as other genders, on both a per transaction and total per player basis.  However, because of the larger numbers, they represent by far the largest source of revenue.

Game players overwhelmingly identify as male.  The following table compares purchase behavior by gender.

In [45]:
gen_group = purchase_df.groupby('Gender')
gen_group['']

<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x0000023475851908>