**Heroes of Pymoli Observations**

Heroes of Pymoli is a freemium style game that generates revenue through in game microtransactions.  Analsyis of the player universe and purhcase transactions has yielded the following observations:

1. There are 576 unique players that have made  purchases, of those 84.03% are male.

2. Looking at the dollars generated by gender, out of a total $2,379.77 in revenue, males account for 82.68% of those dollars.

3. Interestingly the average purchase price by female players is higher than males, at \\$3.20 versus \\$3.02 per purchase.

4. Age demographics indicate that 44.79% of the players are aged 20-24.  On top of that we can also see that 76.74% of players fall between the ages of 15-29.

5. Looking at sales by age demographics, 20-24 year old players provide 46.81% of revenue.  The age range of 15-29 year old players generates 76.35% of total revenues.  

6. Our analysis of the most profitable items section shows the top 5 items prices per item exceed the average purchase price of $3.05 by a minimum of 38.69%.

**Conclusion**

The analysis of sales activity in Heroes of Pymoli provides interesting insights.  Possible ongoing strategies to increase revenue within the game might revolve around finding ways to increase the female share of players making purchases, or including more premium items that players desire.  It is also possible to explore ways to broaden the age range of the gamer base depending on player preferences.  Finding ways to diversify the experience and players participating ultimately will lead to a longer game life and revnue base to generate profit from.

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
pymoli_csv = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(pymoli_csv)

In [2]:
# Create dataframe
df = purchase_data
df.head()


Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [3]:
## Player Count Section ##

# Count unique players based on "SN" column 
unique_players = len(df['SN'].unique())

#Create data frame based on unique players to display output
unique_players_df = pd.DataFrame({"Total Players": [unique_players]})

In [4]:
## Purchase Analysis variables defined ##

# Variable to count number of unique items
unique_items = len(df['Item ID'].unique())

# Variable to calculate average price and format in dollars/cents
avg_price = df['Price'].mean()
avg_price = '${:.2f}'.format(avg_price)

# Variable to calculate the number of purchases using Item ID column
numb_purchases = len(df['Item ID'])

# Variable to calcuate total revenue bases on sum of Price column and format in dollars/cents
tot_rev = df['Price'].sum()
tot_rev = '${:,.2f}'.format(tot_rev)

In [5]:
# Create Data Frame for Purchasing Analysis and display purhcasing analysis by column
purchase_analysis_df = pd.DataFrame(
    {'Number of Unique Items':[unique_items], 
     'Average Price':[avg_price], 
     'Number of Purchases':[numb_purchases], 
     'Total Revenue':[tot_rev]})
purchase_analysis_df

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,179,$3.05,780,"$2,379.77"


In [6]:
## Gender Demographics Section ##

# Create data frame that includes unique players
gender_df = df.groupby('SN').first()

# Create variable to count unique players by gender
unique_gender_count = gender_df['Gender'].value_counts()

# Create variable to calculate gender by percent of total players and format in % points
player_percent = (unique_gender_count / unique_players) *100
player_percent = player_percent.map("{:.2f}%".format)

In [7]:
# Create Data Frame to hold and display Gender Demographics output
gender_demo_df = pd.DataFrame({
     'Total Count':unique_gender_count, 
     'Percentage of Players':player_percent, 
     })
gender_demo_df

Unnamed: 0,Total Count,Percentage of Players
Male,484,84.03%
Female,81,14.06%
Other / Non-Disclosed,11,1.91%


In [8]:
## Purchase Analysis by Gender Section ##

In [9]:
# Create variable using grouping to calculate purchase count by each gender group
count_group = df.groupby(['Gender']).count()
count_group_by_gender = count_group['Price']

In [10]:
# Create variable using grouping to calculate average price spent by gender group
avg_price_group = df.groupby(['Gender']).mean()
avg_price_by_gender = avg_price_group['Price']

In [11]:
# Create variable using grouping to calculate total price spend by gender group
tot_price_group = df.groupby(['Gender']).sum()
tot_price_by_gender = tot_price_group['Price']


In [12]:
# Create the data frame to store and output Purhcase Analysis by Gender
Purchase_by_Gender_df = pd.DataFrame({
    'Purchase Count': count_group_by_gender,
    'Average Purchase Price': avg_price_by_gender,
    'Total Purchase Value': tot_price_by_gender,
    'Avg Total Purchase per Person': tot_price_by_gender / unique_gender_count,
})

# Rename index column to 'Gender'
Purchase_by_Gender_df.index.name = 'Gender'

# Format purchase values to dollar formatting
Purchase_by_Gender_df['Average Purchase Price'] = Purchase_by_Gender_df['Average Purchase Price'].map('${:,.2f}'.format)
Purchase_by_Gender_df['Total Purchase Value'] = Purchase_by_Gender_df['Total Purchase Value'].map('${:,.2f}'.format)
Purchase_by_Gender_df['Avg Total Purchase per Person'] = Purchase_by_Gender_df['Avg Total Purchase per Person'].map('${:,.2f}'.format)
Purchase_by_Gender_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,$3.20,$361.94,$4.47
Male,652,$3.02,"$1,967.64",$4.07
Other / Non-Disclosed,15,$3.35,$50.19,$4.56


In [13]:
## Age Demographics Section ##

In [14]:
# Create the bins and grouping categoriesfor age groups
bins = [0,9,14,19,24,29,34,39,100]
bin_group = ['<10','10-14','15-19','20-24','25-29','30-34','35-39','40+']

In [15]:
# Categorize existing players into bins using pd.cut
df['Age Ranges'] = pd.cut(df['Age'],bins,labels=bin_group, include_lowest = True)

In [16]:
# Create variable reference original dataframe and remove duplicate player names
age_demos = df.drop_duplicates(subset=['SN'])

#Create variable to group remaining rows by Age Groups
age_demos = age_demos.groupby(['Age Ranges']).count()
age_count = age_demos['Age']


In [17]:
# Create data frame to list 
age_demo_df = pd.DataFrame({
    'Total Count': age_count,
    'Percentage of Players': age_count / unique_players * 100,
})

# Format Pecentage of Players column to percent
age_demo_df['Percentage of Players'] = age_demo_df['Percentage of Players'].map('{:.2f}%'.format)
age_demo_df

Unnamed: 0_level_0,Total Count,Percentage of Players
Age Ranges,Unnamed: 1_level_1,Unnamed: 2_level_1
<10,17,2.95%
10-14,22,3.82%
15-19,107,18.58%
20-24,258,44.79%
25-29,77,13.37%
30-34,52,9.03%
35-39,31,5.38%
40+,12,2.08%


In [18]:
## Purchase Analysis (Age) Section ##

In [19]:
# Create variables to return the total purchase county by age group
pur_by_age = df.groupby(['Age Ranges']).count()
pur_count_by_age = pur_by_age['Purchase ID']

In [20]:
# Create variables to calculate total purchase value by age group
tot_price_age = df.groupby(['Age Ranges']).sum()
tot_price_by_age = tot_price_age['Price']

In [21]:
# Create variables to calculate average purchase price by age group
avg_price_age = df.groupby(['Age Ranges']).mean()
avg_price_by_age = avg_price_age['Price']

In [22]:
# Create data frame to list Purchase Analysis by Age and apply dollar formatting to last three columns
purchase_by_age_df = pd.DataFrame({
    'Purchase Count': pur_count_by_age,
    'Average Purchase Price': avg_price_by_age,
    'Total Purchase Value': tot_price_by_age,
    'Avg Total Purchase per Person': tot_price_by_age / age_count.unique()
})

purchase_by_age_df['Average Purchase Price'] = purchase_by_age_df['Average Purchase Price'].map('${:,.2f}'.format)
purchase_by_age_df['Total Purchase Value'] = purchase_by_age_df['Total Purchase Value'].map('${:,.2f}'.format)
purchase_by_age_df['Avg Total Purchase per Person'] = purchase_by_age_df['Avg Total Purchase per Person'].map('${:,.2f}'.format)
purchase_by_age_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Age Ranges,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,23,$3.35,$77.13,$4.54
10-14,28,$2.96,$82.78,$3.76
15-19,136,$3.04,$412.89,$3.86
20-24,365,$3.05,"$1,114.06",$4.32
25-29,101,$2.90,$293.00,$3.81
30-34,73,$2.93,$214.00,$4.12
35-39,41,$3.60,$147.67,$4.76
40+,13,$2.94,$38.24,$3.19


In [23]:
## Top Spenders Section ##

In [24]:
# Create data frame and calculate purchase count, avg purchase, and total value by top 5 
top_spender_df = df.groupby('SN')['Price'].agg(['sum','count']).nlargest(5,'sum')
top_spender_df['Average Purchase Price'] = top_spender_df['sum'] / top_spender_df['count']

# Rename Columns
top_spender_df.columns = ['Total Purchase Value', 'Purchase Count', 'Average Purchase Price']

# Reorder Columns
top_spender_df = top_spender_df[['Purchase Count','Average Purchase Price','Total Purchase Value']]

# Format last two columns with dollar formatting
top_spender_df['Average Purchase Price'] = top_spender_df['Average Purchase Price'].map('${:,.2f}'.format)
top_spender_df['Total Purchase Value'] = top_spender_df['Total Purchase Value'].map('${:,.2f}'.format)
top_spender_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$3.79,$18.96
Idastidru52,4,$3.86,$15.45
Chamjask73,3,$4.61,$13.83
Iral74,4,$3.40,$13.62
Iskadarya95,3,$4.37,$13.10


In [25]:
## Most Popular Items Section ##

In [26]:
# Create data frame to hold most popular items using groupby
most_pop_df = df.groupby('Item Name')['Price'].agg(['sum', 'count']).nlargest(5,'sum')
most_pop_df['Item Price'] = most_pop_df['sum'] / most_pop_df['count']

# Rename Columns
most_pop_df.columns = ['Total Purchase Value','Purchase Count','Item Price']

# Reorder Columns
most_pop_df = most_pop_df[['Purchase Count','Item Price','Total Purchase Value']]

# Format last two columns with dollar formatting
most_pop_df['Total Purchase Value'] = most_pop_df['Total Purchase Value'].map('${:,.2f}'.format)
most_pop_df['Item Price'] = most_pop_df['Item Price'].map('${:,.2f}'.format)
most_pop_df


Unnamed: 0_level_0,Purchase Count,Item Price,Total Purchase Value
Item Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Final Critic,13,$4.61,$59.99
"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
Nirvana,9,$4.90,$44.10
Fiery Glass Crusader,9,$4.58,$41.22
Singed Scalpel,8,$4.35,$34.80


In [27]:
## Most Profitable Items Section ##

In [28]:
# Create data frame to hold most profitable items using groupby
df.index = df['Item ID']
most_profit_df = df.groupby('Item Name')['Price'].agg(['count', 'sum']).nlargest(5,'sum')
most_profit_df['Item Price'] = most_profit_df['sum'] / most_profit_df['count']

# Rename Columns
most_profit_df.columns = ['Purchase Count','Total Purchase Value','Item Price']

# Reorder Columns
most_profit_df = most_profit_df[['Purchase Count','Item Price','Total Purchase Value']]

# Format last two columns with dollar formatting
most_profit_df['Total Purchase Value'] = most_profit_df['Total Purchase Value'].map('${:,.2f}'.format)
most_profit_df['Item Price'] = most_profit_df['Item Price'].map('${:,.2f}'.format)
most_profit_df

Unnamed: 0_level_0,Purchase Count,Item Price,Total Purchase Value
Item Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Final Critic,13,$4.61,$59.99
"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
Nirvana,9,$4.90,$44.10
Fiery Glass Crusader,9,$4.58,$41.22
Singed Scalpel,8,$4.35,$34.80
