### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [65]:
# Import modules:
import pandas as pd
import csv
import os
import openpyxl

# Set parameters for input and output files:
path = os.path.join("Resources", "purchase_data.csv")
#output_path = os.path.join("Output", "purchase_df.xlsx")
purchase_df = pd.read_csv(path)

## Player Count

* Display the total number of players


In [66]:
# Find total players using value_counts() method, that returns the unique items. Use 'SN' column.
total_players = len(purchase_df['SN'].value_counts())
purchase_total_count_df = pd.DataFrame({"Total Players": [total_players] })
purchase_total_count_df

# Save to excel file:
output_file = purchase_total_count_df.to_excel("1_Purchase_Players_Total.xlsx")

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [67]:
# Activate excel writer and add new sheet for purchase total:
pd.ExcelWriter
writer = pd.ExcelWriter("1_Purchase_Players_Total.xlsx")
#purchase_total_count_df.to_excel(writer, index=False, sheet_name = 'Players')
#writer.save()

In [68]:
# Find number of unique items using Item ID column:
items_unique = len(purchase_df["Item ID"].value_counts())
# Create new dataframe to summarize the results:
purchase_summary = pd.DataFrame({"Number of Unique Items":[items_unique]})
purchase_summary

Unnamed: 0,Number of Unique Items
0,179


In [69]:
# Use sum() method to calculate the total revenue from the purchase data:
revenue_total = purchase_df["Price"].sum()
revenue_total

2379.77

In [70]:
# Use mean() method to output the average price in purchase data -> 'Price' columns:
price_average = purchase_df["Price"].mean()
price_average

3.0509871794871795

In [71]:
# Add new column to the Purchase Summary table to display the average price:
purchase_summary["Average Price"] = price_average
purchase_summary

Unnamed: 0,Number of Unique Items,Average Price
0,179,3.050987


In [72]:
# Use Purchase ID column to extract unique value counts to display the total of purchase entries:
purchases_total = purchase_df["Purchase ID"].value_counts()
purchase_summary["Number of Purchases"] = len(purchases_total)
purchase_summary

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases
0,179,3.050987,780


In [73]:
# Use sum() method to obtain the total revenue from 'Price' data:
revenue_total = purchase_df["Price"].sum()
purchase_summary["Total Revenue"] = revenue_total
purchase_summary

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,179,3.050987,780,2379.77


In [74]:
# Format the summary table:
purchase_summary["Average Price"] = pd.Series(["${:,.2f}".format(price_average)])
purchase_summary["Total Revenue"] = pd.Series(["${:,.2f}".format(revenue_total)])
purchase_summary

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,179,$3.05,780,"$2,379.77"


In [75]:
# Write purchase summary to the same excel file in a new sheet:
purchase_summary.to_excel(writer, index=False, sheet_name = 'Purchase Analysis - summary')
writer.save()

## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [76]:
# User value_count() method on purchase df to check how many players per gender:
players_by_gender = purchase_df["Gender"].value_counts()
players_groupby_gender_df = purchase_df.groupby("Gender", as_index=False)

#df.groupby(['col2','col3'], as_index=False).sum()


players_groupby_gender_df.head()
## Create a data frame to store the Total Count per gender
#gender_demographics_df = pd.DataFrame({"Total Count": (players_by_gender)})
#gender_demographics_df

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44
9,9,Chanosian48,35,Other / Non-Disclosed,136,Ghastly Adamantite Protector,3.58
15,15,Lisassa64,21,Female,98,"Deadline, Voice Of Subtlety",2.89
18,18,Reunasu60,22,Female,82,Nirvana,4.9
22,22,Siarithria38,38,Other / Non-Disclosed,24,Warped Fetish,3.81
38,38,Reulae52,10,Female,116,Renewed Skeletal Katana,4.18


In [77]:
players_unique = purchase_df["Gender"].value_counts()
#print(players_unique)

players_perc = purchase_df["Gender"].value_counts(normalize=True) * 100
players_perc
table_gender = pd.DataFrame({"Total Count": (players_by_gender)})
table_gender
#print(players_by_gender)
#players_percentage_by_gender = (players_by_gender / players_unique) * 100
#players_percentage_by_gender

Unnamed: 0,Total Count
Male,652
Female,113
Other / Non-Disclosed,15


In [78]:
players_perc = (purchase_df["Gender"].value_counts()/purchase_df["Gender"].count())
players_perc
table_gender["Percentage of Players"] = players_perc
table_gender
#print(players_perc.dtype)


Unnamed: 0,Total Count,Percentage of Players
Male,652,0.835897
Female,113,0.144872
Other / Non-Disclosed,15,0.019231


In [79]:
#table_gender['Percentage of Players'] = pd.Series(["{0:,.2f}%".format(players_perc)])
#table_gender

#table_gender['Percentage of Players'] = pd.columns(["${:,.2f}".format(players_perc)])
#table_gender['Percentage of Players'] = pd.Series(["{:.2f}%".format() table_gender['Percentage of Players']], index = table_gender.index)
table_gender["Percentage of Players"] = pd.Series(["{0:.2f}%".format(val * 100) for val in table_gender['Percentage of Players']], index = table_gender.index)
table_gender

Unnamed: 0,Total Count,Percentage of Players
Male,652,83.59%
Female,113,14.49%
Other / Non-Disclosed,15,1.92%


In [80]:
# example below:


## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [84]:
purchase_data_grouped_by_gender = purchase_df.groupby("Gender").count()
purchase_data_grouped_by_gender

Unnamed: 0_level_0,Purchase ID,SN,Age,Item ID,Item Name,Price
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Female,113,113,113,113,113,113
Male,652,652,652,652,652,652
Other / Non-Disclosed,15,15,15,15,15,15


In [85]:
# Redefine total players count again here:
players_total = purchase_df["SN"].count()
players_total

780

In [86]:
grouped_price_mean = purchase_df.groupby("Gender")["Price"].mean()
grouped_price_mean

Gender
Female                   3.203009
Male                     3.017853
Other / Non-Disclosed    3.346000
Name: Price, dtype: float64

In [87]:
test1_df = pd.DataFrame({"Average Purchase Price": grouped_price_mean})
test1_df

Unnamed: 0_level_0,Average Purchase Price
Gender,Unnamed: 1_level_1
Female,3.203009
Male,3.017853
Other / Non-Disclosed,3.346


In [89]:
grouped_purchase_count = purchase_df.groupby("Gender")["Purchase ID"].count()
grouped_purchase_count

Gender
Female                   113
Male                     652
Other / Non-Disclosed     15
Name: Purchase ID, dtype: int64

In [90]:
test1_df = pd.DataFrame({"Purchase Count": grouped_purchase_count,
                         "Average Purchase Price": grouped_price_mean,
                         })
test1_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,113,3.203009
Male,652,3.017853
Other / Non-Disclosed,15,3.346


In [91]:
grouped_purchase_value_total = purchase_df.groupby("Gender")["Price"].sum()
grouped_purchase_value_total

Gender
Female                    361.94
Male                     1967.64
Other / Non-Disclosed      50.19
Name: Price, dtype: float64

In [92]:
test1_df = pd.DataFrame({"Purchase Count": grouped_purchase_count,
                         "Average Purchase Price": grouped_price_mean,
                         "Total Purchase Value": grouped_purchase_value_total
                         })
test1_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,113,3.203009,361.94
Male,652,3.017853,1967.64
Other / Non-Disclosed,15,3.346,50.19


In [93]:
grouped_purchase_average_per_person = grouped_purchase_value_total / grouped_purchase_count
grouped_purchase_average_per_person

Gender
Female                   3.203009
Male                     3.017853
Other / Non-Disclosed    3.346000
dtype: float64

In [94]:
test1_df = pd.DataFrame({"Purchase Count": grouped_purchase_count,
                         "Average Purchase Price": grouped_price_mean,
                         "Total Purchase Value": grouped_purchase_value_total,
                         "Avg Total Purchase per Person": grouped_purchase_average_per_person
                         })
test1_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,3.203009,361.94,3.203009
Male,652,3.017853,1967.64,3.017853
Other / Non-Disclosed,15,3.346,50.19,3.346


In [95]:
test1_df["Average Purchase Price"] = test1_df["Average Purchase Price"].map("${:,.2f}".format)
test1_df["Total Purchase Value"] = test1_df["Total Purchase Value"].map("${:,.2f}".format)
test1_df["Avg Total Purchase per Person"] = test1_df["Avg Total Purchase per Person"].map("${:,.2f}".format)

#test1_df['Total Purchase Value'] = test1_df["Total Purchase Value"].map("${:,.2f}".format)
#test1_df['Avg Total Purchase per Person'] = test1_df["Avg Total Purchase per Person"].map("${:,.2f}".format)
#Purch_Anal_Gen["Total Purchase Value"] = Purch_Anal_Gen["Total Purchase Value"].map("${:.2f}".format)
#test1_df['Total Purchase Value'] = pd.Series(["${:,.2f}".format(grouped_purchase_value_total)])
#test1_df['Avg Total Purchase per Person'] = pd.Series(["${:,.2f}".format(grouped_purchase_average_per_person)])
test1_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,$3.20,$361.94,$3.20
Male,652,$3.02,"$1,967.64",$3.02
Other / Non-Disclosed,15,$3.35,$50.19,$3.35


In [96]:
# Example of the table below

## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [98]:
# Create a new dataframe object with only Age and SN columns...
demographics_df = purchase_df[["Age", "SN", "Price"]]
demographics_df

Unnamed: 0,Age,SN,Price
0,20,Lisim78,3.53
1,40,Lisovynya38,1.56
2,24,Ithergue48,4.88
3,24,Chamassasya86,3.27
4,23,Iskosia90,1.44
...,...,...,...
775,21,Aethedru70,3.54
776,21,Iral74,1.63
777,20,Yathecal72,3.46
778,7,Sisur91,4.19


In [100]:
# Might need to use .loc method to slice the columns, as later issues reported when trying to create df with bins.
demographics_df = purchase_df.loc[:, ["Age", "SN", "Price"]]
demographics_df = demographics_df.drop_duplicates()
demographics_df

Unnamed: 0,Age,SN,Price
0,20,Lisim78,3.53
1,40,Lisovynya38,1.56
2,24,Ithergue48,4.88
3,24,Chamassasya86,3.27
4,23,Iskosia90,1.44
...,...,...,...
775,21,Aethedru70,3.54
776,21,Iral74,1.63
777,20,Yathecal72,3.46
778,7,Sisur91,4.19


In [101]:
# Check the counts in each of the columns to see if they are the same:
demos_count = demographics_df.count()
demos_count

Age      779
SN       779
Price    779
dtype: int64

In [102]:
# 

In [103]:
# Note: the two columns are aligned (above, both counts 780).

In [104]:
# Use pd.cut() method to split age data into bins:
bins_age = [0, 9, 14, 19, 24, 29, 34, 39, 100]
bins_age_labels = ['<10', '10-14', '15-19', '20-24', '25-29', '30-34', '35-39', '40+']
pd.cut(demographics_df['Age'], bins=bins_age, labels=bins_age_labels).head()

0    20-24
1      40+
2    20-24
3    20-24
4    20-24
Name: Age, dtype: category
Categories (8, object): [<10 < 10-14 < 15-19 < 20-24 < 25-29 < 30-34 < 35-39 < 40+]

In [105]:
# Create a dataframe based on the cut demographics:
demographics_df['Age Category'] = pd.cut(demographics_df["Age"], bins=bins_age, labels=bins_age_labels)
demographics_df.head()

Unnamed: 0,Age,SN,Price,Age Category
0,20,Lisim78,3.53,20-24
1,40,Lisovynya38,1.56,40+
2,24,Ithergue48,4.88,20-24
3,24,Chamassasya86,3.27,20-24
4,23,Iskosia90,1.44,20-24


In [106]:
# Create a sorted df based on the demographics by age bins df

# indexed_age_df = pd.DataFrame(purchase_data
# grouped_age_bins = demographics_df.groupby('Age')
# grouped_age_bins.head(50)
                              
# total_players = len(purchase_data['SN'].value_counts())
# table_players_total = pd.DataFrame({"Total Players": [total_players] })
# table_players_total
                              
                              
# purchase_data.groupby('Gender')['Price'].sum()

test2_df = pd.DataFrame({'Age': demographics_df['Age'],
                           'Name': demographics_df['SN'],
                           'Age Category': demographics_df['Age Category'],
                           'Price': demographics_df['Price']})
test2_df_indexed = test2_df.set_index('Age Category')
test2_df_indexed

Unnamed: 0_level_0,Age,Name,Price
Age Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
20-24,20,Lisim78,3.53
40+,40,Lisovynya38,1.56
20-24,24,Ithergue48,4.88
20-24,24,Chamassasya86,3.27
20-24,23,Iskosia90,1.44
...,...,...,...
20-24,21,Aethedru70,3.54
20-24,21,Iral74,1.63
20-24,20,Yathecal72,3.46
<10,7,Sisur91,4.19


In [107]:
test2_df = pd.DataFrame({'Age': demographics_df['Age'],
                         'Name': demographics_df['SN'],
                         'Age Category': demographics_df['Age Category'],
                         'Price': demographics_df['Price']})
test2_df_indexed = test2_df.set_index('Age Category')
test2_df_indexed

Unnamed: 0_level_0,Age,Name,Price
Age Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
20-24,20,Lisim78,3.53
40+,40,Lisovynya38,1.56
20-24,24,Ithergue48,4.88
20-24,24,Chamassasya86,3.27
20-24,23,Iskosia90,1.44
...,...,...,...
20-24,21,Aethedru70,3.54
20-24,21,Iral74,1.63
20-24,20,Yathecal72,3.46
<10,7,Sisur91,4.19


In [108]:
# Group data frame by 'Age Category'
age_bins_count_df = demographics_df.groupby("Age Category")
age_bins_count_df.head()

Unnamed: 0,Age,SN,Price,Age Category
0,20,Lisim78,3.53,20-24
1,40,Lisovynya38,1.56,40+
2,24,Ithergue48,4.88,20-24
3,24,Chamassasya86,3.27,20-24
4,23,Iskosia90,1.44,20-24
5,22,Yalae81,3.61,20-24
6,36,Itheria73,2.18,35-39
9,35,Chanosian48,3.58,35-39
14,35,Saesrideu94,4.86,35-39
19,30,Chamalo71,4.64,30-34


In [109]:
# Looks like need to group first to create iterable object:


In [110]:
# Check how many players in each category, using count() function:
bin_counts = age_bins_count_df['Age'].unique()
bin_counts

Age Category
<10                     [7, 8, 9]
10-14        [11, 10, 12, 13, 14]
15-19        [19, 18, 17, 15, 16]
20-24        [20, 24, 23, 22, 21]
25-29        [29, 27, 25, 26, 28]
30-34        [30, 33, 32, 34, 31]
35-39        [36, 35, 38, 37, 39]
40+      [40, 44, 41, 42, 43, 45]
Name: Age, dtype: object

In [111]:
# Remember total players
players_bin_percentage = []

players_unique = purchase_data["SN"].unique()
players_unique
##players_unique = len(purchase_data["SN"].unique())
#players_unique

NameError: name 'purchase_data' is not defined

In [113]:
# Create a loop to calculate percentage per every age group/bin
# and put the new array of percentage for each group in the list
players_total = len(purchase_df['SN'].count())
print(players_total)
players_bin_percentage = []
for i in range (len(bin_counts)):
    players_bin_percentage.append(bin_counts[i]*100/players_total)
players_bin_percentage

TypeError: object of type 'numpy.int64' has no len()

In [None]:
# Create new summary table to display the Age Demographics:
test3_df = pd.DataFrame({'Total players': bin_counts,
                         'Percentage of Players': players_bin_percentage
                        })
test3_df


#pd.DataFrame({'Age': demographics_df['Age'],
#                         'Name': demographics_df['SN'],
#                         'Age Category': demographics_df['Age Category'],
#                         'Average Price': players_bin_percentage})
#test3_df_indexed = test3_df.set_index('Age Category')
#test3_df_indexed

## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [None]:
writer.close()