### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [3]:
# Import modules:
import pandas as pd
import csv
import os
import openpyxl

# Set parameters for input and output files:
path = os.path.join("Resources", "purchase_data.csv")
purchase_data = pd.read_csv(path)


## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [4]:
purchase_data_grouped_by_gender = purchase_data.groupby("Gender").count()
purchase_data_grouped_by_gender

Unnamed: 0_level_0,Purchase ID,SN,Age,Item ID,Item Name,Price
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Female,113,113,113,113,113,113
Male,652,652,652,652,652,652
Other / Non-Disclosed,15,15,15,15,15,15


In [5]:
# Redefine total players count again here:
players_total = purchase_data["SN"].count()
players_total

780

In [6]:
grouped_price_mean = purchase_data.groupby("Gender")["Price"].mean()
grouped_price_mean

Gender
Female                   3.203009
Male                     3.017853
Other / Non-Disclosed    3.346000
Name: Price, dtype: float64

In [16]:
purchasing_analysis_by_gender = pd.DataFrame({"Average Purchase Price": grouped_price_mean})
purchasing_analysis_by_gender

Unnamed: 0_level_0,Average Purchase Price
Gender,Unnamed: 1_level_1
Female,3.203009
Male,3.017853
Other / Non-Disclosed,3.346


In [17]:
grouped_purchase_count = purchase_data.groupby("Gender")["Purchase ID"].count()
grouped_purchase_count

Gender
Female                   113
Male                     652
Other / Non-Disclosed     15
Name: Purchase ID, dtype: int64

In [18]:
purchasing_analysis_by_gender = pd.DataFrame({"Purchase Count": grouped_purchase_count,
                         "Average Purchase Price": grouped_price_mean,
                         })
purchasing_analysis_by_gender

Unnamed: 0_level_0,Purchase Count,Average Purchase Price
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,113,3.203009
Male,652,3.017853
Other / Non-Disclosed,15,3.346


In [19]:
grouped_purchase_value_total = purchase_data.groupby("Gender")["Price"].sum()
grouped_purchase_value_total

Gender
Female                    361.94
Male                     1967.64
Other / Non-Disclosed      50.19
Name: Price, dtype: float64

In [20]:
purchasing_analysis_by_gender = pd.DataFrame({"Purchase Count": grouped_purchase_count,
                         "Average Purchase Price": grouped_price_mean,
                         "Total Purchase Value": grouped_purchase_value_total
                         })
purchasing_analysis_by_gender

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,113,3.203009,361.94
Male,652,3.017853,1967.64
Other / Non-Disclosed,15,3.346,50.19


In [21]:
grouped_purchase_average_per_person = grouped_purchase_value_total / grouped_purchase_count
grouped_purchase_average_per_person

Gender
Female                   3.203009
Male                     3.017853
Other / Non-Disclosed    3.346000
dtype: float64

In [22]:
purchasing_analysis_by_gender = pd.DataFrame({"Purchase Count": grouped_purchase_count,
                         "Average Purchase Price": grouped_price_mean,
                         "Total Purchase Value": grouped_purchase_value_total,
                         "Avg Total Purchase per Person": grouped_purchase_average_per_person
                         })
purchasing_analysis_by_gender

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,3.203009,361.94,3.203009
Male,652,3.017853,1967.64,3.017853
Other / Non-Disclosed,15,3.346,50.19,3.346


In [23]:
# Format the values for display:
purchasing_analysis_by_gender["Average Purchase Price"] = purchasing_analysis_by_gender["Average Purchase Price"].map("${:,.2f}".format)
purchasing_analysis_by_gender["Total Purchase Value"] = purchasing_analysis_by_gender["Total Purchase Value"].map("${:,.2f}".format)
purchasing_analysis_by_gender["Avg Total Purchase per Person"] = purchasing_analysis_by_gender["Avg Total Purchase per Person"].map("${:,.2f}".format)
purchasing_analysis_by_gender

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,$3.20,$361.94,$3.20
Male,652,$3.02,"$1,967.64",$3.02
Other / Non-Disclosed,15,$3.35,$50.19,$3.35


In [24]:
# Save to excel file:
output_file = purchasing_analysis_by_gender.to_excel("3_Purchasing_Analysis_by_Gender.xlsx")
pd.ExcelWriter
writer = pd.ExcelWriter("3_Purchasing_Analysis_by_Gender.xlsx")
# Write purchase summary to the same excel file in a new sheet:
purchasing_analysis_by_gender.to_excel(writer, sheet_name = 'Purchasing_Analysis_by_Gender')
writer.save()

In [None]:
# Example of the table below

## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [None]:
# Create a new dataframe object with only Age and SN columns...
demographics_df = purchase_data[["Age", "SN", "Price"]]
demographics_df

In [None]:
# Might need to use .loc method to slice the columns, as later issues reported when trying to create df with bins.
demographics_df = purchase_data.loc[:, ["Age", "SN", "Price"]]
demographics_df = demographics_df.drop_duplicates()
demographics_df

In [None]:
# Check the counts in each of the columns to see if they are the same:
demos_count = demographics_df.count()
demos_count

In [None]:
# 

In [None]:
# Note: the two columns are aligned (above, both counts 780).

In [None]:
# Use pd.cut() method to split age data into bins:
bins_age = [0, 9, 14, 19, 24, 29, 34, 39, 100]
bins_age_labels = ['<10', '10-14', '15-19', '20-24', '25-29', '30-34', '35-39', '40+']
pd.cut(demographics_df['Age'], bins=bins_age, labels=bins_age_labels).head()

In [None]:
# Create a dataframe based on the cut demographics:
demographics_df['Age Category'] = pd.cut(demographics_df["Age"], bins=bins_age, labels=bins_age_labels)
demographics_df.head()

In [None]:
# Create a sorted df based on the demographics by age bins df

# indexed_age_df = pd.DataFrame(purchase_data
# grouped_age_bins = demographics_df.groupby('Age')
# grouped_age_bins.head(50)
                              
# total_players = len(purchase_data['SN'].value_counts())
# table_players_total = pd.DataFrame({"Total Players": [total_players] })
# table_players_total
                              
                              
# purchase_data.groupby('Gender')['Price'].sum()

test2_df = pd.DataFrame({'Age': demographics_df['Age'],
                           'Name': demographics_df['SN'],
                           'Age Category': demographics_df['Age Category'],
                           'Price': demographics_df['Price']})
test2_df_indexed = test2_df.set_index('Age Category')
test2_df_indexed

In [None]:
test2_df = pd.DataFrame({'Age': demographics_df['Age'],
                         'Name': demographics_df['SN'],
                         'Age Category': demographics_df['Age Category'],
                         'Price': demographics_df['Price']})
test2_df_indexed = test2_df.set_index('Age Category')
test2_df_indexed

In [None]:
# Group data frame by 'Age Category'
age_bins_count_df = demographics_df.groupby("Age Category")
age_bins_count_df.head()

In [None]:
# Looks like need to group first to create iterable object:


In [None]:
# Check how many players in each category, using count() function:
bin_counts = age_bins_count_df['Age'].unique()
bin_counts

In [None]:
# Remember total players
players_bin_percentage = []

players_unique = purchase_data["SN"].unique()
players_unique
##players_unique = len(purchase_data["SN"].unique())
#players_unique

In [None]:
# Create a loop to calculate percentage per every age group/bin
# and put the new array of percentage for each group in the list
players_total = len(purchase_data['SN'].count())
print(players_total)
players_bin_percentage = []
for i in range (len(bin_counts)):
    players_bin_percentage.append(bin_counts[i]*100/players_total)
players_bin_percentage

In [None]:
# Create new summary table to display the Age Demographics:
test3_df = pd.DataFrame({'Total players': bin_counts,
                         'Percentage of Players': players_bin_percentage
                        })
test3_df


#pd.DataFrame({'Age': demographics_df['Age'],
#                         'Name': demographics_df['SN'],
#                         'Age Category': demographics_df['Age Category'],
#                         'Average Price': players_bin_percentage})
#test3_df_indexed = test3_df.set_index('Age Category')
#test3_df_indexed

## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [None]:
writer.close()