### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [210]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load 
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data_df = pd.read_csv(file_to_load, low_memory=False)   # Create (purchase_data_df)
purchase_data_disp = pd.DataFrame(purchase_data_df)  # Create display 
purchase_data_disp.head() # Show top 5 rows of data


Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [211]:
# Look at actual data file
print(purchase_data_df) 
print(type(purchase_data_df))  # Note data type and column

     Purchase ID               SN  Age                 Gender  Item ID  \
0              0          Lisim78   20                   Male      108   
1              1      Lisovynya38   40                   Male      143   
2              2       Ithergue48   24                   Male       92   
3              3    Chamassasya86   24                   Male      100   
4              4        Iskosia90   23                   Male      131   
5              5          Yalae81   22                   Male       81   
6              6        Itheria73   36                   Male      169   
7              7      Iskjaskst81   20                   Male      162   
8              8        Undjask33   22                   Male       21   
9              9      Chanosian48   35  Other / Non-Disclosed      136   
10            10        Inguron55   23                   Male       95   
11            11     Haisrisuir60   23                   Male      162   
12            12     Saelaephos52   21

## Player Count

* Display the total number of players


In [212]:
# Get player data
playercount = len(purchase_data_df["SN"].unique()) # How many unique playernames

# Show results
playercount_disp = pd.DataFrame({"Total SN":[playercount]}) 
playercount_disp

Unnamed: 0,Total SN
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [213]:
# Calculate number from purchase data
Item_ID_count = len(purchase_data_df['Item ID'].unique()) # Total Items
Price_Avg = purchase_data_df['Price'].mean() # Avg Price per unique item
Number_Purchase = len(purchase_data_df) # Number of purchase
Total_Revenue = purchase_data_df['Price'].sum() # Total revenue

# Show result in table
summary_disp = pd.DataFrame({'Total Item ID ':[Item_ID_count], 'Average_Price': [Price_Avg], 'Total Purchase': [Number_Purchase], 'Total Revenue': [Total_Revenue]}).style.format({'Total Revenue':"{:.2f}"})
summary_disp

Unnamed: 0,Total Item ID,Average_Price,Total Purchase,Total Revenue
0,183,3.05099,780,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [214]:
# Gender Demographics
# Use gender count data in series 
gender_data = purchase_data_df[['Purchase ID', 'Gender']]
gender_count = gender_data['Gender'].value_counts()
gender_tot = gender_count.sum()
percent = gender_count / gender_tot *100

# Show results
gender_result = pd.DataFrame({'count': gender_count, 'percent':percent})
gender_result.index.name = None
gender_result.sort_values(['count'], ascending = False).style.format({'percent':"{:.2f}"})

Unnamed: 0,count,percent
Male,652,83.59
Female,113,14.49
Other / Non-Disclosed,15,1.92



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender

* Create a summary data frame to hold the results

* Optional: give the displayed data cleaner formatting

* Display the summary data frame

In [215]:
# Calculate series
gender_df = purchase_data_df[['Purchase ID','Gender','Price','SN']]
grouped_gender_df = gender_df.groupby(['Gender'])
count = purchase_data_df['Gender'].value_counts()
avg_price = grouped_gender_df['Price'].mean()
price_total = grouped_gender_df['Price'].sum()
# See result table at end

In [216]:
# Create DataFrames
SN_df = purchase_data_df[['SN','Price']]
Gen_df = purchase_data_df[['Gender','SN']]

# Merge two dataframes using an inner join
merge_table = pd.merge(SN_df, Gen_df, how='right', on='SN')
merge_table 

# Stuck can't group into average per Gender
# SN_Gen = merge_table.groupby['Gender']
# SN_price = SN_GEN['Price'].mean()

Unnamed: 0,SN,Price,Gender
0,Lisim78,3.53,Male
1,Lisim78,4.74,Male
2,Lisim78,1.75,Male
3,Lisim78,3.53,Male
4,Lisim78,4.74,Male
5,Lisim78,1.75,Male
6,Lisim78,3.53,Male
7,Lisim78,4.74,Male
8,Lisim78,1.75,Male
9,Lisovynya38,1.56,Male


In [218]:
# Show result 
gender_result1 = pd.DataFrame({'Purchases': count, 'Average Price': avg_price, 'Total Price': price_total, 'Price per Person':0})
# gender_result1.sort_values(['count'], ascending = False).style.format({'avg_price':"{:.2f}"})
gender_result1

Unnamed: 0,Purchases,Average Price,Total Price,Price per Person
Female,113,3.203009,361.94,0
Male,652,3.017853,1967.64,0
Other / Non-Disclosed,15,3.346,50.19,0


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [None]:
age_data = purchase_data_df[['Purchase ID', 'Gender']]
age_df
group_names = ['<10', '10-13', '14-17', '18-21', '22-25', '26-29', '30-33', '34-37', '38-40', '>40']
bins = [0, 9, 13, 17, 21, 25, 29, 33, 37, 40, 80]
age_demo_grp = age_df.groupby(pd.cut(age_df["Age"], bins, labels=group_names))


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

