### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [35]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count

* Display the total number of players


In [2]:
#DataFrame.drop_duplicates(subset=None, keep='first', inplace=False)[source]
#Total players in purchase data
df_unique_players = purchase_data.drop_duplicates(subset="SN", keep='first', inplace=False)
df_unique_players
total_unique_players = len(df_unique_players.SN)


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
#Number of unique items
item_array = purchase_data['Item ID'].unique()
item_i = item_array.size
item_i

183

In [4]:
#Average Price of item 
df = purchase_data[['Item ID', 'Price']]
df_unique = df.drop_duplicates(subset='Item ID')
price_sum = df_unique['Price'].sum()
Avg_unique_item_price = price_sum/item_i
Avg_unique_item_price


3.0433879781420767

In [5]:
#total purchases
total_purchases_i = purchase_data['Purchase ID'].count()
total_purchases_i

780

In [6]:
#Total revenue
total_rev = purchase_data['Price'].sum()
total_rev


2379.77

In [57]:
#Purchase Summary DF ---> DROP INDEX COL!
Purchase_Summary_Raw = {
    "Total_Players": [total_unique_players],
    "Items": [item_i],
    "Avg_Item_Cost": [Avg_unique_item_price],
    "Total_Purchases": [total_purchases_i],
    "Total_Revenue": [total_rev]  
}
Purchase_Summary_df = pd.DataFrame(Purchase_Summary_Raw, columns=["Total_Players", "Items", "Avg_Item_Cost", "Total_Purchases", "Total_Revenue"])
Purchase_Summary_dfr = Purchase_Summary_df.round(decimals=2)
Purchase_Summary_dfr

Unnamed: 0,Total_Players,Items,Avg_Item_Cost,Total_Purchases,Total_Revenue
0,576,183,3.04,780,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [8]:
gender_series = df_unique_players['Gender'].value_counts()
gender_series

Male                     484
Female                    81
Other / Non-Disclosed     11
Name: Gender, dtype: int64

In [9]:
male_count = gender_series.iloc[0]
female_count = gender_series.iloc[1]
other_count = gender_series.iloc[2]

display(
    male_count,
    female_count,
    other_count,
    total_unique_players
)

484

81

11

576

In [10]:
male_percent = 100*(male_count / total_unique_players)
female_percent = 100*(female_count / total_unique_players)
other_percent = 100*(other_count / total_unique_players)

display(
    male_percent,
    female_percent,
    other_percent
)

84.02777777777779

14.0625

1.9097222222222223

In [54]:
Gender_Summary_Raw = {
    "Gender": ["Male", "Female", "Other"],
    "Number_of_Players": [male_count, female_count, other_count],
    "Population_Percentage": [male_percent, female_percent, other_percent]
}
Gender_Summary_df = pd.DataFrame(Gender_Summary_Raw, columns=["Gender", "Number_of_Players", "Population_Percentage"])
Gender_Summary_dfr = Gender_Summary_df.round(decimals=2)
Gender_Summary_dfr

Unnamed: 0,Gender,Number_of_Players,Population_Percentage
0,Male,484,84.03
1,Female,81,14.06
2,Other,11,1.91



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [36]:
#gendered dataframes
male_purchase = purchase_data[purchase_data['Gender'] == 'Male']
female_purchase = purchase_data[purchase_data['Gender'] == 'Female']
other_purchase = purchase_data[purchase_data['Gender'] == 'Other / Non-Disclosed']

In [37]:
#gendered purchase counts
male_purchase_count = len(male_purchase['Purchase ID'])
female_purchase_count = len(female_purchase['Purchase ID'])
other_purchase_count = len(other_purchase['Purchase ID'])

display(
    male_purchase_count,
    female_purchase_count,
    other_purchase_count
)

652

113

15

In [39]:
#gendered avg purchase price
    #gendered rev
male_revenue = male_purchase['Price'].sum()
female_revenue = female_purchase['Price'].sum()
other_revenue = other_purchase['Price'].sum()

male_avg_pur_val = male_revenue / male_purchase_count
female_avg_pur_val = female_revenue / female_purchase_count
other_avg_pur_val = other_revenue / other_purchase_count

display(
    male_avg_pur_val,
    female_avg_pur_val,
    other_avg_pur_val   
)


3.0178527607361967

3.203008849557522

3.3459999999999996

In [45]:
#gendered avg purchase number (puchase count / players)
male_avg_pur_count = male_purchase_count / male_count
female_avg_pur_count = female_purchase_count / female_count
other_avg_pur_count = other_purchase_count / other_count

display(
    male_avg_pur_count,
    female_avg_pur_count,
    other_avg_pur_count
)

1.3471074380165289

1.3950617283950617

1.3636363636363635

In [55]:
#Gender Dataframe
Gender_Purchases_Raw = {
    "Gender": ["Male", "Female", "Other"],
    "Purchases_Made": [male_purchase_count, female_purchase_count, other_purchase_count],
    "Average_Number_Of_Purchases": [male_avg_pur_count, female_avg_pur_count, other_avg_pur_count],
    "Average_Cost_Per_Purchase": [male_avg_pur_val, female_avg_pur_val, other_avg_pur_val],
    "Total_Revenue": [male_revenue, female_revenue, other_revenue]
}
Gender_Purchases_df = pd.DataFrame(Gender_Purchases_Raw, columns=["Gender", "Purchases_Made", "Average_Number_Of_Purchases", "Average_Cost_Per_Purchase", "Total_Revenue"])
Gender_Purchases_dfr =Gender_Purchases_df.round(decimals = 2)
Gender_Purchases_dfr

Unnamed: 0,Gender,Purchases_Made,Average_Number_Of_Purchases,Average_Cost_Per_Purchase,Total_Revenue
0,Male,652,1.35,3.02,1967.64
1,Female,113,1.4,3.2,361.94
2,Other,15,1.36,3.35,50.19


In [47]:
# AGE DEMOGRAPHICS

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

