### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

In [2]:
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count

* Display the total number of players


In [3]:
player_count = len(purchase_data["SN"].unique())


In [4]:
sum_table = pd.DataFrame({"Player Count": [player_count]})

In [5]:
sum_table 

Unnamed: 0,Player Count
0,576


# 
Purchasing Analysis (Total)

Run basic calculations to obtain number of unique items, average price, etc.

Create a summary data frame to hold the results

Optional: give the displayed data cleaner formatting

Display the summary data frame

In [6]:
unique_items = len(purchase_data["Item ID"].unique())

unique_items

183

In [7]:
#finding average = Total Price / Total unique_items
average_price = purchase_data["Price"].mean()

average_price_c = format(average_price,",.2f")
average_price_currency= "$" + average_price_c
average_price_currency

'$3.05'

In [None]:
purchase_number = purchase_data["Purchase ID"].count()
purchase_number

In [None]:
total_revenue = purchase_data["Price"].sum()

total_revenue_c = format(total_revenue,",.2f")

total_revenue_currency= "$" + total_revenue_c
total_revenue_currency

In [None]:
purchasing_analysis_table = pd.DataFrame({"Number of Unique Items": [unique_items],
                                          "Average Price": [average_price_currency ],
                                          "Number of Purchases": [purchase_number],
                                          "Total Revenue": [total_revenue_currency ]})
purchasing_analysis_table 

## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [None]:
unique_player_list= purchase_data.drop_duplicates(subset='SN', keep= 'first')
print(unique_player_list)






In [None]:
## Gender Demographics

In [None]:
* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed

In [None]:
male_count=unique_player_list[unique_player_list['Gender'] == 'Male'].count().iloc[0]
male_count

In [None]:
female_count=unique_player_list[unique_player_list['Gender'] == 'Female'].count().iloc[0]
female_count

In [None]:
non_disclosed_count= unique_player_list[unique_player_list['Gender'] == 'Other / Non-Disclosed'].count().iloc[0]
non_disclosed_count

In [None]:
male_percentage=[round((male_count/player_count)*100)]
male_percentage

In [None]:
female_percentage = [round((female_count/player_count)*100)]
female_percentage

In [None]:
non_disclosed_percentage = [round((non_disclosed_count/player_count)*100)]
non_disclosed_percentage 

In [None]:
gender_count= [male_count,female_count, non_disclosed_count]
gender_percent = [male_percentage,female_percentage, non_disclosed_percentage]
table_column_labels = ['Total Count', 'Percentage of Players']
table_row_labels =['Male','Female','Other / Non-Disclosed']


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [None]:
gender_demographics_table= pd.DataFrame(dict(zip(table_column_labels,[gender_count, gender_percent])),index=table_row_labels,columns= table_column_labels)
gender_demographics_table

In [None]:
## Purchasing Analysis (Gender)

In [None]:
* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [None]:


gender_row_labels= ['Male','Female','Other / Non-Disclosed']


In [None]:
purchase_count = purchase_data.groupby('Gender').count()['Price']
avg_purchase_price = purchase_data.groupby('Gender').mean()['Price']
total_purchase_value= purchase_data.groupby(['Gender','SN']).mean()['Price']
avg_total_purchase= purchase_data.groupby(['Gender','SN']).sum()['Price']

info_by_gender= {}

for gender in gender_row_labels:
    gender_count=purchase_count[gender]
    gender_avg=(round(avg_purchase_price[gender],2))
    gender_total=(round(total_purchase_value[gender].sum(),2))
    gender_avg_total=(round(avg_total_purchase[gender].mean(),2))
    
    info_by_gender[gender]={'Purchase Count':gender_count, 'Average Purchase Price':gender_avg,'Total Purchase Value':gender_total,'Avg Total Purchase per Person':gender_avg_total}

table =pd.DataFrame(info_by_gender)
t_table= table.transpose()
t_table.index.name = 'Gender'
t_table


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [25]:
#choose bin labels
bin_ages = ['<10','10-14', '15-19', '20-24', '25-29', '30-34', '35-39', '40+']

#set df criteria for bin data and make sure to associate with chosen bin labels
purchase_data['Sorted groups'] = pd.DataFrame(pd.cut(purchase_data['Age'], bins=[0,9,14,19,24,29,34,39,999], labels=bin_ages))





In [26]:
#how to identify indiviual players by header unique exclusion
age_group_sorted = purchase_data.groupby(['Sorted groups', 'SN']).count()['Purchase ID']

age_group_sorted = pd.Series([len(age_group_sorted.loc[players]) for players in bin_ages], index=bin_ages)

#refer to deduped player count formula for unique player age percentage
age_group_percentages = round((age_group_sorted/player_count)*100,2)

#crate column names and assort the data that will fall underneath 
column_names = ['Total Count', 'Percentage of Players']
column_populated_data = [age_group_sorted, age_group_percentages]

In [27]:
#set up df with dict in zip
age_demographics = pd.DataFrame(dict(zip(column_names, column_populated_data)), index=bin_ages)

age_demographics

Unnamed: 0,Total Count,Percentage of Players
<10,17,2.95
10-14,22,3.82
15-19,107,18.58
20-24,258,44.79
25-29,77,13.37
30-34,52,9.03
35-39,31,5.38
40+,12,2.08


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

