### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count

* Display the total number of players


In [2]:
player_count = len(purchase_data['SN'].unique())
#player_count
print(f'The total number of unique players is {player_count}')

The total number of unique players is 576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
unique_items = len(purchase_data['Item Name'].unique())
#unique_items
avg_price = round(purchase_data['Price'].mean(),2)
#avg_price
num_purchases = (purchase_data['Purchase ID'].iloc[-1]) + 1
#num_purchases
revenue = purchase_data['Price'].sum()
#revenue

output_purchasing = pd.DataFrame([{'Number of Unique Items': unique_items, 'Average Price' : avg_price, 'Number of Purchases' : num_purchases, 'Total Revenue' : revenue}])
output_purchasing['Average Price'] = output_purchasing['Average Price'].map('${:.2f}'.format)
output_purchasing['Total Revenue'] = output_purchasing['Total Revenue'].map('${:.2f}'.format)
output_purchasing

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,179,$3.05,780,$2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [4]:
gender_counts = purchase_data['SN'].unique()
unique_users = purchase_data.drop_duplicates('SN')
gender_counts_male = unique_users['Gender'].value_counts()['Male']
gender_counts_female = unique_users['Gender'].value_counts()['Female']
gender_counts_other = unique_users['Gender'].value_counts()['Other / Non-Disclosed']

total_gender = len(unique_users['Gender'])

#print(gender_counts_male)
#print(gender_counts_female)
#print(gender_counts_other)
#print(total_gender)

per_male = (gender_counts_male / total_gender) * 100
per_female = (gender_counts_female / total_gender) * 100
per_other = (gender_counts_other / total_gender) * 100

output_gender_demo = pd.DataFrame({'Gender':['Male', 'Female', 'Other / Non-Disclosed'], 
                                  'Total Count': [gender_counts_male, gender_counts_female, gender_counts_other],
                                  'Percentage of Players': [per_male, per_female, per_other]})

output_gender_demo['Percentage of Players'] = output_gender_demo['Percentage of Players'].map('{:.2f}%'.format)

output_gender_demo

Unnamed: 0,Gender,Total Count,Percentage of Players
0,Male,484,84.03%
1,Female,81,14.06%
2,Other / Non-Disclosed,11,1.91%



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [5]:
purchase_counts = purchase_data.groupby('Gender')['Purchase ID'].count()
avg_purchase = purchase_data.groupby('Gender')['Price'].mean()
total_value = purchase_data.groupby('Gender')['Price'].sum()

merged_df1 = pd.merge(purchase_counts, avg_purchase, how='inner', on='Gender')
#merged_df1
merged_df2 = pd.merge(merged_df1, total_value, how='inner', on='Gender')
#merged_df2

renamed_df = merged_df2.rename(columns={'Purchase ID': 'Purchase Count', 'Price_x': 'Average Purchase Price', 'Price_y': 'Total Purchase Value'})

renamed_df['Average Purchase Price'] = renamed_df['Average Purchase Price'].map('${:.2f}'.format)
renamed_df['Total Purchase Value'] = renamed_df['Total Purchase Value'].map('${:.2f}'.format)

#print(purchase_counts)
#print(avg_purchase)
#print(total_value)
#print(unique_avg)

renamed_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,113,$3.20,$361.94
Male,652,$3.02,$1967.64
Other / Non-Disclosed,15,$3.35,$50.19


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [6]:
bins = [0,9,14,19,24,29,34,39,100]
group_names = ['<10', '10-14','15-19','20-24','25-29','30-34','35-39','40+']

purchase_data['age bins'] = pd.cut(purchase_data['Age'],bins, labels=group_names, include_lowest=False)
#purchase_data

non_repeating = purchase_data.drop_duplicates(subset='SN', keep='last')
total_bins = non_repeating.groupby('age bins').count()
total_players = len(gender_counts)
total_bins_renamed = total_bins.rename(columns={'Purchase ID': 'Total Count'})
total_bins_renamed['Percentage of Players'] = (total_bins_renamed['Total Count'] / total_players) * 100
total_bins_renamed['Percentage of Players'] = total_bins_renamed['Percentage of Players'].map('{:.2f}%'.format)
total_bins_final = total_bins_renamed.drop(['SN','Age','Gender','Item ID','Item Name','Price'], axis=1)
total_bins_final

Unnamed: 0_level_0,Total Count,Percentage of Players
age bins,Unnamed: 1_level_1,Unnamed: 2_level_1
<10,17,2.95%
10-14,22,3.82%
15-19,107,18.58%
20-24,258,44.79%
25-29,77,13.37%
30-34,52,9.03%
35-39,31,5.38%
40+,12,2.08%


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [63]:
purchase_data['Age Ranges'] = pd.cut(purchase_data['Age'],bins, labels=group_names, include_lowest=False)

purchase_data['Purchase Count'] = purchase_data['Purchase ID'].count()
grouped_data = purchase_data.groupby('Age Ranges')
purchase_count = grouped_data.count()
print(purchase_count)
avg_purchase_price = grouped_data.mean()['Price']
print(avg_purchase_price)
total_pur_val = grouped_data.sum()['Price']
print(total_pur_val)

purchase_analysis = pd.DataFrame()
purchase_analysis['Purchase Count'] = purchase_data.groupby('Age Ranges')['Age'].count()
purchase_analysis['AVG Purchase Price'] = non_repeating.groupby('age bins')['Price'].mean()
purchase_analysis['Total Purchase Value'] = purchase_data.groupby('Age Ranges')['Price'].sum()
#purchase_analysis['AVG Total per Person'] = purchase_data.groupby('Age Ranges')['Price'].sum() / unique_users

#purchase_data
#grouped_data


purchase_analysis

            Purchase ID   SN  Age  Gender  Item ID  Item Name  Price  \
Age Ranges                                                             
<10                  23   23   23      23       23         23     23   
10-14                28   28   28      28       28         28     28   
15-19               136  136  136     136      136        136    136   
20-24               365  365  365     365      365        365    365   
25-29               101  101  101     101      101        101    101   
30-34                73   73   73      73       73         73     73   
35-39                41   41   41      41       41         41     41   
40+                  13   13   13      13       13         13     13   

            age bins  Purchase Count  
Age Ranges                            
<10               23              23  
10-14             28              28  
15-19            136             136  
20-24            365             365  
25-29            101             101  
30-34 

Unnamed: 0_level_0,Purchase Count,AVG Purchase Price,Total Purchase Value
Age Ranges,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
<10,23,3.508235,77.13
10-14,28,3.047273,82.78
15-19,136,3.084019,412.89
20-24,365,2.992558,1114.06
25-29,101,2.755195,293.0
30-34,73,2.933846,214.0
35-39,41,3.600323,147.67
40+,13,2.903333,38.24


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [42]:
count_sn = purchase_data.groupby('SN')

#sorted_sn = count_sn.sort_values('Purchase Count', ascending=False)

avg_sn_price = count_sn.mean()['Price']
tot_sn_pur = count_sn.sum()['Price']
sn_counts = count_sn.count()

print(avg_sn_price)
print(tot_sn_pur)
print(sn_counts)





SN
Adairialis76     2.280000
Adastirin33      4.480000
Aeda94           4.910000
Aela59           4.320000
Aelaria33        1.790000
                   ...   
Yathecal82       2.073333
Yathedeu43       3.010000
Yoishirrala98    4.580000
Zhisrisu83       3.945000
Zontibe81        2.676667
Name: Price, Length: 576, dtype: float64
SN
Adairialis76     2.28
Adastirin33      4.48
Aeda94           4.91
Aela59           4.32
Aelaria33        1.79
                 ... 
Yathecal82       6.22
Yathedeu43       6.02
Yoishirrala98    4.58
Zhisrisu83       7.89
Zontibe81        8.03
Name: Price, Length: 576, dtype: float64
               Purchase ID  Age  Gender  Item ID  Item Name  Price  age bins  \
SN                                                                             
Adairialis76             1    1       1        1          1      1         1   
Adastirin33              1    1       1        1          1      1         1   
Aeda94                   1    1       1        1          1     

## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [47]:
new_df = purchase_data[['Item ID', 'Item Name', 'Price']]

grouped_id = new_df.groupby('Item ID')

grouped_id

ValueError: No axis named Item Name for object type <class 'pandas.core.frame.DataFrame'>

## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



## Obervable Trends
based on the data  most purchases are made by people between the ages of 20-24 and that same age range also spends the most total. THe highest average purchase price is from users between 35-39. Also, males make up a majority of the amount of purchases but females spend more per purchase