### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [5]:
# Dependencies and Setup
# Import modules:
import pandas as pd
import csv
import os
import openpyxl

# Set parameters for input and output files:
path = os.path.join("Resources", "purchase_data.csv")
purchase_data = pd.read_csv(path)

In [6]:
# Create a new dataframe object with only Age and SN columns...
df = purchase_data.loc[:, ["Age", "SN", "Price", "Purchase ID"]]
df = df.drop_duplicates()
# Use pd.cut() method to split age data into bins:
bins_age = [0, 9, 14, 19, 24, 29, 34, 39, 100]
bins_age_labels = ['<10', '10-14', '15-19', '20-24', '25-29', '30-34', '35-39', '40+']
pd.cut(df['Age'], bins=bins_age, labels=bins_age_labels).head()
# Add Age Category column to the existing df:
df['Age Category'] = pd.cut(df["Age"], bins=bins_age, labels=bins_age_labels)
#df.head()
df = df.groupby('Age Category')
df.head(20)

Unnamed: 0,Age,SN,Price,Purchase ID,Age Category
0,20,Lisim78,3.53,0,20-24
1,40,Lisovynya38,1.56,1,40+
2,24,Ithergue48,4.88,2,20-24
3,24,Chamassasya86,3.27,3,20-24
4,23,Iskosia90,1.44,4,20-24
...,...,...,...,...,...
674,43,Aeral68,4.00,674,40+
686,8,Chadjask77,4.93,686,<10
692,9,Quaecjask96,4.40,692,<10
728,44,Chanosiaya39,1.97,728,40+


In [9]:
purchase_count = df['Purchase ID'].count()
average_purchase_price = df['Price'].mean()
total_purchase_value = df['Price'].sum()
unique_SN = df['SN'].value_counts()
unique_SN_count = unique_SN.count()
unique_SN_len = df['SN'].unique()
average_total_purchase_per_person = total_purchase_value / len(unique_SN)
print(purchase_count)
print(average_purchase_price)
print(total_purchase_value)
print(f" ======= Average total price per person {average_total_purchase_per_person}")
print(f"Unique purchasers = {len(unique_SN)}")

Age Category
<10       23
10-14     28
15-19    136
20-24    365
25-29    101
30-34     73
35-39     41
40+       13
Name: Purchase ID, dtype: int64
Age Category
<10      3.353478
10-14    2.956429
15-19    3.035956
20-24    3.052219
25-29    2.900990
30-34    2.931507
35-39    3.601707
40+      2.941538
Name: Price, dtype: float64
Age Category
<10        77.13
10-14      82.78
15-19     412.89
20-24    1114.06
25-29     293.00
30-34     214.00
35-39     147.67
40+        38.24
Name: Price, dtype: float64
<10      0.133906
10-14    0.143715
15-19    0.716823
20-24    1.934132
25-29    0.508681
30-34    0.371528
35-39    0.256372
40+      0.066389
Name: Price, dtype: float64
Unique purchasers = 576


In [10]:
age_demographics = pd.DataFrame({'Purchase Count': (purchase_count),
                       'Average Purchase Price': (average_purchase_price),
                       'Total Purchase Value': (total_purchase_value),
                       'Average Total Purchase per Person': (average_total_purchase_per_person)})
age_demographics

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Average Total Purchase per Person
Age Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,23,3.353478,77.13,0.133906
10-14,28,2.956429,82.78,0.143715
15-19,136,3.035956,412.89,0.716823
20-24,365,3.052219,1114.06,1.934132
25-29,101,2.90099,293.0,0.508681
30-34,73,2.931507,214.0,0.371528
35-39,41,3.601707,147.67,0.256372
40+,13,2.941538,38.24,0.066389


In [11]:
# Format the values for display:
age_demographics["Average Purchase Price"] = age_demographics["Average Purchase Price"].map("${:,.2f}".format)
age_demographics["Total Purchase Value"] = age_demographics["Total Purchase Value"].map("${:,.2f}".format)
age_demographics["Average Total Purchase per Person"] = age_demographics["Average Total Purchase per Person"].map("${:,.2f}".format)
age_demographics

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Average Total Purchase per Person
Age Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,23,$3.35,$77.13,$0.13
10-14,28,$2.96,$82.78,$0.14
15-19,136,$3.04,$412.89,$0.72
20-24,365,$3.05,"$1,114.06",$1.93
25-29,101,$2.90,$293.00,$0.51
30-34,73,$2.93,$214.00,$0.37
35-39,41,$3.60,$147.67,$0.26
40+,13,$2.94,$38.24,$0.07


In [13]:
# Save to excel file:
output_file = age_demographics.to_excel("4_Age_Demographics.xlsx")
pd.ExcelWriter
writer = pd.ExcelWriter("4_Age_Demographics.xlsx")
# Write purchase summary to the same excel file in a new sheet:
age_demographics.to_excel(writer, sheet_name = 'Age Demographics')
writer.save()

In [7]:
# example below (Do not run)

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
Age Ranges,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,23,$3.35,$77.13,$4.54
10-14,28,$2.96,$82.78,$3.76
15-19,136,$3.04,$412.89,$3.86
20-24,365,$3.05,"$1,114.06",$4.32
25-29,101,$2.90,$293.00,$3.81
30-34,73,$2.93,$214.00,$4.12
35-39,41,$3.60,$147.67,$4.76
40+,13,$2.94,$38.24,$3.19
