### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# Raw data file
file_to_load = "Resources/purchase_data.csv"

# Read purchasing file and store into pandas data frame
purchase_data = pd.read_csv(file_to_load)

## Player Count

* Display the total number of players


In [2]:
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [3]:
unique_players = purchase_data["SN"].nunique()
total_players = pd.DataFrame({"Total Players": [unique_players]})
total_players

Unnamed: 0,Total Players
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [4]:
unique_items = purchase_data["Item ID"].nunique()
no_purchases = purchase_data["Purchase ID"].count()
total_revenue = purchase_data["Price"].sum()
avg_price = total_revenue / no_purchases
purchasing_analysis = pd.DataFrame({
    "No. of Unique Items": [unique_items],
    "Avg. Price": ['${:,.2f}'.format(x) for x in [avg_price]],
    "No. of Purchases": [no_purchases],
    "Total Revenue": ['${:,.2f}'.format(x) for x in [total_revenue]],
})
purchasing_analysis

Unnamed: 0,Avg. Price,No. of Purchases,No. of Unique Items,Total Revenue
0,$3.05,780,183,"$2,379.77"


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [5]:
female = purchase_data.loc[purchase_data["Gender"] == "Female", "SN" ].nunique()
female_perc = (female/unique_players)/1

male = purchase_data.loc[purchase_data["Gender"] == "Male", "SN" ].nunique()
male_perc = (male/unique_players)/1

other = purchase_data.loc[purchase_data["Gender"] == "Other / Non-Disclosed", "SN" ].nunique()
other_perc = (other/unique_players)/1

gender_analysis = pd.DataFrame({
    "Gender":["Female", "Male", "Other / Non-Disclosed"],
    "Percentage of Players": [female_perc, male_perc, other_perc],
    "Total Count": [female, male, other],
})
gender_analysis.groupby('Gender').sum().style.format({"Percentage of Players": lambda x: "{:.2%}".format(abs(x))})

Unnamed: 0_level_0,Percentage of Players,Total Count
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,14.06%,81
Male,84.03%,484
Other / Non-Disclosed,1.91%,11



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [6]:
female_purch = purchase_data.loc[purchase_data["Gender"] == "Female", "SN" ].count()
female_revenue = purchase_data.loc[purchase_data["Gender"] == "Female", "Price" ].sum()
female_avg = female_revenue / female_purch
female_per = female_revenue / female

male_purch = purchase_data.loc[purchase_data["Gender"] == "Male", "SN" ].count()
male_revenue = purchase_data.loc[purchase_data["Gender"] == "Male", "Price" ].sum()
male_avg = male_revenue / male_purch
male_per = male_revenue / male

other_purch = purchase_data.loc[purchase_data["Gender"] == "Other / Non-Disclosed", "SN" ].count()
other_revenue = purchase_data.loc[purchase_data["Gender"] == "Other / Non-Disclosed", "Price" ].sum()
other_avg = other_revenue / other_purch
other_per = other_revenue / other

purchasing_analysis = pd.DataFrame({
    "Gender":["Female", "Male", "Other / Non-Disclosed"],
    "Purchase Count": [female_purch, male_purch, other_purch],
    "Average Purchase Price": [female_avg, male_avg, other_avg],
    "Total Purchase Value": [female_revenue, male_revenue, other_revenue],
    "Average Total Purchase per Person": [female_per, male_per, other_per],
})
purchase_test = purchasing_analysis.groupby("Gender").sum().style.format({"Average Purchase Price": lambda x: "${:.2f}".format(abs(x)),
                                                                        "Total Purchase Value": lambda x: "${:.2f}".format(abs(x)),
                                                                        "Average Total Purchase per Person": lambda x: "${:.2f}".format(abs(x))
                                                                        })
purchase_test

Unnamed: 0_level_0,Average Purchase Price,Average Total Purchase per Person,Purchase Count,Total Purchase Value
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,$3.20,$4.47,113,$361.94
Male,$3.02,$4.07,652,$1967.64
Other / Non-Disclosed,$3.35,$4.56,15,$50.19


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [7]:
age_bins = [0, 9.90, 14.90, 19.90, 24.90, 29.90, 34.90, 39.90, 99999]
group_names = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

In [20]:
pd.cut(purchase_data["Age"], age_bins, labels=group_names).head()
purchase_data["Age Range"] = pd.cut(purchase_data["Age"], age_bins, labels=group_names)
purchase_data.head()

age_demo = purchase_data.groupby("Age Range")

print(age_demo["SN"].nunique())

age_perc = age_demo[["Age", "SN"]].nunique()
age_perc


Age Range
<10       17
10-14     22
15-19    107
20-24    258
25-29     77
30-34     52
35-39     31
40+       12
Name: SN, dtype: int64


Unnamed: 0_level_0,Age,SN
Age Range,Unnamed: 1_level_1,Unnamed: 2_level_1
<10,3,17
10-14,5,22
15-19,5,107
20-24,5,258
25-29,5,77
30-34,5,52
35-39,5,31
40+,6,12


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

