### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [2]:
purchase_data.dtypes

Purchase ID      int64
SN              object
Age              int64
Gender          object
Item ID          int64
Item Name       object
Price          float64
dtype: object

In [3]:
# purchase_data["Price"] = purchase_data["Price"].map("${:,.2f}".format)

## Player Count

* Display the total number of players


In [4]:
player_count = purchase_data['SN'].count()
player_count

780

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [5]:
unique_items = len(purchase_data['Item ID'].unique())                    
unique_items

183

In [6]:
average_price = purchase_data['Price'].mean()
average_price

3.050987179487176

In [7]:
number_purchases = len(purchase_data['Purchase ID'].unique())
number_purchases

780

In [8]:
total_revenue = purchase_data["Price"].sum()
total_revenue

2379.77

In [9]:
#Create Summary Data Frame
analysis_df = pd.DataFrame({
    "Number of Unique Items": [unique_items], 
    "Average Price": [average_price],
    "Number of Purchases": [number_purchases],
    "Total Revenue": [total_revenue]})

analysis_df



Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,183,3.050987,780,2379.77


In [10]:
# purchase_data["average_price"] = purchase_data["average_price"].map(
#     "${:,.2f}".format)
# purchase_data ["total_revenue"] = purchase_data["total_revenue"].map(
#     "${:,.2f}".format)

# analysis_df = pd.DataFrame({
#     "Number of Unique Items": [unique_items], 
#     "Average Price": [average_price],
#     "Number of Purchases": [number_purchases],
#     "Total Revenue": [total_revenue]})

# analysis_df


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [11]:
total_gender = purchase_data["Gender"].count()
total_gender

780

In [12]:
#Male
total_gender = purchase_data["Gender"].count()
male = purchase_data["Gender"].value_counts()['Male']
male_percent = male/total_gender * 100
male_percent

83.58974358974359

In [13]:
# Female
female = purchase_data["Gender"].value_counts()['Female']
female_percent = female/total_gender * 100
female_percent

14.487179487179489

In [14]:
# Other / Non-Disclosed
other = purchase_data["Gender"].value_counts()['Other / Non-Disclosed']
other_percent = other/total_gender * 100
other_percent

1.9230769230769231

In [15]:
genderanalysis_df = pd.DataFrame({
    #"Gender": ["Male", "Female", "Other / Non-Disclosed"]
    #"Gender": ["Male", "Female", "Other / Non-Disclosed"]
    "Total Count": [male, female, other], 
    "Percentage of Players": [male_percent, female_percent, other_percent]})
genderanalysis_df


Unnamed: 0,Total Count,Percentage of Players
0,652,83.589744
1,113,14.487179
2,15,1.923077



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [16]:
#Purchase Count
gender_groups = purchase_data.groupby(['Gender'])
purchase_count = gender_groups.count()["Price"]
purchase_count

Gender
Female                   113
Male                     652
Other / Non-Disclosed     15
Name: Price, dtype: int64

In [17]:
#Average Purchase price
average_purchase_price = gender_groups.mean()["Price"]
average_purchase_price

Gender
Female                   3.203009
Male                     3.017853
Other / Non-Disclosed    3.346000
Name: Price, dtype: float64

In [18]:
#purchase total per person
total_purchase = gender_groups.sum()["Price"]
total_purchase

Gender
Female                    361.94
Male                     1967.64
Other / Non-Disclosed      50.19
Name: Price, dtype: float64

In [19]:
#average purchase total
total_average_purchase_price = (purchase_count*average_purchase_price)/total_purchase

In [20]:
#summary table
purchase_analysis_df = pd.DataFrame({
    "Purchase Count": [purchase_count], 
    "Average Purchase Price": [average_purchase_price],
    "Total Purchase Value": [total_purchase],
    "Avg Total Purchase per Person": [total_average_purchase_price]})

purchase_analysis_df


Unnamed: 0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
0,Gender Female 113 Male ...,Gender Female 3.203009 Male ...,Gender Female 361.94 Male ...,Gender Female 1.0 Male ...


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [21]:
#bins
# Create the names for the bins
group_names = ['<10', '10-14', '15-19', '20-24', '25-29', '30-34', '35-39', '40+']
age_df = pd.DataFrame(purchase_data)
age_df.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [22]:
#Categorize
age_df["Age Range"] = pd.cut(age_df["Age"], 8, labels=group_names)               
age_df.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price,Age Range
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53,15-19
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56,35-39
2,2,Ithergue48,24,Male,92,Final Critic,4.88,20-24
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27,20-24
4,4,Iskosia90,23,Male,131,Fury,1.44,20-24


In [23]:
age_groups = purchase_data.groupby(['Age Range'])
age_groups.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price,Age Range
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53,15-19
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56,35-39
2,2,Ithergue48,24,Male,92,Final Critic,4.88,20-24
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27,20-24
4,4,Iskosia90,23,Male,131,Fury,1.44,20-24
5,5,Yalae81,22,Male,81,Dreamkiss,3.61,20-24
6,6,Itheria73,36,Male,169,"Interrogator, Blood Blade of the Queen",2.18,35-39
7,7,Iskjaskst81,20,Male,162,Abyssal Shard,2.67,15-19
8,8,Undjask33,22,Male,21,Souleater,1.1,20-24
9,9,Chanosian48,35,Other / Non-Disclosed,136,Ghastly Adamantite Protector,3.58,30-34


In [24]:
#Total Count
player_recount = age_groups['SN'].count()
player_recount

Age Range
<10       39
10-14     77
15-19    232
20-24    277
25-29     63
30-34     52
35-39     33
40+        7
Name: SN, dtype: int64

In [25]:
#percentage of players
total_recount = player_recount.sum()
player_percent = player_recount/total_recount * 100
player_percent

Age Range
<10       5.000000
10-14     9.871795
15-19    29.743590
20-24    35.512821
25-29     8.076923
30-34     6.666667
35-39     4.230769
40+       0.897436
Name: SN, dtype: float64

In [26]:
#summary table
player_df = pd.DataFrame({
    "Total Count": [player_recount], 
    "Percentage of Players": [player_percent]})
player_df.head()

Unnamed: 0,Total Count,Percentage of Players
0,Age Range <10 39 10-14 77 15-19 2...,Age Range <10 5.000000 10-14 9.87179...


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [27]:
# bins - see above

In [28]:
#purchase count 
purchase_recount = age_groups["Item ID"].count()
purchase_recount

Age Range
<10       39
10-14     77
15-19    232
20-24    277
25-29     63
30-34     52
35-39     33
40+        7
Name: Item ID, dtype: int64

In [29]:
#average purchase price
average_purchase_price_recount = age_groups.mean()["Price"]
average_purchase_price_recount

Age Range
<10      3.275641
10-14    2.965844
15-19    3.067845
20-24    3.036426
25-29    2.876667
30-34    2.994423
35-39    3.404545
40+      3.075714
Name: Price, dtype: float64

In [30]:
#total purchase total per person
total_purchase_recount = age_groups.sum()["Price"]
total_purchase_recount

Age Range
<10      127.75
10-14    228.37
15-19    711.74
20-24    841.09
25-29    181.23
30-34    155.71
35-39    112.35
40+       21.53
Name: Price, dtype: float64

In [31]:
total_average_purchase_reprice = (purchase_recount*average_purchase_price_recount)/total_purchase_recount

In [32]:
#summary table
purchase_analysis_recount_df = pd.DataFrame({
    "Purchase Count": [purchase_recount], 
    "Average Purchase Price": [average_purchase_price_recount],
    "Total Purchase Value": [total_purchase_recount],
    "Avg Total Purchase per Person": [total_average_purchase_reprice]})

purchase_analysis_df

Unnamed: 0,Purchase Count,Average Purchase Price,Total Purchase Value,Avg Total Purchase per Person
0,Gender Female 113 Male ...,Gender Female 3.203009 Male ...,Gender Female 361.94 Male ...,Gender Female 1.0 Male ...


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [33]:
#SN
SN_groups = age_df.groupby(['SN'])

#Purchase Count
purchase_count = SN_groups.count()["Price"]

#Average Purchase Price
average_purchase_count = SN_groups.mean()["Price"]

#Total Purchase Value
total_purchase_value = SN_groups.sum()["Price"]

topspenders_analysis = pd.DataFrame({
    "Purchase Count": purchase_count,
    "Average Purchase Price": average_purchase_count,
    "Total Purchase Value": total_purchase_value})
topspenders_analysis.head()


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Adairialis76,1,2.28,2.28
Adastirin33,1,4.48,4.48
Aeda94,1,4.91,4.91
Aela59,1,4.32,4.32
Aelaria33,1,1.79,1.79


In [34]:
topspenders_analysis_sorted = topspenders_analysis.sort_values('Total Purchase Value', ascending=False).head()
topspenders_analysis_sorted

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,3.792,18.96
Idastidru52,4,3.8625,15.45
Chamjask73,3,4.61,13.83
Iral74,4,3.405,13.62
Iskadarya95,3,4.366667,13.1


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [35]:
mostprofitable_analysis = purchase_data[["Item ID", "Item Name", "Price"]]
mostprofitable_analysis.head()


Unnamed: 0,Item ID,Item Name,Price
0,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,143,Frenzied Scimitar,1.56
2,92,Final Critic,4.88
3,100,Blindscythe,3.27
4,131,Fury,1.44


In [36]:
# mostprofitable_analysis2 = purchase_data[["Item ID", "Item Name", [unique_items],"Price", [total_revenue]]]
# mostprofitable_analysis2.head()


In [37]:
#Number purchases
number_purchases_end = len(mostprofitable_analysis['Item ID'].unique())
number_purchases_end

183

In [38]:
#ITems
Item_groups = mostprofitable_analysis.groupby(['Item ID'])

#Average Purchase Price
average_purchase_end = Item_groups.mean()["Price"]

#Total Purchase Value
total_purchase_value_end = Item_groups.sum()["Price"]

mostprofitable_analysis_end = pd.DataFrame({
    "Purchase Count": number_purchases_end,
    "Average Purchase Price": average_purchase_end,
    "Total Purchase Value": total_purchase_value_end})
mostprofitable_analysis_end.head()



Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
Item ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,183,1.28,5.12
1,183,3.26,9.78
2,183,2.48,14.88
3,183,2.49,14.94
4,183,1.7,8.5


In [39]:
#groupby
# reorganized_df = purchase_data.groupby["Item ID"]
# reorganized_df.head()

In [40]:
#sort the purchase count column in descending order
mostprofitable_analysis_sorted = mostprofitable_analysis_end.sort_values('Purchase Count', ascending=False).head()
mostprofitable_analysis_sorted

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
Item ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,183,1.28,5.12
116,183,4.18,16.72
118,183,2.17,2.17
119,183,4.32,12.96
120,183,3.08,18.48


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [41]:
mostprofitable_analysis_sorted = mostprofitable_analysis_end.sort_values('Total Purchase Value', ascending=False).head()
mostprofitable_analysis_sorted

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
Item ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
178,183,4.23,50.76
82,183,4.9,44.1
145,183,4.58,41.22
92,183,4.88,39.04
103,183,4.35,34.8


In [None]:
#the end...whew