### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [20]:
import pandas as pd
import numpy as np

upload_data = "Resources/purchase_data.csv"

pymoli = pd.read_csv(upload_data)

## Player Count

* Display the total number of players


In [30]:
total_players = len(pymoli["SN"].unique())

total_players_df = pd.DataFrame([{"Total Players":total_players}])

total_players_df

Unnamed: 0,Total Players
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [9]:
unique_items = len(pymoli["Item Name"].unique())

total_purchase = pymoli["Price"].count()

avg_price = round(pymoli["Price"].mean(),2)

total_revenue = round((pymoli["Price"].sum()),2)

short_cut = pd.DataFrame([{"Total Purchases":total_purchase,
                           "Unique Items":unique_items,
                           "Average Price":avg_price,
                           "Total Revenue":total_revenue}])

summary = pd.DataFrame(short_cut)

summary

Unnamed: 0,Total Purchases,Unique Items,Average Price,Total Revenue
0,780,179,3.05,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [47]:
unique = pymoli.drop_duplicates(["SN"])

In [50]:
gender_count = pd.DataFrame(unique["Gender"].value_counts())

gender_count.columns = ["Total Count"]

total_users = gender_count["Total Count"].sum()

In [51]:
gender_count["Percentage"] = ['{:.2%}'.format((x/total_users)) for x in gender_count["Total Count"]]

gender_count

Unnamed: 0,Total Count,Percentage
Male,484,84.03%
Female,81,14.06%
Other / Non-Disclosed,11,1.91%



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [13]:
male_purchases = pymoli.loc[pymoli["Gender"] == "Male",["Price"]]
total_m_purchase = float(male_purchases.sum())
avg_m_purchase = float(male_purchases.mean())
num_m_purchase = int(male_purchases.count())

female_purchases = pymoli.loc[pymoli["Gender"] == "Female",["Price"]]
total_f_purchase = float(female_purchases.sum())
avg_f_purchase = float(female_purchases.mean())
num_f_purchase = int(female_purchases.count())

other_purchases = pymoli.loc[pymoli["Gender"] == "Other / Non-Disclosed",["Price"]]
total_o_purchase = float(other_purchases.sum())
avg_o_purchase = float(other_purchases.mean())
num_o_purchase = int(other_purchases.count())

In [14]:
gender_sum = gender_count

gender_sum["Purchase Count"] = [num_m_purchase,num_f_purchase,num_o_purchase]
gender_sum["Average Purchase"] = [avg_m_purchase,avg_f_purchase,avg_o_purchase]
gender_sum["Total Value"] = [total_m_purchase,total_f_purchase,total_o_purchase]

total_value = gender_sum["Total Value"].sum()
gender_sum["Percentage Total"] = ['{:.1%}'.format((x/total_value)) for x in gender_sum["Total Value"]]

In [15]:
gender_sum = gender_sum[["Purchase Count","Average Purchase","Total Value","Percentage Total"]]
gender_sum

Unnamed: 0,Purchase Count,Average Purchase,Total Value,Percentage Total
Male,652,3.017853,1967.64,82.7%
Female,113,3.203009,361.94,15.2%
Other / Non-Disclosed,15,3.346,50.19,2.1%


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [36]:
age_count = pymoli.drop_duplicates("SN")

bins = pd.cut(age_count['Age'],[0,9,14,19,24,29,34,39,100],
              labels = ['<10','10 to 14','15 to 19','20 to 24','25 to 29','30 to 34','35 to 39','over 40'])

age_count.groupby(bins)["Age"].agg(["count"])

Unnamed: 0_level_0,count
Age,Unnamed: 1_level_1
<10,17
10 to 14,22
15 to 19,107
20 to 24,258
25 to 29,77
30 to 34,52
35 to 39,31
over 40,12


In [38]:
age_count.groupby(bins)["Age"].count()

Age
<10          17
10 to 14     22
15 to 19    107
20 to 24    258
25 to 29     77
30 to 34     52
35 to 39     31
over 40      12
Name: Age, dtype: int64

## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [26]:
y = pymoli

bins = pd.cut(y['Age'],[0,9,14,19,24,29,34,39,100], 
                labels = ['<10','10 to 14','15 to 19','20 to 24','25 to 29','30 to 34','35 to 39','over 40'])

def analysis(x):
    
    titles = {
        'Purchase Count': x['Age'].count(),
        'Avg Purchase Price':  x['Price'].sum()/x['Price'].count(),
        'Total Purchase': x['Price'].sum(),
        'Avg Purchase per Person': x['Price'].sum()/x['SN'].nunique(),
      }

    return pd.Series(titles, index = ['Purchase Count', 'Avg Purchase Price','Total Purchase','Avg Purchase per Person'])


y.groupby(bins).apply(analysis)

Unnamed: 0_level_0,Purchase Count,Avg Purchase Price,Total Purchase,Avg Purchase per Person
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,23.0,3.353478,77.13,4.537059
10 to 14,28.0,2.956429,82.78,3.762727
15 to 19,136.0,3.035956,412.89,3.858785
20 to 24,365.0,3.052219,1114.06,4.318062
25 to 29,101.0,2.90099,293.0,3.805195
30 to 34,73.0,2.931507,214.0,4.115385
35 to 39,41.0,3.601707,147.67,4.763548
over 40,13.0,2.941538,38.24,3.186667


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [40]:
y = pymoli 

y.groupby('SN')

def analysis(x):
    
    titles = {
        'Purchase Count': x['Age'].count(),
        'Avg Purchase Price':  x['Price'].sum()/x['Price'].count(),
        'Total Purchase': x['Price'].sum(),
        'Avg Purchase per Person': x['Price'].sum()/x['SN'].nunique(),
      }

    return pd.Series(titles, index = ['Purchase Count', 'Avg Purchase Price','Total Purchase','Avg Purchase per Person'])

z = y.groupby(['SN']).apply(analysis)

z.nlargest(10,'Total Purchase')

Unnamed: 0_level_0,Purchase Count,Avg Purchase Price,Total Purchase,Avg Purchase per Person
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Lisosia93,5.0,3.792,18.96,18.96
Idastidru52,4.0,3.8625,15.45,15.45
Chamjask73,3.0,4.61,13.83,13.83
Iral74,4.0,3.405,13.62,13.62
Iskadarya95,3.0,4.366667,13.1,13.1
Ilarin91,3.0,4.233333,12.7,12.7
Ialallo29,3.0,3.946667,11.84,11.84
Tyidaim51,3.0,3.943333,11.83,11.83
Lassilsala30,3.0,3.836667,11.51,11.51
Chadolyla44,3.0,3.82,11.46,11.46


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [32]:
y = pymoli

y.groupby(['Purchase ID','Item Name'])

def analysis_2(x):
    titles = {
        'Purchase Count':x['Purchase ID'].count(),
        'Item Price':x['Price'].sum()/x['Price'].count(),
        'Total Purchase':x['Price'].sum()*x['Purchase ID'].count()/x['Price'].count()
    }
    
    return pd.Series(titles, index = ['Purchase Count','Item Price','Total Purchase'])

z = y.groupby(['Item ID','Item Name']).apply(analysis_2)

z.nlargest(5,'Purchase Count')

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13.0,4.614615,59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12.0,4.23,50.76
82,Nirvana,9.0,4.9,44.1
108,"Extraction, Quickblade Of Trembling Hands",9.0,3.53,31.77
132,Persuasion,9.0,3.221111,28.99


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame


In [39]:
y = pymoli

y.groupby(['Purchase ID','Item Name'])

def analysis_2(x):
    titles = {
        'Purchase Count':x['Purchase ID'].count(),
        'Item Price':x['Price'].sum()/x['Price'].count(),
        'Total Purchase':x['Price'].sum()*x['Purchase ID'].count()/x['Price'].count()
    }
    
    return pd.Series(titles, index = ['Purchase Count','Item Price','Total Purchase'])

z = y.groupby(['Purchase ID','Item Name']).apply(analysis_2)

z.nlargest(10,'Total Purchase')

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase
Purchase ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
189,Stormfury Mace,1.0,4.99,4.99
554,Stormfury Mace,1.0,4.99,4.99
110,"Mercy, Katana of Dismay",1.0,4.94,4.94
116,"Mercy, Katana of Dismay",1.0,4.94,4.94
231,"Mercy, Katana of Dismay",1.0,4.94,4.94
246,"Mercy, Katana of Dismay",1.0,4.94,4.94
493,"Mercy, Katana of Dismay",1.0,4.94,4.94
275,Stormfury Longsword,1.0,4.93,4.93
290,"Hellreaver, Heirloom of Inception",1.0,4.93,4.93
312,"Hellreaver, Heirloom of Inception",1.0,4.93,4.93


* Three Observable Trends 

In [41]:
# Players from age 20 to 24 are the ones play the game the most

# The largest total purchase also stays at the 20 to 24 peirod 

# It seems like "Final Critic" would be a popular item that people like to purchase 