# Heroes of Pymoli

## Data analysis

1. Out of 576 players, the majority are male gamers, counting up to 484 persons that represent 84% of the total population. This group also represents the biggest revenue with a total purchase of $1967.64

2. According to the age demographics analysis, the age group that spent the most lies between 20 to 24 years. They bought a total of 365 items, representing a total revenue of $1,114.06

3. Finally, the most popular and most profitable game in the dataset is "Oathbreaker, Last Hope of the Breaking Storm". Out of 780 bought items, this one was bought 12 times, representing a total revenue of $50.76

In [1]:
import pandas as pd
import numpy as np

In [2]:
fileName = "purchase_data.csv"

readCsv = pd.read_csv(fileName, encoding = "utf-8")
readCsv.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [3]:
###### Player Count ######

tot_players = len(readCsv["SN"].unique())
"The total number of players is " + str(tot_players)

'The total number of players is 576'

In [4]:
###### Purchasing Analysis (Total) ######

unique_prod = readCsv["Item ID"].unique()
unique_items = len(unique_prod)
"There are a total of " + str(unique_items) + " unique items"

'There are a total of 183 unique items'

In [5]:
ave_price = readCsv["Price"].mean()
f"The average purchase price is of ${ave_price:.2f}"

'The average purchase price is of $3.05'

In [6]:
tot_purchase = readCsv['Item ID'].count()
f"The total number of purchases {tot_purchase}"

'The total number of purchases 780'

In [7]:
tot_price = readCsv["Price"].sum()
f"The total revenue is ${tot_price:.2f}"

'The total revenue is $2379.77'

In [8]:
###### Gender Demographics ######
# unique en vez de groupby
gender_demo = readCsv.copy()
gender_group = gender_demo.groupby('SN').first()

gender_dic = gender_group['Gender'].value_counts()
total_genders = gender_dic.sum()
male = gender_dic['Male']
female = gender_dic['Female']
other = gender_dic['Other / Non-Disclosed']

In [9]:
male = male / total_genders * 100
f"There are a total of {gender_dic['Male']} Male players which represent a {male:.2f}% of the total population"

'There are a total of 484 Male players which represent a 84.03% of the total population'

In [10]:
female = female / total_genders * 100
f"There are a total of {gender_dic['Female']} Female players which represent a {female:.2f}% of the total population"

'There are a total of 81 Female players which represent a 14.06% of the total population'

In [11]:
none = other / total_genders * 100
f"There are a total of {gender_dic['Other / Non-Disclosed']} 'Other / Non-Disclosed' players which represent a {none:.2f}% of the total population"

"There are a total of 11 'Other / Non-Disclosed' players which represent a 1.91% of the total population"

In [12]:
###### Purchasing Analysis (Gender -> Male) ######

male_gender = gender_demo.loc[gender_demo['Gender'] == "Male", :].copy()
male_gender.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [13]:
male_purchase_count = male_gender['Purchase ID'].count()
male_ave_price = male_gender["Price"].mean()
male_tot_purchase = male_gender["Price"].sum()
male_tot_person = len(male_gender['SN'].unique())
ave_tot_male = male_tot_purchase / male_tot_person
print(f"The Male gender bought a total of {male_purchase_count} games")
print(f"The average pruchase was of ${male_ave_price:.2f}")
print(f"Representing a total of ${male_tot_purchase:.2f} revenues")
print(f"This means that in average one person paid ${ave_tot_male:.2f} per purchase")

The Male gender bought a total of 652 games
The average pruchase was of $3.02
Representing a total of $1967.64 revenues
This means that in average one person paid $4.07 per purchase


In [14]:
###### Purchasing Analysis (Gender -> Female) ######

female_gender = gender_demo.loc[gender_demo['Gender'] == "Female", :].copy()
female_gender.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
15,15,Lisassa64,21,Female,98,"Deadline, Voice Of Subtlety",2.89
18,18,Reunasu60,22,Female,82,Nirvana,4.9
38,38,Reulae52,10,Female,116,Renewed Skeletal Katana,4.18
41,41,Assosia88,20,Female,7,"Thorn, Satchel of Dark Souls",1.33
55,55,Phaelap26,25,Female,84,Arcane Gem,3.79


In [15]:
female_purchase_count = female_gender['Purchase ID'].count()
female_ave_price = female_gender["Price"].mean()
female_tot_purchase = female_gender["Price"].sum()
female_tot_person = len(female_gender['SN'].unique())
ave_tot_female = female_tot_purchase / female_tot_person
print(f"The Female gender bought a total of {female_purchase_count} games")
print(f"The average pruchase was of ${female_ave_price:.2f}")
print(f"Representing a total of ${female_tot_purchase:.2f} revenues")
print(f"This means that in average one person paid ${ave_tot_female:.2f} per purchase")

The Female gender bought a total of 113 games
The average pruchase was of $3.20
Representing a total of $361.94 revenues
This means that in average one person paid $4.47 per purchase


In [16]:
###### Purchasing Analysis (Gender -> Other / Non-Disclosed) ######

other_gender = gender_demo.loc[gender_demo['Gender'] == "Other / Non-Disclosed", :].copy()
other_gender.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
9,9,Chanosian48,35,Other / Non-Disclosed,136,Ghastly Adamantite Protector,3.58
22,22,Siarithria38,38,Other / Non-Disclosed,24,Warped Fetish,3.81
82,82,Haerithp41,16,Other / Non-Disclosed,160,Azurewrath,4.4
111,111,Sundim98,21,Other / Non-Disclosed,41,Orbit,4.75
228,228,Jiskirran77,20,Other / Non-Disclosed,80,Dreamsong,3.39


In [17]:
other_purchase_count = other_gender['Purchase ID'].count()
other_ave_price = other_gender["Price"].mean()
other_tot_purchase = other_gender["Price"].sum()
other_tot_person = len(other_gender['SN'].unique())
ave_tot_other = other_tot_purchase / other_tot_person
print(f"The Female gender bought a total of {other_purchase_count} games")
print(f"The average pruchase was of ${other_ave_price:.2f}")
print(f"Representing a total of ${other_tot_purchase:.2f} revenues")
print(f"This means that in average one person paid ${ave_tot_other:.2f} per purchase")

The Female gender bought a total of 15 games
The average pruchase was of $3.35
Representing a total of $50.19 revenues
This means that in average one person paid $4.56 per purchase


In [18]:
###### Age Demographics ######

age_demo = readCsv.copy()
age_demo['Age'].unique()

array([20, 40, 24, 23, 22, 36, 35, 21, 30, 38, 29, 11,  7, 19, 37, 10,  8,
       18, 27, 33, 32, 25, 12, 34, 17, 15, 13, 26, 16, 28, 31, 39, 44, 41,
        9, 14, 42, 43, 45])

In [19]:
bin = [i for i in range(9, 49, 5)]
bin.insert(0,0)
bin_labels = ["< 10 years", "from 10 to 14 years", "from 15 to 19 years", "from 20 to 24 years", "from 25 to 29 years",
         "from 30 to 34 years", "from 35 to 39 years", "from 40 to 44 years"]
age_demo['Age range'] = pd.cut(age_demo['Age'], bin, labels = bin_labels)
age_demo.head()


Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price,Age range
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53,from 20 to 24 years
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56,from 40 to 44 years
2,2,Ithergue48,24,Male,92,Final Critic,4.88,from 20 to 24 years
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27,from 20 to 24 years
4,4,Iskosia90,23,Male,131,Fury,1.44,from 20 to 24 years


In [34]:
age_grouped = age_demo.groupby("Age range")
age_purchase_count = age_grouped['Price'].count()
age_ave_price = age_grouped['Price'].mean().map("${:,.2f}".format)
age_tot_purchase = age_grouped['Price'].sum()
ave_tot_age = age_tot_purchase / age_grouped['SN'].nunique()

age_tot_purchase = age_tot_purchase.map("${:,.2f}".format)
ave_tot_age = ave_tot_age.map("${:,.2f}".format)

age_purchase_count = age_purchase_count.rename("Purchase count")
age_ave_price = age_ave_price.rename("Average price")
age_tot_purchase = age_tot_purchase.rename("Total purchases")
ave_tot_age = ave_tot_age.rename("Average purchase total per person")

age_final = pd.concat([age_purchase_count, age_ave_price, age_tot_purchase, ave_tot_age], axis=1)
age_final


Unnamed: 0_level_0,Purchase count,Average price,Total purchases,Average purchase total per person
Age range,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
< 10 years,23,$3.35,$77.13,$4.54
from 10 to 14 years,28,$2.96,$82.78,$3.76
from 15 to 19 years,136,$3.04,$412.89,$3.86
from 20 to 24 years,365,$3.05,"$1,114.06",$4.32
from 25 to 29 years,101,$2.90,$293.00,$3.81
from 30 to 34 years,73,$2.93,$214.00,$4.12
from 35 to 39 years,41,$3.60,$147.67,$4.76
from 40 to 44 years,12,$3.04,$36.54,$3.32


In [21]:
###### Top Spenders ######

top_spenders = readCsv.copy()
top_group = top_spenders.groupby("SN")
spent = top_group['Price'].sum()
# crear un nuevo dataframe con los 5 valores importantes

top_5 = spent.nlargest(5)
# top = [top_5.index[i] for i in range(len(top_5))]
# top_df = top_spenders.loc[top_spenders['SN'].isin(top)]
# top_df
sn = [top_5.index[i] for i in range(5)]
dic = {'SN' : [], "Purchase Count" : [], "Average Purchase Price" : [], "Total Purchase" : []}
for name, group in top_group:
    if name in top_5:
        dic['SN'].append(name)
        dic['Purchase Count'].append(group['Price'].count())
        dic['Average Purchase Price'].append(group['Price'].mean())
        dic['Total Purchase'].append(group['Price'].sum())
        
top_df = pd.DataFrame(dic)
top_df = top_df.set_index("SN")
top_df = top_df.reindex(sn)

top_df['Average Purchase Price'] = top_df['Average Purchase Price'].map("${:,.2f}".format)
top_df['Total Purchase'] = top_df['Total Purchase'].map("${:,.2f}".format)

top_df

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$3.79,$18.96
Idastidru52,4,$3.86,$15.45
Chamjask73,3,$4.61,$13.83
Iral74,4,$3.40,$13.62
Iskadarya95,3,$4.37,$13.10


In [22]:
###### Most Popular Items ######

popular_items = readCsv.copy()
top_5 = popular_items['Item ID'].value_counts().nlargest(5)
popular_group = popular_items.groupby('Item ID')
dic = {'Item ID' : [], 'Item Name' : [], 'Purchase Count' : [], 'Item Price' : [], 'Total Purchase Value' : []}

for i in range(len(top_5)):
    for name, group in popular_group:
        if name == top_5.index[i]:
            dic['Item ID'].append(name)
            dic['Item Name'].append(group['Item Name'].unique()[0])
            dic['Purchase Count'].append(group['Item Name'].count())
            dic['Item Price'].append(group['Price'].unique()[0])
            dic['Total Purchase Value'].append(group['Price'].sum())
    
popular_df = pd.DataFrame(dic)

popular_df['Item Price'] = popular_df['Item Price'].map("${:,.2f}".format)
popular_df['Total Purchase Value'] = popular_df['Total Purchase Value'].map("${:,.2f}".format)

popular_df

Unnamed: 0,Item ID,Item Name,Purchase Count,Item Price,Total Purchase Value
0,178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
1,82,Nirvana,9,$4.90,$44.10
2,108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77
3,145,Fiery Glass Crusader,9,$4.58,$41.22
4,92,Final Critic,8,$4.88,$39.04


In [23]:
###### Most Profitable Items ######

profitable_items = readCsv.copy()
profitable_group = profitable_items.groupby('Item ID')
top_5 = profitable_group['Price'].sum().nlargest(5)
sn = [top_5.index[i] for i in range(5)]
dic = {'Item ID' : [], 'Item Name' : [], 'Purchase Count' : [], 'Item Price' : [], 'Total Purchase Value' : []}

for name, group in profitable_group:
    if name in top_5:
        dic['Item ID'].append(name)
        dic['Item Name'].append(group['Item Name'].unique()[0])
        dic['Purchase Count'].append(group['Item Name'].count())
        dic['Item Price'].append(group['Price'].unique()[0])
        dic['Total Purchase Value'].append(group['Price'].sum())
        
profitable_df = pd.DataFrame(dic)
profitable_df = profitable_df.set_index("Item ID")
profitable_df = profitable_df.reindex(sn)

profitable_df['Item Price'] = profitable_df['Item Price'].map("${:,.2f}".format)
profitable_df['Total Purchase Value'] = profitable_df['Total Purchase Value'].map("${:,.2f}".format)

profitable_df

Unnamed: 0_level_0,Item Name,Purchase Count,Item Price,Total Purchase Value
Item ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
82,Nirvana,9,$4.90,$44.10
145,Fiery Glass Crusader,9,$4.58,$41.22
92,Final Critic,8,$4.88,$39.04
103,Singed Scalpel,8,$4.35,$34.80
