### Heroes Of Pymoli Data Analysis
1) Of the 576 total players, 84% are male, 14% are female, and 2% are Other/Non-Disclosed.

2) Purchase total by each gender aligned with their percent of the population. So, males purchases accounted for 84% of total purchases, female 14%, and Other/Non-Disclosed 2%.  Which means that gender seems not to be a factor in purchasing interest. 

3) A bit under half of all players (44.8%) are between the ages of 20-24 and also account for the majority of purchases, with 15-19 year olds coming in at 2nd place.

-----

In [1]:
#IMPORT DEPENDENCIES
import pandas as pd

#IMPORT DATA FROM CSV
data = "Resources/purchase_data.csv"

#READ DATA
data = pd.read_csv(data)
data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count

* Display the total number of players


In [2]:
#TOTAL NUMBER OF PLAYERS
total_players = data["SN"].nunique()

#DF TOTAL NUMBER OF PLAYERS
total_players_df = pd.DataFrame({"Total Players" : [total_players]})
total_players_df

Unnamed: 0,Total Players
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
#CALCULATIONS
number_unique_items = data["Item ID"].nunique()
average_price = round((data["Price"]).mean() , 2)
number_of_purchases = data["Purchase ID"].count()
total_revenue = data["Price"].sum()
total_revenue

print(f"Total revenue: {total_revenue}")

Total revenue: 2379.77


In [4]:
#DF OF CALCULATIONS - using lists of dictonaries
purchasing_analysis_total_df = pd.DataFrame([{"Number of Unique Items" : number_unique_items ,
                                              "Average Price" : average_price , 
                                              "Number of Purchases" : number_of_purchases , 
                                              "Total Revenue" : total_revenue}])

#FORMAT CURRENCY
purchasing_analysis_total_df = purchasing_analysis_total_df.style.format({'Average Price':"${:,.2f}" ,
                                                                          'Total Revenue': '${:,.2f}'})
#SHOW/PRINT
purchasing_analysis_total_df

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,179,$3.05,780,"$2,379.77"


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [5]:
#CALCULATIONS

#DROP DUPLICATE USERS BY COLUMN "SN" (screen name)
data_2 = data.drop_duplicates(subset="SN")

#TOTALS BY GENDER - since I dropped dupplicates in previous step, I can just count all the genders listed.
total_male = data_2["Gender"].value_counts()['Male']
total_female = data_2["Gender"].value_counts()['Female']
total_other = data_2["Gender"].value_counts()['Other / Non-Disclosed']

print(
    f"Total Males: {total_male}\n"
    f"Total Females: {total_female}\n"
    f"Total Other: {total_other}")

Total Males: 484
Total Females: 81
Total Other: 11


In [6]:
#PERCENTAGE OF TOTAL
percentage_of_males = round(total_male/total_players , 2)
percentage_of_males = format(percentage_of_males , ".2%")
    
percentage_of_females = round(total_female/total_players , 2)
percentage_of_females = format(percentage_of_females , ".2%")
    
percentage_of_other = round(total_other/total_players , 2)
percentage_of_other = format(percentage_of_other , ".2%")
    
print(
    f"Percent of Males: {percentage_of_males}\n"
    f"Percent of Females: {percentage_of_females}\n"
    f"Percent of Other: {percentage_of_other}")

Percent of Males: 84.00%
Percent of Females: 14.00%
Percent of Other: 2.00%


In [35]:
#DF OF CALCULATIONS - using dictionary of lists.
gender_demographics_df = pd.DataFrame({
    "Gender": ["Male" , "Female" , "Other / Non-Disclosed",],
    "Total": [total_male, total_female ,total_other,],
    "Percent of Players": [percentage_of_males , percentage_of_females, percentage_of_other,]
})

gender_demographics_df

Unnamed: 0,Gender,Total,Percent of Players
0,Male,484,84.00%
1,Female,81,14.00%
2,Other / Non-Disclosed,11,2.00%



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [34]:
#CALCULATIONS
gender_purchases_count = data.Gender.value_counts()
gender_purchases_mean = round(data.groupby(["Gender"]).mean()["Price"] , 2)
gender_purchases_sum = data.groupby(["Gender"]).sum()["Price"]
avg_total_purch_per_person = round(gender_purchases_sum/gender_demographics_df.set_index('Gender')['Total'] , 2)
#EXTRA CALCULATION- PERCENT OF TOTAL PURCHASES BY GENDER
percent_of_purchases = round(gender_purchases_count/gender_purchases_count.sum(), 2) * 100
percent_of_purchases

#DF OF CALCULATIONS - using dictionary of lists.
purch_analysis_gender_df = pd.DataFrame({"Purchase Count" : gender_purchases_count,
                                         "Percent of Purchases" : percent_of_purchases,#EXTRA COLUMN
                                         "Average Purchase Mean" : gender_purchases_mean,
                                         "Total Purchase Price" : gender_purchases_sum,
                                         "Avg Total Purchase per Person" : avg_total_purch_per_person})

purch_analysis_gender_df.style.format({"Average Purchase Mean":"${:,.2f}",
                                       "Total Purchase Price":"${:,.2f}",
                                       "Avg Total Purchase per Person":"${:,.2f}",
                                       "Percent of Purchases":"{:,.2f}%"})

Unnamed: 0,Purchase Count,Percent of Purchases,Average Purchase Mean,Total Purchase Price,Avg Total Purchase per Person
Female,113,14.00%,$3.20,$361.94,$4.47
Male,652,84.00%,$3.02,"$1,967.64",$4.07
Other / Non-Disclosed,15,2.00%,$3.35,$50.19,$4.56


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [10]:
#ESTABLISH AGE BINS AND GROUPS(index)
age_bins = [0, 9.99, 14.99, 19.99, 24.99, 29.99, 34.99, 39.99, 149.99]
group_names = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

#DISTRIBUTE THE AGES TO THEIR BINS
data["Age Group"] = pd.cut(data["Age"],age_bins, labels=group_names)
data

#ADD "Age Group" AS A COLUMN TO MY ORIGINAL DF "data"
age_group = data.groupby("Age Group")

#GET UNIQUE PLAYERS BY AGE GROUP
total_count_age = age_group["SN"].nunique()

#CALCULATE TOTALS AND PERCENTAGES BY AGE GROUP
percentage_by_age = (total_count_age/total_players) * 100

#CREATE DF WITH DATA
age_demographics_df = pd.DataFrame({"Players In Age Group" : total_count_age,
                                    "Percentage of Total" : percentage_by_age})

#FORMATING
age_demographics_df.index.name = "Age Group"
age_demographics_df.style.format({"Percentage of Total":"{:,.2f}%"})

Unnamed: 0_level_0,Players In Age Group,Percentage of Total
Age Group,Unnamed: 1_level_1,Unnamed: 2_level_1
<10,17,2.95%
10-14,22,3.82%
15-19,107,18.58%
20-24,258,44.79%
25-29,77,13.37%
30-34,52,9.03%
35-39,31,5.38%
40+,12,2.08%


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [11]:
#CALCULATIONS

#COUNT PURCHASES BY AGE GROUP
purchase_count_age = age_group["Purchase ID"].count()

#AVERAGE PURCHASES BY AGE GROUP
avg_purchase_price_age = age_group["Price"].mean()

#TOTAL PURCHASES BY AGE GROUP
total_purchase_value = age_group["Price"].sum()

#AVERAGE PURCHASES PER PERSON WITHIN AGE GROUP
avg_purchase_per_person_age = total_purchase_value/total_count_age

#CREATE DF WITH DATA
age_demographics = pd.DataFrame({"Purchase Count": purchase_count_age,
                                 "Avg Purchase Price": avg_purchase_price_age,
                                 "Total Purchase Value":total_purchase_value,
                                 "Avg Purchase Total per Person": avg_purchase_per_person_age})

#FORMATTING
age_demographics.index.name = "Age Group"
age_demographics.style.format({"Avg Purchase Price":"${:,.2f}",
                               "Total Purchase Value":"${:,.2f}",
                               "Avg Purchase Total per Person":"${:,.2f}"})

Unnamed: 0_level_0,Purchase Count,Avg Purchase Price,Total Purchase Value,Avg Purchase Total per Person
Age Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<10,23,$3.35,$77.13,$4.54
10-14,28,$2.96,$82.78,$3.76
15-19,136,$3.04,$412.89,$3.86
20-24,365,$3.05,"$1,114.06",$4.32
25-29,101,$2.90,$293.00,$3.81
30-34,73,$2.93,$214.00,$4.12
35-39,41,$3.60,$147.67,$4.76
40+,13,$2.94,$38.24,$3.19


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [12]:
data.head(0)

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price,Age Group


In [13]:
#CALCULATIONS

#GROUP PURCHASES BY PLAYER
players_grouped = data.groupby("SN")

#COUNT PURCHASES BY PLAYER
puchase_count_player = players_grouped["Purchase ID"].count()

#AVERAGE PURCHASE PRICE BY PLAYER
puchase_average_player = players_grouped["Price"].mean()

#TOTAL PURCHASE VALUE BY PLAYER
puchase_total_player = players_grouped["Price"].sum()

#CREATE DF WITH DATA
top_spenders_df = pd.DataFrame({"Purchase Count" : puchase_count_player,
                                "Average Purchase Price" : puchase_total_player,
                                "Total Purchase Value" : puchase_total_player})

#SORT "TOTAL PURCHASE VALUE" COLUMN IN DECENDING ORDER TO GET TOP 5 SPENDERS
top_spenders_df = top_spenders_df.sort_values(["Total Purchase Value"] , ascending=False).head()

#FORMATTING
top_spenders_df.style.format({"Average Purchase Price":"${:,.2f}",
                              "Total Purchase Value":"${:,.2f}"})

Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,$18.96,$18.96
Idastidru52,4,$15.45,$15.45
Chamjask73,3,$13.83,$13.83
Iral74,4,$13.62,$13.62
Iskadarya95,3,$13.10,$13.10


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [14]:
data.head(0)

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price,Age Group


In [36]:
#GATHER Item ID, Item Name, and Item Price columns
items = data[["Item ID", "Item Name", "Price"]]

#GROUP BY "Item ID" and "Item Name"
items_grouped = items.groupby(["Item ID" , "Item Name"])

#CALCULATIONS
#HOW MANY TIMES AN ITEM HAS BEEN PURCHASES
item_purch_count = items_grouped["Item ID"].count()

# #PRICE PER ITEM
# item_price = items_grouped["Price"]

#TOTAL VALUE OF PURCHASES BY ITEM
item_purchase_value = items_grouped["Price"].sum()

#PRICE PER ITEM
item_price = item_purchase_value/item_purch_count

#CREATE DF WITH DATA
most_popular_items_df = pd.DataFrame({"Purchase Count" : item_purch_count,
                                      "Item Price" : item_price,
                                      "Total Purchase Value" : item_purchase_value})

#SORT "TOTAL PURCHASE VALUE" COLUMN IN DECENDING ORDER TO GET TOP 5 SPENDERS
most_popular_items_df = most_popular_items_df.sort_values(["Purchase Count"] , ascending=False).head()
most_popular_items_df

#FORMATTING
most_popular_items_df.style.format({"Item Price":"${:,.2f}",
                                    "Total Purchase Value":"${:,.2f}"})

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13,$4.61,$59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
145,Fiery Glass Crusader,9,$4.58,$41.22
132,Persuasion,9,$3.22,$28.99
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [37]:
#SORT THE "MOST PROFITABLE ITEMS" TABLE BY PURCHASE VALUE
most_profitable_items = most_popular_items_df.sort_values(["Total Purchase Value"] , ascending=False).head()
most_profitable_items

#FORMATTING
most_profitable_items.style.format({"Item Price":"${:,.2f}",
                                    "Total Purchase Value":"${:,.2f}"})

Unnamed: 0_level_0,Unnamed: 1_level_0,Purchase Count,Item Price,Total Purchase Value
Item ID,Item Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
92,Final Critic,13,$4.61,$59.99
178,"Oathbreaker, Last Hope of the Breaking Storm",12,$4.23,$50.76
145,Fiery Glass Crusader,9,$4.58,$41.22
108,"Extraction, Quickblade Of Trembling Hands",9,$3.53,$31.77
132,Persuasion,9,$3.22,$28.99
