### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)
purchase_data.head()


Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [2]:
#create dataset of unique players
purchase_data_unique=purchase_data.drop_duplicates(subset="SN")

## Player Count

* Display the total number of players


In [3]:
players=purchase_data_unique["SN"].value_counts()
total_players=players.count()
print(f"Heroes Of Pymoli has {total_players} players")



Heroes Of Pymoli has 576 players


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [4]:
#find number of unique items and average price as variables
item_data = purchase_data[["Item ID","Item Name" ,"Price"]]
#drop duplicate items
unique_item_data=item_data.drop_duplicates(subset="Item ID")


unique_items = unique_item_data["Item ID"].value_counts().count()

average_price= np.round(unique_item_data["Price"].mean(), 2)
print(f"{unique_items} # of unique items with an average price of ${average_price}")

summary_df= pd.DataFrame( {"Name":["item summary"], "Total items sold": [unique_items], "Average Price per Item":[average_price]})
summary_df.head()





183 # of unique items with an average price of $3.04


Unnamed: 0,Name,Total items sold,Average Price per Item
0,item summary,183,3.04


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [5]:
#Counts for each gender
genders = purchase_data_unique["Gender"].value_counts()
genders=genders.index


In [6]:
#determine unique number of males
unique_male=purchase_data_unique.loc[purchase_data_unique["Gender"] == "Male", :]
male=unique_male["SN"].value_counts()
total_male=male.count()
percent_male=np.round((total_male/total_players)*100,1)


#determine uniuqe number of females
unique_female=purchase_data_unique.loc[purchase_data_unique["Gender"] == "Female", :]
female=unique_female["SN"].value_counts()
total_female=female.count()
percent_female=np.round((total_female/total_players)*100,1)

#Determine unique number of Other/ Non-Disclosed
unique_other=purchase_data_unique.loc[purchase_data_unique["Gender"] == "Other / Non-Disclosed", :]
other=unique_other["SN"].value_counts()
total_other=other.count() 
percent_other=np.round((total_other/total_players)*100,1) 


gender_stats=pd.DataFrame({"Gender":genders, "Count":[total_male, total_female, total_other], 
                           "Percentage":[percent_male, percent_female, percent_other]})

gender_stats


Unnamed: 0,Gender,Count,Percentage
0,Male,484,84.0
1,Female,81,14.1
2,Other / Non-Disclosed,11,1.9



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [7]:


male_purchase=purchase_data.loc[purchase_data["Gender"]== "Male",:]
male_purchase_count=male_purchase["SN"].count()
male_avg_price=male_purchase["Price"].mean()
print(male_purchase_count)
female_purchase=purchase_data.loc[purchase_data["Gender"]== "Female",:]
female_purchase_count=female_purchase["SN"].count()
female_avg_price=female_purchase["Price"].mean()

other_purchase=purchase_data.loc[purchase_data["Gender"]== "Other / Non-Disclosed",:]
other_purchase_count=other_purchase["SN"].count()
other_avg_price=male_purchase["Price"].mean()

gender_purchase=pd.DataFrame({"Gender":genders,"Count":[total_male, total_female, total_other],"Percentage":[percent_male, percent_female, percent_other], "Purchase Count":[male_purchase_count, female_purchase_count, other_purchase_count], 
                    "Averge Price":[male_avg_price, female_avg_price, other_avg_price]})
gender_purchase

652


Unnamed: 0,Gender,Count,Percentage,Purchase Count,Averge Price
0,Male,484,84.0,652,3.017853
1,Female,81,14.1,113,3.203009
2,Other / Non-Disclosed,11,1.9,15,3.017853


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [8]:
age_bins_lables =["Child 0-12", "Teen 13-18", "Young Adult 19-30", "Middle Age 30-54", "Senior 55+"]
age_bins=[0,12,19,30,55,110]
purchase_data_unique["Age Demographic"]=pd.cut(purchase_data_unique["Age"], bins = age_bins, labels = age_bins_lables)
#counts for each age group
AgeDemo=purchase_data_unique["Age Demographic"].value_counts()
#percentage for each age group (rounding 2 decimal include here)
AgePercent=np.round(purchase_data_unique["Age Demographic"].value_counts(normalize=True), 2)
#merge into dataframe
age_df=pd.merge(AgeDemo, AgePercent, left_index=True, right_index=True)
age_df.rename(columns = {"Age Demographic_x":"Count", "Age Demographic_y":"Percentage"})

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,Count,Percentage
Young Adult 19-30,360,0.62
Teen 13-18,112,0.19
Middle Age 30-54,70,0.12
Child 0-12,34,0.06
Senior 55+,0,0.0


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [9]:
purchase_data["Age Demographic"]=pd.cut(purchase_data["Age"], bins = age_bins, labels = age_bins_lables)

#function to split purchase data into seperate dataframes for each age demographic

def purchase_age(df, label, string):
    
        df=df.loc[df[f"{string}"]== f"{label}",:]
        return df
    
purchase_data_by_bin = [
    purchase_age(purchase_data, f, "Age Demographic")
    for f in age_bins_lables
]

Child=purchase_data_by_bin[0]
Teen=purchase_data_by_bin[1]
YoungAdult=purchase_data_by_bin[2]
MiddleAge=purchase_data_by_bin[3]
Senior=purchase_data_by_bin[4]

In [10]:

#total purchase by age group

total_pChild=(Child["Item ID"].count())
total_pTeen=(Teen["Item ID"].count())
total_pYoungAdult=(YoungAdult["Item ID"].count())
total_pMiddleAge=(MiddleAge["Item ID"].count())
total_pSenior=(Senior["Item ID"].count())
total_p=[total_pChild, total_pTeen, total_pYoungAdult, total_pMiddleAge, total_pSenior]

# average purchase
total_aChild=(Child["Price"].mean())
total_aTeen=(Teen["Price"].mean())
total_aYoungAdult=(YoungAdult["Price"].mean())
total_aMiddleAge=(MiddleAge["Price"].mean())
total_aSenior=(Senior["Price"].mean())
total_a=np.round([total_aChild, total_aTeen, total_aYoungAdult, total_aMiddleAge, total_aSenior], 2)

#purchase per person
pChild=(Child["Price"].sum())/(AgeDemo.loc["Child 0-12"])
pTeen=(Teen["Price"].sum())/(AgeDemo.loc["Teen 13-18"])
pYoungAdult=(YoungAdult["Price"].sum())/(AgeDemo.loc["Young Adult 19-30"])
pMiddleAge=(MiddleAge["Price"].sum())/(AgeDemo.loc["Middle Age 30-54"])
pSenior=(Senior["Price"].sum())/(AgeDemo.loc["Senior 55+"])
total_per_person=np.round([pChild, pTeen, pYoungAdult, pMiddleAge, pSenior], 2)





In [11]:
#build dataframe
age_purchase=pd.DataFrame({"Age Demographic":age_bins_lables, "Total # of Purchase":total_p,
              "Average $ per Item":total_a,
              "Average $ per User":total_per_person})
age_purchase.head()


Unnamed: 0,Age Demographic,Total # of Purchase,Average $ per Item,Average $ per User
0,Child 0-12,45,3.19,4.22
1,Teen 13-18,142,3.02,3.83
2,Young Adult 19-30,501,3.03,4.21
3,Middle Age 30-54,92,3.15,4.14
4,Senior 55+,0,,


## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [12]:
#top spenders
top_spenders= purchase_data[["SN","Item ID","Price"]]
top_spenders=top_spenders.groupby(['SN'])
top_spenders_count=top_spenders["Item ID"].value_counts()
top_spenders_list=top_spenders["Price"].sum()



In [13]:
topspenderstable = pd.DataFrame({"Total Spent": top_spenders_list})
topspenderstable=topspenderstable.sort_values("Total Spent", ascending=False)
topspenderstable.head()

Unnamed: 0_level_0,Total Spent
SN,Unnamed: 1_level_1
Lisosia93,18.96
Idastidru52,15.45
Chamjask73,13.83
Iral74,13.62
Iskadarya95,13.1


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [14]:
top_item=purchase_data[["Item Name","Item ID","Price"]]
top_item=top_item.groupby(['Item Name', 'Item ID'])
top_item_list=purchase_data[["Item Name","Item ID"]]
s1=top_item["Price"].sum()
s2=top_item["Item ID"].value_counts()

s1=pd.DataFrame({"Total $ Sold": s1})
s2=pd.DataFrame({"Total # Sold": s2})
top_item_df=pd.merge(s1, s2,left_on="Item Name", right_on="Item Name", left_index=False, right_index=False)

top_item_df=top_item_df.sort_values("Total # Sold", ascending=False)
top_item_df.head()

Unnamed: 0_level_0,Total $ Sold,Total # Sold
Item Name,Unnamed: 1_level_1,Unnamed: 2_level_1
"Oathbreaker, Last Hope of the Breaking Storm",50.76,12
"Extraction, Quickblade Of Trembling Hands",31.77,9
Nirvana,44.1,9
Fiery Glass Crusader,41.22,9
Final Critic,39.04,8


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [15]:
top_profit_item=top_item_df.sort_values("Total $ Sold", ascending=False)
top_profit_item.head()

Unnamed: 0_level_0,Total $ Sold,Total # Sold
Item Name,Unnamed: 1_level_1,Unnamed: 2_level_1
"Oathbreaker, Last Hope of the Breaking Storm",50.76,12
Nirvana,44.1,9
Fiery Glass Crusader,41.22,9
Final Critic,39.04,8
Final Critic,39.04,5
