# Heroes Of Pymoli Data Analysis

## Takeaways on observable trends on the data 

- There are 780 purchases in the dataset from 576 players. Among the players, 84% were men whereas only 14% were women. Purchases were also mainly made by male players, that 652 purchases were made by male. However, female players tended to spend more. Female players spent \$4.47 per person on average, exceeding male players by \$0.4.<br><br>
- The majority of the players were composed of 20-24 years olds (44.79%). However, 35-39 years olds tended to spend more on average that they spent \$4.76 per person. People less then 10 years old spent 
\$4.54 per person, ranked second. 20-24 years olds spent \$4.32, ranked third.<br><br>

-  Most popular item in the dataset was Oathbreaker, Last Hope of the Breaking Storm, which was purchased 12 times. It was also the most profitable itme. 

## Analysis

In [None]:
# Dependencies and Setup
import pandas as pd

In [None]:
#Load the data. Data stored in the same directory as the notebook
file = pd.read_csv('04-Pandas_homework_HeroesOfPymoli_Resources_purchase_data.csv')

#Make dataframe
df = pd.DataFrame(file)

#Display first 5 rows of the data
df.head(5)

### Data cleaning and descriptives

In [None]:
#Check if any value is missing
df.isnull().values.any()

In [None]:
#Descriptives
df.describe()

### Player Count

In [None]:
#Count the number of unique players
count_player = df['SN'].nunique()
player_count_df = pd.DataFrame({'Total Players': [count_player]})
player_count_df

### Purchasing Analysis (Total)

In [None]:
#Count number of unique items
count_items = df['Item ID'].nunique()
count_items

In [None]:
#Average Purchase Price
average_price = df['Price'].mean()
average_price

In [None]:
#Total number of purchases
count_purchase = df['Purchase ID'].count()
count_purchase

In [None]:
#Total revenue
revenue_sum = df['Price'].sum()
revenue_sum

In [None]:
#display result in a dataframe
purchasing_analysis_df = pd.DataFrame({'Number of Unique Items':[count_items],
                                       'Average Price':[average_price],
                                       'Number of Purchases':[count_purchase],
                                       'Total Revenue':[revenue_sum]})

purchasing_analysis_df.style.format({'Average Price': '${:.2f}',
                                    'Total Revenue': '${:.2f}'})

### Gender Demographics

In [None]:
#filter duplicated players
dedupe_players_df = df.drop_duplicates('SN')
#Check if all the duplicated players have been successfully deduped in 
#the dedupe dataframe 
dedupe_players_df.count()

In [None]:
#Gender distribution in players (count)
demographics_count = dedupe_players_df.groupby(['Gender'])['Gender'].count()
demographics_count

In [None]:
#Gender distribution in players (%)
demographics_percents = demographics_count/dedupe_players_df['Gender'].count()
demographics_percents

In [None]:
#Dislay result in a dataframe, sorted high to low by Total Count
demo_df=pd.DataFrame({'Total Count':demographics_count, 
                      'Percents':demographics_percents})
demo_df_sorted=demo_df.sort_values(['Total Count'],ascending = False)
demo_df_sorted.style.format({'Percents': '{:.2%}'})

### Purchasing Analysis

In [None]:
#purchase count by gender
purchase_count_gender = df.groupby(['Gender'])['Purchase ID'].count()
purchase_count_gender

In [None]:
#average purchase price by gender
purchase_avg_price_gender = df.groupby(['Gender'])['Price'].mean()
purchase_avg_price_gender

In [None]:
#total purchase value by gender
total_purchase_value_gender = df.groupby(['Gender'])['Price'].sum()
total_purchase_value_gender

In [None]:
#Avg total purchase per person by gender
avg_total_purchase_gender = total_purchase_value_gender / demographics_count
avg_total_purchase_gender

In [None]:
#display all the data in one dataframe
purchasing_analysis = pd.DataFrame({'Purchase Count':purchase_count_gender,
                                    'Average Purchase Price':purchase_avg_price_gender,
                                    'Total Purchase Value':total_purchase_value_gender,
                                    'Avg Total Purchase per Person':avg_total_purchase_gender})

purchasing_analysis_sorted = purchasing_analysis.sort_values('Purchase Count',ascending=False)
purchasing_analysis_sorted.style.format({'Average Purchase Price':'${:.2f}',
                                    'Total Purchase Value':'${:.2f}',
                                    'Avg Total Purchase per Person':'${:.2f}'})

### Age Demogrpahics

In [None]:
#Craete bins
age_bins = [0,9,14,19,24,29,34,39,100]

#Create labels for bins
age_name = ['<10','10~14','15~19','20~24','25~29','30~34','35~39','>40']  

#implement the age brack back to dataset
df['Age Bracket']=pd.cut(df['Age'],age_bins,labels=age_name)

#drop duplicated players
dedupe_players_df = df.drop_duplicates('SN')

#display the dataframe
dedupe_players_df.head()

In [None]:
#Calculate purchase count by age
purchase_count_by_age_demo = dedupe_players_df.groupby(['Age Bracket'])['Purchase ID'].count()
purchase_count_by_age_demo

In [None]:
#calculate the percentage of players by age
purchase_percents_by_age_demo = purchase_count_by_age_demo / dedupe_players_df['Age'].count()
purchase_percents_by_age_demo

In [None]:
#display the final result in  a single dataframe
age_demographics_df = pd.DataFrame({'Total Count': purchase_count_by_age, 
                                   'Percentage of Players': purchase_percents_by_age})
age_demographics_df.head().style.format({'Percentage of Players':'{:.2%}'})

### Purchaing Analysis (Age)

In [None]:
#Purchasing count by age (using undeduped player data)
purchase_count_by_age = df.groupby(['Age Bracket'])['Purchase ID'].count()
purchase_count_by_age

In [None]:
#Calculate average purchase price by age
purchase_price_mean_by_age=df.groupby(['Age Bracket'])['Price'].mean()
purchase_price_mean_by_age

In [None]:
#Calculate total purcahse value
purchase_value_by_age = df.groupby(['Age Bracket'])['Price'].sum()
purchase_value_by_age

In [None]:
#Average total purchase per person
purchase_per_person_by_age = purchase_value_by_age/purchase_count_by_age_demo
purchase_per_person_by_age

In [None]:
#Put result into a single dataframe
age_purchase_df = pd.DataFrame({'Purchase Count': purchase_count_by_age,
                                'Average Purchase Price': purchase_price_mean_by_age,
                                'Total Purchase Value': purchase_value_by_age,
                                'Avg Total Purchase per Person': purchase_per_person_by_age})

age_purchase_df.style.format({'Average Purchase Price': '${:.2f}',
                              'Total Purchase Value': '${:.2f}',
                              'Avg Total Purchase per Person':'${:.2f}'})

### Top Spenders

In [None]:
#Display player name by number of items purchased
top_spender_count = df.groupby(['SN'])['Purchase ID'].count()
top_spender_count.sort_values(ascending=False).head()

In [None]:
#Calculate average purchase price per player
average_purchase_price_per_player = df.groupby(['SN'])['Price'].mean()
average_purchase_price_per_player.head()

In [None]:
#Total purchase value per player
total_purchase_per_player = df.groupby(['SN'])['Price'].sum()
total_purchase_per_player.head()

In [None]:
#display result in one dataframe and sort by total purchase value from high to low
top_spenders = pd.DataFrame({'Purchase Count': top_spender_count,
                             'Average Purchase Price': average_purchase_price_per_player,
                             'Total Purchase Value': total_purchase_per_player})
sorted_top_spenders_df = top_spenders.sort_values('Total Purchase Value',ascending = False)
sorted_top_spenders_df.head().style.format({'Total Purchase Value':'${:.2f}',
                                           'Average Purchase Price';:'${:.2f}'})

### Most Popular Items

In [None]:
#retrieval certain columns for most popular items analysis
popular_item = df[['Item ID','Item Name','Price']]
popular_item.head()

In [None]:
#purchase count per item
purchase_count = popular_item.groupby(['Item ID','Item Name'])['Item ID'].count()
purchase_count.head()

In [None]:
#average price per item
purchase_price = popular_item.groupby(['Item ID','Item Name'])['Price'].mean()
purchase_price.head()

In [None]:
#total purchase value
purchase_value = popular_item.groupby(['Item ID','Item Name'])['Price'].sum()
purchase_value.head()

In [None]:
#Put result into one dataframe
most_popular_items_df = pd.DataFrame({
                                      'Purchase Count': purchase_count,
                                      'Item Price': purchase_price,
                                      'Total Purchase Value': purchase_value})

most_popular_items_df_sorted = most_popular_items_df.sort_values('Purchase Count',ascending=False)
most_popular_items_df_sorted.head().style.format({'Item Price': '${:.2f}',
                                                  'Total Purchase Value': '${:.2f}'})

### Most Profitable Items 

In [None]:
#Resort the previous table to get the most profitable item
most_popular_items_df_sorted = most_popular_items_df.sort_values('Total Purchase Value',ascending=False)
most_popular_items_df_sorted.head().style.format({'Item Price': '${:.2f}',
                                                  'Total Purchase Value': '${:.2f}'})