### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# File to Load (Remember to Change These)
file_to_load = "purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
df = pd.read_csv(file_to_load)
df1 = pd.read_csv(file_to_load)
df.head(5)

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


## Player Count

* Display the total number of players


In [2]:
players_list = df1.iloc[:,1]
  
# sorting by first name 

# dropping ALL duplicte values 
df.drop_duplicates(subset ="SN", 
                     keep = "first", inplace = True) 
  
no_duplicates_players = df['SN'].count()

player_count = {"Total Players":no_duplicates_players}
pd.DataFrame(player_count, index=[0])

Unnamed: 0,Total Players
0,576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [4]:
u_items = df['Item Name'].unique()
u_avg = round(df['Price'].mean(),2)
n_purchases = df1['Item Name'].count()
total_revenue = df1['Price'].sum()


count=0
for i in u_items:
    count += 1




summary_df = {'Number of Unique Items': count, 'Average Price': u_avg, 'Number of Purchases': n_purchases, "Total Revenue":"$"+str(total_revenue)}
purchasing_analysis = pd.DataFrame(summary_df, index=[0])
purchasing_analysis

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,173,3.07,780,$2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [41]:
p_m = df.groupby('Gender')

#grouped gender sorted by gender, droped extra tables, added column names

g_demo = g_demo.drop(['Purchase ID',"Age",'SN','Item ID'], axis=1)

g_demo.columns = ['Total Count', 'Percentage Players']
g_demo = p_m.count().sort_values(by='Total Count', ascending=True)


count_male = (df['Gender'].values == 'Male').sum()
count_female = (df['Gender'].values == 'Female').sum()
count_other = (df['Gender'].values == "Other / Non-Disclosed").sum()
total = df['Gender'].count()

count_f = (df['Gender'].values == 'Female').sum()
count_m = (df['Gender'].values == 'Male').sum()
count_o = (df['Gender'].values == "Other / Non-Disclosed").sum()


percentage_female = round((count_f / total) * 100,2)
percentage_male = round((count_m / total) * 100,2)
percentage_other = round((count_o / total)* 100,2)


gender_demographics = pd.DataFrame({"Total Count":[count_male,count_female,count_other],
                          "Percentage Players": [percentage_male,percentage_female,percentage_other]})
gender_demographics

KeyError: "['Purchase ID' 'Age' 'SN' 'Item ID'] not found in axis"


## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [None]:
g_gender = df1.groupby('Gender')



male_purchases = (df.Gender == 'Male').sum()
female_purchases = (df.Gender == 'Female').sum()
other_purchases = (df.Gender == 'Other / Non-Disclosed').sum()

filter_male_price = df[(df.Gender == 'Male') &
   (df.Price)]
filter_female_price = df[(df.Gender == 'Female') &
   (df.Price)]
filter_other_price = df[(df.Gender == 'Other / Non-Disclosed') &
   (df.Price)]

avg_female_price = round(filter_female_price.Price.mean(),3)
avg_male_price = round(filter_male_price.Price.mean(),2)
avg_other_price = round(filter_other_price.Price.mean(),2)


t_female = df1[(df1.Gender == 'Female') &
   (df1.Price)]
t_male = df1[(df1.Gender == 'Male') &
   (df1.Price)]
t_other = df1[(df1.Gender == 'Other / Non-Disclosed') &
   (df1.Price)]
tfp = t_female.Price.sum()
tmp = t_male.Price.sum()
top = t_other.Price.sum()

avg_total_per_female = round(tfp / count_female,2)
avg_total_per_male = round(tmp / count_male,2)
avg_total_per_other = round(top / count_other,2)



purchasing_analysis = pd.DataFrame({'Purchase_count':["",female_purchases, male_purchases, other_purchases],
                     'Average Puchase Price':['','$'+str(avg_female_price),'$'+str(avg_male_price),
                                              '$'+str(avg_other_price)],
                     'Total Purchase Value':['','$'+str(tfp),'$'+str(tmp),'$'+str(top)],
                     'Avg Total Purchase per Person':['',avg_total_per_female, avg_total_per_male, 
                                                      avg_total_per_other]}, 
                    index=['Gender','Female','Male','Other / Non-Disclosed'])
purchasing_analysis


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [None]:
bins = [0, 10, 15, 20, 25, 30, 35, 40,100]


ten = (df.Age < 10).sum()
forteen = ((df.Age < 15) & (df.Age >= 10)).sum()
nineteen = ((df.Age < 20) & (df.Age >= 15)).sum()
twentyfor = ((df.Age < 25) & (df.Age >= 20)).sum()
twentynine = ((df.Age < 30) & (df.Age >= 25)).sum()
thirtyfor = ((df.Age < 35) & (df.Age >= 30)).sum()
thirtynine = ((df.Age < 40) & (df.Age >= 35)).sum()
forty = (df.Age >= 40).sum()

percentage_players = [round((ten/no_duplicates_players)*100,2), round((forteen/no_duplicates_players)*100,2), 
                      round((nineteen/no_duplicates_players)*100,2), round((twentyfor/no_duplicates_players)*100,2),
                     round((twentynine/no_duplicates_players)*100,2), round((thirtyfor/no_duplicates_players)*100,2),
                     round((thirtynine/no_duplicates_players)*100,2), round((forty/no_duplicates_players)*100,2)]

group_names = ['<10','10-14','15-19','20-24','25-29','30-34','35-39','40+']

df1["Percentage of Players"] = pd.cut(df1.Age, bins, labels=group_names)
df1.head()
age_demographics = pd.DataFrame({'Total Count':[ten,forteen,nineteen,twentyfor,
                                                twentynine,thirtyfor,thirtynine,forty],
                                'Percentage of Players':percentage_players}, index=group_names)
age_demographics

## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [None]:
bins = [10, 15, 20, 25, 30, 35, 40, 100, 1000]
group_names = ['10-14','15-19','20-24','25-29','30-34','35-39','40+','<10']


purchase_count_two = ((df1.Age < 15) & (df1.Age >= 10)).sum()
purchase_count_three = ((df1.Age < 20) & (df1.Age >= 15)).sum()
purchase_count_four = ((df1.Age < 25) & (df1.Age >= 20)).sum()
purchase_count_five = ((df1.Age < 30) & (df1.Age >= 25)).sum()
purchase_count_six = ((df1.Age < 35) & (df1.Age >= 30)).sum()
purchase_count_seven = ((df1.Age < 40) & (df1.Age >= 35)).sum()
purchase_count_eight = (df1.Age >= 40).sum()
purchase_count_one = (df1.Age < 10).sum()


total_purchase_value_two = df1[((df1.Age < 15) & (df1.Age >= 10))]['Price'].sum()
total_purchase_value_three = df1[((df1.Age < 20) & (df1.Age >= 15))]['Price'].sum()
total_purchase_value_four = df1[((df1.Age < 25) & (df1.Age >= 20))]['Price'].sum()
total_purchase_value_five = round(df1[((df1.Age < 30) & (df1.Age >= 25))]['Price'].sum(),2)
total_purchase_value_six = round(df1[((df1.Age < 35) & (df1.Age >= 30))]['Price'].sum(),2)
total_purchase_value_seven = df1[((df1.Age < 40) & (df1.Age >= 35))]['Price'].sum()
total_purchase_value_eight = df1[(df1.Age >= 40)]['Price'].sum()
total_purchase_value_one = df1[(df1.Age < 10)]['Price'].sum()


avg_purchase_price_two = total_purchase_value_two / purchase_count_two
avg_purchase_price_three = total_purchase_value_three / purchase_count_three
avg_purchase_price_four = total_purchase_value_four / purchase_count_four
avg_purchase_price_five = total_purchase_value_five / purchase_count_five
avg_purchase_price_six = total_purchase_value_six / purchase_count_six
avg_purchase_price_seven = total_purchase_value_seven / purchase_count_seven
avg_purchase_price_eight = total_purchase_value_eight / purchase_count_eight
avg_purchase_price_one = total_purchase_value_one / purchase_count_one


avg_total_purchase_per_person_two = total_purchase_value_two / forteen
avg_total_purchase_per_person_three = total_purchase_value_three / nineteen
avg_total_purchase_per_person_four = total_purchase_value_four / twentyfor
avg_total_purchase_per_person_five = total_purchase_value_five / twentynine
avg_total_purchase_per_person_six = total_purchase_value_six / thirtyfor
avg_total_purchase_per_person_seven = total_purchase_value_seven / thirtynine
avg_total_purchase_per_person_eight = total_purchase_value_eight / forty
avg_total_purchase_per_person_one = total_purchase_value_one / ten




df1["Purchase_count"] = pd.cut(df1['Age'], bins, labels=group_names)
df1.head(2)   
purchasing_analysis = pd.DataFrame({'Purchase Count':[purchase_count_two,purchase_count_three,
                                                     purchase_count_four,purchase_count_five,purchase_count_six,
                                                      purchase_count_seven,purchase_count_eight,purchase_count_one],
                                   'Average Purchase Price':['$'+str(round(avg_purchase_price_two,2)),
                                                             '$'+str(round(avg_purchase_price_three,2)),
                                                            '$'+str(round(avg_purchase_price_four,2)),
                                                            '$'+str(round(avg_purchase_price_five,2)),
                                                            '$'+str(round(avg_purchase_price_six,2)),
                                                            '$'+str(round(avg_purchase_price_seven,2)),
                                                            '$'+str(round(avg_purchase_price_eight,2)),
                                                            '$'+str(round(avg_purchase_price_one,2)),],
                                    
                                   'Total Purchase Value':['$'+ str(total_purchase_value_two),
                                                            '$'+ str(total_purchase_value_three),'$'+ str(total_purchase_value_four),
                                                            '$'+ str(total_purchase_value_five),'$'+ str(total_purchase_value_six),
                                                             '$'+ str(total_purchase_value_seven),'$'+ str(total_purchase_value_eight),
                                                            '$'+ str(total_purchase_value_one)],
                                   'Avg Total Purchase per Person':['$'+str(round(avg_total_purchase_per_person_two,2)),
                                                             '$'+str(round(avg_total_purchase_per_person_three,2)),
                                                            '$'+str(round(avg_total_purchase_per_person_four,2)),
                                                            '$'+str(round(avg_total_purchase_per_person_five,2)),
                                                            '$'+str(round(avg_total_purchase_per_person_six,2)),
                                                            '$'+str(round(avg_total_purchase_per_person_seven,2)),
                                                            '$'+str(round(avg_total_purchase_per_person_eight,2)),
                                                            '$'+str(round(avg_total_purchase_per_person_one,2)),]}, index=group_names)
purchasing_analysis

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [None]:
df1['SN'].value_counts().head(5)

df1.loc[df1['SN'] == 'Lisosia93']['Price'].sum()

In [8]:
df2 = pd.read_csv('purchase_data.csv')

grouped = df2.groupby(['SN'])


purchase_count = (grouped.count()).sort_values(by='Price', ascending=False)

avg_purchase_price_sn = grouped['Price'].sum().sort_values(ascending=False).to_list()

grouped_total_purchase = grouped['Price'].sum().sort_values(ascending=False).to_list()




top_spenders = pd.DataFrame({'Purchase Count':purchase_count['Price'],
                             'Average Purchase Price':round(avg_purchase_price_sn/purchase_count['Price'],2),
                             'Total Purchase Value':grouped_total_purchase})
top_spenders.head(5)


Unnamed: 0_level_0,Purchase Count,Average Purchase Price,Total Purchase Value
SN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lisosia93,5,3.79,18.96
Iral74,4,3.86,15.45
Idastidru52,4,3.46,13.83
Asur53,3,4.54,13.62
Inguron55,3,4.37,13.1


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

