### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
from matplotlib import pyplot as plt
from scipy import stats
import numpy as np


# File to Load (Remember to Change These)
hop = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_df = pd.read_csv(hop)






## Player Count

* Display the total number of players


In [2]:
players_df = purchase_df

#total_players=player_df["SN"].nunique()
players_df_count = len(purchase_df["SN"].value_counts())

unique_player_df=players_df.loc[:, ["SN","Gender","Age"]].drop_duplicates()


total_unique=(len(unique_player_df))
print(total_unique)

unique_player_df.head()


576


Unnamed: 0,SN,Gender,Age
0,Lisim78,Male,20
1,Lisovynya38,Male,40
2,Ithergue48,Male,24
3,Chamassasya86,Male,24
4,Iskosia90,Male,23


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [3]:
#create variables for each calculation im trying to find
#unique_items
unique_items=players_df["Number of Unique Items"] = len(purchase_df["Item ID"].value_counts())
players_df.loc[:, ["Number of Unique Items"]].head(1)
#average price
average_price=players_df["Price"].mean()

#use .unique for the unique items
#number of purchases
#total_puchases=players_df["Purchase ID"].nunique()
total_purchases= players_df["Purchase ID"].count()
#total revenue

total_revenue=players_df["Price"].sum()



summary_df = pd.DataFrame({"Number of Unique Items": [unique_items],
                           "Average Price": [average_price],
                           "Number of Purchases": [total_purchases],
                           "Total Revenue": [total_revenue]})

summary_df
                           #"Best Bitcoin Close": [bitcoin_close],

Unnamed: 0,Number of Unique Items,Average Price,Number of Purchases,Total Revenue
0,179,3.050987,780,2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [4]:

#total count of male
unique_player_df.groupby("Gender").count()

unique_player_counts=unique_player_df["Gender"].value_counts()
unique_player_percents=(unique_player_counts/total_unique)*100

#print(unique_player_percents)


gender_demographics_df=pd.DataFrame({"total count":unique_player_counts,"Percent":unique_player_percents})
gender_demographics_df

Unnamed: 0,total count,Percent
Male,484,84.027778
Female,81,14.0625
Other / Non-Disclosed,11,1.909722



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [5]:
price_average=players_df.groupby(["Gender"]).mean()["Price"]
price_average
price_total=players_df.groupby(["Gender"]).sum()["Price"]
price_count=players_df.groupby(["Gender"]).count()["Price"]
average_total_bygender=price_total/gender_demographics_df["total count"]


purchase_summery=pd.DataFrame({"Count":price_count,"total":price_total,"Average":price_average,"Total per person":average_total_bygender})

purchase_summery

Unnamed: 0_level_0,Count,total,Average,Total per person
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,113,361.94,3.203009,4.468395
Male,652,1967.64,3.017853,4.065372
Other / Non-Disclosed,15,50.19,3.346,4.562727


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [6]:
#0-10,10-14,15-19,20-24,25-29,30-34,35-39,40+
bins=[0,9,14,19,24,29,34,39,200]
bin_labels=["<10","10-14","15-19","20-24","25-29","30-34","35-39","40+"]

#create df new df for ages 
age_df=unique_player_df
#print(age_df)

#age_df["Age Group"] = pd.cut(age_df["Age"], bins, labels=bin_labels, include_lowest=True)
age_df["Age Group"] = pd.cut(age_df["Age"], bins, labels=bin_labels,include_lowest=True)
#age_groups = age_df.groupby("Age")
age_group_total=age_df["Age Group"].value_counts()
age_df.head()
#summary_df

age_groups=pd.DataFrame({"Total Count": age_group_total,"Total Percent of players":(age_group_total/total_unique*100)})
age_groups


Unnamed: 0,Total Count,Total Percent of players
20-24,258,44.791667
15-19,107,18.576389
25-29,77,13.368056
30-34,52,9.027778
35-39,31,5.381944
10-14,22,3.819444
<10,17,2.951389
40+,12,2.083333


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [38]:
bins=[0,9,14,19,24,29,34,39,200]
bin_labels=["<10","10-14","15-19","20-24","25-29","30-34","35-39","40+"]

#purchase count, avg. purchase price, avg. purchase total per person etc. in the table below

age_grouped=purchase_df.loc[:, ["SN","Purchase ID","Age","Price"]].drop_duplicates()

age_grouped["Age Group"] = pd.cut(age_grouped["Age"], bins, labels=bin_labels,include_lowest=True)
age_groups=age_grouped.groupby("Age Group")
#total purchase price
total_purchase_value = age_groups["Price"].sum()
unique_sn=age_grouped["SN"].nunique()
#purchase counts
purchase_count=age_groups["Purchase ID"].value_counts()

#avg. purchase price
avg_purch_price=age_groups["Price"].mean()

#avg. purchase total per person
avg_purch_perperson=total_purchase_value/purchase_count


summary_purchase_by_age=pd.DataFrame({"Purchase Count": purchase_count,
                                      "Average Purchase Price":avg_purch_price,
                                      "Average Purchase Total Per Person":avg_purch_perperson})





summary_purchase_by_age
#summary_purchase_by_age.index.name = None


#summary_purchase_by_age=pd.DataFrame({"Purchase Count": purchase_count,
 #                                     "Average Purchase Price":avg_purch_price,
  #                                    "Average Purchase Total Per Person":avg_purch_perperson})













#age_df["Age Group"] = pd.cut(age_df["Age"], bins, labels=bin_labels,include_lowest=True)
#age_grouped=purchase_df.groupby("Age Group")
#purchase_sums=purchase_df["Price"].sum()
#avg_purchase_price=(purchase_sums/purchase_count)
#avg_purchase_price
#avg_purchase_perperson
#avg_purchase_perperson=age_grouped["Price"].sum()
#avg_purchase_price=age_df["Purchase ID"].count()
#avg_purchase_perperson=(purchase_sums/total_unique) 
#purchase_count=age_df["Purchase ID"]


#purchase_df["Age Group"]
#summary_purchase_by_age=pd.DataFrame({"Purchase Count": purchase_count,
                                     # "Average Purchase Price":avg_purchase_price})

#"Age Group": age_df["Age Group"],
#summary_purchase_by_age.head()
#purchase_df.head()


ValueError: Buffer dtype mismatch, expected 'Python object' but got 'signed char'

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

