### Heroes Of Pymoli Data Analysis
* Of the 1163 active players, the vast majority are male (84%). There also exists, a smaller, but notable proportion of female players (14%).

* Our peak age demographic falls between 20-24 (44.8%) with secondary groups falling between 15-19 (18.60%) and 25-29 (13.4%).  
-----

### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd
import numpy as np

# File to Load (Remember to Change These)
file_to_load = "purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

FileNotFoundError: [Errno 2] File b'purchase_data.csv' does not exist: b'purchase_data.csv'

## Player Count

* Display the total number of players

In [None]:
player_count = purchase_data["SN"].nunique()
player_count

summary_table = pd.DataFrame({"Total Number of Players": [player_count]})
summary_table

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [None]:
numberofitems = len(purchase_data["Item ID"].unique())
average_price = purchase_data["Price"].mean()
total_purchases = purchase_data["Purchase ID"].count()
total_rev = purchase_data["Price"].sum()

summary_table2 = pd.DataFrame({"Number of Unique Items": [numberofitems],
                              "Average Price":[average_price],
                              "Number of Purchases":[total_purchases],
                              "Total Revenue":[total_rev]})

summary_table2["Total Revenue"] = summary_table2["Total Revenue"].map("${:,.2f}".format)
summary_table2["Average Price"] = summary_table2["Average Price"].map("${:,.2f}".format)

summary_table2

## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [None]:
gender_analysis_df = purchase_data.loc[:, ["SN", "Gender"]]

#this new set df is to drop duplicates and count unique screenames, not be used for next analysis if you need multiple SNs
newset_df = gender_analysis_df.drop_duplicates(subset="SN", keep='first', inplace=False)
newset_df.head()
newset_df["SN"].count()

In [None]:
#counts screen names afer dups removed
totalplayerct = newset_df["SN"].count()

#counts number of each values in Gender field to see how many of each
gendercounts = newset_df["Gender"].value_counts()

#put that into new data frame
gencounts_df = pd.DataFrame(gendercounts)

#add column with percentages
genpercents = gencounts_df["Gender"]/totalplayerct
gencounts_df["Percent of Total"] = genpercents

#gencounts_df.rename(columns={"Gender":"Total Count"}
                          
#gencounts_df["Percent of Total"] = gencounts_df["Percent of Total"].map("{0:.2f}%".format(genpercents*100))

gencounts_df


## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [None]:
#grouping by SN and Gender to find average total per person
grouped = purchase_data.groupby(["SN","Gender"])
perSNsum = grouped["Price"].sum()
perSNsum

#index = [grouped["Gender"]]
sumtable = pd.DataFrame({"Avg Price": perSNsum})

final = sumtable.reset_index(level="SN")
final

final2 = final.groupby("Gender")
perSNsum2 = final2["Avg Price"].mean()

In [None]:
#purchase_data is my original dataframe with all values
groupedgender_df = purchase_data.groupby(["Gender"])

#purchase count
genavgct = groupedgender_df["Price"].count()                

#average purchase price
genavgprice = groupedgender_df["Price"].mean()  

#average perchase total per person by gender - NOT SURE IF THIS IS RIGHT YET
genavgtotal = groupedgender_df["Price"].sum() 
  
#merge = pd.merge(genavgct_df,genavgprice_df,on=None,how="outer")
#merge.head()

gentable = pd.DataFrame({"Purchase Count": genavgct,
                              "Average Price":genavgprice,
                              "Total Purchase Value":genavgtotal,
                        "Average Total Per Person":perSNsum2})
gentable.head()


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [None]:
bins = [0, 9, 14, 19, 24, 29, 34, 39, 50]
agegroups = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

purchase_data["Age Range"] = pd.cut(purchase_data["Age"], bins, labels=agegroups)

groupedage_df = purchase_data.groupby(["Age Range"])

perofplayers = (groupedage_df["SN"].nunique())/player_count
numofplayers = groupedage_df["SN"].nunique()


agetable = pd.DataFrame({"Number of Players":numofplayers,
                        "Percentage of Players":perofplayers})
agetable


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [None]:

totalitems = groupedage_df["Price"].count()
totalspent = groupedage_df["Price"].sum()
avgspent = groupedage_df["Price"].mean()

agetable2 = pd.DataFrame({"Purchase Count": totalitems,
                              "Average Purchase Price":avgspent,
                              "Total Purchase Value":totalspent,})
agetable2

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [None]:
SNgrouped_df = purchase_data.groupby(["SN"])
SNgrouped_df.head()

#purchase count
SNpurct = SNgrouped_df["Price"].count()                

#average purchase price
SNavgpurprice = SNgrouped_df["Price"].mean()  

#Total Purchase Value
totpurchval = SNgrouped_df["Price"].sum() 


topspenders = pd.DataFrame({"Purchase Count": SNpurct,
                              "Average Purchace Price":SNavgpurprice,
                              "Total Purchase Value":totpurchval})

sortedtopspenders = topspenders.sort_values("Total Purchase Value", ascending=False)
sortedtopspenders


## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



In [None]:
itemanalysis_df = purchase_data.loc[:, ["Item ID", "Item Name", "Price"]]

itemgrouped_df = itemanalysis_df.groupby(["Item ID","Item Name"])

#Purchase Count
itemct = itemgrouped_df["Item Name"].count()

#Item Price
itprice = itemgrouped_df["Price"].unique()

#Total Purchase Value
itsum = itemgrouped_df["Price"].sum()

index = [(itemgrouped_df["Item ID"]), (itemgrouped_df["Item Name"])]
popitems = pd.DataFrame({"Purchase Count": itemct,
                              "Item Price":itprice,
                              "Total Purchase Value":itsum})

sortedpopitems = popitems.sort_values("Purchase Count", ascending=False)
sortedpopitems.head()

finale = sortedpopitems.iloc[0:5, :]
finale


## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame



In [None]:
sortedpopitems2 = popitems.sort_values("Total Purchase Value", ascending=False)
sortedpopitems2.head()