### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)
purchase_data

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44
...,...,...,...,...,...,...,...
775,775,Aethedru70,21,Female,60,Wolf,3.54
776,776,Iral74,21,Male,164,Exiled Doomblade,1.63
777,777,Yathecal72,20,Male,67,"Celeste, Incarnation of the Corrupted",3.46
778,778,Sisur91,7,Male,92,Final Critic,4.19


## Player Count

* Display the total number of players


In [3]:
#Obtain unique screen names of players 
purchase_data_df = purchase_data
unique = purchase_data_df["SN"].unique()
num_uni_players = len(unique)
num_uni_players

576

## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [12]:
#Run basic calculations 
summary = purchase_data_df.describe()
summary


Unnamed: 0,Purchase ID,Age,Item ID,Price
count,780.0,780.0,780.0,780.0
mean,389.5,22.714103,91.755128,3.050987
std,225.310896,6.659444,52.697702,1.169549
min,0.0,7.0,0.0,1.0
25%,194.75,20.0,47.75,1.98
50%,389.5,22.0,92.0,3.15
75%,584.25,25.0,138.0,4.08
max,779.0,45.0,183.0,4.99


In [19]:
#Run basic calculations 
summary = purchase_data_df.describe()

#Locate average price
mean_price = summary.loc["mean", "Price"]
mean_price

3.050987179487176

In [16]:
#Obtain number of unique items
unique = purchase_data_df["Item ID"].unique()
num_uni_items = len(unique)
num_uni_items


179

In [30]:
#Run basic calculations 
summary = purchase_data_df.describe()

#Locate average price
mean_price = summary.loc["mean", "Price"]


#Obtain number of unique items
unique = purchase_data_df["Item ID"].unique()
num_uni_items = len(unique)

#Create SummaryDF to hold results
summary_info = [{"Average Price":mean_price, "Number of Items":num_uni_items}]
summary_df = pd.DataFrame(summary_info)
summary_df

Unnamed: 0,Average Price,Number of Items
0,3.050987,179


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [146]:
# Calculate the unique number of users
total_users = len(purchase_data_df["SN"].unique())

# Extract the username and gender column
dataframe_uni = purchase_data_df[["SN", "Gender"]]

# dropping duplicate values for the username category. The parameter keep = "first" IS KEY HERE
dataframe_uniqueness = dataframe_uni.drop_duplicates(subset = ["SN"], keep="first",inplace=False)

# Give a count per gender
count_gender = dataframe_uniqueness.groupby(["Gender"]).count()

# Create a summary dataframe with a single column containing the results from the previous calculation
gender_summary = pd.DataFrame({"Gender Count": count_gender ["SN"]}, index = count_gender.index)

# This is a function to be used to calculate the percantage per row of the dataframe
def percentageData(x):
    pct = float (x/total_users)*100
    return round(pct,2)

# Create a new dataframe column that will display the percentages per gender using the percentageData function
gender_summary["Gender Percentage"] = gender_summary.apply(percentageData, axis = 1)

# Print summary dataframe
gender_summary


Unnamed: 0_level_0,Gender Count,Gender Percentage
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,81,14.06
Male,484,84.03
Other / Non-Disclosed,11,1.91



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [110]:
#1. Total dollar value purchases by each gender,
total_per_gender = purchase_data_df.groupby("Gender")["Price"].sum()


#2. Average price of the item bought by the gender,
avg_per_gender = purchase_data_df.groupby("Gender")["Price"].mean()

#3. Total items bought by that geneder group
purch_per_gender = purchase_data_df.groupby("Gender")["Item ID"].count()



summary_info = {"Total Amount by Gender":total_per_gender, "Average Amount by Gender":avg_per_gender, "Number of Purchases by Gender":purch_per_gender}
summary_data = pd.DataFrame(summary_info)
summary_data
#summary_df=pd.DataFrame(summary_info)


Unnamed: 0_level_0,Total Amount by Gender,Average Amount by Gender,Number of Purchases by Gender
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,361.94,3.203009,113
Male,1967.64,3.017853,652
Other / Non-Disclosed,50.19,3.346,15


## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


In [144]:
#Create bins for ages 
bins = [0,9,19,29,39,49,59,69,100]

#Create labels for these bins
age_grp = ["0 to 9", "10 to 19", "20 to 29", "30 to 39", "40 to 49", "50 to 59", "60 to 69", "70 to 100"]

# Calculate the unique number of users
total_users = len(purchase_data_df["SN"].unique())

# Extract the username and gender and age column
dataframe_uni_df = purchase_data_df[["SN", "Age"]]

# dropping duplicate values for the username category. The parameter keep = "first" IS KEY HERE
dataframe_uniqueness_df = dataframe_uni_df.drop_duplicates(subset = ["SN"], keep="first",inplace=False)


#slice data and put into bins
pd.cut(dataframe_uniqueness_df["Age"], bins, labels = age_grp).head()

#Place the data series into a new column inside datafram
dataframe_uniqueness_df["Age Group"] = pd.cut(dataframe_uniqueness_df["Age"], bins, labels = age_grp)
dataframe_uniqueness_df.head()

#Create a groupby object based on "Age Group"
age_grp = dataframe_uniqueness_df.groupby("Age Group")

#Count of rows fall into each bin 
count_age_grp = age_grp["SN"].count()
count_age_grp



# Create a summary dataframe with a single column containing the results from the previous calculation
age_grp_summary = pd.DataFrame({"Age Group Count": count_age_grp})

age_grp_summary

# This is a function to be used to calculate the percantage per row of the dataframe
def percentageData(x):
    pct = float (x/total_users)*100
    return round(pct,2)



# Create a new dataframe column that will display the percentages per gender using the percentageData function
age_grp_summary["Age Percentage"] = age_grp_summary.apply(percentageData, axis = 1)

# Print summary dataframe
age_grp_summary


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0_level_0,Age Group Count,Age Percentage
Age Group,Unnamed: 1_level_1,Unnamed: 2_level_1
0 to 9,17,2.95
10 to 19,129,22.4
20 to 29,335,58.16
30 to 39,83,14.41
40 to 49,12,2.08
50 to 59,0,0.0
60 to 69,0,0.0
70 to 100,0,0.0


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [None]:
#Create bins for ages 
bins = [0,9,19,29,39,49,59,69,100]

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

