### Note
* Instructions have been included for each segment. You do not have to follow them exactly, but they are included to help you think through the steps.

In [1]:
# Dependencies and Setup
import pandas as pd

# File to Load (Remember to Change These)
file_to_load = "Resources/purchase_data.csv"

# Read Purchasing File and store into Pandas data frame
purchase_data = pd.read_csv(file_to_load)

## Player Count

* Display the total number of players


In [2]:
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [3]:
#QC-ing for data Nulls

purchase_data.isnull().any()

Purchase ID    False
SN             False
Age            False
Gender         False
Item ID        False
Item Name      False
Price          False
dtype: bool

In [4]:
# counting the Total number of Players
print(f'The total number of Players is {len(purchase_data["SN"].unique())}')

The total number of Players is 576


## Purchasing Analysis (Total)

* Run basic calculations to obtain number of unique items, average price, etc.


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame


In [5]:
#calculating number of Unique Items
UniItem = len(purchase_data["Item ID"].unique())

#Calculating average price of Items
AvePrice = purchase_data["Price"].mean()

#Calculating Number of Purchases
NumPur = purchase_data["Purchase ID"].count()

#Calculating Total Revenue
Tot = purchase_data["Price"].sum()

In [6]:
#Creating Summary Dataframe
summary = pd.DataFrame({
    "Number Unique Items":[UniItem],
    "Average Price":[AvePrice],
    "Number of Purchases":[NumPur],
    "Total Revenue":[Tot]
})
summary.head()

Unnamed: 0,Number Unique Items,Average Price,Number of Purchases,Total Revenue
0,183,3.050987,780,2379.77


In [7]:
# Formatting Summary Table with a dollar sign and 2 decimal places
summary["Average Price"] = summary["Average Price"].map("${:.2f}".format)
summary["Total Revenue"] = summary["Total Revenue"].map("${:.2f}".format)

summary.head()

Unnamed: 0,Number Unique Items,Average Price,Number of Purchases,Total Revenue
0,183,$3.05,780,$2379.77


## Gender Demographics

* Percentage and Count of Male Players


* Percentage and Count of Female Players


* Percentage and Count of Other / Non-Disclosed




In [8]:
#counting number of alll data in DataFrame
purchase_data.count()

Purchase ID    780
SN             780
Age            780
Gender         780
Item ID        780
Item Name      780
Price          780
dtype: int64

In [9]:
#Filtering dataframe to unique SN values
Filter = purchase_data.drop_duplicates(subset ="SN", keep = "first")
Filter.count()

Purchase ID    576
SN             576
Age            576
Gender         576
Item ID        576
Item Name      576
Price          576
dtype: int64

In [10]:
#Generating 2 additional columns into Filter

Filter["Total Counts"] = Filter["Purchase ID"].value_counts()
Filter["Percentage of Players"]= Filter["Purchase ID"].value_counts()/Filter["Purchase ID"].count()

#Shrinking the Filter Table
ShortFilter = Filter[["Gender", "Total Counts","Percentage of Players"]]

#Grouping ShortFilter by Gender
GroupGender = ShortFilter.groupby(["Gender"])
SummaryGender = GroupGender.sum()
SummaryGender

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Unnamed: 0_level_0,Total Counts,Percentage of Players
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,81,0.140625
Male,484,0.840278
Other / Non-Disclosed,11,0.019097


In [11]:
# Formatting Table
SummaryGender["Percentage of Players"] = SummaryGender["Percentage of Players"].map("{:.2%}".format)
SummaryGender

Unnamed: 0_level_0,Total Counts,Percentage of Players
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,81,14.06%
Male,484,84.03%
Other / Non-Disclosed,11,1.91%



## Purchasing Analysis (Gender)

* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. by gender




* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

In [12]:
# Setting calculations 
purchase_data.head()

Unnamed: 0,Purchase ID,SN,Age,Gender,Item ID,Item Name,Price
0,0,Lisim78,20,Male,108,"Extraction, Quickblade Of Trembling Hands",3.53
1,1,Lisovynya38,40,Male,143,Frenzied Scimitar,1.56
2,2,Ithergue48,24,Male,92,Final Critic,4.88
3,3,Chamassasya86,24,Male,100,Blindscythe,3.27
4,4,Iskosia90,23,Male,131,Fury,1.44


In [24]:
#Calculating Male Metrics for Purchase Count, Avergae Purchase Price, Total Purchase Value, Ave Total per Person by Gender
#Locating Males in purchase_data
GenderMale = purchase_data.loc[purchase_data["Gender"]=="Male",:]
#Locating Males in Filter
FilterMale = Filter.loc[Filter["Gender"]=="Male",:]
#Counting
FiltMaleCount = FilterMale["Gender"].count()
PurchMale = GenderMale["Gender"].count()
AvePurch = GenderMale["Price"].mean()
TotPurch = GenderMale["Price"].sum()
PurchPerson = TotPurch/FiltMaleCount
PurchPerson


# purchase_data["Purchase Count"]=purchase_data["Purchase ID"].value_counts()

# purchase_data["Average Purchase Price"]=purchase_data["Price"].mean()

# Group_df.count()

4.065371900826446

## Age Demographics

* Establish bins for ages


* Categorize the existing players using the age bins. Hint: use pd.cut()


* Calculate the numbers and percentages by age group


* Create a summary data frame to hold the results


* Optional: round the percentage column to two decimal points


* Display Age Demographics Table


## Purchasing Analysis (Age)

* Bin the purchase_data data frame by age


* Run basic calculations to obtain purchase count, avg. purchase price, avg. purchase total per person etc. in the table below


* Create a summary data frame to hold the results


* Optional: give the displayed data cleaner formatting


* Display the summary data frame

## Top Spenders

* Run basic calculations to obtain the results in the table below


* Create a summary data frame to hold the results


* Sort the total purchase value column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Popular Items

* Retrieve the Item ID, Item Name, and Item Price columns


* Group by Item ID and Item Name. Perform calculations to obtain purchase count, item price, and total purchase value


* Create a summary data frame to hold the results


* Sort the purchase count column in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the summary data frame



## Most Profitable Items

* Sort the above table by total purchase value in descending order


* Optional: give the displayed data cleaner formatting


* Display a preview of the data frame

