# Analyze Consumer Buying Behavior

You have been tasked with analyzing shopping data for a web site! The data is in JSON format and available with this notebook.
On the website, each user logs in using their personal account and can purchase products as they browse the list of products offered. Each product has a sales value. Age and gender data for each user has been collected and is provided in the JSON file.
Your job is to deliver an analysis of consumers' buying behavior. This is a type of common activity carried out by Data Scientists and the result of this work can be used, for example, to feed a Machine Learning model and make predictions about future behaviors.
But on this mission, you will analyze consumers' buying behavior using the Python language Pandas package and your final report should include each of the following items:

    •	Buyer Count
    •	Total number of buyers
    •	Demographic Information By Gender
    •	Purchasing Analysis by Gender
    •	Identify the top 5 buyers by total purchase value
    •	Identify the 5 most popular items by counting purchases
    •	Identify the 5 most profitable items by the total purchase value


As final considerations:
Your script should work for the data set provided.
You must use the Pandas Library and Jupyter Notebook.

notes: the archive language is Portuguese (Brazil)

In [24]:
# Python Language Version
from platform import python_version
print('Python Language Version Used In This Jupyter Notebook:', python_version())

Python Language Version Used In This Jupyter Notebook: 3.8.3


In [35]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [26]:
pd.__version__

'1.0.5'

In [27]:
np.__version__

'1.18.5'

In [28]:
# Load the file
load_file = "dados_compras.json"
purchase_file = pd.read_json(load_file, orient = "records")
purchase_file.head()

Unnamed: 0,Login,Idade,Sexo,Item ID,Nome do Item,Valor
0,Aelalis34,38,Masculino,164,Bone Crushing Silver Skewer,3.37
1,Eolo46,21,Masculino,119,"Stormbringer, Dark Blade of Ending Misery",2.32
2,Assastnya25,34,Masculino,174,Primitive Blade,2.46
3,Pheusrical25,21,Masculino,92,Final Critic,1.36
4,Aela59,23,Masculino,63,Stormfury Mace,1.27


## Informações Sobre os Compradores

In [29]:
player_demographics = purchase_file.loc[:, ["Sexo", "Login", "Idade"]]
player_demographics.head()

Unnamed: 0,Sexo,Login,Idade
0,Masculino,Aelalis34,38
1,Masculino,Eolo46,21
2,Masculino,Assastnya25,34
3,Masculino,Pheusrical25,21
4,Masculino,Aela59,23


In [30]:
# Data cleaning and duplicate removal
player_demographics = player_demographics.drop_duplicates()
player_count = player_demographics.count()[0]
player_count

573

In [45]:
# Convert output to DF for later use in analysis
pd.DataFrame({"Total players" : [player_count]})

Unnamed: 0,Total players
0,573


## Purchase Analysis

In [46]:
# Basic calculations
average_item_price = purchase_file["Valor"].mean()
total_item_price = purchase_file["Valor"].sum()
total_item_count = purchase_file["Valor"].count()
item_id = len(purchase_file["Item ID"].unique())

# Dataframe for results
summary_calculations = pd.DataFrame({"Number of Unique Items" : item_id,
                                     "Number of Purchases" : total_item_count, 
                                     "Sales amount" : total_item_price, 
                                     "Avarege Price" : [average_item_price]})

# Data Munging
summary_calculations = summary_calculations.round(2)
summary_calculations ["Avarege Price"] = summary_calculations["Avarege Price"].map("${:,.2f}".format)
summary_calculations ["Sales amount"] = summary_calculations["Sales amount"].map("${:,.2f}".format)
summary_calculations = summary_calculations.loc[:, ["Number of Unique Items", "Avarege Price", "Number of Purchases", "Sales amount"]]

summary_calculations

Unnamed: 0,Number of Unique Items,Avarege Price,Number of Purchases,Sales amount
0,183,$2.93,780,"$2,286.33"


In [47]:
purchase_file["Item ID"].unique()

array([164, 119, 174,  92,  63,  10, 153, 169, 118,  99,  57,  47,  81,
        77,  44,  96, 123,  59,  91, 177,  78,   3,  11, 183,  65, 132,
       106,  49,  45, 155,  37,  48,  90,  13, 171,  25,   7, 124,  68,
        85, 120,  17, 141,  73, 151,  32, 165,  51, 101, 140,  31,  34,
         2,  86,  39,  28, 160, 134,  83,  38, 158, 110, 122,  54, 105,
        87,  23, 144, 128, 175,  46, 150, 152, 108, 172, 167, 181,  20,
       130, 111, 103,  30, 139, 173,  55, 115,  35,  42,   9,  84, 180,
       102,  53,  18,  74, 126,  50,  62, 125, 121, 129, 149,  12,  71,
        14,  58,  27,  52,  66, 100, 112,  24,  94, 107,   0, 182,  97,
        70,  89,   1, 170,  93, 179,  36,  75, 143, 137, 176, 148, 127,
       147, 161, 154, 157, 116,  61, 131,  41, 145,  60, 162, 135,   8,
        40,  15,  29,  72, 114, 117,  79,  88, 104,  95,  64,  98,  33,
        76, 146, 166,  56,  22,  21,  16,  67, 133,  69, 159,  82, 113,
         6, 163,   5,  19, 168, 136,  80,  26, 142, 178, 156, 10

## Informações Demográficas

In [49]:
# Basic calculations
gender_count = player_demographics["Sexo"].value_counts()
gender_percent = (gender_count / player_count) * 100

# Dataframe para os resultados
gender_demographics = pd.DataFrame({"Sex" : gender_count, 
                                    "%" : gender_percent})

# Data Munging
gender_demographics = gender_demographics.round(2)
gender_demographics ["%"] = gender_demographics["%"].map("{:,.1f}%".format)

In [50]:
# Output Test
gender_count

Masculino                465
Feminino                 100
Outro / Não Divulgado      8
Name: Sexo, dtype: int64

In [51]:
# Output Test
gender_percent

Masculino                81.151832
Feminino                 17.452007
Outro / Não Divulgado     1.396161
Name: Sexo, dtype: float64

In [52]:
# Output Test
gender_demographics

Unnamed: 0,Sex,%
Masculino,465,81.2%
Feminino,100,17.4%
Outro / Não Divulgado,8,1.4%


## ## Purchase Analysis by Gender

In [53]:
# Groupings
gender_total_item_price = purchase_file.groupby(["Sexo"]).sum()["Valor"].rename("Sales amount")
gender_average_item_price = purchase_file.groupby(["Sexo"]).mean()["Valor"].rename("Average Price")
purchase_count = purchase_file.groupby(["Sexo"]).count()["Valor"].rename("Number of Purchases")
normalized_total = gender_total_item_price / gender_demographics["Sex"]

# Storing the result in a Dataframe
gender_purchasing_analysis = pd.DataFrame({"Number of Purchases" : purchase_count, 
                                           "Average Value Per Item" : gender_average_item_price, 
                                           "Sales amount" : gender_total_item_price, 
                                           "Normalized Total" : normalized_total})

# Data Munging
gender_purchasing_analysis = gender_purchasing_analysis.round(2)
gender_purchasing_analysis ["Average Value Per Item"] = gender_purchasing_analysis["Average Value Per Item"].map("${:,.2f}".format)
gender_purchasing_analysis ["Sales amount"] = gender_purchasing_analysis["Sales amount"].map("${:,.2f}".format)
gender_purchasing_analysis ["Normalized Total"] = gender_purchasing_analysis["Normalized Total"].map("${:,.2f}".format)

In [54]:
# Result
gender_total_item_price

Sexo
Feminino                  382.91
Masculino                1867.68
Outro / Não Divulgado      35.74
Name: Sales amount, dtype: float64

In [55]:
# Resultado
gender_average_item_price

Sexo
Feminino                 2.815515
Masculino                2.950521
Outro / Não Divulgado    3.249091
Name: Average Price, dtype: float64

In [56]:
# Resultado
gender_purchasing_analysis

Unnamed: 0_level_0,Number of Purchases,Average Value Per Item,Sales amount,Normalized Total
Sexo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Feminino,136,$2.82,$382.91,$3.83
Masculino,633,$2.95,"$1,867.68",$4.02
Outro / Não Divulgado,11,$3.25,$35.74,$4.47


In [57]:
# Resultado
normalized_total

Feminino                 3.829100
Masculino                4.016516
Outro / Não Divulgado    4.467500
dtype: float64

## Demographic Analysis


In [58]:
player_demographics

Unnamed: 0,Sexo,Login,Idade
0,Masculino,Aelalis34,38
1,Masculino,Eolo46,21
2,Masculino,Assastnya25,34
3,Masculino,Pheusrical25,21
4,Masculino,Aela59,23
...,...,...,...
771,Masculino,Lassista97,24
772,Masculino,Sidap51,15
773,Masculino,Chamadarsda63,21
778,Masculino,Quelaton80,20


In [60]:
# Basic calculations
age_bins = [0, 9.99, 14.99, 19.99, 24.99, 29.99, 34.99, 39.99, 999]
age_bracket = ["Lass than 10", "10 a 14", "15 a 19", "20 a 24", "25 a 29", "30 a 34", "35 a 39", "More than 40"]

purchase_file["Range of Ages"] = pd.cut(purchase_file["Idade"], age_bins, labels=age_bracket)

# Basic calculations
age_demographics_count = purchase_file["Range of Ages"].value_counts()
age_demographics_average_item_price = purchase_file.groupby(["Range of Ages"]).mean()["Valor"]
age_demographics_total_item_price = purchase_file.groupby(["Range of Ages"]).sum()["Valor"]
age_demographics_percent = (age_demographics_count / player_count) * 100

# Dataframe for results
age_demographics = pd.DataFrame({"Count": age_demographics_count, "%": age_demographics_percent, "Unitary value": age_demographics_average_item_price, "Total Purchase Value": age_demographics_total_item_price})

# Data Munging
age_demographics ["Unitary value"] = age_demographics["Unitary value"].map("${:,.2f}".format)
age_demographics ["Total Purchase Value"] = age_demographics["Total Purchase Value"].map("${:,.2f}".format)
age_demographics ["%"] = age_demographics["%"].map("{:,.2f}%".format)

In [61]:
# Resultado
player_demographics.head()

Unnamed: 0,Sexo,Login,Idade
0,Masculino,Aelalis34,38
1,Masculino,Eolo46,21
2,Masculino,Assastnya25,34
3,Masculino,Pheusrical25,21
4,Masculino,Aela59,23


In [62]:
# Resultado
age_demographics = age_demographics.sort_index()
age_demographics

Unnamed: 0,Count,%,Unitary value,Total Purchase Value
Lass than 10,28,4.89%,$2.98,$83.46
10 a 14,35,6.11%,$2.77,$96.95
15 a 19,133,23.21%,$2.91,$386.42
20 a 24,336,58.64%,$2.91,$978.77
25 a 29,125,21.82%,$2.96,$370.33
30 a 34,64,11.17%,$3.08,$197.25
35 a 39,42,7.33%,$2.84,$119.40
More than 40,17,2.97%,$3.16,$53.75


## Top Spenders

In [65]:
# Basic calculations
user_total = purchase_file.groupby(["Login"]).sum()["Valor"].rename("Total Purchase Value")
user_average = purchase_file.groupby(["Login"]).mean()["Valor"].rename("Average Purchase Value")
user_count = purchase_file.groupby(["Login"]).count()["Valor"].rename("Number of Purchases")

# Dataframe for Results
user_data = pd.DataFrame({"Total Purchase Value": user_total, "Average Purchase Value": user_average, "Number of Purchases": user_count})

# Data Munging
user_data ["Total Purchase Value"] = user_data["Total Purchase Value"].map("${:,.2f}".format)
user_data ["Average Purchase Value"] = user_data["Average Purchase Value"].map("${:,.2f}".format)
user_data.sort_values("Number of Purchases", ascending=False).head(5)

Unnamed: 0_level_0,Total Purchase Value,Average Purchase Value,Number of Purchases
Login,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Undirrala66,$17.06,$3.41,5
Mindimnya67,$12.74,$3.18,4
Qarwen67,$9.97,$2.49,4
Saedue76,$13.56,$3.39,4
Sondastan54,$10.24,$2.56,4


In [64]:
# Resultado 
user_data

Unnamed: 0_level_0,Total Purchase Value,Average Purchase Value,Number of Purchases
Login,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Adairialis76,$2.46,$2.46,1
Aduephos78,$6.70,$2.23,3
Aeduera68,$5.80,$1.93,3
Aela49,$2.46,$2.46,1
Aela59,$1.27,$1.27,1
...,...,...,...
Yasurra52,$3.14,$3.14,1
Yathecal72,$7.77,$3.88,2
Yathecal82,$2.41,$2.41,1
Zhisrisu83,$2.46,$1.23,2


## Most Popular Items

In [66]:
# Basic calculations
user_total = purchase_file.groupby(["Nome do Item"]).sum()["Valor"].rename("Total Purchase Value")
user_average = purchase_file.groupby(["Nome do Item"]).mean()["Valor"].rename("Average Purchase Value")
user_count = purchase_file.groupby(["Nome do Item"]).count()["Valor"].rename("Number of Purchases")

# Dataframe for results
user_data = pd.DataFrame({"Total Purchase Value": user_total, "Average Purchase Value": user_average, "Number of Purchases": user_count})

# Data Munging
user_data ["Total Purchase Value"] = user_data["Total Purchase Value"].map("${:,.2f}".format)
user_data ["Average Purchase Value"] = user_data["Average Purchase Value"].map("${:,.2f}".format)
user_data.sort_values("Number of Purchases", ascending=False).head(5)

Unnamed: 0_level_0,Total Purchase Value,Average Purchase Value,Number of Purchases
Nome do Item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Final Critic,$38.60,$2.76,14
Arcane Gem,$24.53,$2.23,11
"Betrayal, Whisper of Grieving Widows",$25.85,$2.35,11
Stormcaller,$34.65,$3.46,10
Woeful Adamantite Claymore,$11.16,$1.24,9


## Most Profitable Items

In [68]:
# Cálculos básicos
user_total = purchase_file.groupby(["Nome do Item"]).sum()["Valor"].rename("Total Purchase Value")
user_average = purchase_file.groupby(["Nome do Item"]).mean()["Valor"].rename("Average Purchase Value")
user_count = purchase_file.groupby(["Nome do Item"]).count()["Valor"].rename("Number of Purchases")

# Dataframe para os resultados
user_data = pd.DataFrame({"Total Purchase Value": user_total, "Average Purchase Value": user_average, \
                          "Number of Purchases": user_count})

# Data Munging
user_data ["Number of Purchases"] = user_data["Number of Purchases"]
user_data ["Total Purchase Value"] = user_data["Total Purchase Value"].map("${:,.2f}".format)
user_data ["Average Purchase Value"] = user_data["Average Purchase Value"].map("${:,.2f}".format)


display(user_data.sort_values("Total Purchase Value", ascending=False).head(5)[ \
    ['Total Purchase Value','Average Purchase Value','Number of Purchases']])

Unnamed: 0_level_0,Total Purchase Value,Average Purchase Value,Number of Purchases
Nome do Item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Shadowsteel,$9.90,$1.98,5
Souleater,$9.81,$3.27,3
"Shadow Strike, Glory of Ending Hope",$9.65,$1.93,5
"Heartseeker, Reaver of Souls",$9.63,$3.21,3
Agatha,$9.55,$1.91,5
