# Candy Store Analysis
This Notebook briefly analyzes a mix of the [The Ultimate Halloween Candy Power Ranking dataset](https://www.kaggle.com/datasets/fivethirtyeight/the-ultimate-halloween-candy-power-ranking) and the [Retail Sales dataset](https://www.kaggle.com/datasets/mohammadtalib786/retail-sales-dataset) from Kaggle.

## Data Gathering
### Data set download using Kaggle Python API
The first part of this notebook downloads the data set using the Kaggle python API.

The files are downloaded only if there are not already in the local folder or if they are not up to date anymore.

In [11]:
import os
import sys
from typing import List

# Handle relative import of modules
src_path = os.path.abspath(os.path.join("../../src"))
if src_path not in sys.path:
    sys.path.append(src_path)

In [12]:
from helpers import kaggle_helper

dataset_folder = "dataset"
kaggle_helper.download_dataset_files(
    dataset_author="mohammadtalib786",
    dataset_name="retail-sales-dataset",
    dataset_folder=dataset_folder,
)
kaggle_helper.download_dataset_files(
    dataset_author="fivethirtyeight",
    dataset_name="the-ultimate-halloween-candy-power-ranking",
    dataset_folder=dataset_folder,
)

Listing local csv files in ./dataset.
File retail_sales_dataset.csv with size 51673 found in ./dataset
File candy-data.csv with size 5205 found in ./dataset
File candy-type-data.csv with size 445 found in ./dataset
Listing files associated with Kaggle dataset mohammadtalib786/retail-sales-dataset.
File retail_sales_dataset.csv with size 51673 retrieved from Kaggle API.
Listing local csv files in ./dataset.
File retail_sales_dataset.csv with size 51673 found in ./dataset
File candy-data.csv with size 5205 found in ./dataset
File candy-type-data.csv with size 445 found in ./dataset
Listing files associated with Kaggle dataset fivethirtyeight/the-ultimate-halloween-candy-power-ranking.
File candy-data.csv with size 5205 retrieved from Kaggle API.


### Pandas Data frames creation
Once the csv files associated with the Kaggle data set are downloaded, we can open (read) them inside a Pandas `DataFrame`.

In [13]:
import pandas as pd

df_candies = pd.read_csv(f"{dataset_folder}/candy-data.csv")
print(
    f"The candies data set has {len(df_candies)} candies with {df_candies.shape[1]} variables."
)
df_candies.head()

The candies data set has 85 candies with 13 variables.


Unnamed: 0,competitorname,chocolate,fruity,caramel,peanutyalmondy,nougat,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent
0,100 Grand,1,0,1,0,0,1,0,1,0,0.732,0.86,66.971725
1,3 Musketeers,1,0,0,0,1,0,0,1,0,0.604,0.511,67.602936
2,One dime,0,0,0,0,0,0,0,0,0,0.011,0.116,32.261086
3,One quarter,0,0,0,0,0,0,0,0,0,0.011,0.511,46.116505
4,Air Heads,0,1,0,0,0,0,0,0,0,0.906,0.511,52.341465


In [14]:
df_candies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 85 entries, 0 to 84
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   competitorname    85 non-null     object 
 1   chocolate         85 non-null     int64  
 2   fruity            85 non-null     int64  
 3   caramel           85 non-null     int64  
 4   peanutyalmondy    85 non-null     int64  
 5   nougat            85 non-null     int64  
 6   crispedricewafer  85 non-null     int64  
 7   hard              85 non-null     int64  
 8   bar               85 non-null     int64  
 9   pluribus          85 non-null     int64  
 10  sugarpercent      85 non-null     float64
 11  pricepercent      85 non-null     float64
 12  winpercent        85 non-null     float64
dtypes: float64(3), int64(9), object(1)
memory usage: 8.8+ KB


In [15]:
df_retail_sales = pd.read_csv(f"{dataset_folder}/retail_sales_dataset.csv")
print(
    f"The retail sales data set has {len(df_retail_sales)} entries with {df_retail_sales.shape[1]} variables."
)
df_retail_sales.head()

The retail sales data set has 1000 entries with 9 variables.


Unnamed: 0,Transaction ID,Date,Customer ID,Gender,Age,Product Category,Quantity,Price per Unit,Total Amount
0,1,2023-11-24,CUST001,Male,34,Beauty,3,50,150
1,2,2023-02-27,CUST002,Female,26,Clothing,2,500,1000
2,3,2023-01-13,CUST003,Male,50,Electronics,1,30,30
3,4,2023-05-21,CUST004,Male,37,Clothing,1,500,500
4,5,2023-05-06,CUST005,Male,30,Beauty,2,50,100


In [16]:
df_retail_sales.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Transaction ID    1000 non-null   int64 
 1   Date              1000 non-null   object
 2   Customer ID       1000 non-null   object
 3   Gender            1000 non-null   object
 4   Age               1000 non-null   int64 
 5   Product Category  1000 non-null   object
 6   Quantity          1000 non-null   int64 
 7   Price per Unit    1000 non-null   int64 
 8   Total Amount      1000 non-null   int64 
dtypes: int64(5), object(4)
memory usage: 70.4+ KB


## Data transformation
In order to simplify the visualization of the price and rating of each candy category in Tableau, I create a new data frame containing the average price and rating for each category.

In [17]:
# List of candy types
candy_types: List[str] = [
    "chocolate",
    "fruity",
    "caramel",
    "peanutyalmondy",
    "nougat",
    "crispedricewafer",
    "hard",
    "bar",
    "pluribus",
]
measures_of_interest: List[str] = ["pricepercent", "winpercent"]
values: List[str | int] = []
for candy_type in candy_types:
    df_temp = df_candies.groupby(candy_type)[measures_of_interest].mean()
    # Retrieve price and win for the second line which corresponds to the category being true
    type_price, type_win = df_temp.iloc[1]
    # The type_win (rating) is transformed from 0-100% to 0-1.0
    values.append((candy_type.capitalize(), type_price, type_win / 100))
df_categories = pd.DataFrame(
    values, columns=["Category", "avgpricepercent", "avgwinpercet"]
)
df_categories.head()

Unnamed: 0,Category,avgpricepercent,avgwinpercet
0,Chocolate,0.632162,0.609215
1,Fruity,0.332737,0.441197
2,Caramel,0.631571,0.573469
3,Peanutyalmondy,0.666643,0.636971
4,Nougat,0.614143,0.600519


In [18]:
# Write new category data frame to csv file
df_categories.to_csv("dataset/candy-type-data.csv", index=False)

## Data Visualization
The data visualization is done in Tableau. Here is the [link](https://public.tableau.com/views/candy-store/CandyStore?:language=es-ES&publish=yes&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link) to the dashboard.

In [20]:
%%HTML
<div class='tableauPlaceholder' id='viz1726403430709' style='position: relative'><noscript><a href='#'><img alt='Candy Store ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;ca&#47;candy-store&#47;CandyStore&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='candy-store&#47;CandyStore' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;ca&#47;candy-store&#47;CandyStore&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='es-ES' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1726403430709');                    var vizElement = divElement.getElementsByTagName('object')[0];                    if ( divElement.offsetWidth > 800 ) { vizElement.style.minWidth='420px';vizElement.style.maxWidth='3840px';vizElement.style.width='100%';vizElement.style.minHeight='587px';vizElement.style.maxHeight='2187px';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.minWidth='420px';vizElement.style.maxWidth='3840px';vizElement.style.width='100%';vizElement.style.minHeight='587px';vizElement.style.maxHeight='2187px';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else { vizElement.style.width='100%';vizElement.style.height='1827px';}                     var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>