# Products_catalog table

| Name | Type | Granularity | Primary Key | Volumetry | Business Definition |
|------|------|-------------|-------------|-----------|-------------------|
| products_catalog | Dimension | Individual product (SKU) | sku | 21 rows - 9 columns  | List of products sold in North America by our champaign brand. |

## Imports

In [None]:
# To import data
import pandas as pd 
import numpy as np

# To import path to access packages
import sys
from pathlib import Path

# Find the folder containing 'packages'
root = next(p for p in Path.cwd().resolve().parents if (p / "packages").exists())
sys.path.insert(0, str(root))

# Functions for data preparation
from packages.utils.data_exploration import (
    find_primary_key,find_volumetry, find_number_of_unique_values_per_column, 
    share_of_null, share_of_rows_with_null, share_of_columns_with_null, share_of_null_within_columns
)

In [3]:
# Import the products_catalog dataset
products_catalog = pd.read_csv('../../../data/dimensions/product_catalog.csv')

In [4]:
# Print the first rows
products_catalog.head()

Unnamed: 0,sku,brand,cuvée,bottle_size_ml,category,upc,list_price_usd,list_price_cad,launch_date
0,CH-01-375,Maison Étoile,Brut Tradition,375,Champagne,800000000000,36.48,47.88,2020-05-08
1,CH-01-750,Maison Étoile,Brut Tradition,750,Champagne,800000000001,61.41,83.8,2024-01-18
2,CH-01-1500,Maison Étoile,Brut Tradition,1500,Champagne,800000000002,130.83,174.71,2015-09-02
3,CH-02-375,Maison Étoile,Brut Rosé,375,Champagne,800000000100,33.54,44.6,2015-09-02
4,CH-02-750,Maison Étoile,Brut Rosé,750,Champagne,800000000101,58.58,78.41,2024-07-16


## Analyze the table

### Find the primary key

In [5]:
# Utilize the find_primary_key from the data_exploration package to obtain the primary key
find_primary_key(products_catalog)

['sku', 'upc', 'list_price_usd', 'list_price_cad']

The granularity of the table is the Standard Key Unit, or individual product.

### Volumetry

In [6]:
# Utilize the find_volumetry from the data_exploration package to obtain the number of rows
print(f"There are {find_volumetry(products_catalog)} rows in the dataset.")

There are 21 rows in the dataset.


### Number of unique values per column

In [7]:
print(find_number_of_unique_values_per_column(products_catalog))

   sku  brand  cuvée  bottle_size_ml  category  upc  list_price_usd  \
0   21      1      7               3         1   21              21   

   list_price_cad  launch_date  
0              21           16  


### Number of unique values per column

In [9]:
print(share_of_null_within_columns(products_catalog))

   sku  brand  cuvée  bottle_size_ml  category  upc  list_price_usd  \
0    0      0      0               0         0    0               0   

   list_price_cad  launch_date  
0               0            0  


### Percentage of null cells

In [12]:
# Utilize the find_number_of_unique_values_per_column from the data_exploration package to obtain the % of null cells
print(f"The share of null cells within the dataframe is {share_of_null(products_catalog)}%.")

The share of null cells within the dataframe is 0.0%.
