# Retailers table

| Name | Type | Granularity | Primary Key | Volumetry | Business Definition |
|------|------|-------------|-------------|-----------|-------------------|
| products_catalog | Dimension | Individual product (SKU) | sku | 21 rows - 9 columns  | List of products sold in North America by the champaign brand |

## Imports

In [1]:
# To import data
import pandas as pd 
import numpy as np

# To import path to access packages
import sys
from pathlib import Path

# Find the folder containing 'packages'
root = next(p for p in Path.cwd().resolve().parents if (p / "packages").exists())
sys.path.insert(0, str(root))

# Functions for data preparation
from packages.utils.data_exploration import (
    find_primary_key,find_volumetry, find_number_of_unique_values_per_column, 
    share_of_null, share_of_rows_with_null, share_of_columns_with_null, share_of_null_within_columns
)

In [2]:
# Import the retailers dataset
retailers = pd.read_csv('../../../data/dimensions/retailers.csv')

In [3]:
# Print the first rows
retailers.head()

Unnamed: 0,retailer_id,retailer_name,country,type
0,R0001,WholeBev,USA,Retailer
1,R0002,CityCellars,USA,Retailer
2,R0003,GoldenBottle,USA,Retailer
3,R0004,LiquorKing,USA,Retailer
4,R0005,FineWineNYC,USA,Retailer


## Analyze the table

### Find the primary key

In [4]:
# Utilize the find_primary_key from the data_exploration package to obtain the primary key
find_primary_key(retailers)

['retailer_id', 'retailer_name']

The granularity of the table is the individual retailer.

### Volumetry

In [5]:
# Utilize the find_volumetry from the data_exploration package to obtain the number of rows
print(f"There are {find_volumetry(retailers)} rows in the dataset.")

There are 18 rows in the dataset.


### Number of unique values per column

In [6]:
# Count the number of unique values for each column in the dataset 'retailers'
unique_counts = find_number_of_unique_values_per_column(retailers)

# Print results
for col in retailers.columns:
    print(f"The number of unique values in the column {col} is {unique_counts[col]}.")

The number of unique values in the column retailer_id is 0    18
Name: retailer_id, dtype: int64.
The number of unique values in the column retailer_name is 0    18
Name: retailer_name, dtype: int64.
The number of unique values in the column country is 0    2
Name: country, dtype: int64.
The number of unique values in the column type is 0    2
Name: type, dtype: int64.


### Check that the included countries are the good ones

In [7]:
# We should get only USA and Canada
print(retailers['country'].unique())

['USA' 'Canada']


## Check the different retailers types

In [8]:
# We should get only USA and Canada
print(retailers['type'].unique())

['Retailer' 'Restaurant']
