# Distributors Table

| Name | Type | Granularity | Primary Key | Volumetry | Business Definition |
|------|------|-------------|-------------|-----------|-------------------|
| distributors | Dimension | Individual distributor | distributor_id | 4 rows - 3 columns | List of distributors managing products imports in North America and distribution to retailers |

## Imports

In [None]:
# To import data
import pandas as pd 
import numpy as np

# To import path to access packages
import sys
from pathlib import Path

# Find the folder containing 'packages'
root = next(p for p in Path.cwd().resolve().parents if (p / "packages").exists())
sys.path.insert(0, str(root))

# Functions for data preparation
from packages.utils.data_exploration import (
    find_primary_key,find_volumetry, find_number_of_unique_values_per_column, 
    share_of_null, share_of_rows_with_null, share_of_columns_with_null, share_of_null_within_columns
)

ModuleNotFoundError: No module named 'exploration'

In [4]:
# Import the distributors dataset
distributors = pd.read_csv('../../../data/dimensions/distributors.csv')

In [5]:
# Print the first rows
distributors.head()

Unnamed: 0,distributor_id,distributor_name,country
0,D001,AmeriBev Dist.,USA
1,D002,NorthSpirits Co.,USA
2,D003,MapleWine Dist.,Canada
3,D004,Pacific Pour Inc.,USA


## Analyze the table

### Find the primary key

In [6]:
# Utilize the find_primary_key from the data_exploration package to obtain the primary key
find_primary_key(distributors)

['distributor_id', 'distributor_name']

The granularity of the table is the individual distributor.

### Volumetry

In [7]:
# Utilize the find_volumetry from the data_exploration package to obtain the number of rows
print(f"There are {find_volumetry(distributors)} rows in the dataset.")

There are 4 rows in the dataset.


### Number of unique values per column

In [10]:
# Analyze it automatically
profile_distrib = ProfileReport(distributors, title="Profiling Report", explorative=True)
profile_distrib.to_notebook_iframe()

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

100%|██████████| 3/3 [00:00<00:00, 377.46it/s]


Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

## Transformations to be done

None 