# Promotions Table

| Name | Type | Granularity | Primary Key | Volumetry | Business Definition |
|------|------|-------------|-------------|-----------|-------------------|
| promotions | Dimension | Individual promotion | promo_id | 60 rows - 8 columns  | List of promotions on champaign products proposed by the brand that have been realized through retailers |

## Imports

In [None]:
# To import data
import pandas as pd 
import numpy as np

# To import path to access packages
import sys
from pathlib import Path

# Find the folder containing 'packages'
root = next(p for p in Path.cwd().resolve().parents if (p / "packages").exists())
sys.path.insert(0, str(root))

# Functions for data preparation
from packages.utils.data_exploration import (
    find_primary_key,find_volumetry, find_number_of_unique_values_per_column, 
    share_of_null, share_of_rows_with_null, share_of_columns_with_null, share_of_null_within_columns
)

In [2]:
# Import the promotions dataset
promotions = pd.read_csv('../../../data/dimensions/promotions.csv')

In [3]:
# Print the first rows
promotions.head()

Unnamed: 0,promo_id,distributor_id,start_date,end_date,discount_type,discount_value,products,description
0,P1000,D003,07/30/2023,2023-08-10,fixed,3.94,"CH-06-1500,CH-05-375,CH-07-750",Local event tie-in
1,P1001,D004,03/17/2024,2024-03-29,fixed,3.1,CH-03-750,Flash sale
2,P1002,D004,2024-08-11,2024-08-23,percent,23.71,"CH-04-375,CH-04-750",Holiday bundle
3,P1003,D002,2024-06-02,2024-06-09,fixed,2.75,"CH-03-375,CH-04-375,CH-07-1500",Restaurant pairing
4,P1004,D004,2024-02-25,2024-03-11,percent,16.06,"CH-03-1500,CH-07-375,CH-02-375",Local event tie-in


## Analyze the table

### Find the primary key

In [4]:
# Utilize the find_primary_key from the data_exploration package to obtain the primary key
find_primary_key(promotions)

['promo_id', 'start_date']

The granularity of the table is the individual promotion, identified by its id.

### Volumetry

In [5]:
# Utilize the find_volumetry from the data_exploration package to obtain the number of rows
print(f"There are {find_volumetry(promotions)} rows in the dataset.")

There are 60 rows in the dataset.


### Number of unique values per column

In [6]:
print(find_number_of_unique_values_per_column(promotions))

   promo_id  distributor_id  start_date  end_date  discount_type  \
0        60               4          60        50              2   

   discount_value  products  description  
0              58        54            5  


### Number of null values per column

In [7]:
print(share_of_null_within_columns(promotions))

   promo_id  distributor_id  start_date  end_date  discount_type  \
0         0               0           0         9              0   

   discount_value  products  description  
0               0         0            0  


### Percentage of null cells

In [8]:
# Utilize the find_number_of_unique_values_per_column from the data_exploration package to obtain the % of null cells
print(f"The share of null cells within the dataframe is {share_of_null(promotions)}%.")

The share of null cells within the dataframe is 0.02%.
