The create_primary_keys function generates unique primary keys for weekly transactional data by identifying distinct values for outlet_code, item_department, and week in the trans_weekly DataFrame.

It groups the data to determine the minimum and maximum week for each store-department combination, and then expands the week range between the minimum and maximum weeks. This expansion is done by using the expand_by_week function, ensuring that every combination of store, department, and week is captured. The function then sorts and applies the previous_day function to get the Monday of each week, creating all possible combinations of these values to form primary keys. 

This results in a DataFrame that uniquely identifies every combination of store, department, and week, which can be used for tracking and analyzing sales data across different time periods and store-department groupings.

In [1]:
%run ./run_script.ipynb

conf = get_conf()

trans = get_datasources(conf)["trans_info"]
item = get_datasources(conf)["item_info"]
stores = get_datasources(conf)["outlets_info"]

trans = pre_process_transaction_info(trans)
item = pre_process_item_info(item)
store = pre_process_stores_info(stores)

trans_weekly= get_weekly_sales(item, trans)

In [2]:
def create_primary_keys(trans_weekly):
    
    """
    Generating Primary Keys
    
    Args: 
        trans_weekly: Pandas DataFrame
            weekly transactions
    
    Returns:
        primary_keys: Pandas DataFrame
            Primary Keys
    """
  
    distinct_stores = trans_weekly["outlet_code"].unique()
    
    #display(distinct_stores)

    distinct_department= trans_weekly["item_department"].unique()
    
    #display(distinct_department)

    distinct_weeks = trans_weekly["week"].sort_values().unique()
    
    #display(distinct_weeks)
    
    trans_weekly_final = trans_weekly.groupby(['outlet_code', 'item_department']).agg(
    max_week=('week', 'max'),
    min_week=('week', 'min')).reset_index()
    
    #display(trans_weekly_final)
    
    all_weeks = expand_by_week(trans_weekly_final, "min_week", "max_week", "DATE")
    all_weeks.sort_values(by=['outlet_code', 'item_department', 'DATE'], inplace=True)
    all_weeks['week'] = all_weeks['DATE'].apply(lambda x: previous_day(x, "monday"))
    distinct_all_weeks = all_weeks["week"].sort_values().unique()
    
    #display(all_weeks)
    
    primary_keys = pd.DataFrame(
        [(outlet, department, week) for outlet in distinct_stores
         for department in distinct_department
         for week in distinct_all_weeks],
        columns=["outlet_code", "item_department", "week"]).sort_values("week") 
    
    #display(primary_keys)
    
    return primary_keys  

In [3]:
primary_keys = create_primary_keys(trans_weekly)
primary_keys 

Unnamed: 0,outlet_code,item_department,week
0,A,Beverages,2022-01-17
160,B,Chilled,2022-01-17
120,B,Beverages,2022-01-17
480,E,Beverages,2022-01-17
400,D,Chilled,2022-01-17
...,...,...,...
79,A,Chilled,2022-10-17
519,E,Beverages,2022-10-17
39,A,Beverages,2022-10-17
399,D,Beverages,2022-10-17
