# üéØ Happy Hour Dynamic Pricing Engine (V3)

This notebook implements an automated pricing optimization system for the Happy Hour promotion program. It identifies products with declining sales performance and calculates optimal discount prices to boost sales.

## Workflow Overview

1. **Environment Setup** - Install dependencies and configure connections
2. **Data Collection** - Fetch product, pricing, and sales data from Snowflake
3. **Product Selection** - Identify underperforming products eligible for discounts
4. **Price Optimization** - Calculate optimal discount prices using multiple data sources
5. **Retailer Targeting** - Select retailers most likely to respond to discounts
6. **Output Generation** - Create discount sheets for upload to the pricing system

---


## 1. Environment Setup

### 1.1 Install Required Packages
Install all necessary Python packages for database connectivity, data manipulation, and analysis.


In [1]:
%%capture

# Upgrade pip
!pip install --upgrade pip
# Connectivity
!pip install psycopg2-binary  # PostgreSQL adapter
# !pip install snowflake-connector-python  # Snowflake connector
!pip install snowflake-connector-python==3.15.0 # Snowflake connector Older Version
!pip install snowflake-sqlalchemy  # Snowflake SQLAlchemy connector
!pip install warnings # Warnings management
# !pip install pyarrow # Serialization
!pip install keyring==23.11.0 # Key management
!pip install sqlalchemy==1.4.46 # SQLAlchemy
!pip install requests # HTTP requests
!pip install boto3 # AWS SDK
# !pip install slackclient # Slack API
!pip install oauth2client # Google Sheets API
!pip install gspread==5.9.0 # Google Sheets API
!pip install gspread_dataframe # Google Sheets API
!pip install google.cloud # Google Cloud
# Data manipulation and analysis
!pip install polars
!pip install pandas==2.2.1
!pip install numpy
# !pip install fastparquet
!pip install openpyxl # Excel file handling
!pip install xlsxwriter # Excel file handling
# Linear programming
!pip install pulp
# Date and time handling
!pip install --upgrade datetime
!pip install python-time
!pip install --upgrade pytz
# Progress bar
!pip install tqdm
# Database data types
!pip install db-dtypes
# Geospatial data handling
# !pip install geopandas
# !pip install shapely
# !pip install fiona
# !pip install haversine
# Plotting

# Modeling
!pip install statsmodels
!pip install scikit-learn

!pip install import-ipynb

### 1.2 Import Libraries and Initialize Environment
Import required libraries and set up connections to Snowflake, Google Sheets, and AWS.


In [2]:
import pandas as pd
import numpy as np
from tqdm import tqdm
from datetime import datetime
import calendar
import json
from datetime import date, timedelta
from oauth2client.service_account import ServiceAccountCredentials
import setup_environment_2
import importlib
import import_ipynb
import warnings
import boto3
import requests
warnings.filterwarnings("ignore")
importlib.reload(setup_environment_2)
setup_environment_2.initialize_env()
import os
import time
import pytz
import gspread
import snowflake.connector

/home/ec2-user/.Renviron
/home/ec2-user/service_account_key.json


### 1.3 Database Helper Functions
Define reusable functions for querying Snowflake database.


In [3]:
def snowflake_query(country, query, warehouse=None, columns=[], conn=None):
    """
    Execute a query against Snowflake and return results as DataFrame.
    
    Args:
        country: Country identifier (e.g., "Egypt")
        query: SQL query string to execute
        warehouse: Snowflake warehouse (optional)
        columns: Custom column names (optional)
        conn: Existing connection (optional)
        
    Returns:
        pandas DataFrame with query results
    """
    con = snowflake.connector.connect(
        user     = os.environ["SNOWFLAKE_USERNAME"],
        account  = os.environ["SNOWFLAKE_ACCOUNT"],
        password = os.environ["SNOWFLAKE_PASSWORD"],
        database = os.environ["SNOWFLAKE_DATABASE"]
    )

    try:
        cur = con.cursor()
        cur.execute("USE WAREHOUSE COMPUTE_WH")
        cur.execute(query)
        
        column_names = [col[0] for col in cur.description]
        results = cur.fetchall()
        
        if not results:
            out = pd.DataFrame(columns=[name.lower() for name in column_names])
        else:
            if len(columns) == 0:
                out = pd.DataFrame(np.array(results), columns=column_names)
                out.columns = out.columns.str.lower()
            else:
                out = pd.DataFrame(np.array(results), columns=columns)
                out.columns = out.columns.str.lower()
        
        return out
        
    except Exception as e:
        print(f"‚ùå Query error: {e}")
        raise
        
    finally:
        cur.close()
        con.close()

In [4]:
def get_warehouse_mapping():
    """Define warehouse to region/cohort mapping."""
    whs_data = [
        ('Cairo', 'Mostorod', 1, 700),
        ('Giza', 'Barageel', 236, 701),
        ('Delta West', 'El-Mahala', 337, 703),
        ('Delta West', 'Tanta', 8, 703),
        ('Delta East', 'Mansoura FC', 339, 704),
        ('Delta East', 'Sharqya', 170, 704),
        ('Upper Egypt', 'Assiut FC', 501, 1124),
        ('Upper Egypt', 'Bani sweif', 401, 1126),
        ('Upper Egypt', 'Menya Samalot', 703, 1123),
        ('Upper Egypt', 'Sohag', 632, 1125),
        ('Alexandria', 'Khorshed Alex', 797, 702),
        ('Giza', 'Sakkarah', 962, 701)
    ]
    
    df_whs = pd.DataFrame(whs_data, columns=['region', 'warehouse', 'warehouse_id', 'cohort_id'])
    return df_whs

# Get warehouse mapping
df_whs = get_warehouse_mapping()
print("Warehouse Mapping:")

Warehouse Mapping:


---

## 2. Data Collection

This section fetches all required data from Snowflake and Google Sheets for the pricing analysis.

### 2.1 Configuration & Reference Data
Get timezone settings and load brand inclusion lists.


In [5]:
# Get Snowflake timezone for consistent date/time handling
query = "SHOW PARAMETERS LIKE 'TIMEZONE'"
timezone_result = snowflake_query("Egypt", query)
zone_to_use = timezone_result['value'].values[0]
print(f"‚úì Using timezone: {zone_to_use}")

‚úì Using timezone: America/Los_Angeles


In [6]:
scope = ["https://spreadsheets.google.com/feeds",
         'https://www.googleapis.com/auth/spreadsheets',
         "https://www.googleapis.com/auth/drive.file",
         "https://www.googleapis.com/auth/drive"]
creds = ServiceAccountCredentials.from_json_keyfile_dict(json.loads(setup_environment_2.get_secret("prod/maxab-sheets")), scope)
client = gspread.authorize(creds)
included_brand = client.open('QD_brands').worksheet('Happy Hour push brands')
included_brand_df = pd.DataFrame(included_brand.get_all_records())
for col in included_brand_df.columns:
    included_brand_df[col] = pd.to_numeric(included_brand_df[col], errors='ignore')
try:    
    b_list = [brand  for brand in included_brand_df['brand']]
except:
    b_list= [] 

In [7]:
sku_to_add = client.open('QD_brands').worksheet('HH SKU PUSH')
try:
    sku_to_add_df = pd.DataFrame(sku_to_add.get_all_records())
    sku_to_add_df = sku_to_add_df.merge(df_whs[['warehouse','warehouse_id']],on=['warehouse'])
except: 
    sku_to_add_df = pd.DataFrame(columns=['product_id', 'sku', 'warehouse', 'warehouse_id'])
sku_to_add_df

Unnamed: 0,product_id,sku,warehouse,warehouse_id


### 2.2 Product Performance Data

Fetch comprehensive product sales and stock data to identify underperforming products:
- **Sales metrics**: All-day NMV, until-the-hour NMV, last hour NMV
- **Stock metrics**: Available stock, days on hand (DOH), running rates
- **Growth metrics**: Compare current vs historical performance using weighted distance scoring

In [8]:
command_string = f'''
with last_update as (
    select DATE_PART('hour', max_date) * 60 + DATE_PART('minute', max_date) AS total_minutes
    from (
        select max(created_at) as max_date from sales_orders
    )
),

predicted_rr as (
    select product_id, warehouse_id, rr, date
    from Finance.PREDICTED_RUNNING_RATES
    where date >= CURRENT_DATE
    qualify date = max(date) over(partition by product_id, warehouse_id)
),

days_stocks as (
    select timestamp::date as date, product_id, warehouse_id,
        avg(in_stock) as in_stock_perc,
        avg(case when date_part('hour', timestamp) = date_part('hour', current_timestamp) - 1 then in_stock end) as last_hour_stocks
    from (
        select timestamp, product_id, warehouse_id, case when AVAILABLE_STOCK > 0 then 1 else 0 end as in_stock
        from materialized_views.STOCK_SNAP_SHOTS_RECENT sss
        where sss.timestamp::date >= date_trunc('month', current_date - 90)
            and date_part('hour', sss.timestamp) < date_part('hour', CONVERT_TIMEZONE('{zone_to_use}', 'Africa/Cairo', CURRENT_TIMEstamp()))
            and warehouse_id in (1, 8, 170, 236, 337, 339, 401, 501, 632, 703, 797, 962)
    )
    group by all
),

base as (
    select *, row_number() over(partition by retailer_id order by priority) as rnk
    from (
        select x.*, TAGGABLE_ID as retailer_id
        from (
            select id as cohort_id, name as cohort_name, priority, dynamic_tag_id
            from cohorts
            where is_active = 'true'
                and id in (700, 701, 702, 703, 704, 1123, 1124, 1125, 1126)
        ) x
        join DYNAMIC_TAGgables dt on x.dynamic_tag_id = dt.dynamic_tag_id and dt.dynamic_tag_id <> 3038
    )
    qualify rnk = 1
),

sales_data as (
    SELECT DISTINCT
        so.created_at::date as date,
        pso.warehouse_id as warehouse_id,
        districts.id as district_id,
        districts.name_ar as district_name,
        pso.product_id,
        CONCAT(products.name_ar, ' ', products.size, ' ', product_units.name_ar) as sku,
        brands.name_ar as brand,
        categories.name_ar as cat,
        sum(pso.total_price) as all_day_nmv,
        sum(case when (date_part('hour', so.created_at) * 60 + DATE_PART('minute', so.created_at)) < (select * from last_update) then pso.total_price end) as uth_nmv,
        sum(case when (date_part('hour', so.created_at) * 60 + DATE_PART('minute', so.created_at))
            between (select * from last_update) - 60
            and (select * from last_update)
            then pso.total_price end) as last_hour_nmv

    FROM product_sales_order pso
    JOIN sales_orders so ON so.id = pso.sales_order_id
    JOIN products on products.id = pso.product_id
    JOIN brands on products.brand_id = brands.id
    JOIN categories ON products.category_id = categories.id
    JOIN finance.all_cogs f ON f.product_id = pso.product_id
        AND f.from_date::date <= so.created_at::date
        AND f.to_date::date > so.created_at::date
    JOIN product_units ON product_units.id = products.unit_id
    JOIN materialized_views.retailer_polygon on materialized_views.retailer_polygon.retailer_id = so.retailer_id
    JOIN districts on districts.id = materialized_views.retailer_polygon.district_id
    JOIN cities on cities.id = districts.city_id
    join states on states.id = cities.state_id
    join regions on regions.id = states.region_id

    WHERE True
        AND so.created_at::date >= date_trunc('month', current_date - 90)
        AND so.sales_order_status_id not in (7, 12)
        AND so.channel IN ('telesales', 'retailer')
        AND pso.purchased_item_count <> 0
        and products.id <> 7630

    GROUP BY ALL
    order by date desc
),

data as (
    select *, 1 / nullif((0.3 * week_distance + 0.1 * month_distance + 0.6 * day_distance), 0) as distance
    from (
        select *,
            floor((DATE_PART('day', date) - 1) / 7 + 1) AS week_of_month,
            DATE_PART('month', date) as month,
            DATE_PART('DOW', date) AS day_number,
            abs(floor((DATE_PART('day', current_date) - 1) / 7 + 1) - week_of_month) as week_distance,
            abs(DATE_PART('month', current_date) - month) as month_distance,
            abs(DATE_PART('DOW', current_date) - day_number) as day_distance
        from (
            select *,
                max(case when date = CURRENT_DATE then last_hour_stocks end) over(partition by product_id, warehouse_id) as current_stocks
            from (
                select ds.date, ds.product_id, ds.warehouse_id, ds.in_stock_perc, ds.last_hour_stocks,
                    sd.district_id, sd.district_name,
                    sd.all_day_nmv, sd.uth_nmv, sd.last_hour_nmv
                from days_stocks ds
                left join sales_data sd on ds.product_id = sd.product_id
                    and ds.warehouse_id = sd.warehouse_id
                    and ds.date = sd.date
            )
        )
        where current_stocks <> 0
            and (in_stock_perc = 1 or date = CURRENT_DATE)
    )
),

current_state as (
    select product_id, warehouse_id, AVAILABLE_STOCK, activation
    from PRODUCT_WAREHOUSE
    where IS_BASIC_UNIT = 1
        and case when product_id = 1309 then packing_unit_id <> 23 else true end
)

select x.*,
    cs.AVAILABLE_STOCK,
    cs.activation,
    coalesce(prr.rr, 0) as rr,
    case when coalesce(prr.rr, 0) <> 0 then cs.AVAILABLE_STOCK / coalesce(prr.rr, 0) else cs.AVAILABLE_STOCK end as doh,
    cs.AVAILABLE_STOCK * f.wac1 as stock_value
from (
    select product_id, warehouse_id, district_id, district_name,
        coalesce(max(case when state = 'prev' then all_day_nmv end), 0) as prev_all_day,
        coalesce(max(case when state = 'prev' then uth_nmv end), 0) as prev_uth,
        coalesce(max(case when state = 'prev' then last_hour_nmv end), 0) as prev_last_hour,

        coalesce(max(case when state = 'current' then all_day_nmv end), 0) as current_all_day,
        coalesce(max(case when state = 'current' then uth_nmv end), 0) as current_uth,
        coalesce(max(case when state = 'current' then last_hour_nmv end), 0) as current_last_hour

    from (
        select 'current' as state, product_id, warehouse_id, district_id, district_name, all_day_nmv, uth_nmv, last_hour_nmv
        from data
        where date = CURRENT_DATE
        
        union all
        
        (
            select state, product_id, warehouse_id, district_id, district_name,
                sum(all_day_nmv * distance) / sum(distance) as all_day_nmv,
                sum(uth_nmv * distance) / sum(distance) as uth_nmv,
                sum(last_hour_nmv * distance) / sum(distance) as last_hour_nmv
            from (
                select 'prev' as state, product_id, warehouse_id, district_id, district_name, all_day_nmv, uth_nmv, last_hour_nmv, distance
                from data
                where date <> CURRENT_DATE
            )
            group by all
        )
    )
    group by all
) x
join current_state cs on x.product_id = cs.product_id and x.warehouse_id = cs.warehouse_id
left join predicted_rr prr on x.product_id = prr.product_id and x.warehouse_id = prr.warehouse_id
join products p on p.id = x.product_id
join finance.all_cogs f on f.product_id = x.product_id and CONVERT_TIMEZONE('{zone_to_use}', 'Africa/Cairo', CURRENT_TIMEstamp()) between f.from_date and f.to_date
where doh > 1
    and p.activation = 'true'
    and cs.activation = 'true'
    and cs.AVAILABLE_STOCK * f.wac1 >= 1000
    and prev_uth > 0
'''

product_data = snowflake_query("Egypt", command_string)
for col in product_data.columns:
    product_data[col] = pd.to_numeric(product_data[col], errors='ignore')

### 2.3 Price Reference Data

Fetch pricing data from multiple sources for comparison:
- **UTH Contribution**: Historical until-the-hour sales contribution by district
- **Current Prices**: Live and local cohort prices per warehouse
- **Marketplace Prices**: External market price benchmarks
- **Competitor Prices**: Ben Soliman and scraped competitor prices
- **Historical Stats**: Optimal margin boundaries and targets


In [9]:
query = f'''
with last_update as (
    select DATE_PART('hour', max_date) * 60 + DATE_PART('minute', max_date) AS total_minutes
    from (
        select max(created_at) as max_date from sales_orders
    )
),

base as (
    select *, row_number() over(partition by retailer_id order by priority) as rnk
    from (
        select x.*, TAGGABLE_ID as retailer_id
        from (
            select id as cohort_id, name as cohort_name, priority, dynamic_tag_id
            from cohorts
            where is_active = 'true'
                and id in (700, 701, 702, 703, 704, 1123, 1124, 1125, 1126)
        ) x
        join DYNAMIC_TAGgables dt on x.dynamic_tag_id = dt.dynamic_tag_id and dt.dynamic_tag_id <> 3038
    )
    qualify rnk = 1
),

sales as (
    SELECT
        so.created_at::date as date,
        pso.warehouse_id as warehouse_id,
        districts.id as district_id,
        districts.name_ar as district_name,
        sum(pso.total_price) as all_day_nmv,
        sum(case when (date_part('hour', so.created_at) * 60 + DATE_PART('minute', so.created_at)) < (select * from last_update) then pso.total_price end) as uth_nmv,
        sum(case when (date_part('hour', so.created_at) * 60 + DATE_PART('minute', so.created_at))
            between (select * from last_update) - 60
            and (select * from last_update)
            then pso.total_price end) as last_hour_nmv
    FROM product_sales_order pso
    JOIN sales_orders so ON so.id = pso.sales_order_id
    JOIN products on products.id = pso.product_id
    JOIN brands on products.brand_id = brands.id
    JOIN categories ON products.category_id = categories.id
    JOIN finance.all_cogs f ON f.product_id = pso.product_id
        AND f.from_date::date <= so.created_at::date
        AND f.to_date::date > so.created_at::date
    JOIN product_units ON product_units.id = products.unit_id
    JOIN materialized_views.retailer_polygon on materialized_views.retailer_polygon.retailer_id = so.retailer_id
    JOIN districts on districts.id = materialized_views.retailer_polygon.district_id
    JOIN cities on cities.id = districts.city_id
    join states on states.id = cities.state_id
    join regions on regions.id = states.region_id
    WHERE True
        AND so.created_at::date between date_trunc('month', current_date - 60) and current_date - 1
        AND so.sales_order_status_id not in (7, 12)
        AND so.channel IN ('telesales', 'retailer')
        AND pso.purchased_item_count <> 0
    GROUP BY ALL
    order by date desc
)

select warehouse_id, district_id, district_name, sum(uth_cntrb * distance) / sum(distance) as uth_cntrb
from (
    select *, 1 / nullif((0.3 * week_distance + 0.1 * month_distance + 0.6 * day_distance), 0) as distance
    from (
        select *,
            uth_nmv / all_day_nmv as uth_cntrb,
            floor((DATE_PART('day', date) - 1) / 7 + 1) AS week_of_month,
            DATE_PART('month', date) as month,
            DATE_PART('DOW', date) AS day_number,
            abs(floor((DATE_PART('day', current_date) - 1) / 7 + 1) - week_of_month) as week_distance,
            abs(DATE_PART('month', current_date) - month) as month_distance,
            abs(DATE_PART('DOW', current_date) - day_number) as day_distance
        from sales
    )
)
group by all
'''

uth_cntrb = snowflake_query("Egypt", query)
for col in uth_cntrb.columns:
    uth_cntrb[col] = pd.to_numeric(uth_cntrb[col], errors='ignore')
uth_cntrb['uth_cntrb'] = uth_cntrb.groupby('warehouse_id')['uth_cntrb'].transform(
    lambda x: x.fillna(x.mean())
)    

In [10]:
query = f'''
WITH whs as (SELECT *
             FROM   (values
                            ('Cairo', 'El-Marg', 38,700),
                            ('Cairo', 'Mostorod', 1,700),
                            ('Giza', 'Barageel', 236,701),
                            ('Delta West', 'El-Mahala', 337,703),
                            ('Delta West', 'Tanta', 8,703),
                            ('Delta East', 'Mansoura FC', 339,704),
                            ('Delta East', 'Sharqya', 170,704),
                            ('Upper Egypt', 'Assiut FC', 501,1124),
                            ('Upper Egypt', 'Bani sweif', 401,1126),
                            ('Upper Egypt', 'Menya Samalot', 703,1123),
                            ('Upper Egypt', 'Sohag', 632,1125),
                            ('Alexandria', 'Khorshed Alex', 797,702),
							('Giza', 'Sakkarah', 962,701)
							
							)
                    x(region, wh, warehouse_id,cohort_id)),


local_prices as (
SELECT  case when cpu.cohort_id in (700,695) then 'Cairo'
             when cpu.cohort_id in (701) then 'Giza'
             when cpu.cohort_id in (704,698) then 'Delta East'
             when cpu.cohort_id in (703,697) then 'Delta West'
             when cpu.cohort_id in (696,1123,1124,1125,1126) then 'Upper Egypt'
             when cpu.cohort_id in (702,699) then 'Alexandria'
        end as region,
		cohort_id,
        pu.product_id,
		pu.packing_unit_id as packing_unit_id,
		pu.basic_unit_count,
        avg(cpu.price) as price
FROM    cohort_product_packing_units cpu
join    PACKING_UNIT_PRODUCTS pu on pu.id = cpu.product_packing_unit_id
WHERE   cpu.cohort_id in (700,701,702,703,704,696,695,698,697,699,1123,1124,1125,1126)
    and cpu.created_at::date<>'2023-07-31'
    and cpu.is_customized = true
	group by all 
),
live_prices as (
select region,cohort_id,product_id,pu_id as packing_unit_id,buc as basic_unit_count,NEW_PRICE as price
from materialized_views.DBDP_PRICES
where created_at = current_date
and DATE_PART('hour',CONVERT_TIMEZONE('{zone_to_use}', 'Africa/Cairo', CURRENT_TIMEstamp())) BETWEEN SPLIT_PART(time_slot, '-', 1)::int AND (SPLIT_PART(time_slot, '-', 1)::int)+1
and cohort_id in (700,701,702,703,704,696,695,698,697,699,1123,1124,1125,1126)
),
prices as (
select *
from (
    SELECT *, 1 AS priority FROM live_prices
    UNION ALL
    SELECT *, 2 AS priority FROM local_prices
)
QUALIFY ROW_NUMBER() OVER (PARTITION BY region,cohort_id,product_id,packing_unit_id ORDER BY priority) = 1
)
select warehouse_id,product_id,price 
from prices 
join whs on prices.cohort_id = whs.cohort_id
and basic_unit_count = 1 
and case when product_id = 1309 then packing_unit_id <> 23 else true end

'''
product_warehouse_price = snowflake_query("Egypt", query)
for col in product_warehouse_price.columns:
    product_warehouse_price[col] = pd.to_numeric(product_warehouse_price[col], errors='ignore')    

In [11]:
query = f'''
WITH whs as (SELECT *
             FROM   (values
                            ('Cairo', 'El-Marg', 38,700),
                            ('Cairo', 'Mostorod', 1,700),
                            ('Giza', 'Barageel', 236,701),
                            ('Delta West', 'El-Mahala', 337,703),
                            ('Delta West', 'Tanta', 8,703),
                            ('Delta East', 'Mansoura FC', 339,704),
                            ('Delta East', 'Sharqya', 170,704),
                            ('Upper Egypt', 'Assiut FC', 501,1124),
                            ('Upper Egypt', 'Bani sweif', 401,1126),
                            ('Upper Egypt', 'Menya Samalot', 703,1123),
                            ('Upper Egypt', 'Sohag', 632,1125),
                            ('Alexandria', 'Khorshed Alex', 797,702),
							('Giza', 'Sakkarah', 962,701)
							
							)
                    x(region, wh, warehouse_id,cohort_id)),
full_data as (
select products.id as product_id, region,warehouse_id
from products , whs 
where activation = 'true'
),				

MP as (
select region,product_id,
min(min_price) as min_price,
min(max_price) as max_price,
min(mod_price) as mod_price,
min(true_min) as true_min,
min(true_max) as true_max

from (
select mp.region,mp.product_id,mp.pu_id,
min_price/BASIC_UNIT_COUNT as min_price,
max_price/BASIC_UNIT_COUNT as max_price,
mod_price/BASIC_UNIT_COUNT as mod_price,
TRUE_MIN_PRICE/BASIC_UNIT_COUNT as true_min,
TRUE_MAX_PRICE/BASIC_UNIT_COUNT as true_max
from materialized_views.marketplace_prices mp 
join packing_unit_products pup on pup.product_id = mp.product_id and pup.packing_unit_id = mp.pu_id
join finance.all_cogs f on f.product_id = mp.product_id and CURRENT_TIMESTAMP between f.from_date and f.to_date
where  least(min_price,mod_price) between wac_p*0.9 and wac_p*1.3 
)
group by all 
),
region_mapping AS (
    SELECT * 
	FROM 
	(	VALUES
        ('Delta East', 'Delta West'),
        ('Delta West', 'Delta East'),
        ('Alexandria', 'Cairo'),
        ('Alexandria', 'Giza'),
        ('Upper Egypt', 'Cairo'),
        ('Upper Egypt', 'Giza'),
		('Cairo','Giza'),
		('Giza','Cairo'),
		('Delta West', 'Cairo'),
		('Delta East', 'Cairo'),
		('Delta West', 'Giza'),
		('Delta East', 'Giza')
		)
    AS region_mapping(region, fallback_region)
)


select region,warehouse_id,product_id,
min(final_min_price) as final_min_price,
min(final_max_price) as final_max_price,
min(final_mod_price) as final_mod_price,
min(final_true_min) as final_true_min,
min(final_true_max) as final_true_max

from (
SELECT
distinct 
	w.region,
    w.warehouse_id,
	w.product_id,
    COALESCE(m1.min_price, m2.min_price) AS final_min_price,
    COALESCE(m1.max_price, m2.max_price) AS final_max_price,
    COALESCE(m1.mod_price, m2.mod_price) AS final_mod_price,
	COALESCE(m1.true_min, m2.true_min) AS final_true_min,
	COALESCE(m1.true_max, m2.true_max) AS final_true_max,
FROM full_data w
LEFT JOIN MP m1
    ON w.region = m1.region and w.product_id = m1.product_id
JOIN region_mapping rm
    ON w.region = rm.region
LEFT JOIN MP m2
    ON rm.fallback_region = m2.region
   AND w.product_id = m2.product_id
)
where final_min_price is not null 
group by all 
'''
marketplace = snowflake_query("Egypt", query)
for col in marketplace.columns:
    marketplace[col] = pd.to_numeric(marketplace[col], errors='ignore')    

In [12]:
query = f'''
with lower as (
select distinct product_id,sku,new_d*bs_price as ben_soliman_price,INJECTION_DATE
from (
select maxab_product_id as product_id,maxab_sku as sku,INJECTION_DATE,wac1,wac_p,(bs_price/bs_unit_count) as bs_price,diff,cu_price,case when p1 > 1 then child_quantity else 0 end as scheck,round(p1/2)*2 as p1,p2,case when (ROUND(p1 / scheck) * scheck) = 0 then p1 else (ROUND(p1 / scheck) * scheck) end as new_d
from (
select sm.*,wac1, wac_p, abs((bs_price/bs_unit_count)-(wac_p*maxab_basic_unit_count))/(wac_p*maxab_basic_unit_count) as diff,cpc.price as cu_price,pup.child_quantity , round((cu_price/(bs_price/bs_unit_count))) as p1, round(((bs_price/bs_unit_count)/cu_price)) as p2
from materialized_views.savvy_mapping sm 
join finance.all_cogs f on f.product_id = sm.maxab_product_id and CONVERT_TIMEZONE('{zone_to_use}', 'Africa/Cairo', CURRENT_TIMESTAMP()) between f.from_Date and f.to_date
join   PACKING_UNIT_PRODUCTS pu on pu.product_id = sm.maxab_product_id and pu.IS_BASIC_UNIT = 1 
join cohort_product_packing_units cpc on cpc.PRODUCT_PACKING_UNIT_ID = pu.id and cohort_id = 700 
join packing_unit_products pup on pup.product_id = sm.maxab_product_id and pup.is_basic_unit = 1  
where bs_price is not null and INJECTION_DATE::date >= CONVERT_TIMEZONE('{zone_to_use}', 'Africa/Cairo', CURRENT_TIMESTAMP())::date - 5 
and diff > 0.3
and p1 > 1
)
)
qualify max(INJECTION_DATE)over(partition by product_id)  = INJECTION_DATE
),
m_bs as (
select z.* from (
	select maxab_product_id as product_id, maxab_sku as sku, avg(bs_final_price) as ben_soliman_price,INJECTION_DATE
	from (
		select *, row_number() over(partition by maxab_product_id order by diff) as rnk_2 from (
			select *, (bs_final_price-wac_p)/wac_p as diff_2 from (
				select *, bs_price/maxab_basic_unit_count as bs_final_price from (
					select *, row_number() over(partition by maxab_product_id, maxab_pu order by diff) as rnk from (
						select * ,max(INJECTION_DATE::date) over(partition by maxab_product_id, maxab_pu) as max_date,
						from (
							select sm.*,wac1, wac_p, abs(bs_price-(wac_p*maxab_basic_unit_count))/(wac_p*maxab_basic_unit_count) as diff 
					from materialized_views.savvy_mapping sm 
					join finance.all_cogs f on f.product_id = sm.maxab_product_id and CONVERT_TIMEZONE('{zone_to_use}', 'Africa/Cairo', CURRENT_TIMESTAMP()) between f.from_Date and f.to_date
					where bs_price is not null and INJECTION_DATE::date >= CONVERT_TIMEZONE('{zone_to_use}', 'Africa/Cairo', CURRENT_TIMESTAMP())::date - 5 
					and diff < 0.3
					)
					qualify max_date = INJECTION_DATE
					) qualify rnk = 1 
				)
			) where diff_2 between -0.5 and 0.5 
		) qualify rnk_2 = 1 
	) group by all
) z 
join finance.all_cogs f on f.product_id = z.product_id and CONVERT_TIMEZONE('{zone_to_use}', 'Africa/Cairo', CURRENT_TIMESTAMP()) between f.from_Date and f.to_date
where ben_soliman_price between f.wac_p*0.7 and f.wac_p*1.3
)
select product_id,sku,avg(ben_soliman_price) as ben_soliman_price
from (
select *
from (
select * 
from m_bs 

union all

 select *
 from lower
 )
 qualify max(INJECTION_DATE) over(partition by product_id) = INJECTION_DATE
 )
 group by all
'''


print("Fetching Ben Soliman (competitor) prices...")
bensoliman = snowflake_query("Egypt", query)
bensoliman.columns = bensoliman.columns.str.lower()

for col in bensoliman.columns:
    bensoliman[col] = pd.to_numeric(bensoliman[col], errors='ignore')

print(f"‚úì Retrieved competitor prices for {len(bensoliman)} products")

Fetching Ben Soliman (competitor) prices...
‚úì Retrieved competitor prices for 1570 products


In [13]:
query = f'''
WITH whs as (SELECT *
             FROM   (values
                            ('Cairo', 'El-Marg', 38,700),
                            ('Cairo', 'Mostorod', 1,700),
                            ('Giza', 'Barageel', 236,701),
                            ('Delta West', 'El-Mahala', 337,703),
                            ('Delta West', 'Tanta', 8,703),
                            ('Delta East', 'Mansoura FC', 339,704),
                            ('Delta East', 'Sharqya', 170,704),
                            ('Upper Egypt', 'Assiut FC', 501,1124),
                            ('Upper Egypt', 'Bani sweif', 401,1126),
                            ('Upper Egypt', 'Menya Samalot', 703,1123),
                            ('Upper Egypt', 'Sohag', 632,1125),
                            ('Alexandria', 'Khorshed Alex', 797,702),
							('Giza', 'Sakkarah', 962,701)
							
							)
                    x(region, wh, warehouse_id,cohort_id))
select product_id,x.region,warehouse_id,min(MARKET_PRICE) as min_scrapped,max(MARKET_PRICE) as max_scrapped,median(MARKET_PRICE) as median_scrapped
from (
select MATERIALIZED_VIEWS.CLEANED_MARKET_PRICES.*,max(date)over(partition by region,MATERIALIZED_VIEWS.CLEANED_MARKET_PRICES.product_id,competitor) as max_date
from MATERIALIZED_VIEWS.CLEANED_MARKET_PRICES
join finance.all_cogs f on f.product_id = MATERIALIZED_VIEWS.CLEANED_MARKET_PRICES.product_id and CURRENT_TIMESTAMP between f.from_date and f.to_date 
where date>= current_date -5
and MARKET_PRICE between f.wac_p * 0.9 and wac_p*1.3
qualify date = max_date 
) x 
left join whs on whs.region = x.region
group by all 
'''
try:
    scrapped_prices = snowflake_query("Egypt", query)
    scrapped_prices.columns = scrapped_prices.columns.str.lower() 
    for col in scrapped_prices.columns:
        scrapped_prices[col] = pd.to_numeric(scrapped_prices[col], errors='ignore')
except: 
    scrapped_prices = pd.DataFrame(columns = ['product_id','region','warehouse_id','min_scrapped','max_scrapped','median_scrapped'])    

In [14]:
query = f'''
select region,product_id,optimal_bm,MIN_BOUNDARY,MAX_BOUNDARY,MEDIAN_BM
from (
select region,product_id,target_bm,optimal_bm,MIN_BOUNDARY,MAX_BOUNDARY,MEDIAN_BM,max(created_at) over(partition by product_id,region) as max_date,created_at
from materialized_views.PRODUCT_STATISTICS
where created_at::date >= date_trunc('month',current_date - 60)
qualify max_date = created_at
)

'''
 
stats = snowflake_query("Egypt", query)
stats.columns = stats.columns.str.lower() 
for col in stats.columns:
    stats[col] = pd.to_numeric(stats[col], errors='ignore')

In [15]:
query = f'''
select warehouse_id,warehouse_name, region
from (
    select *, row_number() over(partition by warehouse_id order by nmv desc) as rnk
    from (
        SELECT case when regions.id = 2 then cities.name_en else regions.name_en end as region,
            pso.warehouse_id,
            w.name as warehouse_name,
            sum(pso.total_price) as nmv
        FROM product_sales_order pso
        JOIN sales_orders so ON so.id = pso.sales_order_id
        JOIN materialized_views.retailer_polygon on materialized_views.retailer_polygon.retailer_id = so.retailer_id
        JOIN districts on districts.id = materialized_views.retailer_polygon.district_id
        JOIN cities on cities.id = districts.city_id
        join states on states.id = cities.state_id
        join regions on regions.id = states.region_id
        join warehouses w on w.id = pso.warehouse_id
        WHERE True
            AND so.created_at::date between current_date - 31 and CURRENT_DATE - 1
            AND so.sales_order_status_id not in (7, 12)
            AND so.channel IN ('telesales', 'retailer')
            AND pso.purchased_item_count <> 0
        GROUP BY ALL
    )
    qualify rnk = 1
)
'''

warehouse_region = snowflake_query("Egypt", query)
warehouse_region.columns = warehouse_region.columns.str.lower()
for col in warehouse_region.columns:
    warehouse_region[col] = pd.to_numeric(warehouse_region[col], errors='ignore')

In [16]:
query = f'''
SELECT DISTINCT cat, brand, margin as target_bm
FROM    performance.commercial_targets cplan
QUALIFY CASE WHEN DATE_TRUNC('month', MAX(DATE)OVER()) = DATE_TRUNC('month', CURRENT_DATE) THEN DATE_TRUNC('month', CURRENT_DATE)
ELSE DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month') END = DATE_TRUNC('month', date)
'''
brand_cat_target = snowflake_query("Egypt", query)
for col in brand_cat_target.columns:
    brand_cat_target[col] = pd.to_numeric(brand_cat_target[col], errors='ignore')

query = f'''
select cat,sum(target_bm *(target_nmv/cat_total)) as cat_target_margin
from (
select *,sum(target_nmv)over(partition by cat) as cat_total
from (
select cat,brand,avg(target_bm) as target_bm , sum(target_nmv) as target_nmv
from (
SELECT DISTINCT date,city as region,cat, brand, margin as target_bm,nmv as target_nmv
FROM    performance.commercial_targets cplan
QUALIFY CASE WHEN DATE_TRUNC('month', MAX(DATE)OVER()) = DATE_TRUNC('month', CURRENT_DATE) THEN DATE_TRUNC('month', CURRENT_DATE)
ELSE DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month') END = DATE_TRUNC('month', date)
)
group by all
)
)
group by all 
'''
cat_target = snowflake_query("Egypt", query)
for col in cat_target.columns:
    cat_target[col] = pd.to_numeric(cat_target[col], errors='ignore')

query = f'''
SELECT  DIStinct  
		products.id as product_id,
		CONCAT(products.name_ar,' ',products.size,' ',product_units.name_ar) as sku,
		brands.name_ar as brand, 
		categories.name_ar as cat,
		f.wac_p
from products 
JOIN brands on products.brand_id = brands.id 
JOIN categories ON products.category_id = categories.id
JOIN finance.all_cogs f  ON f.product_id = products.id and CONVERT_TIMEZONE('{zone_to_use}', 'Africa/Cairo', CURRENT_TIMEstamp()) between f.from_date and f.to_date 
JOIN product_units ON product_units.id = products.unit_id 
'''
sku_info = snowflake_query("Egypt", query)
for col in sku_info.columns:
    sku_info[col] = pd.to_numeric(sku_info[col], errors='ignore')

In [17]:
query = '''
select warehouse_id, district_id, district_name, product_id, nmv as last_d_nmv,
    coalesce(sku_dis_nmv, 0) / nmv as sku_disc_cntrb,
    coalesce(quantity_nmv, 0) / nmv as quant_disc_cntrb,
    sku_disc_price,
    quantity_price
from (
    SELECT DISTINCT
        pso.warehouse_id,
        districts.id as district_id,
        districts.name_ar as district_name,
        pso.product_id,
        sum(pso.total_price) as nmv,
        avg(item_price / basic_unit_count) as item_price,
        sum(case when ITEM_DISCOUNT_value > 0 then pso.total_price end) as sku_dis_nmv,
        sum(case when ITEM_quantity_DISCOUNT_value > 0 then pso.total_price end) as quantity_nmv,
        avg(case when ITEM_DISCOUNT_value > 0 then (item_price / BASIC_UNIT_COUNT) - (ITEM_DISCOUNT_value / BASIC_UNIT_COUNT) end) as sku_disc_price,
        avg(case when ITEM_quantity_DISCOUNT_value > 0 then (item_price / BASIC_UNIT_COUNT) - (ITEM_quantity_DISCOUNT_value / BASIC_UNIT_COUNT) end) as quantity_price
    FROM product_sales_order pso
    JOIN sales_orders so ON so.id = pso.sales_order_id 
        and so.retailer_id not in (select taggable_id from dynamic_taggables where dynamic_tag_id = 3038)
    JOIN materialized_views.retailer_polygon on materialized_views.retailer_polygon.retailer_id = so.retailer_id
    JOIN districts on districts.id = materialized_views.retailer_polygon.district_id
    WHERE so.created_at::date = current_date - 1
        AND so.sales_order_status_id not in (7, 12)
        AND so.channel IN ('telesales', 'retailer')
        AND pso.purchased_item_count <> 0
    GROUP BY ALL
)
order by nmv desc
'''

last_day = snowflake_query("Egypt", query)
last_day.columns = last_day.columns.str.lower()
for col in last_day.columns:
    last_day[col] = pd.to_numeric(last_day[col], errors='ignore')

In [18]:
query = '''

with main_data  as (
SELECT  DISTINCT
		pso.warehouse_id,
		pso.product_id,
		CONCAT(products.name_ar,' ',products.size,' ',product_units.name_ar) as sku,
		brands.name_ar as brand, 
		categories.name_ar as cat,
		
        sum(pso.total_price) as nmv,
       sum(COALESCE(f.wac_p,0) * pso.purchased_item_count * pso.basic_unit_count) as cogs_p,
	   ((nmv-cogs_p)/nmv) as bm_p,


FROM product_sales_order pso
JOIN sales_orders so ON so.id = pso.sales_order_id
--join COHORT_PRICING_CHANGES cpc on cpc.id = pso.COHORT_PRICING_CHANGE_id
JOIN products on products.id=pso.product_id
JOIN brands on products.brand_id = brands.id 
JOIN categories ON products.category_id = categories.id
JOIN finance.all_cogs f  ON f.product_id = pso.product_id
                        AND f.from_date::date <= so.created_at ::date
                        AND f.to_date::date > so.created_at ::date
JOIN product_units ON product_units.id = products.unit_id 
JOIN materialized_views.retailer_polygon on materialized_views.retailer_polygon.retailer_id=so.retailer_id
JOIN districts on districts.id=materialized_views.retailer_polygon.district_id
JOIN cities on cities.id=districts.city_id
join states on states.id=cities.state_id
join regions on regions.id=states.region_id             

WHERE   True
    AND so.created_at ::date between  current_date - 5 and current_date -1 
    AND so.sales_order_status_id not in (7,12)
    AND so.channel IN ('telesales','retailer')
    AND pso.purchased_item_count <> 0

GROUP BY ALL
),
cp as (
select cat,brand,sum(nmv) as target_nmv ,avg(margin) as target_margin
from performance.commercial_targets 
where date  between '2025-10-01' and current_date - 1
group by all 
),
stocks as (					
select warehouse_id,warehouse,product_id,sum(stocks) as stocks
from (
		SELECT DISTINCT product_warehouse.warehouse_id,w.name as warehouse,
                product_warehouse.product_id,
                (product_warehouse.available_stock)::integer as stocks

        from  product_warehouse 
        JOIN products on product_warehouse.product_id = products.id
        JOIN product_units ON products.unit_id = product_units.id
		join warehouses w on w.id = product_warehouse.warehouse_id

        where   product_warehouse.warehouse_id not in (6,9,10)
            AND product_warehouse.is_basic_unit = 1
			and product_warehouse.available_stock > 0 

)
group by all
),
prs AS (
SELECT DISTINCT product_purchased_receipts.purchased_receipt_id,
                purchased_receipts.purchased_order_id,
                DATE_PART('Day', purchased_receipts.date::date) AS DAY,
                DATE_PART('month', purchased_receipts.date::date) AS MONTH,
                DATE_Part('year', purchased_receipts.date::date) AS YEAR,
                products.id AS product_id,
                CONCAT(products.name_ar, ' ', products.size, ' ', product_units.name_ar) AS sku,
                brands.name_ar AS Brand,
                categories.name_ar as category,
                products.description,
                purchased_receipts.warehouse_id AS warehouse_id,
                warehouses.name as warehouse,
                packing_units.name_ar AS packing_unit,
                purchased_receipts.discount AS Total_discount,
                purchased_receipts.return_orders_discount,
                purchased_receipts.discount_type_id,
                suppliers.id AS supplier_id,
                suppliers.name AS supplier_name,
                purchased_receipt_statuses.name_ar AS PR_status,
                product_purchased_receipts.basic_unit_count,
                product_purchased_receipts.purchased_item_count AS purchase_count,
                product_purchased_receipts.purchased_item_count*product_purchased_receipts.basic_unit_count AS purchase_min_count,
                product_purchased_receipts.item_price,
                product_purchased_receipts.final_price/product_purchased_receipts.purchased_item_count AS final_item_price,
                product_purchased_receipts.total_price AS purchase_price,
                CASE WHEN product_purchased_receipts.vat = 'true' THEN product_purchased_receipts.total_price * 0.14
                     ELSE CASE WHEN product_purchased_receipts.vat = 'false' THEN product_purchased_receipts.total_price * 0
                               END
                END AS vat,
                CASE WHEN purchased_receipts.discount_type_id = 2 THEN (product_purchased_receipts.discount/100) * product_purchased_receipts.total_price
                     ELSE product_purchased_receipts.discount
                END AS SKU_discount,
                purchased_receipts.total_price AS pr_value,
                CASE
                    WHEN product_purchased_receipts.t_tax_id = 1 THEN product_purchased_receipts.total_price * 0.05
                    ELSE CASE
                             WHEN product_purchased_receipts.t_tax_id = 2 THEN product_purchased_receipts.total_price * 0.08
                             ELSE CASE
                                      WHEN product_purchased_receipts.t_tax_id = 3 THEN product_purchased_receipts.total_price * 0.1
                                      ELSE 0
                                  END
                         END
                END AS table_tax,
                product_purchased_receipts.final_price AS Final_Price,
                product_purchased_receipts.product_type_id,
                purchased_receipts.debt_note_value as credit_note,
                purchased_receipts.tips,
                purchased_receipts.delivery_fees,
                case when purchased_receipts.is_actual = 'true' then 'Real' 
                     else 'Virtual' 
                     end as is_actual
                     
FROM product_purchased_receipts
LEFT JOIN products ON products.id = product_purchased_receipts.product_id
LEFT JOIN packing_unit_products ON packing_unit_products.product_id = products.id
LEFT JOIN purchased_receipts ON purchased_receipts.id = product_purchased_receipts.purchased_receipt_id
LEFT JOIN purchased_receipt_statuses ON purchased_receipt_statuses.id = purchased_receipts.purchased_receipt_status_id
LEFT JOIN packing_units ON packing_units.id = product_purchased_receipts.packing_unit_id
LEFT JOIN product_units ON products.unit_id = product_units.id
LEFT JOIN suppliers ON suppliers.id = purchased_receipts.supplier_id
LEFT JOIN brands ON brands.id = products.brand_id
left join categories on categories.id = products.category_id
left join warehouses on warehouses.id = purchased_receipts.warehouse_id
WHERE product_purchased_receipts.purchased_item_count <> 0
      AND purchased_receipts.purchased_receipt_status_id IN (4,5,7)
      AND purchased_receipts.date::date >= current_date - 4
    AND purchased_receipts.is_actual = 'true'
     
     
    ),
prs_data as (
select warehouse_id , product_id,sum(final_price) as total_prs 
from prs 
group by all
)

select warehouse_id,product_id,1 as zero_rr
from (
select s.*,
CONCAT(products.name_ar,' ',products.size,' ',product_units.name_ar) as sku,
brands.name_ar as brand, 
categories.name_ar as cat,
coalesce(md.nmv,0) as sales,wac1,
wac1*stocks as stock_value,
coalesce(total_prs,0) as prs_data
from stocks s
left join main_data md on md.product_id =s.product_id and md.warehouse_id = s.warehouse_id
JOIN finance.all_cogs f  ON f.product_id = s.product_id
                        AND f.from_date::date <= current_date 
                        AND f.to_date::date > current_date
JOIN products on products.id=s.product_id
JOIN brands on products.brand_id = brands.id 
JOIN categories ON products.category_id = categories.id
JOIN product_units ON product_units.id = products.unit_id
left join prs_data on prs_data.product_id =s.product_id and prs_data.warehouse_id = s.warehouse_id 
where stocks > 0 and sales = 0 
and prs_data < 0.7*stock_value
order by wac1* stocks desc 
)
'''

zerorr = snowflake_query("Egypt", query)
zerorr.columns = zerorr.columns.str.lower()
for col in zerorr.columns:
    zerorr[col] = pd.to_numeric(zerorr[col], errors='ignore')

---

## 3. Product Selection & Analysis

This section identifies products that are underperforming and eligible for Happy Hour discounts.

### 3.1 Calculate Growth Metrics
Compute product-level and warehouse-level growth by comparing current vs historical performance.


In [19]:
product_data = product_data.merge(product_warehouse_price,on=['product_id','warehouse_id'])
product_data = product_data.merge(uth_cntrb[['warehouse_id','district_id','district_name','uth_cntrb']],on=['warehouse_id','district_id','district_name'])
product_data['product_UTH_growth'] =(product_data['current_uth'] -product_data['prev_uth'])/product_data['prev_uth']
product_data['product_LH_growth'] =(product_data['current_last_hour'] -product_data['prev_last_hour'])/product_data['prev_last_hour']
product_data[['product_UTH_growth','product_LH_growth']] =product_data[['product_UTH_growth','product_LH_growth']].fillna(0) 
product_data = product_data.replace([np.inf, -np.inf], 1)
product_data['product_closing_growth'] = (product_data['product_UTH_growth']*product_data['uth_cntrb'])+(product_data['product_LH_growth']*(1-product_data['uth_cntrb']))

In [20]:
warehouse_data = product_data.groupby(['warehouse_id', 'district_id', 'district_name'])[['prev_all_day', 'prev_uth', 'prev_last_hour', 'current_all_day', 'current_uth', 'current_last_hour']].sum().reset_index()
warehouse_data['UTH_growth'] = (warehouse_data['current_uth'] - warehouse_data['prev_uth']) / warehouse_data['prev_uth']
warehouse_data['LH_growth'] = (warehouse_data['current_last_hour'] - warehouse_data['prev_last_hour']) / warehouse_data['prev_last_hour']
warehouse_data = warehouse_data.merge(uth_cntrb, on=['warehouse_id', 'district_id','district_name'])
warehouse_data['Closing_growth'] = (warehouse_data['UTH_growth'] * warehouse_data['uth_cntrb']) + (warehouse_data['LH_growth'] * (1 - warehouse_data['uth_cntrb']))
dropping_whs = warehouse_data[warehouse_data['Closing_growth'] < 0]

In [21]:
growing_products = product_data.merge(warehouse_data[['warehouse_id', 'district_id', 'district_name', 'UTH_growth', 'LH_growth', 'Closing_growth']], on=['warehouse_id', 'district_id'])
# needs edit
growing_products = growing_products[growing_products['product_closing_growth'] >= np.maximum(growing_products['Closing_growth'], 0.1)]
growing_products['max_closing'] = growing_products.groupby('product_id')['product_closing_growth'].transform('sum')
growing_products = growing_products[growing_products['max_closing'] == growing_products['product_closing_growth']]
growing_products = growing_products.groupby(['product_id'])['price'].mean().reset_index()
growing_products.columns = ['product_id', 'maxab_good_price']

In [22]:
selected_products = product_data.merge(sku_info,on=['product_id'])
selected_products = selected_products[selected_products['brand'].isin(b_list)]
selected_products = selected_products.merge(warehouse_data[['warehouse_id','district_id', 'district_name','UTH_growth','LH_growth','Closing_growth']],on=['warehouse_id','district_id', 'district_name'])
selected_products=selected_products.drop(columns=['cat','brand','sku','wac_p'])
selected_products = selected_products[selected_products['product_closing_growth'] <selected_products['Closing_growth']]

In [23]:
selected_products_wh = product_data.merge(sku_info,on=['product_id']) 
selected_products_wh = selected_products_wh.merge(sku_to_add_df[['product_id','warehouse_id']],on=['product_id','warehouse_id'])
selected_products_wh = selected_products_wh.merge(warehouse_data[['warehouse_id','district_id', 'district_name','UTH_growth','LH_growth','Closing_growth']],on=['warehouse_id','district_id', 'district_name'])
selected_products_wh=selected_products_wh.drop(columns=['cat','brand','sku','wac_p'])

### 3.2 Identify Underperforming Products
Select products with negative growth or belonging to push brands that are underperforming.


In [24]:
dropping_products = product_data.merge(dropping_whs[['warehouse_id','district_id', 'district_name','UTH_growth','LH_growth','Closing_growth']],on=['warehouse_id','district_id', 'district_name'])
dropping_products = dropping_products[dropping_products['product_closing_growth'] < 0]
dropping_products = pd.concat([dropping_products,selected_products])
dropping_products = pd.concat([dropping_products,selected_products_wh])
dropping_products=dropping_products.drop_duplicates()

In [25]:
dropping_products = dropping_products.merge(sku_info,on=['product_id'])

In [26]:
delta_wh = [8,170,337,339]
rem_brands = ['ÿ™ÿßŸàÿ™ÿßŸà','ŸÑÿßÿ±ÿ¥','ÿßŸÑŸÉÿ®Ÿàÿ≥','ŸÉŸÖÿßÿ±ÿß']
dropping_products = dropping_products[~((dropping_products['brand'].isin(rem_brands)) & (dropping_products['warehouse_id'].isin(delta_wh)))]

In [27]:
dropping_products = dropping_products.sort_values(by='prev_all_day',ascending = False)
dropping_products = dropping_products.merge(growing_products,on='product_id',how='left')
dropping_products = dropping_products.merge(marketplace,on=['product_id','warehouse_id'],how='left')
dropping_products = dropping_products.merge(bensoliman[['product_id','ben_soliman_price']],on=['product_id'],how='left')
dropping_products = dropping_products.drop(columns = 'region')
dropping_products = dropping_products.merge(scrapped_prices,on=['product_id','warehouse_id'],how='left')
dropping_products = dropping_products.drop(columns = 'region')
dropping_products = dropping_products.merge(zerorr,on=['product_id','warehouse_id'],how='left')

In [28]:
dropping_products = dropping_products.merge(warehouse_region,on=['warehouse_id'])
dropping_products = dropping_products.merge(stats,on=['product_id','region'],how='left')
dropping_products = dropping_products.merge(brand_cat_target,on=['brand','cat'],how='left')
dropping_products = dropping_products.merge(cat_target,on=['cat'],how='left')
dropping_products['Target_margin'] = dropping_products['target_bm'].fillna(dropping_products['cat_target_margin'])
dropping_products = dropping_products[[ 'warehouse_id','district_id','district_name','product_id','sku','brand','cat', 'prev_all_day', 'prev_uth',
       'prev_last_hour', 'current_all_day', 'current_uth', 'current_last_hour','product_UTH_growth', 'product_LH_growth',
       'product_closing_growth','doh','wac_p','price','maxab_good_price', 'final_min_price', 'final_max_price',
       'final_mod_price', 'final_true_min', 'final_true_max',
       'ben_soliman_price','optimal_bm', 'min_boundary',
       'max_boundary', 'median_bm','Target_margin','min_scrapped','max_scrapped','median_scrapped','zero_rr']]

In [29]:
dropping_products = dropping_products.merge(last_day,on=['product_id','warehouse_id','district_id','district_name'],how='left')
dropping_products[['last_d_nmv','sku_disc_cntrb','quant_disc_cntrb','sku_disc_price','quantity_price']] = dropping_products[['last_d_nmv','sku_disc_cntrb','quant_disc_cntrb','sku_disc_price','quantity_price']].fillna(0)
dropping_products=dropping_products.drop_duplicates()

---

## 4. Price Optimization Engine

This section calculates optimal discount prices using multiple pricing signals and business rules.

### 4.1 Price Selection Algorithm

The `select_price_optimized` function evaluates prices from multiple sources:
- **Marketplace prices** (min, max, mod, true_min, true_max)
- **Competitor prices** (Ben Soliman, scraped data)
- **Internal benchmarks** (Maxab good prices from growing products)
- **Margin targets** (optimal, min_boundary, max_boundary, median)

**Decision Logic:**
1. For zero running rate or overstock (DOH > 45): Aggressive pricing to clear stock
2. For normal products: Select from "Listed" prices meeting margin criteria
3. Fallback: Calculate weighted average from acceptable prices ("induced" pricing)


In [30]:
def select_price_optimized(remaining_prices, price, wac, Target_margin, min_boundary, zero_rr, doh):
    """
    Optimized price selection function using numpy for faster computation.
    Returns (target_price, source)
    """
    target_price = 0.0
    source = ''
    current_margin = (price - wac) / price if price != 0 else 0
    
    # Convert to numpy array for vectorized operations
    stocks_pricing_list = np.array(remaining_prices + [wac / (1 - (Target_margin * 0.65))])
    stocks_pricing_list = np.sort(stocks_pricing_list)
    
    is_zero_rr = not np.isnan(zero_rr)
    is_overstock = doh > 45
    
    if is_zero_rr or is_overstock:
        source = 'Zero_rr' if is_zero_rr else 'OS'
        
        # Vectorized: find first price where new_price >= wac and diff <= -0.05
        diffs = (stocks_pricing_list - price) / price
        valid_mask = (stocks_pricing_list >= wac*0.9) & (diffs >= -0.05)
        valid_prices = stocks_pricing_list[valid_mask]
        
        if len(valid_prices) > 0:
            target_price = valid_prices[0]
        elif current_margin > Target_margin and current_margin - Target_margin > 0.0025:
            target_price = wac / (1 - Target_margin)
        elif current_margin > min_boundary and current_margin - min_boundary > 0.0025:
            target_price = wac / (1 - min_boundary)
        elif current_margin > Target_margin / 2 and current_margin - Target_margin / 2 > 0.0025:
            target_price = wac / (1 - (Target_margin / 2))
    else:
        remaining_arr = np.array(remaining_prices)
        if len(remaining_arr) > 0:
            # Vectorized margin calculations
            new_margins = np.where(remaining_arr != 0, (remaining_arr - wac) / remaining_arr, 0)
            diffs = (remaining_arr - price) / price if price != 0 else np.zeros_like(remaining_arr)
            
            # Find valid prices (reverse order - largest first that meets criteria)
            valid_mask = (remaining_arr >= wac*0.9) & (diffs >= -0.05)&(diffs <= -0.0025)
            
            valid_indices = np.where(valid_mask)[0]
            if len(valid_indices) > 0:
                # Get the last valid index (was iterating in reverse)
                target_price = remaining_arr[valid_indices[-1]]
                source = 'Listed'
            else:
                # Find acceptable prices (positive margin)
                acceptable_mask = remaining_arr >= wac*0.9
                acceptable = remaining_arr[acceptable_mask]
                
                if len(acceptable) > 1:
                    # Vectorized distance-weighted average
                    price_diffs = np.abs(price - acceptable)
                    # Avoid division by zero
                    price_diffs = np.where(price_diffs == 0, 1e-10, price_diffs)
                    distances = 1 / price_diffs
                    weights = distances / np.sum(distances)
                    final_value = np.sum(weights * acceptable)
                    target_price = max(final_value, wac / (1 - (0.3 * Target_margin)))
                    source = 'induced_1'
                elif len(acceptable) == 1:
                    final_value = (0.3 * acceptable[0]) + (0.7 * price)
                    target_price = max(final_value, wac / (1 - (0.3 * Target_margin)))
                    source = 'induced_2'
    
    return target_price, source


def process_row(row):
    """
    Process a single row to determine selected price and source.
    Designed for use with DataFrame.apply()
    """
    wac = row['wac_p']
    price = row['price']
    doh = row['doh']
    Target_margin = row['Target_margin']
    min_boundary = row['min_boundary'] if not pd.isna(row['min_boundary']) else 0
    zero_rr = row['zero_rr']
    
    # Safely compute prices, handling edge cases
    def safe_price(margin):
        if pd.isna(margin) or margin == 1:
            return np.nan
        return wac / (1 - margin)
    
    # Build prices list
    prices_list = [
        row['maxab_good_price'], row['final_min_price'], row['final_max_price'],
        row['final_mod_price'], row['final_true_min'], row['final_true_max'],
        row['ben_soliman_price'],
        safe_price(row['optimal_bm']),
        safe_price(row['min_boundary']),
        safe_price(row['max_boundary']),
        safe_price(row['median_bm']),
        safe_price(Target_margin),
        row['min_scrapped'], row['max_scrapped'], row['median_scrapped']
    ]
    
    # Clean prices - remove 0, nan, and duplicates
    cleaned_prices = list({x for x in prices_list if x != 0 and not pd.isna(x) and np.isfinite(x)})
    
    qd_cntrb = row['quant_disc_cntrb']
    sd_cntrb = row['sku_disc_cntrb']
    qd_price = row['quantity_price']
    sd_price = row['sku_disc_price']
    ld_nmv = row['last_d_nmv']
    prev_nmv = row['prev_all_day']
    
    qd_discount = ((qd_price - price) / price) * -1 if price != 0 else 0
    sku_discount = ((sd_price - price) / price) * -1 if price != 0 else 0
    
    # Check previous discount conditions
    if ld_nmv > (prev_nmv * 1.15) and (
        ((qd_cntrb > 0) and (qd_cntrb > sd_cntrb) and (qd_discount < Target_margin * 0.25)) or 
        ((sd_cntrb > 0) and (qd_cntrb < sd_cntrb) and (sku_discount < Target_margin * 0.25))
    ):
        if qd_cntrb > sd_cntrb and qd_cntrb > 0 and qd_price > 0 and qd_price > wac:
            return pd.Series({'selected_price': qd_price, 'source': 'Prev_disc'})
        elif qd_cntrb < sd_cntrb and sd_cntrb > 0 and sd_price > 0 and sd_price > wac:
            return pd.Series({'selected_price': sd_price, 'source': 'Prev_disc'})
        elif sd_price > wac or qd_price > wac:
            return pd.Series({'selected_price': max(qd_price, sd_price), 'source': 'Prev_disc'})
    
    # Determine remaining prices based on discount conditions
    if qd_cntrb > sd_cntrb and qd_price > 0 and qd_discount <= Target_margin * 0.35:
        remaining_prices = [x for x in cleaned_prices if x < qd_price and x < price]
    elif qd_cntrb < sd_cntrb and sd_price > 0 and sku_discount <= Target_margin * 0.35:
        remaining_prices = [x for x in cleaned_prices if x < sd_price and x < price]
    else:
        remaining_prices = [x for x in cleaned_prices if x < price]
    
    remaining_prices.sort()
    
    selected_price, source = select_price_optimized(
        remaining_prices, price, wac, Target_margin, min_boundary, zero_rr, doh
    )
    
    return pd.Series({'selected_price': selected_price, 'source': source})    
            

### 4.2 Execute Price Optimization
Apply the pricing algorithm to all selected products.


In [31]:
# OPTIMIZED: Using apply() instead of iterrows() + concat
# This is ~10-50x faster than the previous implementation

print(f"Processing {len(dropping_products):,} products...")

# Enable progress bar for apply
tqdm.pandas(desc="Processing prices")

# Apply the optimized row processing function
result_cols = dropping_products.progress_apply(process_row, axis=1)

# Combine original data with results
product_final_df = dropping_products.copy()
product_final_df['selected_price'] = result_cols['selected_price']
product_final_df['source'] = result_cols['source']

print(f"‚úì Processed {len(product_final_df):,} products")



Processing 166,807 products...


Processing prices: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 166807/166807 [01:02<00:00, 2686.39it/s]

‚úì Processed 166,807 products





In [32]:
product_final_df.district_id.nunique()

471

### 4.3 Calculate Discounts & Filter Results
Convert selected prices to discount percentages and apply final filters.


In [33]:
product_final_df['discount'] = abs((product_final_df['selected_price']-product_final_df['price'])/product_final_df['price'])
product_final_df = product_final_df[(product_final_df['discount'] > 0.0025)&(product_final_df['selected_price']>0)]
product_final_df['discount'] = product_final_df['discount']*100000
product_final_df['discount'] = ((product_final_df['discount']//10)+1)/10000
product_final_df['discount'] = np.minimum(product_final_df['discount'],0.05)
product_final_df['discount']=product_final_df['discount']*100
product_final_df['discount'] = product_final_df['discount'].apply(lambda x: f"{x:.2f}")

In [34]:
product_final_df = product_final_df[~product_final_df['cat'].isin(['ŸÉÿ±Ÿàÿ™ ÿ¥ÿ≠ŸÜ'])]
product_final_df = product_final_df[~product_final_df['brand'].isin(['ŸÅŸäŸàÿ±Ÿä','ÿßŸÑÿπÿ±Ÿàÿ≥ÿ©'])]

In [35]:
# =============================================================================
# DISCOUNT & MARGIN ANALYSIS
# =============================================================================

# Create analysis dataframe
analysis_df = product_final_df.copy()

# Convert discount from string to numeric (it's stored as "2.50" format)
analysis_df['discount_pct'] = pd.to_numeric(analysis_df['discount'], errors='coerce')

# Calculate margins
analysis_df['current_margin'] = (analysis_df['price'] - analysis_df['wac_p']) / analysis_df['price']
analysis_df['new_margin'] = (analysis_df['selected_price'] - analysis_df['wac_p']) / analysis_df['selected_price']
analysis_df['margin_change'] = analysis_df['new_margin'] - analysis_df['current_margin']

print("=" * 80)
print("üìä HAPPY HOUR DISCOUNT ANALYSIS REPORT")
print("=" * 80)

# =============================================================================
# 1. OVERVIEW STATISTICS
# =============================================================================
print("\n" + "‚îÄ" * 80)
print("1Ô∏è‚É£  OVERVIEW STATISTICS")
print("‚îÄ" * 80)

total_skus = len(analysis_df)
unique_products = analysis_df['product_id'].nunique()
unique_warehouses = analysis_df['warehouse_id'].nunique()
unique_districts = analysis_df['district_id'].nunique()

print(f"\nüì¶ Total SKU-Warehouse-District combinations: {total_skus:,}")
print(f"üì¶ Unique Products: {unique_products:,}")
print(f"üè≠ Unique Warehouses: {unique_warehouses:,}")
print(f"üìç Unique Districts: {unique_districts:,}")

# =============================================================================
# 2. DISCOUNT ANALYSIS
# =============================================================================
print("\n" + "‚îÄ" * 80)
print("2Ô∏è‚É£  DISCOUNT ANALYSIS")
print("‚îÄ" * 80)

avg_discount = analysis_df['discount_pct'].mean()
median_discount = analysis_df['discount_pct'].median()
min_discount = analysis_df['discount_pct'].min()
max_discount = analysis_df['discount_pct'].max()
std_discount = analysis_df['discount_pct'].std()

print(f"\nüìâ Average Discount: {avg_discount:.2f}%")
print(f"üìâ Median Discount: {median_discount:.2f}%")
print(f"üìâ Min Discount: {min_discount:.2f}%")
print(f"üìâ Max Discount: {max_discount:.2f}%")
print(f"üìâ Std Deviation: {std_discount:.2f}%")

# Discount distribution buckets
print("\nüìä Discount Distribution:")
discount_bins = [0, 1, 2, 3, 4, 5, 100]
discount_labels = ['0-1%', '1-2%', '2-3%', '3-4%', '4-5%', '>5%']
analysis_df['discount_bucket'] = pd.cut(analysis_df['discount_pct'], bins=discount_bins, labels=discount_labels, right=True)
discount_dist = analysis_df['discount_bucket'].value_counts().sort_index()
for bucket, count in discount_dist.items():
    pct = (count / total_skus) * 100
    bar = "‚ñà" * int(pct / 2)
    print(f"   {bucket:>6}: {count:>6,} ({pct:>5.1f}%) {bar}")

# =============================================================================
# 3. MARGIN ANALYSIS
# =============================================================================
print("\n" + "‚îÄ" * 80)
print("3Ô∏è‚É£  MARGIN ANALYSIS")
print("‚îÄ" * 80)

avg_current_margin = analysis_df['current_margin'].mean() * 100
avg_new_margin = analysis_df['new_margin'].mean() * 100
avg_margin_change = analysis_df['margin_change'].mean() * 100

print(f"\nüìà Average Current Margin: {avg_current_margin:.2f}%")
print(f"üìà Average Margin After Discount: {avg_new_margin:.2f}%")
print(f"üìâ Average Margin Change: {avg_margin_change:.2f}%")

# Margin distribution after discount (1% increments)
print("\nüìä Margin After Discount Distribution:")
margin_bins = [-100, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 100]
margin_labels = ['<-5%', '-5 to -4%', '-4 to -3%', '-3 to -2%', '-2 to -1%', '-1 to 0%', 
                 '0-1%', '1-2%', '2-3%', '3-4%', '4-5%', '5-6%', '6-7%', '7-8%', '8-9%', 
                 '9-10%', '10-11%', '11-12%', '12-13%', '13-14%', '14-15%', '>15%']
analysis_df['margin_bucket'] = pd.cut(analysis_df['new_margin'] * 100, bins=margin_bins, labels=margin_labels, right=True)
margin_dist = analysis_df['margin_bucket'].value_counts().sort_index()
for bucket, count in margin_dist.items():
    if count > 0:  # Only show buckets with data
        pct = (count / total_skus) * 100
        bar = "‚ñà" * int(pct / 2)
        print(f"   {bucket:>12}: {count:>6,} ({pct:>5.1f}%) {bar}")

# =============================================================================
# 4. NEGATIVE MARGIN ANALYSIS (CRITICAL)
# =============================================================================
print("\n" + "‚îÄ" * 80)
print("4Ô∏è‚É£  ‚ö†Ô∏è  NEGATIVE MARGIN ANALYSIS (CRITICAL)")
print("‚îÄ" * 80)

negative_margin_df = analysis_df[analysis_df['new_margin'] < 0]
negative_margin_count = len(negative_margin_df)
negative_margin_pct = (negative_margin_count / total_skus) * 100

print(f"\nüî¥ SKUs with Negative Margin After Discount: {negative_margin_count:,} ({negative_margin_pct:.2f}%)")

if negative_margin_count > 0:
    avg_negative_margin = negative_margin_df['new_margin'].mean() * 100
    min_negative_margin = negative_margin_df['new_margin'].min() * 100
    print(f"üî¥ Average Negative Margin: {avg_negative_margin:.2f}%")
    print(f"üî¥ Worst Negative Margin: {min_negative_margin:.2f}%")
    
    # Top categories with negative margins
    print("\nüìä Negative Margin by Category:")
    neg_by_cat = negative_margin_df.groupby('cat').agg({
        'product_id': 'count',
        'new_margin': 'mean'
    }).rename(columns={'product_id': 'count', 'new_margin': 'avg_margin'})
    neg_by_cat['avg_margin'] = neg_by_cat['avg_margin'] * 100
    neg_by_cat = neg_by_cat.sort_values('count', ascending=False).head(10)
    for cat, row in neg_by_cat.iterrows():
        print(f"   {cat[:30]:>30}: {row['count']:>5} SKUs (avg margin: {row['avg_margin']:.2f}%)")
    
    # Top brands with negative margins
    print("\nüìä Negative Margin by Brand:")
    neg_by_brand = negative_margin_df.groupby('brand').agg({
        'product_id': 'count',
        'new_margin': 'mean'
    }).rename(columns={'product_id': 'count', 'new_margin': 'avg_margin'})
    neg_by_brand['avg_margin'] = neg_by_brand['avg_margin'] * 100
    neg_by_brand = neg_by_brand.sort_values('count', ascending=False).head(10)
    for brand, row in neg_by_brand.iterrows():
        print(f"   {brand[:30]:>30}: {row['count']:>5} SKUs (avg margin: {row['avg_margin']:.2f}%)")
else:
    print("‚úÖ No SKUs with negative margin after discount!")

# =============================================================================
# 5. ANALYSIS BY CATEGORY
# =============================================================================
print("\n" + "‚îÄ" * 80)
print("5Ô∏è‚É£  ANALYSIS BY CATEGORY")
print("‚îÄ" * 80)

cat_analysis = analysis_df.groupby('cat').agg({
    'product_id': 'count',
    'discount_pct': 'mean',
    'current_margin': 'mean',
    'new_margin': 'mean'
}).rename(columns={'product_id': 'sku_count'})
cat_analysis['current_margin'] = cat_analysis['current_margin'] * 100
cat_analysis['new_margin'] = cat_analysis['new_margin'] * 100
cat_analysis = cat_analysis.sort_values('sku_count', ascending=False)

print(f"\n{'Category':<35} {'SKUs':>8} {'Avg Disc':>10} {'Curr Mrgn':>12} {'New Mrgn':>12}")
print("-" * 80)
for cat, row in cat_analysis.head(15).iterrows():
    print(f"{cat[:34]:<35} {row['sku_count']:>8,} {row['discount_pct']:>9.2f}% {row['current_margin']:>11.2f}% {row['new_margin']:>11.2f}%")

# =============================================================================
# 6. ANALYSIS BY WAREHOUSE
# =============================================================================
print("\n" + "‚îÄ" * 80)
print("6Ô∏è‚É£  ANALYSIS BY WAREHOUSE")
print("‚îÄ" * 80)

wh_analysis = analysis_df.groupby('warehouse_id').agg({
    'product_id': 'count',
    'discount_pct': 'mean',
    'current_margin': 'mean',
    'new_margin': 'mean'
}).rename(columns={'product_id': 'sku_count'})
wh_analysis['current_margin'] = wh_analysis['current_margin'] * 100
wh_analysis['new_margin'] = wh_analysis['new_margin'] * 100
wh_analysis = wh_analysis.sort_values('sku_count', ascending=False)

print(f"\n{'Warehouse ID':>12} {'SKUs':>10} {'Avg Discount':>14} {'Curr Margin':>14} {'New Margin':>14}")
print("-" * 70)
for wh_id, row in wh_analysis.iterrows():
    print(f"{wh_id:>12} {row['sku_count']:>10,} {row['discount_pct']:>13.2f}% {row['current_margin']:>13.2f}% {row['new_margin']:>13.2f}%")

# =============================================================================
# 7. PRICING SOURCE ANALYSIS
# =============================================================================
print("\n" + "‚îÄ" * 80)
print("7Ô∏è‚É£  PRICING SOURCE ANALYSIS")
print("‚îÄ" * 80)

source_analysis = analysis_df.groupby('source').agg({
    'product_id': 'count',
    'discount_pct': 'mean',
    'new_margin': 'mean'
}).rename(columns={'product_id': 'sku_count'})
source_analysis['new_margin'] = source_analysis['new_margin'] * 100
source_analysis = source_analysis.sort_values('sku_count', ascending=False)

print(f"\n{'Source':<20} {'SKUs':>10} {'Avg Discount':>14} {'New Margin':>14}")
print("-" * 60)
for source, row in source_analysis.iterrows():
    print(f"{source:<20} {row['sku_count']:>10,} {row['discount_pct']:>13.2f}% {row['new_margin']:>13.2f}%")

# =============================================================================
# 8. SUMMARY TABLE
# =============================================================================
print("\n" + "=" * 80)
print("üìã EXECUTIVE SUMMARY")
print("=" * 80)

summary_data = {
    'Metric': [
        'Total SKU Combinations',
        'Unique Products',
        'Unique Warehouses',
        'Unique Districts',
        'Average Discount (%)',
        'Median Discount (%)',
        'Average Current Margin (%)',
        'Average New Margin (%)',
        'Margin Impact (%)',
        'SKUs with Negative Margin',
        'Negative Margin Rate (%)'
    ],
    'Value': [
        f"{total_skus:,}",
        f"{unique_products:,}",
        f"{unique_warehouses:,}",
        f"{unique_districts:,}",
        f"{avg_discount:.2f}",
        f"{median_discount:.2f}",
        f"{avg_current_margin:.2f}",
        f"{avg_new_margin:.2f}",
        f"{avg_margin_change:.2f}",
        f"{negative_margin_count:,}",
        f"{negative_margin_pct:.2f}"
    ]
}

summary_df = pd.DataFrame(summary_data)
print("\n")
print(summary_df.to_string(index=False))

# Save analysis to Excel
print("\n" + "‚îÄ" * 80)
print("üíæ Saving analysis to Excel...")
with pd.ExcelWriter('Main_V3_HH.xlsx', engine='openpyxl') as writer:
    # Main data
    product_final_df.to_excel(writer, sheet_name='Discount_Data', index=False)
    
    # Summary
    summary_df.to_excel(writer, sheet_name='Summary', index=False)
    
    # Category analysis
    cat_analysis.reset_index().to_excel(writer, sheet_name='By_Category', index=False)
    
    # Warehouse analysis
    wh_analysis.reset_index().to_excel(writer, sheet_name='By_Warehouse', index=False)
    
    # Source analysis
    source_analysis.reset_index().to_excel(writer, sheet_name='By_Source', index=False)
    
    # Negative margin SKUs
    if negative_margin_count > 0:
        negative_margin_df[['product_id', 'sku', 'brand', 'cat', 'warehouse_id', 'district_id', 
                           'price', 'selected_price', 'wac_p', 'discount_pct', 'current_margin', 
                           'new_margin']].to_excel(writer, sheet_name='Negative_Margin_SKUs', index=False)

print("‚úÖ Analysis saved to 'Main_V3_HH.xlsx' with multiple sheets!")
print("=" * 80)


üìä HAPPY HOUR DISCOUNT ANALYSIS REPORT

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
1Ô∏è‚É£  OVERVIEW STATISTICS
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

üì¶ Total SKU-Warehouse-District combinations: 151,190
üì¶ Unique Products: 2,457
üè≠ Unique Warehouses: 12
üìç Unique Districts: 471

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
2Ô∏è‚É£  DISCOUNT ANALYSIS
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

In [36]:
# Include district_id in the tuple for more precise retailer targeting
product_final_df['tuple'] = product_final_df[["product_id", 'warehouse_id', 'district_id']].apply(tuple, axis=1)
selected_skus_tuple = str(list(product_final_df['tuple']))[1:-1]
product_final_df = product_final_df.drop(columns='tuple')
print(f"‚úì Created {len(product_final_df):,} product-warehouse-district combinations")

‚úì Created 151,190 product-warehouse-district combinations


---

## 5. Retailer Targeting

This section identifies the most relevant retailers for each discounted product based on their purchase history and behavior.

### 5.1 Retailer Selection Criteria

Retailers are selected based on four behavioral signals:

| Signal | Description | Query |
|--------|-------------|-------|
| **Churned/Dropped** | Previously bought product but stopped (>60% drop) | `churned_dropped` |
| **Category Buyer** | Buys category but not this specific product | `cat_not_product` |
| **Out of Cycle** | Past purchase cycle exceeded expected timing | `out_of_cycle` |
| **Viewed, No Order** | Browsed brand/category but didn't purchase | `view_no_orders` |

### 5.2 Churned/Dropped Retailers
Find retailers who used to buy the product but have significantly reduced purchases.

In [37]:
query = f'''
with selected_prods as (
select * 
from(
VALUES
{selected_skus_tuple}
)x(product_id, warehouse_id, district_id)
),
sales_before as (
select retailer_id, product_id, warehouse_id, district_id, avg(nmv) as avg_nmv_before
from (
SELECT DISTINCT
    so.id as order_id,
    sp.district_id,
    sp.warehouse_id as warehouse_id,
    pso.product_id as product_id,
    so.retailer_id as retailer_id,
    sum(pso.total_price) as nmv 

FROM product_sales_order pso
JOIN sales_orders so ON so.id = pso.sales_order_id
JOIN materialized_views.retailer_polygon on materialized_views.retailer_polygon.retailer_id = so.retailer_id
JOIN districts on districts.id = materialized_views.retailer_polygon.district_id
JOIN selected_prods sp on sp.product_id = pso.product_id 
    AND sp.warehouse_id = pso.warehouse_id 
    AND sp.district_id = districts.id

WHERE True
    AND so.created_at::date between current_date - 120 and current_date - 31
    AND so.sales_order_status_id not in (7, 12)
    AND so.channel IN ('telesales', 'retailer')
    AND pso.purchased_item_count <> 0

GROUP BY ALL
)
group by all 
),
sales_after as (
select retailer_id, product_id, warehouse_id, district_id, avg(nmv) as avg_nmv_after, max(order_date) as last_order
from (
SELECT DISTINCT
    so.id as order_id,
    so.created_at::date as order_date,
    sales_order_status_id, 
    sp.district_id,
    sp.warehouse_id as warehouse_id,
    pso.product_id as product_id,
    so.retailer_id as retailer_id,
    sum(pso.total_price) as nmv 

FROM product_sales_order pso
JOIN sales_orders so ON so.id = pso.sales_order_id
JOIN materialized_views.retailer_polygon on materialized_views.retailer_polygon.retailer_id = so.retailer_id
JOIN districts on districts.id = materialized_views.retailer_polygon.district_id
JOIN selected_prods sp on sp.product_id = pso.product_id 
    AND sp.warehouse_id = pso.warehouse_id 
    AND sp.district_id = districts.id

WHERE True
    AND so.created_at::date > current_date - 31
    AND so.sales_order_status_id not in (7, 12)
    AND so.channel IN ('telesales', 'retailer')
    AND pso.purchased_item_count <> 0

GROUP BY ALL
)
group by all 
),
made_order as (
select distinct so.retailer_id

FROM sales_orders so 
JOIN materialized_views.retailer_polygon on materialized_views.retailer_polygon.retailer_id = so.retailer_id
JOIN districts on districts.id = materialized_views.retailer_polygon.district_id
JOIN selected_prods sp on sp.district_id = districts.id

WHERE True
    AND so.created_at::date >= current_date - 60
    AND so.sales_order_status_id not in (7, 12)
    AND so.channel IN ('telesales', 'retailer')

GROUP BY ALL
)

select distinct retailer_id, product_id, warehouse_id, district_id
from (
select sb.*, coalesce(avg_nmv_after, 0) as nmv_after, (nmv_after - avg_nmv_before) / avg_nmv_before as growth
from sales_before sb 
left join sales_after sa on sb.retailer_id = sa.retailer_id and sb.product_id = sa.product_id and sb.district_id = sa.district_id
left join made_order mo on mo.retailer_id = sa.retailer_id 
where growth < -0.3
and (current_date - last_order >= 5 or last_order is null)
and mo.retailer_id is not null 
)
'''
churned_dropped = snowflake_query("Egypt", query)
churned_dropped.columns = churned_dropped.columns.str.lower()
for col in churned_dropped.columns:
    churned_dropped[col] = pd.to_numeric(churned_dropped[col], errors='ignore')  
print(f"‚úì Churned/dropped retailers: {churned_dropped.retailer_id.nunique():,}")    

‚úì Churned/dropped retailers: 13,589


### 5.3 Category Buyers (Not This Product)
Find retailers who buy from the same category but haven't purchased this specific product.


In [38]:
query = f'''
with selected_prods as (
select * 
from(
VALUES
{selected_skus_tuple}
)x(product_id, warehouse_id, district_id)
),
selected_prods_with_cat as (
select distinct sp.warehouse_id, sp.product_id, sp.district_id, c.name_ar as cat, b.name_ar as brand
from selected_prods sp
join products p on p.id = sp.product_id
join brands b on b.id = p.brand_id 
join categories c on c.id = p.category_id 
),
selected_dis_cat_brand as (
select distinct warehouse_id, district_id, cat
from selected_prods_with_cat
),

buy_cat as (
SELECT DISTINCT
    sd.district_id,
    sd.warehouse_id as warehouse_id,
    so.retailer_id as retailer_id,
    c.name_ar as cat,
    b.name_ar as brand,
    pso.product_id

FROM product_sales_order pso
JOIN sales_orders so ON so.id = pso.sales_order_id
JOIN materialized_views.retailer_polygon on materialized_views.retailer_polygon.retailer_id = so.retailer_id
JOIN districts on districts.id = materialized_views.retailer_polygon.district_id
JOIN products p on p.id = pso.product_id
JOIN brands b on b.id = p.brand_id 
JOIN categories c on c.id = p.category_id 
JOIN selected_dis_cat_brand sd on sd.cat = c.name_ar and sd.district_id = districts.id

WHERE True
    AND so.created_at::date >= current_date - 60
    AND so.sales_order_status_id not in (7, 12)
    AND so.channel IN ('telesales', 'retailer')
    AND pso.purchased_item_count <> 0
),
chosen_products as (
select sp.*, c.name_ar as cat, b.name_ar as brand
from selected_prods sp 
join products p on p.id = sp.product_id
join brands b on b.id = p.brand_id 
join categories c on c.id = p.category_id 
)
select distinct retailer_id, selected_product_id as product_id, warehouse_id, selected_district_id as district_id
from (
select warehouse_id, district_id, retailer_id, cat, brand, selected_product_id, selected_district_id, max(flag) as flag
from (
select bc.*, cp.product_id as selected_product_id, cp.district_id as selected_district_id,
    case when cp.product_id = bc.product_id then 1 else 0 end as flag 
from buy_cat bc 
left join chosen_products cp on cp.warehouse_id = bc.warehouse_id and cp.cat = bc.cat and cp.district_id = bc.district_id
)
group by all 
)
where flag = 0 
'''
cat_not_product = snowflake_query("Egypt", query)
cat_not_product.columns = cat_not_product.columns.str.lower()
for col in cat_not_product.columns:
    cat_not_product[col] = pd.to_numeric(cat_not_product[col], errors='ignore') 
print(f"‚úì Category buyers (not product): {cat_not_product.retailer_id.nunique():,}")   

‚úì Category buyers (not product): 54,430


### 5.4 Out of Cycle Retailers
Find retailers whose regular purchase cycle for this product has expired.


In [39]:
query = f'''
with selected_prods as (
select * 
from(
VALUES
{selected_skus_tuple}
)x(product_id, warehouse_id, district_id)
)
select retailer_id, product_id, warehouse_id, district_id
from (
select *, last_o_date + floor(avg_cycle + (2.5 * std))::int as next_order
from(
select retailer_id, product_id, warehouse_id, district_id, max(last_o_date) as last_o_date, 
    sum(order_days * (w / all_w)) as avg_cycle, stddev(order_days) as std
from (
select *,
    max(order_num) over(partition by retailer_id, product_id, district_id) as max_orders,
    lag(o_date) over(partition by product_id, retailer_id, district_id order by o_date) as prev_order,
    o_date - prev_order as order_days,
    case when current_date - o_date = 0 then 1 else 1 / (CURRENT_DATE - o_date) end as w,
    sum(w) over(partition by product_id, retailer_id, district_id) as all_w
from (
SELECT DISTINCT
    so.id as order_id,
    so.created_at::date as o_date,
    sp.district_id,
    sp.warehouse_id as warehouse_id,
    pso.product_id as product_id,
    so.retailer_id as retailer_id,
    sum(pso.total_price) as nmv,
    row_number() over(partition by so.retailer_id, pso.product_id, sp.district_id order by o_date desc) as order_num,
    max(o_date) over(partition by so.retailer_id, pso.product_id, sp.district_id) as last_o_date

FROM product_sales_order pso
JOIN sales_orders so ON so.id = pso.sales_order_id
JOIN materialized_views.retailer_polygon on materialized_views.retailer_polygon.retailer_id = so.retailer_id
JOIN districts on districts.id = materialized_views.retailer_polygon.district_id
JOIN selected_prods sp on sp.product_id = pso.product_id 
    AND sp.warehouse_id = pso.warehouse_id 
    AND sp.district_id = districts.id

WHERE so.created_at::date >= date_trunc('month', current_date - interval '1 year')
    AND so.sales_order_status_id not in (7, 12)
    AND so.channel IN ('telesales', 'retailer')
    AND pso.purchased_item_count <> 0
GROUP BY 1, 2, 3, 4, 5, 6
)
where last_o_date >= current_date - 60
qualify max_orders >= 4
)
where prev_order is not null 
group by all
)
where CURRENT_DATE >= next_order
)
'''
out_of_cycle = snowflake_query("Egypt", query)
out_of_cycle.columns = out_of_cycle.columns.str.lower()
for col in out_of_cycle.columns:
    out_of_cycle[col] = pd.to_numeric(out_of_cycle[col], errors='ignore')  
print(f"‚úì Out of cycle retailers: {out_of_cycle.retailer_id.nunique():,}")   

‚úì Out of cycle retailers: 5,949


### 5.5 Viewed But Didn't Order
Find retailers who viewed the brand/category in the app but didn't complete a purchase.


In [40]:
query = f'''
with selected_prods as (
select * 
from(
VALUES
{selected_skus_tuple}
)x(product_id, warehouse_id, district_id)
),
selected_prods_with_brand_cat as (
select distinct sp.warehouse_id, sp.district_id, c.id as cat_id, b.id as brand_id, sp.product_id
from selected_prods sp
join products p on p.id = sp.product_id 
join brands b on b.id = p.brand_id 
join categories c on c.id = p.category_id
),
brand_open as (
select        
    event_date,
    event_timestamp,
    vb.retailer_id,
    vb.brand_id,
    vb.brand_name,
    vb.category_id,
    c.name_ar as cat_name

FROM maxab_events.view_brand vb
join categories c on c.id = vb.category_id
WHERE event_timestamp::date between CURRENT_DATE - 10 and CURRENT_DATE - 2
    AND country LIKE '%Egypt%'
    AND user_id LIKE '%EG_retailers_%'
    and brand_id <> 'null'
),
add_to_cart as (
SELECT 
    event_date,
    event_timestamp,
    uc.retailer_id,
    productsid AS product_id,
    b.id as brand_id
FROM maxab_events.update_cart uc
join products p on p.id = uc.productsid 
join brands b on b.id = p.brand_id 
WHERE event_timestamp::date between CURRENT_DATE - 10 and CURRENT_DATE - 2
    AND country LIKE '%Egypt%'
    AND update_type = 'add'
    AND user_id LIKE '%EG_retailers_%'
    AND productsid REGEXP '^[0-9]+$'
),
in_stock_retailers as(
select distinct retailer_id 
from sales_orders 
where sales_order_status_id = 6 
and channel in ('retailer', 'telesales')
and created_at::date >= date_trunc('month', current_date - interval '6 months')
),
sales_data as (
select so.retailer_id, b.name_ar as brand, c.name_ar as cat, max(so.created_at::date) as o_date
from sales_orders so
join PRODUCT_SALES_ORDER pso on pso.sales_order_id = so.id 
join products p on p.id = pso.product_id
join brands b on b.id = p.brand_id 
join categories c on c.id = p.category_id
where so.created_at::date >= CURRENT_DATE - 10  
and sales_order_status_id not in (7, 12)
group by all
),
cat_brand as (
select distinct c.id as cat, b.id as brand 
from sales_orders so
join PRODUCT_SALES_ORDER pso on pso.sales_order_id = so.id 
join products p on p.id = pso.product_id
join brands b on b.id = p.brand_id 
join categories c on c.id = p.category_id
where so.created_at::date >= CURRENT_DATE - 120 
and sales_order_status_id not in (7, 12)
),
main_cte as (
select * 
from (
select x.*, case when sd.retailer_id is not null then 1 else 0 end as ordered 
from (
select *, max(event_date) over(partition by retailer_id, brand_id, category_id) as last_event
from (
select event_date, retailer_id, brand_id, brand_name, category_id,
    cat_name, sum(count_n) as total_count
from (
select bo.*, count(distinct atc.product_id) as count_n
from brand_open bo 
join cat_brand cb on bo.category_id = cb.cat and bo.brand_id = cb.brand
join in_stock_retailers isr on isr.retailer_id = bo.retailer_id 
left join add_to_cart atc on bo.retailer_id = atc.retailer_id and bo.brand_id = atc.brand_id and atc.event_timestamp >= bo.event_timestamp
group by all 
)
group by all 
)
qualify event_date = last_event
)x 
left join sales_data sd on sd.retailer_id = x.retailer_id and x.cat_name = sd.cat and x.brand_name = sd.brand and x.event_date <= sd.o_date
)
where ordered = 0 and total_count = 0 
)
select distinct m.retailer_id, sp.product_id, sp.warehouse_id, sp.district_id
from main_cte m 
JOIN materialized_views.retailer_polygon on materialized_views.retailer_polygon.retailer_id = m.retailer_id
JOIN districts on districts.id = materialized_views.retailer_polygon.district_id
JOIN selected_prods_with_brand_cat sp on sp.district_id = districts.id and sp.brand_id = m.brand_id and sp.cat_id = m.category_id
'''
view_no_orders = snowflake_query("Egypt", query)
view_no_orders.columns = view_no_orders.columns.str.lower()
for col in view_no_orders.columns:
    view_no_orders[col] = pd.to_numeric(view_no_orders[col], errors='ignore')  
print(f"‚úì View but no orders retailers: {view_no_orders.retailer_id.nunique():,}")   

‚úì View but no orders retailers: 31,990


### 5.6 Retailer Exclusions
Exclude retailers who are inactive, have recent failed orders, or are wholesale accounts.


In [41]:
query = f'''
select retailer_id
from (
SELECT  DISTINCT
retailer_id,
sales_order_status_id,
created_at::date as o_date ,
max(o_date)over(partition by retailer_id) as last_order
from sales_orders so 
WHERE  so.created_at ::date >= current_date - 120
AND so.sales_order_status_id not in (7,12)
AND so.channel IN ('telesales','retailer')
qualify o_date = last_order
)
where sales_order_status_id not in (6,9,12)
union all 
select id as retailer_id 
from retailers 
where activation = 'false'
union all 
select distinct dta.TAGGABLE_ID as retailer_id
from DYNAMIC_TAGS dt 
join dynamic_taggables dta on dt.id = dta.dynamic_tag_id 
where name like '%whole_sale%'
and dt.id > 3000
union all 
select distinct f.value::int as retailer_id 
from SKU_DISCOUNTS sd,
LATERAL FLATTEN(
    input => SPLIT(
        REPLACE(REPLACE(REPLACE(sd.retailer_ids, '{{', ''), '}}', ''), '"', ''),
        ','
    )
) f
where active = 'true'
and CONVERT_TIMEZONE('{zone_to_use}', 'Africa/Cairo', CURRENT_TIMEstamp()) between start_at and end_at

'''
exec_rets = snowflake_query("Egypt", query)
exec_rets.columns = exec_rets.columns.str.lower()
for col in exec_rets.columns:
    exec_rets[col] = pd.to_numeric(exec_rets[col], errors='ignore') 
exec_rets = exec_rets.retailer_id.unique() 
print(f"‚úì Excluded retailers: {len(exec_rets):,}")

‚úì Excluded retailers: 125,126


### 5.7 Active Quantity Discounts
Check for existing quantity discounts to avoid conflicts with SKU discounts.


In [42]:
try:
    query = f'''
    SELECT DISTINCT
        qdv.product_id,
        qd.dynamic_tag_id AS tag_id
    FROM quantity_discounts qd
    JOIN quantity_discount_values qdv 
        ON qd.id = qdv.quantity_discount_id
    WHERE ((CURRENT_TIMESTAMP AT TIME ZONE 'Africa/Cairo'
          BETWEEN qd.start_at AND qd.end_at) or ((qd.start_at::date = current_date) and (CURRENT_TIMESTAMP AT TIME ZONE 'Africa/Cairo' < qd.start_at)))
    AND qd.active = TRUE
    '''
    quantity_data =  setup_environment_2.dwh_pg_query(query, columns = ['product_id','tag_id'])
    quantity_data.columns = quantity_data.columns.str.lower()
    for col in quantity_data.columns:
        quantity_data[col] = pd.to_numeric(quantity_data[col], errors='ignore')     

    qd_data = quantity_data.copy()[['tag_id']].drop_duplicates()
    qd_data['tuple'] = "("+qd_data['tag_id'].astype(str)+")"
    qd_data = qd_data['tuple'].unique()
    qd_list = ''
    for c in qd_data:
        qd_list = qd_list+c+","
    qd_list = qd_list[:-1]

    query = f'''
    with tags as (
    select *
    from(
    values
    {qd_list}
    )x(dynamic_tag_id)

    )

    select tags.dynamic_tag_id as tag_id,taggable_id as retailer_id
    from dynamic_taggables dt  
    join tags on tags.dynamic_tag_id = dt.dynamic_tag_id
    '''
    qd_rets = snowflake_query("Egypt", query)
    for col in qd_rets.columns:
        qd_rets[col] = pd.to_numeric(qd_rets[col], errors='ignore')  

    quantity_data = quantity_data.merge(qd_rets, on='tag_id')
    quantity_data['have_quantity'] = 1
    print(f"‚úì Found {len(quantity_data):,} active quantity discounts")
except:
    quantity_data = pd.DataFrame(columns=['product_id', 'tag_id', 'retailer_id', 'have_quantity'])
    print("‚ö† No active quantity discounts found")

‚úì Found 7,469,627 active quantity discounts


In [43]:
# Fetch packing unit mappings for discount output formatting
query = '''
SELECT DISTINCT product_id, packing_unit_id 
FROM packing_unit_products
WHERE product_id <> 1309 OR (product_id = 1309 AND packing_unit_id <> 23)
'''
pus = snowflake_query("Egypt", query)
for col in pus.columns:
    pus[col] = pd.to_numeric(pus[col], errors='ignore')
print(f"‚úì Loaded {len(pus):,} packing unit mappings")     

‚úì Loaded 34,810 packing unit mappings


In [44]:
query ='''
select retailer_id,warehouse_id,1 as last_wh 
from (
SELECT  DISTINCT
		so.retailer_id,
		pso.warehouse_id,
		so.created_at::date as o_date,
		max(so.created_at::date) over(partition by so.retailer_id) as max_date

FROM product_sales_order pso
JOIN sales_orders so ON so.id = pso.sales_order_id
JOIN products on products.id=pso.product_id
JOIN brands on products.brand_id = brands.id 
JOIN categories ON products.category_id = categories.id
JOIN finance.all_cogs f  ON f.product_id = pso.product_id
                        AND f.from_date::date <= so.created_at ::date
                        AND f.to_date::date > so.created_at ::date
JOIN product_units ON product_units.id = products.unit_id  


WHERE  so.created_at::date >= current_date - 365
    AND so.sales_order_status_id not in (7,12)
    AND so.channel IN ('telesales','retailer')
    AND pso.purchased_item_count <> 0
	and pso.warehouse_id in (1,8,170,236,337,339,401,501,632,703,797,962)

GROUP BY 1,2,3
qualify o_date = max_date
)
'''
ret_wh = snowflake_query("Egypt", query)
for col in ret_wh.columns:
    ret_wh[col] = pd.to_numeric(ret_wh[col], errors='ignore')
print(f"‚úì Loaded warehouse history for {ret_wh.retailer_id.nunique():,} retailers")       

‚úì Loaded warehouse history for 114,708 retailers


### 5.8 Combine & Filter Retailers
Combine all retailer segments and apply final filters.


In [45]:
# Combine all retailer sources - now including district_id
all_retailers = pd.concat([cat_not_product, churned_dropped]).drop_duplicates().reset_index(drop=True)
all_retailers = pd.concat([all_retailers, out_of_cycle]).drop_duplicates().reset_index(drop=True)
all_retailers = pd.concat([all_retailers, view_no_orders]).drop_duplicates().reset_index(drop=True)

# Merge with last warehouse info
all_retailers = all_retailers.merge(ret_wh, on=['retailer_id', 'warehouse_id'], how='left')
all_retailers = all_retailers.fillna(0)

# Rank and filter
all_retailers['rank'] = all_retailers.groupby(['retailer_id'])['last_wh'].rank(method='dense', ascending=False).astype(int)
all_retailers = all_retailers[all_retailers['rank'] == 1]
all_retailers = all_retailers[~(all_retailers['retailer_id'].isin(exec_rets))]

print(f"‚úì Total unique retailers: {all_retailers.retailer_id.nunique():,}")

‚úì Total unique retailers: 53,873


In [46]:
# Select required columns including district_id
product_final_df = product_final_df[['product_id', 'warehouse_id', 'district_id', 'discount']]

# Merge with retailers - now matching on district_id as well for precision
final_df = product_final_df.merge(
    all_retailers[['warehouse_id', 'product_id', 'district_id', 'retailer_id']], 
    on=['warehouse_id', 'product_id', 'district_id']
)

# Filter out retailers with active quantity discounts
final_df = final_df.merge(quantity_data, on=['retailer_id', 'product_id'], how='left')
final_df = final_df[final_df['have_quantity'].isna()]

print(f"‚úì Final retailer-product combinations: {len(final_df):,}")

‚úì Final retailer-product combinations: 6,507,552


In [47]:
final_df.retailer_id.nunique()

53723

In [None]:
#final_df = final_df.groupby(['product_id', 'warehouse_id', 'district_id', 'retailer_id'])['discount'].min().reset_index()

In [48]:
final_df = final_df.sort_values('discount').drop_duplicates(
    subset=['product_id', 'warehouse_id', 'district_id', 'retailer_id']
    
)
final_df

Unnamed: 0,product_id,warehouse_id,district_id,discount,retailer_id,tag_id,have_quantity
2879090,11643,797,669,0.26,244804,,
5852594,10939,1,587,0.26,509221,,
5852595,10939,1,587,0.26,286374,,
5852596,10939,1,587,0.26,769990,,
5852597,10939,1,587,0.26,99843,,
...,...,...,...,...,...,...,...
3833358,10466,962,580,5.00,165276,,
3833359,10466,962,580,5.00,698574,,
3833360,10466,962,580,5.00,587615,,
3833353,10466,962,580,5.00,104371,,


---

## 6. Output Generation

This section prepares the discount data for upload to the pricing system.

### 6.1 Prepare Discount Data
Format the discount information for each retailer-product combination.


In [49]:
final_df = final_df.merge(pus,on='product_id')
final_df= final_df.drop_duplicates()
final_df['HH_data'] = '['+(final_df['product_id']).astype(str)+','+(final_df['packing_unit_id']).astype(str)+','+(final_df['discount']).astype(str)+']'

In [50]:
slots = ['0-12','13-17','18-23']
local_tz = pytz.timezone('Africa/Cairo')
current_hour = datetime.now(local_tz).hour
chosen_slot = [np.nan,np.nan]

for slot in slots:
    parts = slot.split("-")
    if(current_hour >= int(parts[0]) and current_hour < int(parts[1])):
        chosen_slot[0] = int(parts[0]) 
        chosen_slot[1] = int(parts[1]) 
        break
    else:
        chosen_slot[0] = 0
        chosen_slot[1] = 0 
        
today = datetime.now(local_tz)
start_hour = np.maximum(current_hour,chosen_slot[0])
if(start_hour==current_hour):
    if ((datetime.now(local_tz).minute) +10) <60:
        start_mins =  ((datetime.now(local_tz).minute) +10)
    else:
        start_mins =  ((datetime.now(local_tz).minute) +10)-60
else:
    start_mins = 30 
if ((datetime.now(local_tz).minute) +10) > 60:
    start_hour =start_hour+1
    
start_date = (today.replace(hour=start_hour, minute=start_mins, second=0, microsecond=0)+ timedelta(minutes=0)).strftime('%d/%m/%Y %H:%M')
end_date = ((today+ timedelta(days=1)).replace(hour=12, minute=59, second=0, microsecond=0)).strftime('%d/%m/%Y %H:%M')
print(start_date,end_date)

21/01/2026 14:32 22/01/2026 12:59


In [51]:
output_df  = final_df.groupby('retailer_id')['HH_data'].apply(list).reset_index()
output_df['Discounts']= output_df['HH_data'].astype(str).str.replace("'",'').str.replace(' ','')
output_df = output_df.groupby('Discounts')['retailer_id'].agg(list).reset_index()
output_df['Arabic Offer Name']= 'ÿÆÿµŸàŸÖÿßÿ™ ÿ≠ÿµÿ±Ÿäÿ©'
output_df['Start Date/Time'] = start_date
output_df['End Date/Time'] = end_date
output_df = output_df[['retailer_id','Start Date/Time','End Date/Time','Discounts','Arabic Offer Name']]
output_df['French Offer Name']=np.nan
output_df['English Offer Name']=np.nan

In [52]:
data = []
for i,row in output_df.iterrows():
    
    start_date = row['Start Date/Time']
    end_date = row['End Date/Time']
    retailers = row['retailer_id']
    discount = row['Discounts']
    name = row['Arabic Offer Name'] 
    name_f = row['French Offer Name'] 
    name_e = row['English Offer Name'] 
    
    length = len(retailers)
    if(length>100):
        iters = length//100
        remaining = length%100
        for j in range(0,iters+1):
            if(j<=iters):
                start = (j*100)
                end = (j+1)*100
                rets = retailers[start:end]
                data.append({'Discounts':discount,'retailer_id':rets,'Start Date/Time':start_date,'End Date/Time':end_date
                            ,'Arabic Offer Name':name,'French Offer Name':name_f,'English Offer Name':name_e})
            else:
                print("else new")
            
    else:
        data.append({'Discounts':discount,'retailer_id':retailers,'Start Date/Time':start_date,'End Date/Time':end_date
                            ,'Arabic Offer Name':name,'French Offer Name':name_f,'English Offer Name':name_e})
        
dfx = pd.DataFrame(data)

In [53]:
dfx['English Offer Name'] = 'Special Discounts'
dfx['Swahili Offer Name'] = ''
dfx['Rwandan Offer Name'] = ''

In [54]:
df_added = dfx.iloc[0, :].to_frame().T
df_added['retailer_id'] = "[111780,114210]"
dfx = pd.concat([dfx,df_added])

### 6.2 Generate Excel Files
Split the output into multiple Excel files for batch upload.


In [55]:
import shutil
import os
from pathlib import Path

def move_all_files(source_dir, dest_dir):
    """Copy files to destination and delete from source"""
    # Create destination directory if it doesn't exist
    os.makedirs(dest_dir, exist_ok=True)
    
    source = Path(source_dir)
    destination = Path(dest_dir)
    
    moved_count = 0
    error_count = 0
    
    for file in source.iterdir():
        if file.is_file():
            try:
                # Copy file to destination
                dest_file = destination / file.name
                #shutil.copy2(file, dest_file)  # copy2 preserves metadata
                
                # Delete from source
                file.unlink()
                
                print(f"‚úì Moved: {file.name}")
                moved_count += 1
            except Exception as e:
                print(f"‚úó Error moving {file.name}: {e}")
                error_count += 1
    
    print(f"\nSummary: {moved_count} files moved, {error_count} errors")

# Usage
move_all_files('HH_Sheets', 'HH_temp_files')

‚úì Moved: o_happy_hour_2026-01-19_NO._23.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._47.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._7.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._28.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._46.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._26.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._0.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._14.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._12.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._6.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._30.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._22.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._19.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._41.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._39.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._4.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._5.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._38.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._44.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._27.xlsx
‚úì Moved: o_happy_hour_2026-01-19_NO._29.xlsx
‚úì Moved: o_happy

In [56]:
dfx.rename(columns={'retailer_id': 'Retailers List'}, inplace=True)
dfx = dfx[['Retailers List','Start Date/Time','End Date/Time','Discounts','Arabic Offer Name','French Offer Name','English Offer Name','Swahili Offer Name','Rwandan Offer Name']]
# 500 row per sheet 
final=dfx.reset_index().drop(columns='index')
mino=final.index.min()
maxo=final.index.max()
ran = [i for i in range(mino,maxo,1000)]
for i in tqdm(range(len(ran))):
    if i+1 == len(ran):
        val1 = ran[i]
        val2 = maxo
    else:
        val1 = ran[i]
        val2 = ran[i+1] - 1
    x=final.loc[val1:val2,:]
    x.to_excel(f'HH_Sheets/o_happy_hour_{str((datetime.now()).date())}_NO._{i}.xlsx'.format(i),index=False)

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 51/51 [00:13<00:00,  3.68it/s]


---

## 7. API Upload

This section handles the automated upload of discount files to the MaxAB pricing system.

### 7.1 API Authentication & Helper Functions
Define functions for authenticating with the MaxAB API and uploading discount files.


In [57]:
def get_secret(secret_name):
    region_name = "us-east-1"

    # Create a Secrets Manager client
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager',
        region_name=region_name
    )

    # In this sample we only handle the specific exceptions for the 'GetSecretValue' API.
    # See https://docs.aws.amazon.com/secretsmanager/latest/apireference/API_GetSecretValue.html
    # We rethrow the exception by default.

    try:
        get_secret_value_response = client.get_secret_value(SecretId=secret_name)
    except ClientError as e:
        if e.response['Error']['Code'] == 'DecryptionFailureException':
            # Secrets Manager can't decrypt the protected secret text using the provided KMS key.
            # Deal with the exception here, and/or rethrow at your discretion.
            raise e
        elif e.response['Error']['Code'] == 'InternalServiceErrorException':
            # An error occurred on the server side.
            # Deal with the exception here, and/or rethrow at your discretion.
            raise e
        elif e.response['Error']['Code'] == 'InvalidParameterException':
            # You provided an invalid value for a parameter.
            # Deal with the exception here, and/or rethrow at your discretion.
            raise e
        elif e.response['Error']['Code'] == 'InvalidRequestException':
            # You provided a parameter value that is not valid for the current state of the resource.
            # Deal with the exception here, and/or rethrow at your discretion.
            raise e
        elif e.response['Error']['Code'] == 'ResourceNotFoundException':
            # We can't find the resource that you asked for.
            # Deal with the exception here, and/or rethrow at your discretion.
            raise e
    else:
        # Decrypts secret using the associated KMS CMK.
        # Depending on whether the secret is a string or binary, one of these fields will be populated.
        if 'SecretString' in get_secret_value_response:
            return get_secret_value_response['SecretString']
        else:
            return base64.b64decode(get_secret_value_response['SecretBinary'])

In [58]:
pricing_api_secret = json.loads(get_secret("prod/pricing/api/"))
username = pricing_api_secret["egypt_username"]
password = pricing_api_secret["egypt_password"]
secret = pricing_api_secret["egypt_secret"]

In [59]:
def get_access_token(url, client_id, client_secret):
    """
    get_access_token function takes three parameters and returns a session token
    to connect to MaxAB APIs

    :param url: production MaxAB token URL
    :param client_id: client ID
    :param client_secret: client sercret
    :return: session token
    """
    response = requests.post(
        url,
        data={"grant_type": "password",
              "username": username,
              "password": password},
        auth=(client_id, client_secret),
    )
    return response.json()["access_token"]

In [60]:
def preassigned_url():
    token = get_access_token('https://sso.maxab.info/auth/realms/maxab/protocol/openid-connect/token',
                             'main-system-externals',
                             secret)
    url = "https://api.maxab.info/commerce/api/admins/v1/bulk-upload/presigned-url?type=SKU_DISCOUNTS"
    payload={}
    headers = {
      'Authorization': 'bearer {}'.format(token)}

    response = requests.request("GET", url, headers=headers, data=payload)
    return response

In [61]:
def upload_sku_discount(file_name,new_url):
    url = new_url
    headers = {'Content-Type':'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'}
    with open(file_name, 'rb') as f:
        response = requests.put(new_url, data=f, headers=headers)
    return response

In [62]:
def validate_skus_discount(key):
    token = get_access_token('https://sso.maxab.info/auth/realms/maxab/protocol/openid-connect/token',
                             'main-system-externals',
                             secret)
    url = 'https://api.maxab.info/commerce/api/admins/v1/bulk-upload/sheets/validate'
    headers = {
        'Authorization': 'bearer {}'.format(token),
        'content-type':'application/json'

       }
    payload={"fileName":key,"sheetType":"SKU_DISCOUNTS"}
    response = requests.request("POST", url, headers=headers, json=payload)
    return response

In [63]:
def proceed(key):
    token = get_access_token('https://sso.maxab.info/auth/realms/maxab/protocol/openid-connect/token',
                             'main-system-externals',
                             secret)
    url = f'https://api.maxab.info/commerce/api/admins/v1/bulk-upload/sheets/proceed/{key}?uploadType=SKU_DISCOUNTS'
    headers = {
        'Authorization': 'bearer {}'.format(token),
        'content-type':'application/json'
       }
    response = requests.request("POST", url, headers=headers)
    return response

In [64]:
def listing():
    token = get_access_token('https://sso.maxab.info/auth/realms/maxab/protocol/openid-connect/token',
                             'main-system-externals',
                             secret)
    url = 'https://api.maxab.info/commerce/api/admins/v1/bulk-upload/sheets?filter=sheetType=in=(SKU_DISCOUNTS,EDIT_SKU_DISCOUNTS);status!=DELETED&limit=1'
    headers = {
        'Authorization': 'bearer {}'.format(token),
        'content-type':'application/json'
       }
    response = requests.request("GET", url, headers=headers)
    return response

In [65]:
# def sku_discount_upload_func(file_name):
#     pre_data = preassigned_url().json()
#     key = pre_data['key']
#     new_url = pre_data['preSignedUrl']
#     upload_sku_discount(file_name,new_url)
#     validation_data = validate_skus_discount(key)
#     #print('validate: ',validation_data)
#     proceed_data = proceed(key)
#     # print('proceed:',proceed_data)
#     try:
#         if proceed_data.ok and validation_data.ok:
#             print('Passed')
#         else:
#             print('Failed')
#     except:
#         print("error")
# files = [f for f in os.listdir('HH_Sheets') if os.path.isfile(os.path.join('HH_Sheets', f))]
# for file in files:
#     print(file)
#     sku_discount_upload_func('HH_Sheets/'+file)    

In [66]:
# def update_delivery_fees(token, delivery_fees_data):

#     url = 'https://api.maxab.info/commerce/api/admins/v1/delivery-fees'
    
#     headers = {
#         'Authorization': f'Bearer {token}',
#         'Content-Type': 'application/json'
#     }
    
#     response = requests.post(url, headers=headers, json=delivery_fees_data)
    
#     return response


# # Usage
# token = get_access_token('https://sso.maxab.info/auth/realms/maxab/protocol/openid-connect/token',
#                              'main-system-externals',
#                              secret)

# data = [
#      {
#     "dynamic_tag_id":3154 ,
#     "delivery_fees": 349,
#     "ticket_size": 1000000
        
#     }
# ]

# response = update_delivery_fees(token, data)
# print(f"Status Code: {response.status_code}")
# print(f"Response: {response.text}")

### 7.2 Execute Batch Upload
Upload all generated discount files to the pricing system.


In [67]:
# def delete_delivery_fees(token,data):

#     url = 'https://api.maxab.info/commerce/api/admins/v1/delivery-fees'
    
#     headers = {
#         'Authorization': f'Bearer {token}',
#         'Content-Type': 'application/json'
#     }
    
#     response = requests.delete(url, headers=headers, json=data)
    
#     return response
# data = {
#   "deliveryFeesIds": [268,269,273,272,270,271]
# }
# token = get_access_token('https://sso.maxab.info/auth/realms/maxab/protocol/openid-connect/token',
#                              'main-system-externals',
#                              secret)
# response = delete_delivery_fees(token,data)
# print(f"Status Code: {response.status_code}")
# print(f"Response: {response.text}")


In [68]:
import os
from datetime import datetime

def sku_discount_upload_func(file_name):
    """
    Upload SKU discount file and process through validation pipeline
    
    Args:
        file_name: Path to the Excel file to upload
    
    Returns:
        dict: Summary of upload and validation status
    """
    print(f"\n{'='*70}")
    print(f"üìÅ Processing file: {os.path.basename(file_name)}")
    print(f"{'='*70}")
    
    results = {
        'file': file_name,
        'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
        'steps': {}
    }
    
    try:
        # Step 1: Get pre-signed URL
        print("\n[1/4] üîó Getting pre-signed upload URL...")
        pre_data = preassigned_url().json()
        key = pre_data['key']
        new_url = pre_data['preSignedUrl']
        print(f"      ‚úì Key: {key}")
        results['steps']['presigned_url'] = 'Success'
        
    except Exception as e:
        print(f"      ‚úó Failed to get pre-signed URL: {e}")
        results['steps']['presigned_url'] = f'Failed: {e}'
        return results
    
    try:
        # Step 2: Upload file
        print("\n[2/4] üì§ Uploading file to S3...")
        upload_response = upload_sku_discount(file_name, new_url)
        
        if upload_response.status_code in [200, 201, 204]:
            print(f"      ‚úì Upload successful (Status: {upload_response.status_code})")
            results['steps']['upload'] = 'Success'
        else:
            print(f"      ‚úó Upload failed (Status: {upload_response.status_code})")
            print(f"      Error: {upload_response.text}")
            results['steps']['upload'] = f'Failed: {upload_response.status_code}'
            return results
            
    except Exception as e:
        print(f"      ‚úó Upload error: {e}")
        results['steps']['upload'] = f'Failed: {e}'
        return results
    
    try:
        # Step 3: Validate file
        print("\n[3/4] ‚úÖ Validating file data...")
        validation_data = validate_skus_discount(key)
        
        if validation_data.ok:
            print(f"      ‚úì Validation passed (Status: {validation_data.status_code})")
            results['steps']['validation'] = 'Success'
            
            # Try to parse validation response
            try:
                validation_response = validation_data.json()
                if validation_response:
                    print(f"      Response: {validation_response}")
            except:
                pass
        else:
            print(f"      ‚úó Validation failed (Status: {validation_data.status_code})")
            print(f"      Error: {validation_data.text[:200]}")
            results['steps']['validation'] = f'Failed: {validation_data.status_code}'
            return results
            
    except Exception as e:
        print(f"      ‚úó Validation error: {e}")
        results['steps']['validation'] = f'Failed: {e}'
        return results
    
    try:
        # Step 4: Proceed with processing
        print("\n[4/4] ‚öôÔ∏è  Processing discounts...")
        proceed_data = proceed(key)
        
        if proceed_data.ok:
            print(f"      ‚úì Processing completed (Status: {proceed_data.status_code})")
            results['steps']['proceed'] = 'Success'
            results['status'] = 'SUCCESS'
            
            # Try to parse proceed response
            try:
                proceed_response = proceed_data.json()
                if proceed_response:
                    print(f"      Response: {proceed_response}")
            except:
                pass
        else:
            print(f"      ‚úó Processing failed (Status: {proceed_data.status_code})")
            print(f"      Error: {proceed_data.text[:200]}")
            results['steps']['proceed'] = f'Failed: {proceed_data.status_code}'
            results['status'] = 'FAILED'
            return results
            
    except Exception as e:
        print(f"      ‚úó Processing error: {e}")
        results['steps']['proceed'] = f'Failed: {e}'
        results['status'] = 'FAILED'
        return results
    
    # Final status
    print(f"\n{'='*70}")
    if results.get('status') == 'SUCCESS':
        print(f"üéâ SUCCESS: {os.path.basename(file_name)} processed successfully!")
    else:
        print(f"‚ùå FAILED: {os.path.basename(file_name)} processing failed")
    print(f"{'='*70}")
    
    return results


# Main execution with summary
def process_all_discount_files(directory='HH_Sheets'):
    """
    Process all discount files in the specified directory
    
    Args:
        directory: Directory containing Excel files to process
    """
    print(f"\n{'#'*70}")
    print(f"SKU DISCOUNT BATCH UPLOAD")
    print(f"{'#'*70}")
    print(f"üìÇ Directory: {directory}")
    print(f"‚è∞ Started: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    
    # Get all files
    if not os.path.exists(directory):
        print(f"\n‚ùå ERROR: Directory '{directory}' does not exist")
        return
    
    files = [f for f in os.listdir(directory) if os.path.isfile(os.path.join(directory, f))]
    excel_files = [f for f in files if f.endswith(('.xlsx', '.xls'))]
    
    if not excel_files:
        print(f"\n‚ö†Ô∏è  WARNING: No Excel files found in '{directory}'")
        return
    
    print(f"üìä Found {len(excel_files)} file(s) to process\n")
    
    # Process each file
    all_results = []
    success_count = 0
    failed_count = 0
    
    for idx, file in enumerate(excel_files, 1):
        file_path = os.path.join(directory, file)
        print(f"\n{'‚îÄ'*70}")
        print(f"Processing {idx}/{len(excel_files)}: {file}")
        print(f"{'‚îÄ'*70}")
        
        result = sku_discount_upload_func(file_path)
        all_results.append(result)
        
        if result.get('status') == 'SUCCESS':
            success_count += 1
        else:
            failed_count += 1
    
    # Print summary
    print(f"\n\n{'#'*70}")
    print(f"BATCH PROCESSING SUMMARY")
    print(f"{'#'*70}")
    print(f"üìä Total files: {len(excel_files)}")
    print(f"‚úÖ Successful: {success_count}")
    print(f"‚ùå Failed: {failed_count}")
    print(f"‚è∞ Completed: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print(f"{'#'*70}\n")
    
    # Detailed results table
    if failed_count > 0:
        print("\nüìã FAILED FILES DETAILS:")
        print(f"{'‚îÄ'*70}")
        for result in all_results:
            if result.get('status') != 'SUCCESS':
                print(f"\n‚ùå File: {os.path.basename(result['file'])}")
                for step, status in result['steps'].items():
                    if 'Failed' in str(status):
                        print(f"   ‚îî‚îÄ {step}: {status}")
    
    print("\n‚úÖ All files processed!")
    
    return all_results


# Usage
results = process_all_discount_files('HH_Sheets')


######################################################################
SKU DISCOUNT BATCH UPLOAD
######################################################################
üìÇ Directory: HH_Sheets
‚è∞ Started: 2026-01-21 12:22:54
üìä Found 51 file(s) to process


‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
Processing 1/51: o_happy_hour_2026-01-21_NO._42.xlsx
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

üìÅ Processing file: o_happy_hour_2026-01-21_NO._42.xlsx

[1/4] üîó Getting pre-signed upload URL...
      ‚úì Key: 2026-01-21-14-22-54-user-2642.xlsx

[2/4] üì§ Uploading file to S3...
      ‚úì Upload successful (Status: 200)

[3/4] ‚úÖ Validating file dat