### Price Optimization Machine Learning Model

##### Objective
We want to leverage a machine learning model to help us set optimal prices, the aim would be to increase revenue and/or margin while keeping in mind market conditions and customer trust.

##### Why Now?
I believe we are now well-positioned to design a price optimization model. We have complete access to all our current and historical Brightpearl data, including the pricing information we need.

##### Expected Benefits
- Revenue Uplift: with optimized pricing we can expect improved revenue performance per product
- Margin Protection: An optimized model could help us avoid underpricing
- Insights: Clear understanding of demand elasticity by product & segment

##### Scope
TBC

##### Data Needed

- Historical prices & sales (SKU × date/time × channel)
- Product costs
- Inventory & stockouts
- Promotions & discounts
- Competitor prices ????? Is this achieveable for us
- External demand drivers (seasonality, events)

##### Resources

- Tools: Data warehouse (Perceptium), Python ML stack, Tableau BI dashboard.

### Breakdown of Data from Tables

##### Historical prices & sales (SKU × date/time × channel) / Product costs
Order Table:
- ord_id - Order ID
- ord_invoicetaxDate - Tax Date
- ord_channelId - Channel ID
- ord_orderTypeCode - Type code (used to filter, example: PC or SC is a refund????? Please confirm)

Orderline Table:
- orl_ord_id - Order ID (Number for overall order)
- orl_id - OrderLine ID (Number for orderline, used to show individual lines inside of an order)
- orl_productSku - product SKU
- orl_productId - Product ID
- orl_nominalCode - For filtering (Not needed as a column)
- orl_itemCostValue - Cost (cost price for single unit of product)
- orl_quantity - Quantity (number of items purchased)
- orl_productPriceValue - Price (price of the product at the time the order is placed)
- DO NOT USE - orl_discountPercentage - discount percent on row (not dependable) 

In [11]:
# Imports Libraries - Remove unneeded 

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import warnings
import pyodbc
warnings.filterwarnings('ignore')


from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose  

from sklearn.model_selection import train_test_split

##### Loading Dataset

In [12]:
#Load datasets - Original Dataset use sql below for full model
#orders = pd.read_csv('Order.csv')

# --- Step 1: Read the credentials from the text file ---
credentials = {}
try:
    with open('credentials.txt', 'r') as file:
        for line in file:
            # Remove leading/trailing whitespace and split the line at the first '='
            key, value = line.strip().split('=', 1)
            credentials[key.strip()] = value.strip()
except FileNotFoundError:
    print("Error: The 'credentials.txt' file was not found.")
    exit() # or handle the error in another way

# Assign credentials to variables
server_name = credentials.get('server')
database_name = credentials.get('database')
username = credentials.get('username')
password = credentials.get('password')
driver = '{ODBC Driver 17 for SQL Server}'

# Check for missing credentials
if not all([server_name, database_name, username, password]):
    raise ValueError("One or more credentials are missing from the file.")

# --- Step 2: Establish the connection ---
try:
    conn_string = (
        f'DRIVER={driver};'
        f'SERVER={server_name};'
        f'DATABASE={database_name};'
        f'UID={username};'
        f'PWD={password};'
    )
    conn = pyodbc.connect(conn_string)
    print("Connection to Azure SQL Database successful!")

except pyodbc.Error as ex:
    print(f"Error connecting to the database: {ex.args[0]}")
    conn = None

# --- Step 3: Fetch merged data and load into a single DataFrame ---
if conn:
    try:
        # SQL query to join the two tables - Use top(10000) for initial start until model is ready for larger dataset (this effects speed)
        merged_query = """
        SELECT DISTINCT --TOP(10000)
    o.ord_id AS [Order ID],
    o.ord_invoicetaxDate AS [Tax Date],
    o.ord_net AS [Net],
    o.ord_total AS [Total],
    o.ord_channelId AS [Channel Id],
    ord_orderTypeCode AS [Type Code],
    ol.orl_id AS [Orderline ID],
    ol.orl_productId AS [Product Id],
    ol.orl_productSku AS [Product SKU],
    ol.orl_productName AS [Product Name],
    ol.orl_quantity AS [Quantity],    
    CASE 
        WHEN ol.orl_compositionBundleParent = 1 THEN op.bpar_orl_calcRowNetValue
        WHEN ol.orl_compositionBundleChild = 1 THEN oc.bchd_orl_calcRowNetValue
        ELSE ol.orl_rowNetValue
    END AS [Product Value],
    CASE 
        WHEN ol.orl_compositionBundleParent = 1 THEN op.bpar_orl_calcRowTaxValue
        WHEN ol.orl_compositionBundleChild = 1 THEN oc.bchd_orl_calcRowTaxValue
        ELSE ol.orl_rowTaxValue
    END AS [Product Tax Value],
    ol.orl_productPriceValue AS [Price of Product],
    CASE 
        WHEN ol.orl_compositionBundleParent = 1 THEN op.bpar_orl_itemCostValue
        WHEN ol.orl_compositionBundleChild = 1 THEN oc.bchd_orl_itemCostValue
        ELSE ol.orl_itemCostValue
    END AS [Cost of Product],
    ol.orl_nominalCode AS [Nominal Code]
FROM dbo.tblOrder AS o
LEFT JOIN dbo.tblOrderLine AS ol ON o.ord_id = ol.orl_ord_id
LEFT JOIN Perceptium.tblOrderLineParentView AS op ON ol.orl_id = op.bpar_orl_id
LEFT JOIN Perceptium.tblOrderLineChildView AS oc ON ol.orl_id = oc.bchd_orl_id
WHERE o.ord_invoicetaxDate >= '2020-01-01' 
        """
        
        # Load the joined data directly into a single DataFrame
        orders = pd.read_sql(merged_query, conn)
        print(f"Successfully loaded {len(orders)} rows from the merged query.")
        #print("\nMerged DataFrame Head:")
        #print(orders.head())

    except Exception as e:
        print(f"An error occurred while fetching data: {e}")

    finally:
        conn.close()
        print("Database connection closed.")
else:
    print("Cannot proceed with data fetching. Database connection failed.")

Connection to Azure SQL Database successful!
Successfully loaded 5456672 rows from the merged query.
Database connection closed.


In [13]:
#Info
orders.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5456672 entries, 0 to 5456671
Data columns (total 16 columns):
 #   Column             Dtype         
---  ------             -----         
 0   Order ID           int64         
 1   Tax Date           datetime64[ns]
 2   Net                float64       
 3   Total              float64       
 4   Channel Id         int64         
 5   Type Code          object        
 6   Orderline ID       float64       
 7   Product Id         float64       
 8   Product SKU        object        
 9   Product Name       object        
 10  Quantity           float64       
 11  Product Value      float64       
 12  Product Tax Value  float64       
 13  Price of Product   float64       
 14  Cost of Product    float64       
 15  Nominal Code       float64       
dtypes: datetime64[ns](1), float64(10), int64(2), object(3)
memory usage: 666.1+ MB


In [14]:
#Null values per columns
orders.isnull().sum()

Order ID                   0
Tax Date                   0
Net                        1
Total                      1
Channel Id                 0
Type Code                  0
Orderline ID            3501
Product Id              3501
Product SKU          1006519
Product Name            3501
Quantity                3501
Product Value           4190
Product Tax Value       4190
Price of Product        3501
Cost of Product         4190
Nominal Code            3501
dtype: int64

### Promotion Data

We have a couple of CSV's with some promotional data that would be benefical to apply to this notebook

In [15]:
# Summer Sale 2025 - 25th of June to 27th of August
summerSale25 = pd.read_csv("C:/Users/Devin Ferko/Desktop/Codes/Machine Learning Projects/Price Optimization/Summer Sale 2025 - All.csv")
summerSale25.head()

Unnamed: 0,Sale Type,SKU,Name,Summer - Sale Price DR,Summer - Sale Price TW,Summer - Sale Price OR
0,,CO713DC,Crosswater Cucina Cook Industrial Style Single...,,189.99,
1,,CO713DF,Crosswater Cucina Cook Industrial Style Single...,,234.99,
2,,CO713DM,Crosswater Cucina Cook Industrial Style Single...,,234.99,
3,,CO721DC,Crosswater Cucina Cook Pull Out Single Lever K...,,184.99,
4,,CO721DM,Crosswater Cucina Cook Pull Out Single Lever K...,,209.99,


In [16]:
# Spring Sale 2025 - March 5th to April 7th
springSale25 = pd.read_csv("C:/Users/Devin Ferko/Desktop/Codes/Machine Learning Projects/Price Optimization/Spring Sale 2025 - All.csv")
springSale25.head()

Unnamed: 0,Sale type,SKU,Name,Spring - Sale Price DR,Spring - Sale Price TW,Spring - Sale Price OR
0,Overstock,NOT-109FS/A-220-C/P,Vado Notion Wall Mounted Single Lever Basin Mi...,314.99,314.99,
1,Overstock,R1SV-CHR,-,399.99,399.99,
2,,dr-1700p-reinforced-bath-pack-1,(DC) Drench P Shaped Reinforced Shower Bath & ...,479.99,479.99,
3,,dr-1700p-rh-reinforced-bath-pk-1,-,479.99,479.99,
4,Overstock,TPM1CM/,Rangemaster Parma Kitchen Mixer Tap - Chrome,,189.99,


### Merge with Promotional Data or Just indicate if sale was present?

In [17]:
# Is this needed? Can we assume that the price of the product will be in the orderline dataset and we need to only therefore
# regonise if the product was in the sale or not?

'''
df_merge1 = orders.merge(summerSale25, left_on='Product SKU', right_on='SKU', how='left')
df_merge2 = df_merge1.merge(springSale25, left_on='Product SKU', right_on='SKU', how='left')
'''

"\ndf_merge1 = orders.merge(summerSale25, left_on='Product SKU', right_on='SKU', how='left')\ndf_merge2 = df_merge1.merge(springSale25, left_on='Product SKU', right_on='SKU', how='left')\n"

In [18]:
#print(df_merge2.columns)

In [19]:
# Defines Conditions Summer Sale

'''
start_date = pd.to_datetime('2025-06-25')
end_date = pd.to_datetime('2025-08-27')

sumConditionDR = (df_merge2['Tax Date'] >= start_date) & (df_merge2['Tax Date'] <= end_date) & (df_merge2['Channel Id'] == 7)
sumConditionTW = (df_merge2['Tax Date'] >= start_date) & (df_merge2['Tax Date'] <= end_date) & (df_merge2['Channel Id'] == 2)
sumConditionOR = (df_merge2['Tax Date'] >= start_date) & (df_merge2['Tax Date'] <= end_date) & (df_merge2['Channel Id'] == 8)

df_merge2['Price of Product'] = np.where(

    sumConditionDR,
    df_merge2['Summer - Sale Price DR'],
    np.where(
        sumConditionTW,
        df_merge2['Summer - Sale Price TW'],
        np.where(
            sumConditionOR,
            df_merge2['Summer - Sale Price TW'],
            df_merge2['Price of Product']
        )
    )
)
'''

"\nstart_date = pd.to_datetime('2025-06-25')\nend_date = pd.to_datetime('2025-08-27')\n\nsumConditionDR = (df_merge2['Tax Date'] >= start_date) & (df_merge2['Tax Date'] <= end_date) & (df_merge2['Channel Id'] == 7)\nsumConditionTW = (df_merge2['Tax Date'] >= start_date) & (df_merge2['Tax Date'] <= end_date) & (df_merge2['Channel Id'] == 2)\nsumConditionOR = (df_merge2['Tax Date'] >= start_date) & (df_merge2['Tax Date'] <= end_date) & (df_merge2['Channel Id'] == 8)\n\ndf_merge2['Price of Product'] = np.where(\n\n    sumConditionDR,\n    df_merge2['Summer - Sale Price DR'],\n    np.where(\n        sumConditionTW,\n        df_merge2['Summer - Sale Price TW'],\n        np.where(\n            sumConditionOR,\n            df_merge2['Summer - Sale Price TW'],\n            df_merge2['Price of Product']\n        )\n    )\n)\n"

In [20]:
#Drops columns, final dataset

'''
df_main_updated = df_merge2.drop(columns=['SKU', 'Sale Price DR', 'Sale Price TW', 'Sale Price OR'])
print(df_main_updated)
df_main_updated.head()
'''

"\ndf_main_updated = df_merge2.drop(columns=['SKU', 'Sale Price DR', 'Sale Price TW', 'Sale Price OR'])\nprint(df_main_updated)\ndf_main_updated.head()\n"

In [22]:
# Add Sale Boolean Column
# Define dates

sum_start_date = pd.to_datetime('2025-06-25')
sum_end_date = pd.to_datetime('2025-08-27')

spr_start_date = pd.to_datetime('2025-03-05')
spr_end_date = pd.to_datetime('2025-04-07')

# Add summer sale 2025 column

orders["Summer_Sale"] = (
    (orders["Tax Date"].between(sum_start_date, sum_end_date)) & 
    (orders["Product SKU"].isin(summerSale25["SKU"]))
).astype(int)

# Add spring sale 2025 column

orders["Spring_Sale"] = (
    (orders["Tax Date"].between(spr_start_date, spr_end_date)) & 
    (orders["Product SKU"].isin(springSale25["SKU"]))
).astype(int)

orders.head()

Unnamed: 0,Order ID,Tax Date,Net,Total,Channel Id,Type Code,Orderline ID,Product Id,Product SKU,Product Name,Quantity,Product Value,Product Tax Value,Price of Product,Cost of Product,Nominal Code,Summer_Sale,Spring_Sale
0,10821,2021-08-01,0.0,0.0,2,PO,18816.0,7326.0,1067A,Tre Mercati Imperial Pair Of Stands Only - Chrome,1.0,27.75,5.55,27.75,27.75,5000.0,0,0
1,17122,2021-08-01,0.0,0.0,2,PO,31787.0,11164.0,clrwtr-cls,Clearwater Utility White Ceramic Small Laundry...,1.0,102.15,20.43,102.15,102.15,5000.0,0,0
2,17892,2021-08-01,0.0,0.0,2,PO,33325.0,7003.0,AC-313-C,Sagittarius Lockable Medicine Cabinet,1.0,13.13,2.63,13.13,13.13,5000.0,0,0
3,21405,2021-08-01,0.0,0.0,1,PO,40781.0,11154.0,clrwtr-so1b,Clearwater Sonnet White Ceramic Single Bowl Si...,1.0,125.55,25.11,125.55,125.55,5000.0,0,0
4,17122,2021-08-01,0.0,0.0,2,PO,48814.0,1000.0,,"THEY SAID THEY NEVER INVOICED US FOR IT, SO NO...",1.0,0.0,0.0,0.0,0.0,5000.0,0,0
