# **KM Master Discrepancy Detection System**

![Python Code](https://img.shields.io/badge/Python-Code-blue?logo=python&logoColor=white)
![Sheets API](https://img.shields.io/badge/Google_Sheets-API-34A853?logo=googlesheets&logoColor=white)

A systematic framework to identify and prioritize discrepancies in KM Master (transportation distance) data. 

This system provides **KM Master recommendations** for each store, enabling operations teams to efficiently **prioritize** which KM Master data should be validated.

> **Disclaimer:** Due to confidentiality, actual data is not included in this repository. Some values are replaced with the '*' symbol.

# Setup and Libraries

In [1]:
# setup source path
import sys
import os

src_path = os.path.join(os.getcwd(), "..", "src")

if src_path not in sys.path:
    sys.path.append(src_path)
    print(f"Added {src_path[34:]} to sys.path")
else:
    print(f"{src_path[34:]} already in sys.path")

# initialize configuration and environment variables
import config
from dotenv import load_dotenv
load_dotenv(config.ENV_PATH)

# set hide_values parameter
# config.HIDE_VALUES = False  # uncomment to show numeric values

# import source modules
from google_sheets_io import sheets_loader, sheets_updater
from data_preprocessing import convert_to_op_code, correct_scientific_notation
from utils import setup_logging, mask_numeric_value, DataTracker, result_summary

# import common libraries
import pandas as pd
import numpy as np
import logging
import re

# setup logging
setup_logging()
logger = logging.getLogger(__name__)

Added km-master-discrepancy-detection-system\notebooks\..\src to sys.path


# Data

The data is derived from two sources: **Operational Data** and **Master Zona Data**. Both datasets must be cleaned and combined before analysis can be performed.

## A. Operational Data

In [2]:
# read path for operational data
op_data_fns = ["1. Operational Data - Januari 2025.csv",
               "2. Operational Data - Februari 2025.csv",
               "3. Operational Data - Maret 2025.csv",
               "4. Operational Data - April 2025.csv",
               "5. Operational Data - Mei 2025.csv",
               "6. Operational Data - Juni 2025.csv",
               "7. Operational Data - Juli 2025.csv",
               "8. Operational Data - Agustus 2025.csv"]

op_data_paths = [os.path.join(config.DATA_PATH, filename) for filename in op_data_fns]

# read operational data
df_op_data = pd.concat([pd.read_csv(path, sep=';', low_memory=False, dtype=str) for path in op_data_paths], ignore_index=True)
logger.info(f"Loaded {mask_numeric_value(f'{len(df_op_data):,}')} rows of operational data")

# show column names
print("\nOperational Data Column Names:")
display(df_op_data.head(0))

2025-12-10 06:53:39 - __main__ - INFO - Loaded *,***,*** rows of operational data



Operational Data Column Names:


Unnamed: 0,Tgl SLA,OP,Nomor SPJ,Toko,KM Tempuh,KM Max,KM Master


In [3]:
# track preprocessing operational data
tracker_op_data = DataTracker("Operational Data")
tracker_op_data.track(df_op_data, step_name="Initial Rows")

2025-12-10 06:53:39 - utils - INFO - Initialized DataTracker for: [Operational Data]
2025-12-10 06:53:39 - utils - INFO - [Operational Data] Step: Initial Rows | Counts: *,***,*** | Change: +0 (+0.00%) | Retention: 100.00% | Step Time: 0.00s | Cumulative Time: 0.00s


In [4]:
# drop rows with all NaN value
df_op_data = df_op_data.dropna(how='all')
tracker_op_data.track(df_op_data, step_name="Drop Rows with All NaN Values")

# drop rows with duplicate value
df_op_data = df_op_data.drop_duplicates()
tracker_op_data.track(df_op_data, step_name="Drop Rows with Duplicate Values")

2025-12-10 06:53:41 - utils - INFO - [Operational Data] Step: Drop Rows with All NaN Values | Counts: *,***,*** | Change: -***,*** (-8.29%) | Retention: 91.71% | Step Time: 2.48s | Cumulative Time: 2.48s
2025-12-10 06:53:47 - utils - INFO - [Operational Data] Step: Drop Rows with Duplicate Values | Counts: *,***,*** | Change: +0 (+0.00%) | Retention: 91.71% | Step Time: 5.41s | Cumulative Time: 7.88s


In [5]:
# convert Operating Point (OP) name to OP code
df_op_data = convert_to_op_code(df_op_data, sheets_url=os.getenv('SHEETS_URL'))
tracker_op_data.track(df_op_data, step_name="Convert OP Name to OP Code")

2025-12-10 06:53:47 - data_preprocessing - INFO - Converting OP names to OP codes using method: complete
2025-12-10 06:53:47 - google_sheets_io - INFO - Loading Google Sheets from URL: https://docs.google.com/spread [Redacted]...
2025-12-10 06:53:47 - google_sheets_io - INFO - Credentials loaded successfully
2025-12-10 06:53:47 - google_sheets_io - INFO - Google Sheets API authorized
2025-12-10 06:53:49 - google_sheets_io - INFO - Sheet opened successfully
2025-12-10 06:53:49 - data_preprocessing - INFO - Loaded *** OP codes from sheets Master Kode OP
2025-12-10 06:53:59 - data_preprocessing - INFO - All OP names successfully converted to OP codes
2025-12-10 06:53:59 - utils - INFO - [Operational Data] Step: Convert OP Name to OP Code | Counts: *,***,*** | Change: +0 (+0.00%) | Retention: 91.71% | Step Time: 12.26s | Cumulative Time: 20.14s


In [6]:
# exclude ignored OPs
exclude_ops = os.getenv('EXCLUDE_OPS').split(',')
df_op_data = df_op_data[~df_op_data['OP'].isin(exclude_ops)]
tracker_op_data.track(df_op_data, step_name="Exclude Rows with Ignored OP")

# exclude ignored stores
exclude_stores = re.compile(rf"{os.getenv('EXCLUDE_STORES')}")
df_op_data = df_op_data[~df_op_data['Toko'].str.contains(exclude_stores, na=False)]
tracker_op_data.track(df_op_data, step_name="Exclude Rows with Ignored Store")

2025-12-10 06:54:00 - utils - INFO - [Operational Data] Step: Exclude Rows with Ignored OP | Counts: *,***,*** | Change: -**,*** (-0.30%) | Retention: 91.44% | Step Time: 0.68s | Cumulative Time: 20.83s
2025-12-10 06:54:02 - utils - INFO - [Operational Data] Step: Exclude Rows with Ignored Store | Counts: *,***,*** | Change: -*,*** (-0.03%) | Retention: 91.41% | Step Time: 2.69s | Cumulative Time: 23.52s


In [7]:
# correct scientific notation
df_op_data = correct_scientific_notation(df_op_data, sheets_url=os.getenv('SHEETS_URL'))
tracker_op_data.track(df_op_data, step_name="Correct Scientific Notation")

2025-12-10 06:54:03 - data_preprocessing - INFO - Starting scientific notation correction for store codes
2025-12-10 06:54:03 - google_sheets_io - INFO - Loading Google Sheets from URL: https://docs.google.com/spread [Redacted]...
2025-12-10 06:54:03 - google_sheets_io - INFO - Credentials loaded successfully
2025-12-10 06:54:03 - google_sheets_io - INFO - Google Sheets API authorized
2025-12-10 06:54:04 - google_sheets_io - INFO - Sheet opened successfully
2025-12-10 06:54:05 - data_preprocessing - INFO - Loaded *** store mappings from sheets Master Saintifik Toko
2025-12-10 06:54:10 - data_preprocessing - INFO - Found *,*** stores with comma delimiter, * with period delimiter
2025-12-10 06:55:00 - data_preprocessing - INFO - All store codes successfully corrected.
2025-12-10 06:55:00 - utils - INFO - [Operational Data] Step: Correct Scientific Notation | Counts: *,***,*** | Change: +0 (+0.00%) | Retention: 91.41% | Step Time: 57.30s | Cumulative Time: 80.82s


In [8]:
# change data type
for col in ['KM Tempuh', 'KM Master', 'KM Max']:
  # remove '.' and ',' symbols
  df_op_data[col] = df_op_data[col].str.replace(r'[.,]', '', regex=True)
  # change to numeric
  df_op_data[col] = pd.to_numeric(df_op_data[col], errors='coerce')
  # raise error if there is NaN data
  if df_op_data[col].isna().any():
    logger.error(f"Column {col} contains NaN values")
    raise ValueError(f"Column {col} contains NaN values")
tracker_op_data.track(df_op_data, step_name="Change Data Type")

2025-12-10 06:55:16 - utils - INFO - [Operational Data] Step: Change Data Type | Counts: *,***,*** | Change: +0 (+0.00%) | Retention: 91.41% | Step Time: 16.55s | Cumulative Time: 97.38s


In [9]:
# remove rows with KM Tempuh < 1
df_op_data = df_op_data[df_op_data['KM Tempuh'] >= 1]
tracker_op_data.track(df_op_data, step_name="Remove Rows with KM Tempuh < 1")

# add column dev_km
df_op_data['dev_km'] = (df_op_data['KM Tempuh'] - df_op_data['KM Master']) / df_op_data['KM Master'] * 100

# save tuple of unique OP and store for preprocessing master zona data
unique_pairs = df_op_data[['OP', 'Toko']].drop_duplicates()
unique_op_store_op_data = tuple(map(tuple, unique_pairs.values))

tracker_op_data.track(df_op_data, step_name="Add dev_km Column and Save Unique OP + Store")

2025-12-10 06:55:17 - utils - INFO - [Operational Data] Step: Remove Rows with KM Tempuh < 1 | Counts: *,***,*** | Change: -*** (-0.02%) | Retention: 91.39% | Step Time: 0.56s | Cumulative Time: 97.94s
2025-12-10 06:55:18 - utils - INFO - [Operational Data] Step: Add dev_km Column and Save Unique OP + Store | Counts: *,***,*** | Change: +0 (+0.00%) | Retention: 91.39% | Step Time: 1.40s | Cumulative Time: 99.34s


In [10]:
# show summary
tracker_op_data.summary()

2025-12-10 06:55:18 - utils - INFO - [Operational Data] DataTracker summary generated for 10 steps.


Unnamed: 0,Step,Counts,Change,Change (%),Retained (%),Duration (s),Cumulative Time (s)
0,Initial Rows,"*,***,***",+0,0.0,100.0,0.0,0.0
1,Drop Rows with All NaN Values,"*,***,***","-***,***",-8.29,91.71,2.48,2.48
2,Drop Rows with Duplicate Values,"*,***,***",+0,0.0,91.71,5.41,7.88
3,Convert OP Name to OP Code,"*,***,***",+0,0.0,91.71,12.26,20.14
4,Exclude Rows with Ignored OP,"*,***,***","-**,***",-0.3,91.44,0.68,20.83
5,Exclude Rows with Ignored Store,"*,***,***","-*,***",-0.03,91.41,2.69,23.52
6,Correct Scientific Notation,"*,***,***",+0,0.0,91.41,57.3,80.82
7,Change Data Type,"*,***,***",+0,0.0,91.41,16.55,97.38
8,Remove Rows with KM Tempuh < 1,"*,***,***",-***,-0.02,91.39,0.56,97.94
9,Add dev_km Column and Save Unique OP + Store,"*,***,***",+0,0.0,91.39,1.4,99.34


## B. Master Zona Data

In [11]:
# master zona data
mz_data_fn = "Master Zona Data.csv"
mz_data_path = os.path.join(config.DATA_PATH, mz_data_fn)

# read master zona data
df_mz_data = pd.read_csv(mz_data_path, sep=';', low_memory=False, dtype=str, encoding='latin-1')

logger.info(f"Loaded {mask_numeric_value(f'{len(df_mz_data):,}')} rows of master zona data")

# show column names
print("\nMaster Zona Data Column Names:")
display(df_mz_data.head(0))

2025-12-10 06:55:19 - __main__ - INFO - Loaded **,*** rows of master zona data



Master Zona Data Column Names:


Unnamed: 0,OP,Kode Customer,Customer,Toko,KM Master,Kode Zona,Kecamatan,Kota,Provinsi,Status Toko


In [12]:
# track preprocessing master zona data
tracker_mz_data = DataTracker("Master Zona Data")
tracker_mz_data.track(df_mz_data, step_name="Initial Rows")

2025-12-10 06:55:19 - utils - INFO - Initialized DataTracker for: [Master Zona Data]
2025-12-10 06:55:19 - utils - INFO - [Master Zona Data] Step: Initial Rows | Counts: **,*** | Change: +0 (+0.00%) | Retention: 100.00% | Step Time: 0.00s | Cumulative Time: 0.00s


In [13]:
# drop rows with all NaN value
df_mz_data = df_mz_data.dropna(how='all')
tracker_mz_data.track(df_mz_data, step_name="Drop Rows with All NaN Values")

# drop rows with NaN customer
df_mz_data = df_mz_data.dropna(subset=['Customer']) # if no cust, no transaction
tracker_mz_data.track(df_mz_data, step_name="Drop Rows with NaN Customer")

# drop rows with duplicate value
df_mz_data = df_mz_data.drop_duplicates()
tracker_mz_data.track(df_mz_data, step_name="Drop Rows with Duplicate Values")

2025-12-10 06:55:19 - utils - INFO - [Master Zona Data] Step: Drop Rows with All NaN Values | Counts: **,*** | Change: +0 (+0.00%) | Retention: 100.00% | Step Time: 0.12s | Cumulative Time: 0.12s
2025-12-10 06:55:19 - utils - INFO - [Master Zona Data] Step: Drop Rows with NaN Customer | Counts: **,*** | Change: -* (-0.01%) | Retention: 99.99% | Step Time: 0.04s | Cumulative Time: 0.16s
2025-12-10 06:55:19 - utils - INFO - [Master Zona Data] Step: Drop Rows with Duplicate Values | Counts: **,*** | Change: +0 (+0.00%) | Retention: 99.99% | Step Time: 0.10s | Cumulative Time: 0.26s


In [14]:
# convert OP name to OP code with partial method
df_mz_data = convert_to_op_code(df_mz_data, sheets_url=os.getenv("SHEETS_URL"), method='partial')
tracker_mz_data.track(df_mz_data, step_name="Convert OP Name to OP Code")

2025-12-10 06:55:19 - data_preprocessing - INFO - Converting OP names to OP codes using method: partial
2025-12-10 06:55:19 - google_sheets_io - INFO - Loading Google Sheets from URL: https://docs.google.com/spread [Redacted]...
2025-12-10 06:55:19 - google_sheets_io - INFO - Credentials loaded successfully
2025-12-10 06:55:19 - google_sheets_io - INFO - Google Sheets API authorized
2025-12-10 06:55:21 - google_sheets_io - INFO - Sheet opened successfully
2025-12-10 06:55:22 - data_preprocessing - INFO - Loaded *** OP codes from sheets Master Kode OP
2025-12-10 06:55:22 - data_preprocessing - INFO - Dropped *,*** rows with unmapped OP codes (10.31%%)
2025-12-10 06:55:22 - utils - INFO - [Master Zona Data] Step: Convert OP Name to OP Code | Counts: **,*** | Change: -*,*** (-10.31%) | Retention: 89.68% | Step Time: 3.31s | Cumulative Time: 3.57s


In [15]:
# correct scientific notation
df_mz_data = correct_scientific_notation(df_mz_data, sheets_url=os.getenv("SHEETS_URL"))
tracker_mz_data.track(df_mz_data, step_name="Correct Scientific Notation")

2025-12-10 06:55:22 - data_preprocessing - INFO - Starting scientific notation correction for store codes
2025-12-10 06:55:22 - google_sheets_io - INFO - Loading Google Sheets from URL: https://docs.google.com/spread [Redacted]...
2025-12-10 06:55:22 - google_sheets_io - INFO - Credentials loaded successfully
2025-12-10 06:55:22 - google_sheets_io - INFO - Google Sheets API authorized
2025-12-10 06:55:24 - google_sheets_io - INFO - Sheet opened successfully
2025-12-10 06:55:25 - data_preprocessing - INFO - Loaded *** store mappings from sheets Master Saintifik Toko
2025-12-10 06:55:25 - data_preprocessing - INFO - Found ** stores with comma delimiter, ** with period delimiter
2025-12-10 06:55:26 - data_preprocessing - INFO - All store codes successfully corrected.
2025-12-10 06:55:26 - utils - INFO - [Master Zona Data] Step: Correct Scientific Notation | Counts: **,*** | Change: +0 (+0.00%) | Retention: 89.68% | Step Time: 3.70s | Cumulative Time: 7.26s


In [16]:
# filter based on unique_op_store_op_data
df_mz_data = df_mz_data[df_mz_data[['OP', 'Toko']].apply(tuple, axis=1).isin(unique_op_store_op_data)]
tracker_mz_data.track(df_mz_data, step_name="Filter Rows Based on Unique OP + Store")

2025-12-10 06:55:27 - utils - INFO - [Master Zona Data] Step: Filter Rows Based on Unique OP + Store | Counts: **,*** | Change: -**,*** (-36.55%) | Retention: 56.90% | Step Time: 0.70s | Cumulative Time: 7.96s


In [17]:
# change data type
for col in ['KM Master']:
  # remove '.' and ',' symbols
  df_mz_data[col] = df_mz_data[col].str.replace(r'[.,]', '', regex=True)
  # change to numeric
  df_mz_data[col] = pd.to_numeric(df_mz_data[col], errors='coerce')
  # raise error if there is NaN data
  if df_mz_data[col].isna().any():
    logger.error(f"Column {col} contains NaN values")
    raise ValueError(f"Column {col} contains NaN values")
tracker_mz_data.track(df_mz_data, step_name="Change Data Type")

2025-12-10 06:55:27 - utils - INFO - [Master Zona Data] Step: Change Data Type | Counts: **,*** | Change: +0 (+0.00%) | Retention: 56.90% | Step Time: 0.14s | Cumulative Time: 8.10s


In [18]:
# show summary
tracker_mz_data.summary()

2025-12-10 06:55:27 - utils - INFO - [Master Zona Data] DataTracker summary generated for 8 steps.


Unnamed: 0,Step,Counts,Change,Change (%),Retained (%),Duration (s),Cumulative Time (s)
0,Initial Rows,"**,***",+0,0.0,100.0,0.0,0.0
1,Drop Rows with All NaN Values,"**,***",+0,0.0,100.0,0.12,0.12
2,Drop Rows with NaN Customer,"**,***",-*,-0.01,99.99,0.04,0.16
3,Drop Rows with Duplicate Values,"**,***",+0,0.0,99.99,0.1,0.26
4,Convert OP Name to OP Code,"**,***","-*,***",-10.31,89.68,3.31,3.57
5,Correct Scientific Notation,"**,***",+0,0.0,89.68,3.7,7.26
6,Filter Rows Based on Unique OP + Store,"**,***","-**,***",-36.55,56.9,0.7,7.96
7,Change Data Type,"**,***",+0,0.0,56.9,0.14,8.1


## C. Merged Data

In [19]:
# merged data
df_all_data = pd.merge(
                df_op_data[[col for col in df_op_data.columns if col != "KM Master"]], 
                df_mz_data[
                    ['OP','Toko','KM Master','Kode Zona',
                     'Kecamatan','Kota','Provinsi','Status Toko']],
                how='inner', on=['OP', 'Toko']
                )

logger.info(f"Combined {mask_numeric_value(f'{len(df_all_data):,}')} rows of operational data and master zona data")

# show column names
print("\nCombined Data Column Names:")
display(df_all_data.head(0))

2025-12-10 06:55:32 - __main__ - INFO - Combined *,***,*** rows of operational data and master zona data



Combined Data Column Names:


Unnamed: 0,Tgl SLA,Nomor SPJ,Toko,KM Tempuh,KM Max,OP,dev_km,KM Master,Kode Zona,Kecamatan,Kota,Provinsi,Status Toko


In [20]:
# track preprocessing merged data
tracker_all_data = DataTracker("Merged Data")
tracker_all_data.track(df_all_data, step_name="Initial Rows")

2025-12-10 06:55:32 - utils - INFO - Initialized DataTracker for: [Merged Data]
2025-12-10 06:55:32 - utils - INFO - [Merged Data] Step: Initial Rows | Counts: *,***,*** | Change: +0 (+0.00%) | Retention: 100.00% | Step Time: 0.00s | Cumulative Time: 0.00s


In [21]:
# drop rows with all NaN value
df_all_data = df_all_data.dropna(how='all')
tracker_all_data.track(df_all_data, step_name="Drop Rows with All NaN Values")

# drop rows with duplicate value
df_all_data = df_all_data.drop_duplicates()
tracker_all_data.track(df_all_data, step_name="Drop Rows with Duplicate Values")

2025-12-10 06:55:36 - utils - INFO - [Merged Data] Step: Drop Rows with All NaN Values | Counts: *,***,*** | Change: +0 (+0.00%) | Retention: 100.00% | Step Time: 4.31s | Cumulative Time: 4.31s
2025-12-10 06:55:43 - utils - INFO - [Merged Data] Step: Drop Rows with Duplicate Values | Counts: *,***,*** | Change: +0 (+0.00%) | Retention: 100.00% | Step Time: 7.21s | Cumulative Time: 11.53s


In [22]:
# filter out status toko = 0
df_all_data = df_all_data[df_all_data['Status Toko'] != '0']
tracker_all_data.track(df_all_data, step_name="Drop Rows with Status Toko = 0")

2025-12-10 06:55:45 - utils - INFO - [Merged Data] Step: Drop Rows with Status Toko = 0 | Counts: *,***,*** | Change: -*,*** (-0.13%) | Retention: 99.87% | Step Time: 1.31s | Cumulative Time: 12.83s


In [23]:
# show summary
tracker_all_data.summary()

2025-12-10 06:55:45 - utils - INFO - [Merged Data] DataTracker summary generated for 4 steps.


Unnamed: 0,Step,Counts,Change,Change (%),Retained (%),Duration (s),Cumulative Time (s)
0,Initial Rows,"*,***,***",+0,0.0,100.0,0.0,0.0
1,Drop Rows with All NaN Values,"*,***,***",+0,0.0,100.0,4.31,4.31
2,Drop Rows with Duplicate Values,"*,***,***",+0,0.0,100.0,7.21,11.53
3,Drop Rows with Status Toko = 0,"*,***,***","-*,***",-0.13,99.87,1.31,12.83


In [24]:
# show unique OP + store
print(f"{mask_numeric_value(f'{len(df_all_data):,}')} rows in All Data is equal with {mask_numeric_value(f'{len(df_all_data[['OP', 'Toko']].drop_duplicates()):,}')} unique OP and Store combinations")

*,***,*** rows in All Data is equal with **,*** unique OP and Store combinations


# KM Master Method

In [25]:
# track KM Master method
tracker_km_master = DataTracker("KM Master Method")
tracker_km_master.track(df_all_data, step_name="Initial Unique OP and Store", rows_unique=['OP', 'Toko'])

2025-12-10 06:55:46 - utils - INFO - Initialized DataTracker for: [KM Master Method]
2025-12-10 06:55:47 - utils - INFO - [KM Master Method] Step: Initial Unique OP and Store | Counts: **,*** | Change: +0 (+0.00%) | Retention: 100.00% | Step Time: 0.00s | Cumulative Time: 0.00s


## Main Method

In [26]:
# get data with KM Master outside or equal to treshold for giving recommendation
min_km_master = 1
max_km_master = 1800
df_km_master = df_all_data[(df_all_data['KM Master'] < min_km_master) | (df_all_data['KM Master'] > max_km_master)]

# drop duplicate rows and sort based on OP and Toko
df_km_master = df_km_master.drop_duplicates(subset=['OP', 'Toko']).sort_values(by=['OP', 'Toko']).reset_index(drop=True)

# only keep relevant columns
df_km_master = df_km_master[['OP', 'Kode Zona', 'Toko', 'KM Master']]

# add recommendation column
df_km_master['KM Master Rekomendasi'] = np.nan

tracker_km_master.track(df_km_master, step_name="Filter with KM Master Outside Treshold", rows_unique=['OP', 'Toko'])

2025-12-10 06:55:47 - utils - INFO - [KM Master Method] Step: Filter with KM Master Outside Treshold | Counts: * | Change: -**,*** (-99.99%) | Retention: 0.01% | Step Time: 1.08s | Cumulative Time: 1.08s


In [27]:
# get data from google sheets for recommendation
worksheet = sheets_loader(sheets_url=os.getenv('SHEETS_URL')).worksheet(config.UPDATED_WORKSHEET)
df_recommendation = pd.DataFrame(worksheet.get_all_records())
df_recommendation = df_recommendation.replace("", np.nan)
logger.info(f"Loaded {mask_numeric_value(f'{len(df_recommendation):,}')} rows of google sheets recommendation data")

# check if google sheet data is empty
if df_recommendation.empty:
    df_recommendation = pd.DataFrame(columns=['OP', 'Kode Zona', 'Toko', 'KM Master Rekomendasi'])

# check the data
check_content = df_km_master[['OP', 'Kode Zona', 'Toko']].isin(df_recommendation[['OP', 'Kode Zona', 'Toko']]).all().all()
check_recommendation = df_recommendation['KM Master Rekomendasi'].isna().sum()

if check_content: # if df_km_master content is in df_recommendation
    if check_recommendation > 0: # if there is OP and Store with no recommendation
        # update google sheet data with no KM Master recommendation
        df_km_master = df_km_master.where(df_km_master.notna(), "")
        sheets_updater(sheets_url=os.getenv('SHEETS_URL'), df=df_km_master)
        logger.warning(f"Update {config.UPDATED_WORKSHEET} with no KM Master recommendation")
        # hide values if configured
        if config.HIDE_VALUES:
            raise ValueError(f"Give recommendation to {config.UPDATED_WORKSHEET} in URL: {os.getenv('SHEETS_URL')[:30]} [Redacted]")
        else:
            raise ValueError(f"Give recommendation to {config.UPDATED_WORKSHEET} in URL: {os.getenv('SHEETS_URL')}")
    else:
        logger.info(f"Data in {config.UPDATED_WORKSHEET} is completed")
else:
    # update google sheet data with no KM Master recommendation
    df_km_master = df_km_master.where(df_km_master.notna(), "")
    sheets_updater(sheets_url=os.getenv('SHEETS_URL'), df=df_km_master)
    logger.warning(f"Update {config.UPDATED_WORKSHEET} with no KM Master recommendation")
    # hide values if configured
    if config.HIDE_VALUES:
        raise ValueError(f"Give recommendation to {config.UPDATED_WORKSHEET} in URL: {os.getenv('SHEETS_URL')[:30]} [Redacted]")
    else:
        raise ValueError(f"Give recommendation to {config.UPDATED_WORKSHEET} in URL: {os.getenv('SHEETS_URL')}")

tracker_km_master.track(df_km_master, step_name="Final Unique OP and Store", rows_unique=['OP', 'Toko'])            

2025-12-10 06:55:47 - google_sheets_io - INFO - Loading Google Sheets from URL: https://docs.google.com/spread [Redacted]...
2025-12-10 06:55:47 - google_sheets_io - INFO - Credentials loaded successfully
2025-12-10 06:55:47 - google_sheets_io - INFO - Google Sheets API authorized
2025-12-10 06:55:49 - google_sheets_io - INFO - Sheet opened successfully
2025-12-10 06:55:50 - __main__ - INFO - Loaded * rows of google sheets recommendation data
2025-12-10 06:55:50 - __main__ - INFO - Data in Rekomendasi Manual KM Master is completed
2025-12-10 06:55:50 - utils - INFO - [KM Master Method] Step: Final Unique OP and Store | Counts: * | Change: +0 (+0.00%) | Retention: 0.01% | Step Time: 3.59s | Cumulative Time: 4.67s


## KM Master Result

In [29]:
# show summary
tracker_km_master.summary()

2025-12-10 07:16:45 - utils - INFO - [KM Master Method] DataTracker summary generated for 3 steps.


Unnamed: 0,Step,Counts,Change,Change (%),Retained (%),Duration (s),Cumulative Time (s)
0,Initial Unique OP and Store,"**,***",+0,0.0,100.0,0.0,0.0
1,Filter with KM Master Outside Treshold,*,"-**,***",-99.99,0.01,1.08,1.08
2,Final Unique OP and Store,*,+0,0.0,0.01,3.59,4.67


# KM Tempuh Method

## KM Max == 1 Category

## KM Max <> 1 Category

## Outlier Category

## KM Tempuh Result

# Master Zona Method

## Main Method

## Master Zona Result

# Results Summary