# DH PNA metrics

## Executive summary

In this notebook, we define two DH pna related metrics: 

- Not found rate:

    - Formula: A+B+C/D+E

    - A: Diff of originally placed units and partial (and total) removal for delivered orders. (i.e: You ask for 3 bottles of product X and only 2 are available) --> 1
    - B: Unsuccesful replacements in delivered orders --> we do not have this metric
    - C: All units cancelled due to PNA
    - D: All originally placed items in Delivered orders
    - E: All originally placed items in PNA cancellations


- Items not delivered rate

    - Formula: A+B+C/D+E

    - A: Diff of originally placed units and partial removal for delivered orders. (i.e: You ask for 3 bottles of product X and only 2 are available) --> 1
    - B: Diff in replacement unit for delivered ordres. (i.e: You asf for 3 bottles of product X and receive 1 of product Y) --> 2
    - C: All units cancelled due to PNA
    - D: All originally placed items in Delivered orders
    - E: All originally placed items in PNA cancellations

For DH items means units, so in all cases we count units.

**Business questions answered:**

- **What is the % of Not found rate for JAN QCpartnres?** --> 6.2%
- **What is the % of Items not delivered for JAN QCPartners?** --> 5.5%
- **What rate is bigger?** --> Not found rate will always be bigger as the denominator is the same for both ratios. But the numerator for the not found rate takes all the originally placed units for the replaced item. Instead, the not delivered metric, computes the difference between the replaced item and the replacer.

## Plan

Business question to answer
- [X] What is the ratio of Not found rate for JAN QCpartners?
- [X] What is the ratio of Items not delivered for JAN QCPartners?
- [X] What rate is bigger?

Tasks
- [X] Develop formula for ratio of items not found
- [X] Develop formula for ratio of not delivered items

Conclusions
- [X] Anything to add to knowledge
- [X] Any query to add to repo

## Config

In [25]:
# import matplotlib.pyplot as  plt
# import numpy as np
import pandas as pd
# import seaborn as sns
import sys

sys.path.append('c:\\Users\\Jordi Cremades\\Documents\\Repository')

# from utils import dataset_meta_stats
# from utils import dataset_stats
from utils import query_engines

# dms = dataset_meta_stats.DatasetMetaStats() 
# ds = dataset_stats.DatasetStats()

## Task 1: Develop formula for ratio of items not found

In [62]:
#query object

q = query_engines.QueryEngines(
    query='ratio_not_found.sql', 
    params=None,
    load_from_output_file='ratio_not_found', #with no .csv
    output_file='ratio_not_found', #with no .csv
    printq=None
)

df = q.query_run_starbust()
df

Unnamed: 0,order_country_code,month,diff_units_removals,originally_placed_units_delivered_orders_replacements,originally_placed_units_delivered_orders,originally_placed_units_pna_cancelled_orders,ratio_not_found
0,ME,2023-08-01,7191,211,124079,1157,0.0683
1,KG,2023-08-01,76,73,76588,2599,0.0347
2,HR,2023-12-01,5858,660,213762,1665,0.0380
3,GE,2023-12-01,84274,6576,1464013,22171,0.0760
4,TN,2023-06-01,5517,474,35603,5817,0.2851
...,...,...,...,...,...,...,...
320,AD,2023-03-01,11,5,792,0,0.0202
321,GH,2024-01-01,3980,1828,66376,2459,0.1201
322,SI,2023-12-01,403,134,14857,8,0.0367
323,KE,2024-01-01,12634,5297,571025,9995,0.0481


In [63]:
#For last month
grouped = df.groupby('month')[['diff_units_removals',
                     'originally_placed_units_delivered_orders_replacements',
                     'originally_placed_units_delivered_orders',
                     'originally_placed_units_pna_cancelled_orders']].sum()

grouped['perc'] = (grouped['diff_units_removals'] + grouped['originally_placed_units_delivered_orders_replacements'] + grouped['originally_placed_units_pna_cancelled_orders'])/(grouped['originally_placed_units_delivered_orders'] + grouped['originally_placed_units_pna_cancelled_orders'])

grouped.sort_index(ascending=False)

Unnamed: 0_level_0,diff_units_removals,originally_placed_units_delivered_orders_replacements,originally_placed_units_delivered_orders,originally_placed_units_pna_cancelled_orders,perc
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2024-01-01,1564483,203967,31105647,169799,0.061974
2023-12-01,1411214,174083,29464952,180590,0.059567
2023-11-01,1257035,140246,27675383,162071,0.056016
2023-10-01,1161960,117161,26489175,168034,0.054288
2023-09-01,1028437,106734,23673666,155581,0.054167
2023-08-01,929676,93432,22720676,152912,0.051414
2023-07-01,918956,83393,22291937,146576,0.051203
2023-06-01,843426,72844,20808792,135315,0.050209
2023-05-01,854881,61059,21324809,126746,0.048607
2023-04-01,900841,64941,20161476,127318,0.053877


## Task 2: Develop formula for ratio of not delivered items

In [64]:
#query object

q = query_engines.QueryEngines(
    query='ratio_not_delivered.sql', 
    params=None,
    load_from_output_file='ratio_not_delivered', #with no .csv
    output_file='ratio_not_delivered', #with no .csv
    printq=None
)

df = q.query_run_starbust()
df

Unnamed: 0,order_country_code,month,diff_units_removals_and_replacements,originally_placed_units_delivered_orders,originally_placed_units_pna_cancelled_orders,ratio_not_delivered
0,PL,2023-03-01,299235,4170365,6462,0.0732
1,SI,2023-04-01,222,7481,16,0.0317
2,KZ,2023-10-01,7345,265756,6078,0.0494
3,KE,2023-05-01,8993,425302,12761,0.0497
4,GE,2023-08-01,66401,1131064,20429,0.0754
...,...,...,...,...,...,...
320,KE,2023-08-01,11050,432653,11105,0.0499
321,PT,2023-08-01,98192,1241973,6523,0.0839
322,UA,2023-05-01,32764,2009402,9396,0.0209
323,ES,2023-10-01,298722,5615140,16676,0.0560


In [65]:
#For last month
grouped = df.groupby('month')[['diff_units_removals_and_replacements',
                     'originally_placed_units_delivered_orders',
                     'originally_placed_units_pna_cancelled_orders']].sum()

grouped['perc'] = (grouped['diff_units_removals_and_replacements'] + grouped['originally_placed_units_pna_cancelled_orders'])/(grouped['originally_placed_units_delivered_orders'] + grouped['originally_placed_units_pna_cancelled_orders'])

grouped.sort_index(ascending=False)

Unnamed: 0_level_0,diff_units_removals_and_replacements,originally_placed_units_delivered_orders,originally_placed_units_pna_cancelled_orders,perc
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2024-01-01,1562278,31105647,169799,0.055381
2023-12-01,1410473,29464952,180590,0.05367
2023-11-01,1258570,27675383,162071,0.051033
2023-10-01,1163062,26489175,168034,0.049934
2023-09-01,1030535,23673666,155581,0.049776
2023-08-01,931338,22720676,152912,0.047402
2023-07-01,920434,22291937,146576,0.047553
2023-06-01,845788,20808792,135315,0.046844
2023-05-01,856164,21324809,126746,0.04582
2023-04-01,901886,20161476,127318,0.050728
