### THE SCENARIO
You have just started a diagnostic for one of the biggest food retailers in the UK. Over the past decade they have been incredibly successful and grown enormously, but they are facing difficulties and finding it hard to turn a profit.

Every year they reduce or throw out over £500 million of products that has come to the end if its life. This represents close to 10% of their annual turnover. Alongside this, up to 10% of the time a product a customer is after is not available in a store.

Reducing waste and improving product availability would greatly improve profitability. However, no one knows what the biggest causes of these two issues are and where they need to begin to solve the issues. From the time a product leaves a supplier, goes through the retailers supply chain and ends up in store, it will have been processed through a number of different systems, owned by a number of different teams and available in a number of different format.

Your role is to bring all this data together to paint a complete picture of the journey a product makes through the supply chain and then apply logic to identify the biggest causes of waste and unavailability. The data is large and messy and often ambiguous.

### THE PROCESS
All products start with a forecast. This is the amount of product expected to sell within each store on a daily basis. This forecast is compared with the amount of stock currently held within stores (the stockfile) and an order is generated to purchase the difference. Quite often the stockfile is incorrect, so an accurate picture of what it is in store is not available.

Stock must be ordered at a minimum level. This is called a tray size and is set for each product. For example, BLT sandwiches come on a tray size of 18.

The order is placed with suppliers. Some suppliers cannot deliver every day, so we need to make sure we order enough to cover the entire delivery period. Sometimes, suppliers will over or under-deliver.

These orders are delivered into various distribution centres across the country. From here the product is allocated to store based on its need (the difference between the forecast and the amount they already have) and the volume delivered by the supplier. This allocation is used to pick the amount for each store, loaded onto lorries and dispatched. Quite often the amount picked will be incorrect.

When a lorry arrives in store checks are done to make sure the amount delivered is correct, often it is not. The product is unloaded and either put straight onto shelves or stored within a backroom. In the store processes exist to check and confirm the stockfile matches the actual product they have.

On the day a product is due to expire, it is considered wasted. The price will be marked down in stages to try and recover some of the lost value, but if it doesn’t sell by the end of the day it will be thrown in the bin.

In [212]:
#### Packages
import pandas as pd

### Read in the data

In [231]:
cinv = pd.read_csv('Downloads/Case Study - Closing Inventory.txt', sep='\t', encoding='utf-16',
                   parse_dates=['calendar_date'])

In [205]:
cinv[((cinv['calendar_id'] == 20180707) & (cinv['store_id'] == 2061))]

Unnamed: 0,upc,calendar_date,calendar_id,store_id,geography_id,shelf_life,units_per_tray,closing_inventory_min_neg_over_shelf_life_minus_2_days,closing_inventory_neg_count_over_1_day,closing_inventory_neg_count_over_shelf_life_minus_2_days,closing_inventory_on_day
0,464345,2018-07-07,20180707,2061,6729,6,10,,0,0,6.0
1049,313643,2018-07-07,20180707,2061,6729,8,22,,0,0,12.0
2004,423281,2018-07-07,20180707,2061,6729,2,3,,0,0,0.0
2088,851053,2018-07-07,20180707,2061,6729,6,9,,0,0,17.0
3478,206310,2018-07-07,20180707,2061,6729,7,24,,0,0,40.0
...,...,...,...,...,...,...,...,...,...,...,...
85709,434515,2018-07-07,20180707,2061,6729,7,12,,0,0,9.0
85929,141802,2018-07-07,20180707,2061,6729,7,8,,0,0,33.0
85940,395380,2018-07-07,20180707,2061,6729,4,16,,0,0,19.0
86125,951531,2018-07-07,20180707,2061,6729,5,30,,0,0,83.0


In [210]:
depot[((depot['calendar_id'] == 20180707) & (depot['store_id'] == 2061))]

Unnamed: 0,upc,calendar_date,calendar_id,store_id,geography_id,shelf_life,units_per_tray,depot_delivered_qty_on_day,depot_delivered_qty_over_minus_2_day,depot_delivered_qty_over_shelf_life_plus_1,depot_lvl_required_qty_over_supplier_lead_time,depot_lvl_target_inventory_on_day,depot_ordered_qty_on_day,depot_ordered_qty_over_minus_2_day,depot_ordered_qty_over_shelf_life_plus_1,depot_ordered_qty_over_supplier_lead_time,depot_store_id,over_deliver,under_deliver
0,464345,2018-07-07,20180707,2061,6729,6,10,600.0,0.0,0.0,0.0,585.53,600.0,0.0,0.0,,5173.0,False,False
1049,313643,2018-07-07,20180707,2061,6729,8,22,1364.0,0.0,0.0,0.0,3805.99,1364.0,0.0,0.0,,5173.0,False,False
2004,423281,2018-07-07,20180707,2061,6729,2,3,1668.0,0.0,0.0,0.0,428.30,1668.0,0.0,0.0,,5173.0,False,False
2088,851053,2018-07-07,20180707,2061,6729,6,9,648.0,0.0,0.0,0.0,2100.94,648.0,0.0,0.0,,5173.0,False,False
3478,206310,2018-07-07,20180707,2061,6729,7,24,1248.0,0.0,0.0,0.0,2572.34,1248.0,0.0,0.0,,5173.0,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
85709,434515,2018-07-07,20180707,2061,6729,7,12,1608.0,0.0,0.0,0.0,1853.06,1608.0,0.0,0.0,,5173.0,False,False
85929,141802,2018-07-07,20180707,2061,6729,7,8,256.0,0.0,0.0,0.0,1066.21,256.0,0.0,0.0,,5173.0,False,False
85940,395380,2018-07-07,20180707,2061,6729,4,16,1248.0,0.0,0.0,0.0,984.84,1248.0,0.0,0.0,,5173.0,False,False
86125,951531,2018-07-07,20180707,2061,6729,5,30,7620.0,0.0,0.0,0.0,5664.56,7620.0,0.0,0.0,,5173.0,False,False


In [209]:
fore[((fore['calendar_id'] == 20180707) & (fore['store_id'] == 2061))]

Unnamed: 0,upc,calendar_date,calendar_id,store_id,actual_store_need_on_day,actual_store_need_over_lead_time,actual_store_need_qty_over_shelf_life,additive_allocated_qty_over_lead_time,additive_allocated_qty_over_shelf_life,additive_qty_allocated_on_day,...,replacement_qty_over_lead_time,replacement_qty_over_shelf_life,shelf_life,store_allocated_qty_over_lead_time,store_delivery_qty_on_day,store_required_qty_on_day,store_required_qty_over_lead_time,store_required_qty_over_shelf_life,target_inventory_on_day,units_per_tray
0,464345,2018-07-07,20180707,2061,6.0,6.0,6.0,0.0,0.0,0.0,...,,,6,10.0,20.0,18.20,18.20,18.20,6.12,10
1049,313643,2018-07-07,20180707,2061,2.0,2.0,2.0,0.0,0.0,0.0,...,,,8,0.0,0.0,3.27,3.27,3.27,25.64,22
2004,423281,2018-07-07,20180707,2061,5.0,5.0,5.0,0.0,0.0,0.0,...,,,2,3.0,3.0,6.76,6.76,6.76,4.47,3
2088,851053,2018-07-07,20180707,2061,-13.0,-13.0,-13.0,0.0,0.0,0.0,...,,,6,18.0,18.0,17.12,17.12,17.12,17.47,9
3478,206310,2018-07-07,20180707,2061,-35.0,-35.0,-35.0,0.0,0.0,0.0,...,,,7,24.0,24.0,12.43,12.43,12.43,28.48,24
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
85709,434515,2018-07-07,20180707,2061,6.0,6.0,6.0,0.0,0.0,0.0,...,,,7,12.0,12.0,17.11,17.11,17.11,18.94,12
85929,141802,2018-07-07,20180707,2061,-31.0,-31.0,-31.0,0.0,0.0,0.0,...,,,7,0.0,0.0,-8.74,-8.74,-8.74,20.96,8
85940,395380,2018-07-07,20180707,2061,-11.0,-11.0,-11.0,0.0,0.0,0.0,...,,,4,16.0,16.0,0.32,0.32,0.32,7.64,16
86125,951531,2018-07-07,20180707,2061,-56.0,-56.0,-56.0,0.0,0.0,0.0,...,,,5,60.0,60.0,56.09,56.09,56.09,76.59,30


In [211]:
instore[((instore['calendar_id'] == 20180707) & (instore['store_id'] == 2061))]

Unnamed: 0,upc,calendar_date,calendar_id,store_id,geography_id,shelf_life,units_per_tray,cal_gross_sales_qty_on_day,closing_inventory_min_neg_over_shelf_life_minus_2_days,closing_inventory_neg_count_over_1_day,...,stockfile_adjust_qty_at_minus_1_day,stockfile_adjust_qty_at_plus_1_day,stockfile_adjust_qty_on_day,waste_value_on_day,out_of_range,stock_wasted,out_of_stock,stockfile_adj,more_stock,less_stock
0,464345,2018-07-07,20180707,2061,6729,6,10,12.0,,0,...,,,,0.00,False,False,False,False,False,False
1049,313643,2018-07-07,20180707,2061,6729,8,22,14.0,,0,...,,,,0.00,False,False,False,False,False,False
2004,423281,2018-07-07,20180707,2061,6729,2,3,4.0,,0,...,,,,3.60,False,False,False,False,False,False
2088,851053,2018-07-07,20180707,2061,6729,6,9,4.0,,0,...,,,,0.00,False,False,False,False,False,False
3478,206310,2018-07-07,20180707,2061,6729,7,24,1.0,,0,...,,,,11.00,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
85709,434515,2018-07-07,20180707,2061,6729,7,12,15.0,,0,...,,,,0.00,False,False,False,False,False,False
85929,141802,2018-07-07,20180707,2061,6729,7,8,1.0,,0,...,,,,4.25,False,False,False,False,False,False
85940,395380,2018-07-07,20180707,2061,6729,4,16,8.0,,0,...,,,,0.00,False,False,False,False,False,False
86125,951531,2018-07-07,20180707,2061,6729,5,30,27.0,,0,...,,,,0.00,False,False,False,False,False,False


In [216]:
cinv['closing_inventory_on_day']

0           6.0
1           NaN
2           NaN
3           NaN
4           7.0
           ... 
2648534    66.0
2648535     NaN
2648536     3.0
2648537    14.0
2648538     9.0
Name: closing_inventory_on_day, Length: 2648539, dtype: float64

#### Data Recorded in the Store

In the store processes exist to check and confirm the stockfile matches the actual product they have.

#### Inventory Data
* 'upc' - Unit Product Code?
* 'calendar_date' - date
* 'calendar_id' - date
* 'store_id' - store ID
* 'geography_id' - geography ID: do we have anything these group to? County, region, etc.
* 'shelf_life' - Shelf Life (days?) (whole integer)
* 'units_per_tray' - Units per Tray (whole integer)
* 'closing_inventory_min_neg_over_shelf_life_minus_2_days' - -1*closing_inventory / Shelf Life for two days before - probably used as forecast feature. (neg float)
* 'closing_inventory_neg_count_over_1_day' (0, 1 or 2) not negative? 
* 'closing_inventory_neg_count_over_shelf_life_minus_2_days' (int 0-12)
* 'closing_inventory_on_day' (float) - stockfile for next day?

In [232]:
fore = pd.read_csv('Downloads/Case Study - Forecast Data.txt', sep='\t', encoding='utf-16',
                  parse_dates=['calendar_date'])

In [233]:
depot = pd.read_csv('Downloads/Case Study - Depot Data.txt', sep='\t', encoding='utf-16',
                   parse_dates=['calendar_date'])

In [234]:
instore = pd.read_csv('Downloads/Case Study - In Store Data.txt', sep='\t', encoding='utf-16',
                     parse_dates=['calendar_date'])

#### Apply Logic

1. Product is no longer in range
2. Depot to Store delivered inaccurately
3. Sales Forecast Error
4. Waste Prediction
5. Stock File Adjustment
6. Negative Stock File

#### 1.Product is no longer in range

Waste qty > 0 on days when ranging indicator is 0.

In [236]:
instore['stock_wasted'] = (instore['waste_value_on_day'] > 0) & (instore['ranging_indicator_on_day'] == 0)

Stockout indicator 1 on a day when ranging indicator is 0

In [76]:
instore['out_of_stock'] = (instore['stock_out_ind_on_day'] == 1) & (instore['ranging_indicator_on_day'] == 0)

#### 2. Depot to Store delivered inaccurately

Received_units (store) > Allocation (store)

Shelf life before and including waste event

In [81]:
depot['over_deliver'] = (depot['depot_delivered_qty_on_day'] > depot['depot_ordered_qty_on_day'])
# this does not take into account - Shelf life before and including waste event. Come back to this.

Store receives less from the depot than the amount allocated to them

In [82]:
depot['under_deliver'] = (depot['depot_delivered_qty_on_day'] < depot['depot_ordered_qty_on_day'])
# this does not take into account - Shelf life before and including waste event. Come back to this.

#### 3. Sales Forecast error

In [113]:
inst_f = instore[['upc', 'calendar_date', 'store_id', 'sales_qty_over_shelf_life', 'sales_value_on_day', 'rtm_quant_on_day']]
fore_f = fore[['upc', 'calendar_date', 'store_id', 'forecast_demand_qty_over_shelf_life', 'forecast_demand_on_day']]

In [114]:
sales_fe = inst_f.merge(fore_f, on=['upc', 'calendar_date', 'store_id'])

In [185]:
sales_fe['over_forecast'] = ((((sales_fe['forecast_demand_on_day'] - sales_fe['sales_value_on_day']) /
  sales_fe['forecast_demand_on_day']) > -1.25) &
((sales_fe['forecast_demand_qty_over_shelf_life'] - sales_fe['sales_qty_over_shelf_life']).abs() > 1))

In [186]:
sales_fe['over_forecast'].value_counts()

False    1746704
True      901835
Name: over_forecast, dtype: int64

In [191]:
sales_fe['under_forecast'] = ((sales_fe['forecast_demand_on_day'] - sales_fe['sales_value_on_day'] / sales_fe['forecast_demand_on_day'] < 0.75) & 
 (sales_fe['forecast_demand_qty_over_shelf_life'] - sales_fe['sales_qty_over_shelf_life']).abs() > 1)

#### 4. Waste Prediction

In [131]:
fore_w = fore[['upc', 'calendar_date', 'store_id', 'predicted_waste_on_day']]
instore_w = instore[['upc', 'calendar_date', 'store_id', 'waste_value_on_day']]

In [132]:
waste_p = fore_w.merge(instore_w, on=['upc', 'calendar_date', 'store_id'])

In [137]:
waste_p['waste_over_prediction'] = (waste_p['predicted_waste_on_day'] * 1.25 > waste_p['waste_value_on_day'])

In [138]:
waste_p['waste_under_prediction'] = (waste_p['predicted_waste_on_day'] * 0.75 < waste_p['waste_value_on_day'])

#### 5. Stock file adjustment

In [153]:
instore['more_stock'] = ((instore['stockfile_adjust_qty_on_day'] > 0) | 
                         (instore['stockfile_adjust_qty_at_plus_1_day'] > 0))

In [154]:
instore['less_stock'] = ((instore['stockfile_adjust_qty_on_day'] < 0) |
               (instore['stockfile_adjust_qty_at_minus_1_day'] < 0))

#### 6. Negative Stockfile

In [155]:
# on day before or day of the waste event - check data. This doesn't make sense
cinv['negative_stockfile'] = ((cinv['closing_inventory_on_day'] < 0) | 
                              (cinv['closing_inventory_neg_count_over_1_day'] < 0))

### Simple run at the problem