# Inventory Optimization Calculation

This notebook is based on `EDA_InventoryOptimization.ipynb` and `CheckItems_bySubCategory.ipynb`, included all calculations within these two notebook, but NO PLOT. Just generate new CSV files.  

#### Input Dataset Description
1. `Distribution Report - Auckland Departures - July19.csv` <br/>or `../data/RawData/Distribution Report- Auckland Departures - Jan2020.csv` <br/>or `../data/RawData/Distribution Report-AKL Arrivals-Jan 2020.csv`
2. `sales_newzealand.csv`: monthly sale data
3. `SubCateogoryInfo.csv` 
4. `Vendor_info.csv`


## Contents
1. Identify items which have potential issues (Output csv in **`Output/IssuedItem` folder**)

2. **Capacity adjustment**
   - `ProposedDepth` = 3, if Depth >= 4. Otherwise empty; 
   - `VarianceDepth` = ProposedDepth - Depth (should be negative)
2. Identify Items which satisfy the rule in at least in one month (Output csv in **`AtleastOneMonth` folder**)
   
3. Identify Items which satisfy the rule in every month (Output csv in **`full_month` folder**)

***
### Explaination
#### Identify items which have potential issues
1. `New_items`,  `Removed_items`, `Check_items` will removed from the inventory analysis: 
    1. New_items
       - The items are not in `distribution report`, but have sale history from July-Sep or Sep-Nov
       - save them to `new_SKU.csv`
    2. Removed_items
       - The items are in `distribution report`, but have no sales from July-Sep or Sep-Nov
       - save them to `removed_SKU.csv`
    3. Check_items 
       - The items are in `distribution report`, but have no sales from Apr-Sep or Sep-Nov
       - save them to `check_SKU.csv`
       - There might be two reasons for this collection
          1. file has not been update
          2. the items are truly in the store, but cannot sold out 
    4. Incorrect_record_items
       - The items has extremely high ratio of capacity/facing. (>6)
       - Either facing or capacity is incorrect. Need to report to store manager.
       - Save them in `Incorrect_record_items.csv`
    5. Items Depth < 2 (same as Capacity < Facings *2)
       - e.g. Capacity = Facings = 1, incorrect
       - e.g Facing = 3, Capacity = 5, incorrect
       - Capacity should > Facing. 
       - Save them in `Depth2_items.csv`
       
#### At least one month SKU
Items which satisfy the rule in at least in one month from July, to Sep.
- adjusted SKU(atLeastOneMonth): items which depth >=4, and has new proposed depth. save in `adjusted_SKU.csv`
- unadjusted SKU(atLeastOneMonth): items which depth <4, and has no influnece under new rules. save in `unadjusted_SKU.csv`

#### Full month (3 month) SKU
Items satisfy the rule in every month (e.g. July, Aug, Sep)
Save in `qty_less_than_capacity_all_month_SKU.csv`

# Resolving problems:
### Resolving problem 1: SKU sold only one month
For the SKU sold only one month (SKU within `atLeastOneMonth` folder, whose `stdQty_std_by_SKU` standardard derivation of each SKU is empty), what's the reason? removed SKU or OOS issue?

### Resolving problem 2: Replace Strategy
For the SKU with high `Capacity_to_avg_qty`, and low `StdQty_to_AvgQty`(risk) 

There are three posible strategies we can consider
1. `proposed depth`, given facing, for the SKU's depth >4, adjust to 3. (easy)
2. `proposed #POGS`, given SKU's depth <=3, and `#POGS` >1, adjust to 1. 
3. `proposed facings`, given SKU's depth <=3, POGS =1 and facing is relative large. (hard!)
#### Notes:
for 2 and 3 above, we need to find **underspace SKU** at the same time. (Reverse Logic + Martix design)

#### Martix design
To make decision in Replace Strategy (especially `proposed facings`) we need consider some Key factors: 
1. `Capacity_to_avg_qty`: measure sales performance over certain period
2. `StdQty_to_AvgQty`: **measure the risk**
3. `Classification`: Historical popularity 
4. `Linear (cm)`: match space
5. `Total Margin ($)`
6. `SubCategory`: similar or same.
The real decision should based on these factors.


In [1]:
import pandas as pd
from pandas import ExcelWriter
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import pyplot
import plotly.express as px
import plotly.graph_objects as go
import warnings
warnings.filterwarnings('ignore')
from Inventory_opti_helperFunction import *
import findspark
from pyspark import SparkContext, SparkConf
from pyspark.sql.types import *
# Import SparkSession
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql import Row
from pyspark.sql.functions import UserDefinedFunction
import re
import errno, sys

findspark.init()

"""
Build the SparkSession
"""
# getOrCreate(): get the current Spark session or to create one if there is none running
spark = SparkSession.builder \
   .master("local") \
   .appName("Inventory Opti Model") \
   .config("spark.executor.memory", "1gb") \
   .getOrCreate()
   
sc = spark.sparkContext # create a SparkSession object from your SparkContext

# Verify SparkContext
print(sc)

# Print Spark version
print(sc.version)

<SparkContext master=local appName=Inventory Opti Model>
2.4.4


In [3]:
### User Input from interface
sales_data_path, dist_data_path, start_date, end_date, split_month, store_name = user_put()

#################################
## read, merge and clean datasets
#################################
month_merge, month_merge_early, month_merge_late, dist_df = read_merge_and_clean_data(sales_data_path, 
                                                                             dist_data_path, split_month, 
                                                                             start_date, end_date, store_name, spark)

##############################################
## Identify items with potential issues
##############################################

df = Identify_and_output_issused_items(month_merge, month_merge_early, month_merge_late, dist_df)
df.toPandas().to_csv('../data/CleanedData/month_merge_total.csv', 
                             index=False, encoding='utf-8')


#############################
### Analysis
#############################
names = ["totalMonthlyNetSale", "totalMonthlyQtySold", "Price", 
         "SellMargin", "Facings", "Capacity"]
df = convertColumn(df, names, FloatType())

"""
Add more columns
"""
# add mean, std and geometric mean of qty sold GROUPBY subcategory and month to datatset 
df = calculate_mean_std_and_geometric_mean(df)
# add capacity/sale ratio to dataset
df = calculate_Capacity_to_sales(df)
# add depth, proposed depth
df = calculate_Depths(df)

# find and analysis atLeastOneMonth_SKU
# df_full is the combination of unchanged_SKU and changed_SKU
df_atLeastOneMonth, unchanged_SKU, changed_SKU, df_full = find_and_analysis_atLeastOneMonth_SKU(df)
Group_and_save_atLeastOneMonth_SKU(unchanged_SKU, changed_SKU)

# find and analysis fullMonth SKU 
full_month_SKU_info = find_and_analysis_fullMonth_SKU(df_atLeastOneMonth, split_month, spark)
save_fullMonth_SKU(full_month_SKU_info)

print("\n\nFinish!")


📌Notes:
1. The sales_newzealand.csv must have the sales data from the store you want to test: (NZDF.AKL108 or NZDF.AKL109 or both)

2. The decision of start date is based on the "generating date of Distribution Report".

3. If start date is 2019-04-01, then end_date will be 2019-10-01, the month begin to test is July. 
   If start date is 2019-06-01, then end_date will be 2019-12-01, the month begin to test is Sep. 
   
4. The file name of Distribution Report must contain 'Departures' or 'Arrivals':
   Departures: NZDF.AKL108
   Arrivals: NZDF.AKL109
   
5. The output folder name is ‘Output’ only. Remeber rename it to distinguish different input data: 
   for example: Rename to Output_Sep_108, Output_July_108, Output_Sep_109
   
    
Enter the path of 6 month sales data, 
    e.g. ../data/RawData/sales_newzealand.csv:../data/RawData/sales_newzealand.csv
Enter the path of disrtibuted report data, 
    e.g. ../data/RawData/Distribution Report - Auckland Departures - July19.csv or ../dat