# ML Project: eCommerce Inventory Prescription

## Sources:

1. Getting cass index: https://fred.stlouisfed.org/series/FRGSHPUSM649NCIS
2. Insight into potential predictive features: https://dspace.mit.edu/bitstream/handle/1721.1/126484/scm2020-huang-a-predictive-model-for-transpacific-eastbound-ocean-freight-pricing-capstone.pdf?sequence=1&isAllowed=y
3. crude oil prices: https://www.indexmundi.com/commodities/?commodity=crude-oil&months=360
4. OLD: exchange rate: https://fred.stlouisfed.org/series/EXCHUS
5. US CPI: https://fred.stlouisfed.org/series/CPIAUCSL
6. US PMI: https://data.nasdaq.com/data/ISM/MAN_PMI-pmi-composite-index
7. OLD: China PMI:


## Question for Semi: Should we change the data from 2019 to 2020? Gives us more train data - I'm doing it lol

## Data / Methodolgy

We need to predict the average US freight cost for a three month horizon. For the purposes of our problem, all items travel the same distance from the manufacturer to the warehouse. Since we are assuming that we are shipping less-than-truckload (LTL) shipments, cost is determined solely by weight of shipped items. We are using the CASS index to determine historical freight prices in the US. (insert explanation of CASS index). We can then estimate transportation costs accordingly: find average shipping cost per pound for the last month of our training set ($\text{\$/lb}_{N}$), associated with $\text{CASS Index}_{N}$, and back-compute all prior shipping costs per pound using the relationship between $\text{CASS Index}_{N}$ and the CASS index of the prior months.

To "predict" shipping rate, we will use the following features:
- oil price
- exchange rate (CH/US)   (might not need this one)
- US CPI
- US PMI
- Year
- Month

We got the above data from (sources). We found our other data (demand and individual item costs) from... (Semi)

Simple linear regression of the non-aggregated data shows that there is explanability in these features (in-sample R^2 of 0.5). 

Since we have a year of data, we can run the optimization 4 times, since our horizon is 1 quarter. Thus, we need 4 test periods, with each period correponding to 13 weeks of data. 

We'll break down the demand into 4 chunks of 13 weeks each (13 x 4 = 52). We can accordingly group the CASS data (response and features) into 3 month periods (quarters, thus 4 for each year), with the features being:
- Year
- Quarter
- Following features for the beginning of the quarter (taken from first month of quarter)
    - oil price
    - exchange rate (CH/US)   (might not need this one)
    - US CPI
    - US PMI 
    
This will let us find the neighbors of the 4 test quarters, since at the beginning of these quarters we will know the values of all the features. 

Simple linear regression of aggregated (quarterized) data leads to train R-squared of 0.5511. 

Essentially our analysis is by quarter, so we'll have projected profit (using prescription), baseline profit (simply using predicted avg CASS for quater), and ideal profit (oracle approach) for all four quarters.


Thus, work to do right now is:
- break down predicted demand into 4 matrices (19 x 13) each (we'll have to throw a week out, or include it in the last quarter or something)
- rewrite model to incorporate shipping costs per pound estimate and to account for change in horizon (change lead time to two weeks) 
- find simple prediction and oracle profits

## OLD
We can use the CASS Index to determine how much the rates have changed Y/Y. A paper (citation 2) studied shipping costs from China to US and found that there are 6 features which play a role:
- oil price
- exchange rate (CH/US)
- US CPI
- US PMI
- China PMI

## END OLD

In [2]:
using JuMP
using Gurobi
using CSV
using DataFrames
using Random;

In [55]:
# load data
demand = CSV.read("data/salesByWeek.csv", DataFrame)
prices = CSV.read("data/prices.csv", DataFrame)
costs = CSV.read("data/costEstimates.csv", DataFrame) # placeholders
d = Matrix(demand[:,3:end])
r = Vector(prices[:,3]) 
c = Vector(costs[:,2]) # placeholders
tr = Vector(costs[:,3]) # placeholders
fba = 19
leadtime = 8

items_tot = size(d)[1]
time_tot = size(d)[2];

# uncertain shipping cost by weight - creating random set for now
S = zeros(time_tot)
for i=1:

starting_inv = sum(d[:,1:8], dims=2) .+ 10; # starting inv at week 1

#### decision vars ####
x[i,t] --> inventory order from Supplier of product i at time t

j[i,t] --> how much we are selling of product i in time t


#### technically decision vars but not really decisions ####
s[i,t] --> inventory available for sale (at AMZ warehouse) for product i at time t

m[t] --> capital (money) available to purchase inventory at time t

#### params ####
d[i,t] --> demand for product i at time t

r[i] --> sales price for product i

c[i] --> manufacturing cost for product i

tr[i] --> transportation cost for product i


we'll incorporate volume later - for initial, lets assume fixed unit costs

v[i] --> volume (size) of product i

fba --> AMZ storage fee: $19/cbm per month



revenue = 0.70 * sum( d[i,t]*r[i] for i=1:items_tot, t=1:time_tot ) --> amazon takes 30% cut

cost = sum( x[i,t]*(c[i]+tr[i]) for i=1:items_tot, t=1:time_tot ) + sum( fba*s[i,t] for i=1:items_tot, t=1:time_tot:4 )

    --> manu + transport cost of orders + monthly inventory fee for whatever inventory we have


Transportation cost is something that we want to predict -- 

w[t] = total weight of items being shipped in week t
    
$\hat{S}$[m] = uncertain shipping cost in month m - found using CASS index
    
 
 Thus, total trans cost for week t = 
 
 w[t] * S[m(t)]

## OLD MODEL

In [2]:
model = Model(with_optimizer(Gurobi.Optimizer))
set_optimizer_attribute(model, "OutputFlag", 0)

@variable(model, x[i=1:items_tot, t=1:time_tot] >= 0) # keeping it at not Int for now
@variable(model, j[i=1:items_tot, t=1:time_tot] >= 0) # keeping it at not Int for now

@variable(model, s[i=1:items_tot, t=1:time_tot] >= 0) # keeping it at not Int for now
@variable(model, m[t=1:time_tot] >= 0)

# how much we sell is bounded by inventory and demand
@constraint(model, [i=1:items_tot, t=1:time_tot], j[i,t] <= s[i,t])
@constraint(model, [i=1:items_tot, t=1:time_tot], j[i,t] <= d[i,t])
# capital is available 2 weeks after a product's sale
@constraint(model, [t=1:(time_tot-2)], m[t+2] == m[t+1] + 0.7*sum(j[i,t]*r[i] for i=1:items_tot))
# the cost of inventory orders at time t must be less than our available capital
@constraint(model, [t=1:time_tot], sum(x[i,t]*(c[i]+tr[i]) for i=1:items_tot) <= m[t])
# sales at time t reduce inventory at warehouse, 
# inventory orders at time t are available for sale (arrive at warehouse) 8 weeks later
@constraint(model, [i=1:items_tot, t=leadtime:(time_tot-1)], s[i,t+1] == s[i,t] - j[i,t] + x[i,t-(leadtime-1)])

# address the first 1:leadtime weeks of supply
@constraint(model, [i=1:items_tot, t=1:leadtime-1], s[i,t+1] == s[i,t] - j[i,t])
@constraint(model, [i=1:items_tot], s[i,1] == starting_inv[i] - j[i,1])

@objective(model, Max, 0.7*sum(j[i,t]*r[i] - x[i,t]*(c[i]+tr[i]) - 0.25*fba*s[i,t] for i=1:items_tot, t=1:time_tot))

optimize!(model)

objective_value(model)

LoadError: UndefVarError: Gurobi not defined

## NEW MODEL

In [None]:
model = Model(with_optimizer(Gurobi.Optimizer))
set_optimizer_attribute(model, "OutputFlag", 0)

@variable(model, x[i=1:items_tot, t=1:time_tot] >= 0) # keeping it at not Int for now
@variable(model, j[i=1:items_tot, t=1:time_tot] >= 0) # keeping it at not Int for now

@variable(model, s[i=1:items_tot, t=1:time_tot] >= 0) # keeping it at not Int for now
@variable(model, m[t=1:time_tot] >= 0)

# how much we sell is bounded by inventory and demand
@constraint(model, [i=1:items_tot, t=1:time_tot], j[i,t] <= s[i,t])
@constraint(model, [i=1:items_tot, t=1:time_tot], j[i,t] <= d[i,t])
# capital is available 2 weeks after a product's sale
@constraint(model, [t=1:(time_tot-2)], m[t+2] == 0.7*sum(j[i,t]*r[i] for i=1:items_tot))
# the cost of inventory orders at time t must be less than our available capital
@constraint(model, [t=1:time_tot], sum(x[i,t]*(c[i]+tr[i]) for i=1:items_tot) <= m[t])
# sales at time t reduce inventory at warehouse, 
# inventory orders at time t are available for sale (arrive at warehouse) 8 weeks later
@constraint(model, [i=1:items_tot, t=leadtime:(time_tot-1)], s[i,t+1] == s[i,t] - j[i,t] + x[i,t-(leadtime-1)])

# address the first 1:leadtime weeks of supply
@constraint(model, [i=1:items_tot, t=1:leadtime-1], s[i,t+1] == s[i,t] - j[i,t])
@constraint(model, [i=1:items_tot], s[i,1] == starting_inv[i] - j[i,1])

@objective(model, Max, 0.7*sum(j[i,t]*r[i] - x[i,t]*(c[i]+tr[i]) - 0.25*fba*s[i,t] for i=1:items_tot, t=1:time_tot))

optimize!(model)

objective_value(model)