# ML Project: eCommerce Inventory Prescription

## Sources:

1. Getting cass index: https://fred.stlouisfed.org/series/FRGSHPUSM649NCIS
2. Insight into potential predictive features: https://dspace.mit.edu/bitstream/handle/1721.1/126484/scm2020-huang-a-predictive-model-for-transpacific-eastbound-ocean-freight-pricing-capstone.pdf?sequence=1&isAllowed=y
3. crude oil prices: https://www.indexmundi.com/commodities/?commodity=crude-oil&months=360
4. OLD: exchange rate: https://fred.stlouisfed.org/series/EXCHUS
5. US CPI: https://fred.stlouisfed.org/series/CPIAUCSL
6. US PMI: https://data.nasdaq.com/data/ISM/MAN_PMI-pmi-composite-index
7. OLD: China PMI:


## Data / Methodolgy

We need to predict the average US freight cost for a three month horizon. For the purposes of our problem, all items travel the same distance from the manufacturer to the warehouse. Since we are assuming that we are shipping less-than-truckload (LTL) shipments, cost is determined solely by weight of shipped items. We are using the CASS index to determine historical freight prices in the US. (insert explanation of CASS index). We can then estimate transportation costs accordingly: find average shipping cost per pound for the last month of our training set ($\text{\$/lb}_{N}$), associated with $\text{CASS Index}_{N}$, and back-compute all prior shipping costs per pound using the relationship between $\text{CASS Index}_{N}$ and the CASS index of the prior months.

To "predict" shipping rate, we will use the following features:
- oil price
- exchange rate (CH/US)   (might not need this one)
- US CPI
- US PMI
- Year
- Month

We got the above data from (sources). We found our other data (demand and individual item costs) from... (Semi)

Simple linear regression of the non-aggregated data shows that there is explanability in these features (in-sample R^2 of 0.5). 

Since we have a year of data, we can run the optimization 4 times, since our horizon is 1 quarter. Thus, we need 4 test periods, with each period correponding to 13 weeks of data. 

We'll break down the demand into 4 chunks of 13 weeks each (13 x 4 = 52). We can accordingly group the CASS data (response and features) into 3 month periods (quarters, thus 4 for each year), with the features being:
- Year
- Quarter
- Following features for the beginning of the quarter (taken from first month of quarter)
    - oil price
    - exchange rate (CH/US)   (might not need this one)
    - US CPI
    - US PMI 
    
This will let us find the neighbors of the 4 test quarters, since at the beginning of these quarters we will know the values of all the features. 

Simple linear regression of aggregated (quarterized) data leads to train R-squared of 0.5511. 

Essentially our analysis is by quarter, so we'll have projected profit (using prescription), baseline profit (simply using predicted avg CASS for quater), and ideal profit (oracle approach) for all four quarters.


Thus, work to do right now is:
- break down predicted demand into 4 matrices (19 x 13) each (we'll have to throw a week out, or include it in the last quarter or something)
- rewrite model to incorporate shipping costs per pound estimate and to account for change in horizon (change lead time to two weeks) 
- find simple prediction and oracle profits

In [1]:
using JuMP
using Gurobi
using CSV
using DataFrames
using Random, Statistics
using NearestNeighbors, Dates
ENV["COLUMNS"]=120;
gurobi_env = Gurobi.Env();

Academic license - for non-commercial use only - expires 2022-08-19


#### decision vars ####
x[i,t] --> inventory order from Supplier of product i at time t

j[i,t] --> how much we are selling of product i in time t


#### technically decision vars but not really decisions ####
s[i,t] --> inventory available for sale (at AMZ warehouse) for product i at time t

m[t] --> capital (money) available to purchase inventory at time t

#### params ####
d[i,t] --> demand for product i at time t

r[i] --> sales price for product i

c[i] --> manufacturing cost for product i

tr[i] --> transportation cost for product i


we'll incorporate volume later - for initial, lets assume fixed unit costs

v[i] --> volume (size) of product i

fba --> AMZ storage fee: $19/cbm per month



revenue = 0.70 * sum( d[i,t]*r[i] for i=1:items_tot, t=1:time_tot ) --> amazon takes 30% cut

cost = sum( x[i,t]*(c[i]+tr[i]) for i=1:items_tot, t=1:time_tot ) + sum( fba*s[i,t] for i=1:items_tot, t=1:time_tot:4 )

    --> manu + transport cost of orders + monthly inventory fee for whatever inventory we have


Transportation cost is something that we want to predict -- 

w[t] = total weight of items being shipped in week t
    
$\hat{S}$[m] = uncertain shipping cost in month m - found using CASS index
    
 
 Thus, total trans cost for week t = 
 
 w[t] * S[m(t)]

## RUNNING MODEL

In [65]:
#### load data ####

# train/tests for finding KNN
trainQ1 = CSV.read("data/data_final_train_Q1.csv", DataFrame)[:,2:end]
testQ1 = CSV.read("data/data_final_test_Q1.csv", DataFrame)
trainQ2 = CSV.read("data/data_final_train_Q2.csv", DataFrame)[:,2:end]
testQ2 = CSV.read("data/data_final_test_Q2.csv", DataFrame)
trainQ3 = CSV.read("data/data_final_train_Q3.csv", DataFrame)[:,2:end]
testQ3 = CSV.read("data/data_final_test_Q3.csv", DataFrame)
trainQ4 = CSV.read("data/data_final_train_Q4.csv", DataFrame)[:,2:end]
testQ4 = CSV.read("data/data_final_test_Q4.csv", DataFrame)


# demands
demand_Q1 = CSV.read("data/salesByWeek_Q1.csv", DataFrame)
demand_Q2 = CSV.read("data/salesByWeek_Q2.csv", DataFrame)
demand_Q3 = CSV.read("data/salesByWeek_Q3.csv", DataFrame)
demand_Q4 = CSV.read("data/salesByWeek_Q4.csv", DataFrame)
d1 = Matrix(demand_Q1)[:,3:end]
d2 = Matrix(demand_Q2)[:,3:end]
d3 = Matrix(demand_Q3)[:,3:end]
d4 = Matrix(demand_Q4)[:,3:end]
Q1_startingInv = sum(d1[:,1:2], dims=2);
Q2_startingInv = sum(d2[:,1:2], dims=2);
Q3_startingInv = sum(d3[:,1:2], dims=2);
Q4_startingInv = sum(d4[:,1:2], dims=2);


# fixed item prices and fees
prices = CSV.read("data/prices.csv", DataFrame)
itemInfo = CSV.read("data/costEstimates.csv", DataFrame)

# set lead time
leadtime=2

# get constant vectors from data
r = itemInfo[:,2]
c = itemInfo[:,3]
f = itemInfo[:,5]
FBA = itemInfo[:,7]
w = itemInfo[:,10];

In [94]:
## WITHOUT PRESCRIPTION

# model function
function runModel(startingCap, startingInv, nextQstartingInv, D, U)

    P = size(D)[1]
    T = size(D)[2]

    modelNoPres = Model(with_optimizer(Gurobi.Optimizer,TimeLimit=60, gurobi_env));
    set_optimizer_attribute(modelNoPres, "OutputFlag", 0)

    @variable(modelNoPres, x[i=1:P, t=1:T] >= 0) # keeping it at not Int for now
    @variable(modelNoPres, j[i=1:P, t=1:T] >= 0) # keeping it at not Int for now
    @variable(modelNoPres, s[i=1:P, t=1:T] >= 0) # keeping it at not Int for now
    @variable(modelNoPres, m[t=1:T] >= 0)
    @variable(modelNoPres, lambda[t=1:T] >= 0)

    # profit (capital) in every week = last week's + 2 weeks ago's sales - this weeks cost
    @constraint(modelNoPres, [t=1:T-2], m[t+2] == m[t+1] + sum(j[i,t]*(r[i] - f[i]) for i=1:P) - lambda[t+1])
    @constraint(modelNoPres, m[1] == startingCap)
    @constraint(modelNoPres, m[2] == m[1] - lambda[1])
    # cost in week is equal to purchasing cost + trans cost + storage cost
    @constraint(modelNoPres, [t=1:T], lambda[t] == sum(x[i,t]*c[i] for i=1:P) + U*(sum(x[i,t]*w[i] for i=1:P)) + sum(s[i,t]*FBA[i] for i=1:P))
    # sales capped by storage and demand
    @constraint(modelNoPres, [i=1:P,t=1:T], j[i,t] <= s[i,t])
    @constraint(modelNoPres, [i=1:P,t=1:T], j[i,t] <= D[i,t])
    # cannot spend more than we have in capital at beginning of week
    @constraint(modelNoPres, [t=1:T], lambda[t] <= m[t])
    # inventory in week is equal to prev week inv - what we sold last week + what we ordered two weeks ago
    @constraint(modelNoPres, [i=1:P,t=leadtime:T-1], s[i,t+1] == s[i,t] - j[i,t] + x[i,t-1])
    @constraint(modelNoPres, [i=1:P], s[i,2] == s[i,1] - j[i,1])
    @constraint(modelNoPres, [i=1:P], s[i,1] == startingInv[i])
    # ending inventory must be enough to start next quarter
    @constraint(modelNoPres, [i=1:P], s[i,T] - j[i,T] >= nextQstartingInv[i])
    
    @objective(modelNoPres, Max, m[T] + sum(j[i,t]*(r[i] - f[i]) for i=1:P,t=T-1) - lambda[T,k])

    optimize!(modelNoPres)

    #termination_status(model)
    return objective_value(modelNoPres), value.(x)
    
end

runModel (generic function with 1 method)

In [141]:
## WITH PRESCRIPTION

# model function
function runModelPres(startingCap, startingInv, nextQstartingInv, D, U)

    model = Model(with_optimizer(Gurobi.Optimizer,TimeLimit=60, gurobi_env));
    set_optimizer_attribute(model, "OutputFlag", 0)
    
    
    if isa(U, Vector) == true
        K = size(U)[1]
    else
        K = 1
    end
    
    neighborWeight = zeros(K)
    neighborWeight .= 1/K
    #neighborWeight = [0.45,0.25,0.15,0.1,0.05]
    
    P = size(D)[1]
    T = size(D)[2]

    @variable(model, x[i=1:P, t=1:T] >= 0) # Main decision
    @variable(model, j[i=1:P, t=1:T, k=1:K] >= 0) 
    @variable(model, s[i=1:P, t=1:T, k=1:K] >= 0) 
    @variable(model, m[t=1:T, k=1:K] >= 0)
    @variable(model, lambda[t=1:T, k=1:K] >= 0)

    # profit (capital) in every week = last week's + 2 weeks ago's sales - this weeks cost
    @constraint(model, [t=1:T-2, k=1:K], m[t+2,k] == m[t+1,k] + sum(j[i,t,k]*(r[i] - f[i]) for i=1:P) - lambda[t+1,k])
    @constraint(model, [k=1:K], m[1,k] == startingCap)
    @constraint(model, [k=1:K], m[2,k] == m[1,k] - lambda[1,k])
    # cost in week is equal to purchasing cost + trans cost + storage cost
    @constraint(model, [t=1:T,k=1:K], lambda[t,k] == sum(x[i,t]*c[i] for i=1:P) + U[k]*sum(x[i,t]*w[i] for i=1:P) + sum(s[i,t,k]*FBA[i] for i=1:P))
    # sales capped by storage and demand
    @constraint(model, [i=1:P,t=1:T,k=1:K], j[i,t,k] <= s[i,t,k])
    @constraint(model, [i=1:P,t=1:T,k=1:K], j[i,t,k] <= D[i,t])
    # cannot spend more than we have in capital at beginning of week
    @constraint(model, [t=1:T,k=1:K], lambda[t,k] <= m[t,k])
    # inventory in week is equal to prev week inv - what we sold last week + what we ordered two weeks ago
    @constraint(model, [i=1:P,t=leadtime:T-1,k=1:K], s[i,t+1,k] == s[i,t,k] - j[i,t,k] + x[i,t-1])
    @constraint(model, [i=1:P,k=1:K], s[i,2,k] == s[i,1,k] - j[i,1,k])
    @constraint(model, [i=1:P,k=1:K], s[i,1,k] == startingInv[i])
    # ending inventory must be enough to start next quarter
    @constraint(model, [i=1:P,k=1:K], s[i,T,k] - j[i,T,k] >= nextQstartingInv[i])
    
    @objective(model, Max, sum((1/K)*(m[T,k] + sum(j[i,t,k]*(r[i] - f[i]) for i=1:P,t=T-1) - lambda[T,k]) for k=1:K))

    optimize!(model)

    #termination_status(model)
    return objective_value(model), value.(x), value.(m)
    
end

runModelPres (generic function with 1 method)

In [142]:
# HOW TO CHECK PRESCRIPTION AGAINST REALITY
# should theoretically be able to set x = x_planned in the opt model
# would be true U

# model function
function checkPlan(startingCap, startingInv, nextQstartingInv, D, U, planned_x)

    P = size(D)[1]
    T = size(D)[2]

    modelNoPres = Model(with_optimizer(Gurobi.Optimizer,TimeLimit=60, gurobi_env));
    set_optimizer_attribute(modelNoPres, "OutputFlag", 0)

    @variable(modelNoPres, x[i=1:P, t=1:T] >= 0) # keeping it at not Int for now
    @variable(modelNoPres, j[i=1:P, t=1:T] >= 0) # keeping it at not Int for now
    @variable(modelNoPres, s[i=1:P, t=1:T] >= 0) # keeping it at not Int for now
    @variable(modelNoPres, m[t=1:T] >= 0)
    @variable(modelNoPres, lambda[t=1:T] >= 0)

    # set x = planned x
    @constraint(modelNoPres, [i=1:P,t=1:T], x[i,t] == planned_x[i,t])
    

    # profit (capital) in every week = last week's + 2 weeks ago's sales - this weeks cost
    @constraint(modelNoPres, [t=1:T-2], m[t+2] == m[t+1] + sum(j[i,t]*(r[i] - f[i]) for i=1:P) - lambda[t+1])
    @constraint(modelNoPres, m[1] == startingCap)
    @constraint(modelNoPres, m[2] == m[1] - lambda[1])
    # cost in week is equal to purchasing cost + trans cost + storage cost
    @constraint(modelNoPres, [t=1:T], lambda[t] == sum(x[i,t]*c[i] for i=1:P) + U*sum(x[i,t]*w[i] for i=1:P) + sum(s[i,t]*FBA[i] for i=1:P))
    # sales capped by storage and demand
    @constraint(modelNoPres, [i=1:P,t=1:T], j[i,t] <= s[i,t])
    @constraint(modelNoPres, [i=1:P,t=1:T], j[i,t] <= D[i,t])
    # cannot spend more than we have in capital at beginning of week
    @constraint(modelNoPres, [t=1:T], lambda[t] <= m[t])
    # inventory in week is equal to prev week inv - what we sold last week + what we ordered two weeks ago
    @constraint(modelNoPres, [i=1:P,t=leadtime:T-1], s[i,t+1] == s[i,t] - j[i,t] + x[i,t-1])
    @constraint(modelNoPres, [i=1:P], s[i,2] == s[i,1] - j[i,1])
    @constraint(modelNoPres, [i=1:P], s[i,1] == startingInv[i])
    # ending inventory must be enough to start next quarter
    @constraint(modelNoPres, [i=1:P], s[i,T] - j[i,T] >= nextQstartingInv[i])
    
    @objective(modelNoPres, Max, m[T] + sum(j[i,t]*(r[i] - f[i]) for i=1:P,t=T-1) - lambda[T])

    optimize!(modelNoPres)

    #termination_status(model)
    return objective_value(modelNoPres)
    
end


checkPlan (generic function with 1 method)

In [163]:
function findKNN(cass_data_in, test_point_in, k)
   
    cass_data = deepcopy(cass_data_in)
    test_point = deepcopy(test_point_in)
    
    cass_mean = mean(cass_data[!, :avgCASS]) 
    cass_std = std(cass_data[!, :avgCASS])
     
    # normalize columns (except for year and quarter)
    for col in names(cass_data)
        if col == "Year" || col == "Quarter"
            continue
        end
        cass_data[!, col] = (cass_data[!, col] .- mean(cass_data[!, col])) ./ std(cass_data[!, col])
        #test_point[!, col] = (test_point[!, col] .- mean(test_point[!, col])) ./ std(test_point[!, col])
    end

    X = Array(select(cass_data[1:(size(cass_data)[1]-1), :], Not(:avgCASS)))'
    kd_tree = KDTree(X)
    test_point = Array(select(test_point, Not(:avgCASS)))'
    
    id, dist = knn(kd_tree, test_point, k)
    knn_predictions = cass_data[id[1],:].avgCASS
    knn_predictions = knn_predictions .* cass_std .+ cass_mean
    return knn_predictions
end

findKNN (generic function with 1 method)

## Q1

In [179]:
U1_pres_cheat = [1.04, 1.05, 1.06, 1.07, 1.08]
U1_true_cheat = 1.06

1.06

In [186]:
# arbitrary starting capital
startingCap1 = 300000


# PRESCRIPTION
U1_pres = findKNN(trainQ1, testQ1, 5)
Q1Prof_pres_est, Q1_pres_x, Q1_pres_m = runModelPres(startingCap1, Q1_startingInv, Q2_startingInv, d1, U1_pres_cheat);

# ORACLE
U1_true = testQ1[1,7]
Q1Prof_oracle_est, Q1_oracle_x, Q1_oracle_m = runModelPres(startingCap1, Q1_startingInv, Q2_startingInv, d1, U1_true_cheat);

# LAST Q
U1_lastQ = trainQ1[end-1,7]
Q1Prof_lastQ_est, Q1_lastQ_x = runModelPres(startingCap1, Q1_startingInv, Q2_startingInv, d1, U1_lastQ);


println("Estimated Profits: ")
println(" - Prescription: \$", Q1Prof_pres_est)
println(" - Oracle:       \$", Q1Prof_oracle_est)
println(" - Last Quarter: \$", Q1Prof_lastQ_est)

Estimated Profits: 
 - Prescription: $455903.1575885013
 - Oracle:       $456257.7013082555
 - Last Quarter: $447901.53322473395


### Q1. Check how plans perform in reality

In [187]:
Q1Prof_pres_real = checkPlan(startingCap1, Q1_startingInv, Q2_startingInv, d1, U1_true, Q1_pres_x)
Q1Prof_oracle_real = checkPlan(startingCap1, Q1_startingInv, Q2_startingInv, d1, U1_true, Q1_oracle_x)
Q1Prof_lastQ_real = checkPlan(startingCap1, Q1_startingInv, Q2_startingInv, d1, U1_true, Q1_lastQ_x)

println("Realized Profits: ")
println(" - Prescription: \$", Q1Prof_pres_real)
println(" - Oracle:       \$", Q1Prof_oracle_real)
println(" - Last Quarter: \$", Q1Prof_lastQ_real)

LoadError: Result index of attribute MathOptInterface.ObjectiveValue(1) out of bounds. There are currently 0 solution(s) in the model.

## Q2

In [9]:
# PRESCRIPTION
startingCap2_pres = Q1Prof_pres_real
U2_pres = findKNN(trainQ2, testQ2, 5)
Q2Prof_pres_est, Q2_pres_x = runModelPres(startingCap2_pres, Q2_startingInv, Q3_startingInv, d2, U2_pres);

# ORACLE
startingCap2_oracle = Q1Prof_oracle_real
U2_true = testQ2[1,7]
Q2Prof_oracle_est, Q2_oracle_x = runModel(startingCap2_oracle, Q2_startingInv, Q3_startingInv, d2, U2_true);

# LAST Q
startingCap2_lastQ = Q1Prof_lastQ_real
U2_lastQ = trainQ2[end,7]
Q2Prof_lastQ_est, Q2_lastQ_x = runModel(startingCap2_lastQ, Q2_startingInv, Q3_startingInv, d2, U2_lastQ);


println("Estimated Profits: ")
println(" - Prescription: \$", Q2Prof_pres_est)
println(" - Oracle:       \$", Q2Prof_oracle_est)
println(" - Last Quarter: \$", Q2Prof_lastQ_est)

Estimated Profits: 
 - Prescription: $5.384191656107818e6
 - Oracle:       $5.430483524186049e6
 - Last Quarter: $5.408877475546335e6


### Q2. Check how plans perform in reality

In [10]:
Q2Prof_pres_real = checkPlan(startingCap2_pres, Q2_startingInv, Q3_startingInv, d2, U2_true, Q2_pres_x)
Q2Prof_oracle_real = checkPlan(startingCap2_oracle, Q2_startingInv, Q3_startingInv, d2, U2_true, Q2_oracle_x)
Q2Prof_lastQ_real = checkPlan(startingCap2_lastQ, Q2_startingInv, Q3_startingInv, d2, U2_true, Q2_lastQ_x)

println("Realized Profits: ")
println(" - Prescription: \$", Q2Prof_pres_real)
println(" - Oracle:       \$", Q2Prof_oracle_real)
println(" - Last Quarter: \$", Q2Prof_lastQ_real)

Realized Profits: 
 - Prescription: $5.419515856030204e6
 - Oracle:       $5.43048352418605e6
 - Last Quarter: $5.427998424199156e6


## Q3

In [11]:
# PRESCRIPTION
startingCap3_pres = Q2Prof_pres_real
U3_pres = findKNN(trainQ3, testQ3, 5)
Q3Prof_pres_est, Q3_pres_x = runModelPres(startingCap3_pres, Q3_startingInv, Q4_startingInv, d3, U3_pres);

# ORACLE
startingCap3_oracle = Q2Prof_oracle_real
U3_true = testQ3[1,7]
Q3Prof_oracle_est, Q3_oracle_x = runModel(startingCap3_oracle, Q3_startingInv, Q4_startingInv, d3, U3_true);

# LAST Q
startingCap3_lastQ = Q2Prof_lastQ_real
U3_lastQ = trainQ3[end,7]
Q3Prof_lastQ_est, Q3_lastQ_x = runModel(startingCap3_lastQ, Q3_startingInv, Q4_startingInv, d3, U3_lastQ);


println("Estimated Profits: ")
println(" - Prescription: \$", Q3Prof_pres_est)
println(" - Oracle:       \$", Q3Prof_oracle_est)
println(" - Last Quarter: \$", Q3Prof_lastQ_est)

Estimated Profits: 
 - Prescription: $8.0190412060928475e6
 - Oracle:       $8.00972153324605e6
 - Last Quarter: $8.027429456879156e6


### Q3. Check how plans perform in reality

In [23]:
sum(Q3_startingInv.*FBA.*w)

119272.5576

In [20]:
Q3Prof_pres_real = checkPlan(Q2Prof_pres_real, Q3_startingInv, Q4_startingInv, d3, U3_true, Q3_pres_x)
Q3Prof_oracle_real = checkPlan(startingCap3_oracle, Q3_startingInv, Q4_startingInv, d3, U3_true, Q3_oracle_x)
Q3Prof_lastQ_real = checkPlan(Q2Prof_lastQ_real, Q3_startingInv, Q4_startingInv, d3, U3_true, Q3_lastQ_x)

println("Realized Profits: ")
println(" - Prescription: \$", Q3Prof_pres_real)
println(" - Oracle:       \$", Q3Prof_oracle_real)
println(" - Last Quarter: \$", Q3Prof_lastQ_real)

LoadError: Result index of attribute MathOptInterface.ObjectiveValue(1) out of bounds. There are currently 0 solution(s) in the model.

In [92]:
# get quarter specific data
startingCap3 = endQ2profit_real

U3 = findKNN(trainQ3, testQ3, 5)

# run model for q1
endQ3profit, Q3_x = runModelPres(startingCap3, Q3_startingInv, Q4_startingInv, d3, U3);

Academic license - for non-commercial use only - expires 2022-08-19


In [93]:
endQ3profit_est

8.0190412060928475e6

In [94]:
# test against true U
startingCap3_oracle = endQ2profit_oracle_real
U3_true = testQ3[1,7]

endQ3profit_oracle, Q3_x_oracle = runModel(startingCap3_oracle, Q3_startingInv, Q4_startingInv, d3, U3_true);

Academic license - for non-commercial use only - expires 2022-08-19


In [95]:
endQ3profit_oracle

8.00972153324605e6

In [83]:
# test against prediction of U - just last quarter's U
startingCap3_lastQ = endQ2profit_lastQ_real

U3_lastQ = trainQ3[end,7]

endQ3profit_lastQ, Q3_x_lastQ = runModel(startingCap3_lastQ, Q2_startingInv, Q3_startingInv, d2, U2_lastQ);

Academic license - for non-commercial use only - expires 2022-08-19


In [84]:
endQ2profit_lastQ

5.408877475546335e6

### Q3. Check how plans perform in reality

In [85]:
endQ2profit_real = checkPlan(startingCap2, Q2_startingInv, Q3_startingInv, d2, U2_true, Q2_x)

Academic license - for non-commercial use only - expires 2022-08-19


5.419515856030204e6

In [86]:
endQ2profit_oracle_real = checkPlan(startingCap2_oracle, Q2_startingInv, Q3_startingInv, d2, U2_true, Q2_x_oracle)

Academic license - for non-commercial use only - expires 2022-08-19


5.43048352418605e6

In [87]:
endQ2profit_lastQ_real = checkPlan(startingCap2_lastQ, Q2_startingInv, Q3_startingInv, d2, U2_true, Q2_x_lastQ)

Academic license - for non-commercial use only - expires 2022-08-19


5.427998424199156e6

In [171]:
cass_data = deepcopy(trainQ1)
test_point = deepcopy(testQ1)
k=5


cass_mean = mean(cass_data[!, :avgCASS]) 
cass_std = std(cass_data[!, :avgCASS])

# normalize columns (except for year and quarter)
for col in names(cass_data)
    if col == "Year" || col == "Quarter"
        continue
    end
    cass_data[!, col] = (cass_data[!, col] .- mean(cass_data[!, col])) ./ std(cass_data[!, col])
    test_point[!, col] = (test_point[!, col] .- mean(cass_data[!, col])) ./ std(cass_data[!, col])
end




X = Array(select(cass_data[1:(size(cass_data)[1]), :], Not(:avgCASS)))'[3:end,:]
kd_tree = KDTree(X)
test_point = Array(select(test_point, Not(:avgCASS)))'[3:end,:]

id, dist = knn(kd_tree, test_point, k)
knn_predictions = cass_data[id[1],:].avgCASS
knn_predictions = knn_predictions .* cass_std .+ cass_mean

5-element Vector{Float64}:
 1.201
 1.179333333
 1.237666667
 1.262666667
 1.176

In [173]:
kd_tree

KDTree{StaticArrays.SVector{5, Float64}, Euclidean, Float64}
  Number of points: 113
  Dimensions: 5
  Metric: Euclidean(0.0)
  Reordered: true