# Homework 1

This Notebook will walk you through defining a simple transport flow model and then ask you to interact with the solutions and modify to model to add additional constraints...

## Setting up the model

### Load packages

In [1]:
using JuMP, Clp, DataFrames, CSV;

### Define sets

We will define two sets, both as arrays of strings

***Production plants, $P$***

In [2]:
P=["trenton", "newark"] # production plants

2-element Array{String,1}:
 "trenton"
 "newark"

***Markets for products, $M$***

In [3]:
M=["newyork", "princeton", "philadelphia"] # markets for products

3-element Array{String,1}:
 "newyork"
 "princeton"
 "philadelphia"

Note that sets can also be defined over intervals (as in `i=1:10`) or numerical vectors (as in `x=[2, 4, 5, 11]`) 

### Define parameters

We'll make use of the defined sets as indexes for our parameters...

***Plant production capacities***

In [4]:
plants = DataFrame(plant=P, capacity=[350,650])

Unnamed: 0_level_0,plant,capacity
Unnamed: 0_level_1,String,Int64
1,trenton,350
2,newark,650


***Demand for products***

Stored in a [DataFrame](https://juliadata.github.io/DataFrames.jl/stable/)

In [5]:
markets = DataFrame(
    market=M, 
    demand=[325, 300, 275]
)

Unnamed: 0_level_0,market,demand
Unnamed: 0_level_1,String,Int64
1,newyork,325
2,princeton,300
3,philadelphia,275


A few different ways to index into our DataFrames to access parameters (all of the below are equivalent)

In [6]:
plants[plants.plant.=="newark",:capacity] # option 1

1-element Array{Int64,1}:
 650

In [7]:
plants[plants.plant.=="newark",:].capacity # option 2

1-element Array{Int64,1}:
 650

In [8]:
plants.capacity[plants.plant.=="newark"] # option 3

1-element Array{Int64,1}:
 650

In [9]:
plants[:,:capacity][plants.plant.=="newark"] # option 4

1-element Array{Int64,1}:
 650

Note that DataFrame indexing returns an Array by default, in this case, a 1-element Array of type Int64 (64-bit integer), as indicated by `Array{Int64,1}` above. 

To access the single Int64 value, append `[1]` to any of the above to reference the first (and only) element in this array. 

In [10]:
plants.capacity[plants.plant.=="newark"][1]

650

In [11]:
typeof(plants.capacity[plants.plant.=="newark"][1])

Int64

In [12]:
typeof(plants.capacity[plants.plant.=="newark"])

Array{Int64,1}

***Distance from plants to markets***

Stored in a JuMP [DenseAxisArray](https://jump.dev/JuMP.jl/v0.19/containers/) with data array and symbolic references across each of our sets (plants and markets), converted to Symbols for referencing

In [13]:
# two dimensional symbolic DenseAxisArray, with from/to distance pairs
distances = JuMP.Containers.DenseAxisArray(
    [2.5 0.5 1.5;
     0.5 1.5 3.5],
    Symbol.(P),
    Symbol.(M),
)

2-dimensional DenseAxisArray{Float64,2,...} with index sets:
    Dimension 1, [:trenton, :newark]
    Dimension 2, [:newyork, :princeton, :philadelphia]
And data, a 2×3 Array{Float64,2}:
 2.5  0.5  1.5
 0.5  1.5  3.5

A couple example references to our DenseAxisArray to access parameters...

In [14]:
distances[:trenton, :newyork] #example of distance references

2.5

In [15]:
distances[:newark, :newyork] #example of distance references

0.5

In [16]:
distances[Symbol(P[2]),Symbol(M[1])] # another way to find distance from newark to trenton

0.5

In [17]:
distances[Symbol("newark"), Symbol("newyork")] # and a third...

0.5

***Costs of transport***

In [18]:
freight_cost = 90 # Cost of freight shipment per unit of distance

90

### Create model
(and specify the Clp solver)

In [19]:
transport = Model(Clp.Optimizer)

A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: EMPTY_OPTIMIZER
Solver name: Clp

### Define variables

***Quantities of product to transport from plant $p \in P$ to market $m \in M$***

In [20]:
# Defines 6 new variables - one for each combination of plant - market
@variable(transport, X[P,M] >= 0)

2-dimensional DenseAxisArray{VariableRef,2,...} with index sets:
    Dimension 1, ["trenton", "newark"]
    Dimension 2, ["newyork", "princeton", "philadelphia"]
And data, a 2×3 Array{VariableRef,2}:
 X[trenton,newyork]  X[trenton,princeton]  X[trenton,philadelphia]
 X[newark,newyork]   X[newark,princeton]   X[newark,philadelphia]

Example reference to single quantity decision variable, the quantity shipped from Newark to Philadelphia:

In [21]:
X["newark","philadelphia"]

X[newark,philadelphia]

### Define constraints

***Supply capacity constraint***
- Need to make sure that the sum of supply across the three markets (for a given plant) is not greater than it's capacity

In [22]:
@constraint(transport, cSupply[p in P], 
    sum(X[p,m] for m in M) 
    <= plants.capacity[plants.plant.==p][1])

1-dimensional DenseAxisArray{ConstraintRef{Model,MathOptInterface.ConstraintIndex{MathOptInterface.ScalarAffineFunction{Float64},MathOptInterface.LessThan{Float64}},ScalarShape},1,...} with index sets:
    Dimension 1, ["trenton", "newark"]
And data, a 2-element Array{ConstraintRef{Model,MathOptInterface.ConstraintIndex{MathOptInterface.ScalarAffineFunction{Float64},MathOptInterface.LessThan{Float64}},ScalarShape},1}:
 cSupply[trenton] : X[trenton,newyork] + X[trenton,princeton] + X[trenton,philadelphia] ≤ 350.0
 cSupply[newark] : X[newark,newyork] + X[newark,princeton] + X[newark,philadelphia] ≤ 650.0

***Demand balance constraint***

Ensure all demand is satisfied at each market
- This means ensuring that the sum of supply in a given market is equal to it's demand


In [23]:
@constraint(transport, cDemand[m in M], 
    sum(X[p,m] for p in P) 
    >= markets.demand[markets.market.==m][1])

1-dimensional DenseAxisArray{ConstraintRef{Model,MathOptInterface.ConstraintIndex{MathOptInterface.ScalarAffineFunction{Float64},MathOptInterface.GreaterThan{Float64}},ScalarShape},1,...} with index sets:
    Dimension 1, ["newyork", "princeton", "philadelphia"]
And data, a 3-element Array{ConstraintRef{Model,MathOptInterface.ConstraintIndex{MathOptInterface.ScalarAffineFunction{Float64},MathOptInterface.GreaterThan{Float64}},ScalarShape},1}:
 cDemand[newyork] : X[trenton,newyork] + X[newark,newyork] ≥ 325.0
 cDemand[princeton] : X[trenton,princeton] + X[newark,princeton] ≥ 300.0
 cDemand[philadelphia] : X[trenton,philadelphia] + X[newark,philadelphia] ≥ 275.0

### Define objective function

Minimize total cost of transport to satisfy all demand.

First we'll define an expression for total cost of shipments...

In [24]:
@expression(
    transport, # Model name 
    eCost,     # Expression name 
    sum(freight_cost*distances[Symbol(p),Symbol(m)]*X[p,m] 
        for p in P, m in M) # Expression formula 
    )

225 X[trenton,newyork] + 45 X[trenton,princeton] + 135 X[trenton,philadelphia] + 45 X[newark,newyork] + 135 X[newark,princeton] + 315 X[newark,philadelphia]

Now we'll minimize this total cost

In [25]:
@objective(transport, Min, eCost)

225 X[trenton,newyork] + 45 X[trenton,princeton] + 135 X[trenton,philadelphia] + 45 X[newark,newyork] + 135 X[newark,princeton] + 315 X[newark,philadelphia]

## Interact with the model

**(a)** Now let's solve the model. In the blank cell below, enter the command for JuMP to solve a model and run the cell

In [26]:
optimize!(transport)

Coin0506I Presolve 5 (0) rows, 6 (0) columns and 12 (0) elements
Clp0006I 0  Obj 0 Primal inf 900 (3)
Clp0006I 4  Obj 85500
Clp0000I Optimal - objective value 85500
Clp0032I Optimal objective 85500 - 4 iterations time 0.002


**(b)** You've got a solution. Now query the objective function in the empty cell below and save it to a variable (name of your choice)

In [27]:
vCost = value.(eCost)

85500.0

**(c)** Now query and save the optimal solution for X (the decisions about shipment quantities from plant to market) to an Array or DataFrame

In [28]:
results = DataFrame(
    value.(X).data
)
rename!(results, Symbol.(M))
insert!(results, 1, P, :Plant)

Unnamed: 0_level_0,Plant,newyork,princeton,philadelphia
Unnamed: 0_level_1,String,Float64,Float64,Float64
1,trenton,0.0,75.0,275.0
2,newark,325.0,225.0,0.0


#### Sense check of the result. 
- The optimum value of cost, $vCost$, should equal the sum of the distances * supply * freight_cost

In [29]:
sum([freight_cost * (distances.data[i,j] * value.(X).data[i,j]) 
        for i in 1:2, j in 1:3]
    )     == vCost

true

**(d)** Please interpret your results by writing an explanation in the markdown cell below. 

Which facility or facilities supplies the most demand in New York? Does this result make sense? Why?

Which facility or facilities supplies the most demand in Philadelphia? Does this result make sense? Why?

Which facility or facilities supplies the demand in Princeton? Does this result make sense? Why?

### General point
- The intuition here is that where possible, a city will take all of it's product from the closest product plant. 
- However, if it's not possible to fully satisfy each city's demand using it's closest plant, we will want to only split the power supply for the city that's (most) in between the two plants. 

### New york
- In New York, all of the supply is provided by the Newark power station. 
    - This makes sense, because the Newark plant is only 0.5 units away from New York, whilst the Trenton plant is 2.5 units away.
    
### Philadelphia
- In Philadelphia, all of the supply is provided by the Newark plant. 
    - This makes sense, because the Newark plant is 1.5 units away from Philadelphia, whilst the Trenton plant is only 3.5 units away.

### Princeton
- In Princeton, the product supply is split. Princeton recieves 75 units of power from Trenton, and its remaining 225 unit demand from Newark.
    - The intuition for this is:
        - Princeton is closer to Trenton than it is to Newark. 
        - Therefore, it obtains as much supply as possible from the Trenton power station. 
        - Once the Trenton plant is at capacity (350), it obtains the rest of its supply from the Newark plant.
 

**(d)** A new market in New Brunswick appears, with a demand for 50 units. It is located 1.0 units away from both plants. Add this market to the model and solve again.

In [30]:
# Initialise new model. Note, production plants and freight costs 
# are not re-defined, as they haven't changed.

transport_enlarged = Model(Clp.Optimizer)

# Add to list of markets
M_enlarged = ["newyork", "princeton", "philadelphia", "newbrunswick"] 

# Put into a dataframe, incorporate demand information
markets_enlarged = DataFrame(
    market=M_enlarged, 
    demand=[325, 300, 275, 50]
)

# Record distances in a JuMP container
distances_enlarged = JuMP.Containers.DenseAxisArray(
    [2.5 0.5 1.5 1;
     0.5 1.5 3.5 1],
    Symbol.(P),
    Symbol.(M_enlarged), 
)

# Set up variables
@variable(transport_enlarged, X_e[P,M_enlarged] >= 0)

# Define constraints. 

# Firstly, the capacity constraint
@constraint(transport_enlarged, cSupply_E[p in P], 
    sum(X_e[p,m] for m in M_enlarged) 
    <= plants.capacity[plants.plant.==p][1])

# Secondly, the market clearing constraint 
@constraint(transport_enlarged, cDemand_e[m in M_enlarged], 
    sum(X_e[p,m] for p in P) 
    >= markets_enlarged.demand[markets_enlarged.market.==m][1])

# Set up the new expression to optimize, that includes newbrunswick
@expression(
    transport_enlarged, # Model name 
    eCost_Enlarged,     # Expression name 
    sum(freight_cost*distances_enlarged[Symbol(p),Symbol(m)]*X_e[p,m] 
        for p in P, m in M_enlarged) # Expression formula 
    )
@objective(transport_enlarged, Min, eCost_Enlarged);

In [31]:
# Print the summary of the model, to check it's what we expect... 
print(transport_enlarged)

Min 225 X_e[trenton,newyork] + 45 X_e[trenton,princeton] + 135 X_e[trenton,philadelphia] + 90 X_e[trenton,newbrunswick] + 45 X_e[newark,newyork] + 135 X_e[newark,princeton] + 315 X_e[newark,philadelphia] + 90 X_e[newark,newbrunswick]
Subject to
 cDemand_e[newyork] : X_e[trenton,newyork] + X_e[newark,newyork] ≥ 325.0
 cDemand_e[princeton] : X_e[trenton,princeton] + X_e[newark,princeton] ≥ 300.0
 cDemand_e[philadelphia] : X_e[trenton,philadelphia] + X_e[newark,philadelphia] ≥ 275.0
 cDemand_e[newbrunswick] : X_e[trenton,newbrunswick] + X_e[newark,newbrunswick] ≥ 50.0
 cSupply_E[trenton] : X_e[trenton,newyork] + X_e[trenton,princeton] + X_e[trenton,philadelphia] + X_e[trenton,newbrunswick] ≤ 350.0
 cSupply_E[newark] : X_e[newark,newyork] + X_e[newark,princeton] + X_e[newark,philadelphia] + X_e[newark,newbrunswick] ≤ 650.0
 X_e[trenton,newyork] ≥ 0.0
 X_e[newark,newyork] ≥ 0.0
 X_e[trenton,princeton] ≥ 0.0
 X_e[newark,princeton] ≥ 0.0
 X_e[trenton,philadelphia] ≥ 0.0
 X_e[newark,philadelph

In [32]:
optimize!(transport_enlarged)

Coin0506I Presolve 6 (0) rows, 8 (0) columns and 16 (0) elements
Clp0006I 0  Obj 0 Primal inf 950 (4)
Clp0006I 5  Obj 90000
Clp0000I Optimal - objective value 90000
Clp0032I Optimal objective 90000 - 5 iterations time 0.002


**(e)** What is new optimal solution? 

In [33]:
vCost_Enlarged = value.(eCost_Enlarged)

90000.0

In [34]:
# Print a dataframe of where each market gets its supply, after we add in 
results_enlarged = DataFrame(
    value.(X_e).data
)
rename!(results_enlarged, Symbol.(M_enlarged))
insert!(results_enlarged, 1, P, :Plant)

Unnamed: 0_level_0,Plant,newyork,princeton,philadelphia,newbrunswick
Unnamed: 0_level_1,String,Float64,Float64,Float64,Float64
1,trenton,0.0,75.0,275.0,0.0
2,newark,325.0,225.0,0.0,50.0


**(f)** Interpret this result in the markdown cell below. Which facility or facilities supplies the demand in New Brunswick? Does this result make sense? Why?

- Introducing New Brunswick (NB) to our market increases the total cost faced by the system by 4500. This is just NB's demand (50) multiplied by the distance to supply it (1 for both plants) multiplied by freight cost per unit distance (90). 
    - This can be confirmed by checking $vCost\_Enlarged - vCost = 4500$ (see cell below)
- All of the demand added by New Brunswick is supplied by the Newark plant. 
    - The reason for this is:
        - New Brunswick is an equal distance from both plants. Therefore, the choice of which plant supplies it doesn't make any difference to the total cost of supplying the system.
            - Therefore, the optimal choice of plant can just maximise the efficiency of the rest of the system, and then allocate remaining capacity to NB arbitrarily. 
        - Since the Trenton plant is already at full capacity (as both Philadelphia and Princeton are significantly closer to Trenton than they are to Newark), NB's capacity is supplied completely by Newark.

In [35]:
vCost_Enlarged - vCost == 4500

true