# Translation of specification

These are notes on how we translate a specification of an optimisation problem for MOLA into an abstract Pyomo model that one can subsequently make concrete.



# Specification

This is the version 4 of the LP specification.

## Indices and sets

The index is a label that identifies an element in a set. For simplicity, we shall use the index to refer to the element that it indexes.

* $af \in AF$ is an index for an openLCA product flow imported from an openLCA database. This includes new flows that are defined by the user for the optimisation problem.
* $f\in F$ is an index for a user-defined flow.
* $f_m \in F_m \subset F$ is an index for a user-defined material flow (e.g. energy, material) to be considered in the optimisation problem.
* $f_s \in F_s \subset F$ is an index for an user-defined service flow (e.g. energy storage, transport) to be considered in the optimisation problem.
* $f_{t} \in F_{t} \subset F$ is an index for transport service flow i.e. transport mode \{road, train freight, air etc\}.
* $l\in L$ location defined by Latitude and Longitude.
* $e \in E$ is an elementary flow index for elementary flows imported from an openLCA database. A system process in the openLCA database is by definition broken down into a set of these elementary flows.
* $ap \in AP$ is an index for a process in the set of all processes contained in an openLCA database. This include user-defined processes specifically designed for the optimisation tool.
* $p\in P\subset AP$ is a process index for processes that make up the optimisation problem.
* $t \in T$ is the time interval by time discretization $\{t_1, t_2, t_3, t_4 \ldots t_n\}$.
* $k \in K$ a task index $k$ in the set of all task indices $K$.
* $d \in D$ an index for a demand $d$ in the set of demand indices $D$. 
* $akpi \in AKPI$ is an indexes for all key performance indicators $KPI$ in an openLCA database. This includes key performance indicators defined by the user of the optimisation tool that must be added to the openLCA database.
* $kpi \in KPI\subset AKPI$ is an index that identifies those performance indicators that the user wish to use in the optimisation problem.

## Parameters

### User-defined

* $C_{f_m, k, d, t}$ Conversion factor for material flows to generate per unit of demand product/services $d$ at task $s$, time $t$. If not defined default value is 0.

* $D_{d,k,t}$ Demand for final product/service $d$ for task $k$ at time $t$; If not defined, default value is 0.

* $D_{d,k}^{total}$ Total demand for final product/service over the whole optimisation time period over multiple tasks.

* $L_{f_m, f_s}$ Binary factor to link services flows to material flows e.g. energy storage, material storage, transport; If not defined, default value is 0.

* $X_{k, t}$ Longitude for where the material flow $f_m$ is transported in task $k$.

* $Y_{k, t}$ Latitude for where the material flow $f_m$ is transported in the task $k$.

* $d_{p,f_m, k, t}$ Total travel distance between process $p$ (where material flow $f_m$ is produced) and task $k$ (where material flow $f_m$ is transported to) at time $t$

$$
d_{p,f_m,k,t}=M(X_{k,t},Y_{k,t},X_{p, f_m}^I,Y^I_{p, f_m})
$$

where $M$ is a function that measures this distance e.g. Haversine - see http://www.movable-type.co.uk/scripts/latlong.html


* $\phi_{f, p, t}$ Cost co-efficient for material, service and transport flows $f$ produced by $p$ at time t.


### Imported from openLCA and model initialisation

* $Ef_{akpi, e}$ Environmental impact characterisation factor for elementary flow $e$ and performance indicator $akpi$.

* $EF_{e, f, p}$ Elementary flow $e$ to link with product flow $f$ through process $p$.

* $EI_{akpi, f, ap}$ Calculated environmental impact for product flow $𝑓$ through process $ap$ and performance indicator $kpi$.

* $X^I_{p,f_m}$ Longitude from the location table given in a process for material flow $f_m$.

* $Y^I_{p,f_m}$ Latitude from the location table given in the process for material flow $f_m$.

### Continuous variables


* $Obj_{kpi}$ Objective functions for the user-defined KPIs.
* $Flow_{f_m, p,k, t}$ is the flow of material $f_m$ produced by process $p$ to task $k$ at time $t$.
* $S_{f_s, k, t}$ is the temporary storage of service input flow $f_s$ at task $k$ and time $t$. **Perhaps call it $Service$ like $Flow$.**
* $SF_{f_s, p, k, t}$ Specific service flow (e.g. storage) through process $p$ in task $k$ at time $t$.
* $f_{f_m, p, f_t, k, t}$ Specific material and transport flow which is total quantity of materials $f_m$ produced in process $p$ transported through transport mode $f_t$ at task $k$ and time interval $t$ (unit: kg).
* $T_{f_t,k,t}$ Transport flow which is quantity times distance of all materials transported through the transport mode $f_t$ at task $k$ and time interval $t$ (unit: kg km).
* $t_{f_t,p,k,t}$ Specific transport flow which is quantity times distance of all materials transported through the transport mode $f_t$ and by process $p$ at task $k$ and time interval $t$ (unit: kg km).

## Objective function

Our objective is to minimise the environmental impact of elementary flows and the economic cost derived from a network of processes.

Consequently, the objective is for a fixed impact category $kpi$

$$
\min_D Obj_{kpi}
$$

and

$$
\min_D Obj_{cost}
$$

where the decision variables are defined by the set 

$$
D=\cup_{F,P,K,T}\{SF_{f_s, p, k, t}, f_{f_m,p,f_t,k,t}, t_{f_t, p, k, t}\}
$$

and represent the specific service flows, the specific material flows using a mode of transport, and the specific transport flows.

The environmental impact is the sum of the environmental impacts arising from material, service flows, and transport flows:

$$
Obj_{kpi} = \sum_{f_m, p_m, k, t} Flow_{f_m, p_m, k, t}EI_{kpi, f_m, p_m} + 
\sum_{f_s, p_s, k ,t} SF_{f_s, p_s, k, t}EI_{kpi, f_s, p_s} +
\sum_{f_t, p_t, k, t}t_{f_t, p_t, k, t}EI_{kpi,f_t,p_t}.
$$

The economic impact is the sum of the economic impacts arising from material, service and transport flows:

$$
Obj_{cost} = \sum_{f_m, p_m, k, t} Flow_{f_m, p_m, k, t}\phi_{f_m, p_m, t} +
\sum_{f_s, p_s, k, t} SF_{f_s, p_s, k, t}\phi_{f_s, p_s, t} +
\sum_{f_t, p_t, k, t}t_{f_t, p_t, k, t}\phi_{f_t, p_t, t}
$$

Here the environmental impact of flow $f\in F$ measured by impact factor $kpi$ is 

$$
EI_{kpi, f, p} = \sum_e Ef_{kpi, e}EF_{e, f, p}
$$

where the flow $f$ is the product flow for the process $p\in P$. Here $Ef_{kpi, e}$ denotes the impact factor indexed by impact category $kpi$ and environmental flow $e$ and $EF_{e, f, p}$ is the quantity of elementary flow generated by the product flow $f$ by process $p$. If $f\in F_m\cup F_s\cup F_t$ then the breakdown of flow into elementary flow amounts $EF_{e, f, p}$ must be calculated in openLCA by constructing a *system process*, which is then imported into the optimisation tool. Otherwise the flow is a product flow from an existing system process in openLCA so there already is a breakdown.


## Constraints

The binary parameter $L_{f_m, f_s}$ determines the linkage between storage service flow $S_{f_s, k,t}$ and the material 
storage flow $S_{f_m, k, t}$ at task $k$ and time $t$:

$$
S_{f_s,k,t} = \sum_{f_m} L_{f_m, f_s}S_{f_m, k, t},
$$

where $L_{f_m, f_s}$ is a binary parameter than links the service flow $f_s$ to the material flow $f_m$  in task $k$ at time $t$.

The service flow $S_{f_s, k,t}$ is the sum of the process specific service flows $SF_{f_m, p, k, t}$:

$$
S_{f_s,k,t} = \sum_{f_m, p_m} SF_{f_m, p_m, k, t},\quad \text{needs an $f_s$ here?}
$$

For any material flow $f_m$ produced by $p$, the total quantity $Flow_{f_m,p,k,t}$ is dependent on the quantity of $f_m$ transported through transport mode $f_t$ at task $k$ and time interval $t$, which is denoted by $f_{f_m,p_m,f_t,k,t}$. 

$$
Flow_{f_m,p_m,k,t}=\sum_{f_t} f_{f_m,p_m,f_t,k,t}
$$

The transport flow $T_{f_t,k,t}$ is defined by the quantity of $f_m$ transported via transport mode $f_t$ at task $k$ and time interval $t$ and the transport distance for shipping $f_m$ from initial production location $(X^I_{p,f_m},Y^I_{p,f_m})$ to final task location $X_{k,t}, Y_{k,t}$.

$$
T_{f_t, k, t} = \sum_{f_m, p_m} f_{f_m, p_m, f_t, k, t}d_{p, f_m, k, t}
$$

The transport flow is also the sum of the specific transport flows denoted by $t_{f_t, p_t, k, t}$:

$$
T_{f_t, k, t} = \sum_{p_t} t_{f_t, p_t, k, t}
$$

The sum of material flows over each production process and the conversion of temporary storage to material flow must satisfy demand

$$
\sum_{f_m, p} (Flow_{f_m, p, k, t} - S_{f_m, k, t} + S_{f_m, k, t-1})C_{f_m, k, d, t} \geq D_{d,k,t}
$$

The total material flow minus final storage must satisfy the total demand over the time horizon so

$$
\sum_{f_m,p,t} Flow_{f_m, p, k, t}C_{f_m, k, d, t}  \geq D_{d,k}^{total}.
$$

Finally, we require

$$
SF_{f_s, k, t} \geq 0,
$$

$$
f_{f_m, p, f_t, k, t} \geq 0,
$$

$$
t_{f_t, p, k, t} \geq 0.
$$


# Abstract Pyomo Model

The specification is translated into an abstract pyomo model.

In [1]:
from importlib import reload
import pandas as pd
from pyomo.environ import *
abstract_model = AbstractModel()

## Indices and sets

There is no instantiation of objects in the abstract model just placeholders for data. Some of the data must be supplied by the user via a GUI or programmatically and some must come from a data source.

### User-defined

The user needs to specify these sets.

In [2]:
abstract_model.F_m = Set(doc='Material flows to optimise')
abstract_model.F_s = Set(doc='Service flows to optimise')
abstract_model.F_t = Set(doc='Transport flows to optimise')
abstract_model.F = abstract_model.F_m | abstract_model.F_t | abstract_model.F_s
abstract_model.P = Set(doc='Processes in the optimisation problem')
abstract_model.T = Set(doc='Time intervals')
abstract_model.K = Set(doc='Tasks')
abstract_model.D = Set(doc='Demands')
abstract_model.KPI = Set(doc='Performance indicators for optimisation problem')

A DataPortal can be used to load user configuration from a JSON file to persist user configuration. There is a method in the Specification class that constructs an example user data set

### OpenLCA data

These sets are populated by reference ids from an openLCA database.

In [3]:
abstract_model.AF = Set(doc='All flows in openLCA database')
abstract_model.E = Set(doc='Elementary Flows in OpenLCA database')
abstract_model.AP = Set(doc='All processes from in OpenLCA database')
abstract_model.AKPI = Set(doc='All key performance indicators in an openLCA database')

To generate a new model instance the user first needs to specify an openLCA database. This will these sets so that the user can make choices. 

We show how a pyomo DataPortal can be used to populate the sets from an example database.

In [4]:
olca_dp = DataPortal()
db_file = '/mnt/disk1/data/openlca/sqlite/system/CSV_juice_ecoinvent_36_apos_lci_20200206_20201029-102818.sqlite'
olca_dp.load(filename=db_file, using='sqlite3', query="SELECT REF_ID FROM TBL_FLOWS", set=abstract_model.AF)
olca_dp.load(filename=db_file, using='sqlite3', query="SELECT REF_ID FROM TBL_FLOWS WHERE FLOW_TYPE='ELEMENTARY_FLOW'", set=abstract_model.E)
olca_dp.load(filename=db_file, using='sqlite3', query="SELECT REF_ID FROM TBL_PROCESSES", set=abstract_model.AP)
olca_dp.load(filename=db_file, using='sqlite3', query="SELECT REF_ID FROM TBL_IMPACT_CATEGORIES", set=abstract_model.AKPI)
model_instance = abstract_model.create_instance(olca_dp)

We shall also need flow/process names and categories and other data which is not part of the abstract model. This supplementary data is obtained using a query builder. For example, the following function builds a lookup table for mapping reference ids to names. The table for processes is shown and it contains the location of the process.

In [5]:
import mola.dataimport as di
import mola.dataview as dv
dbconn = di.get_sqlite_connection()
lookup = dv.get_lookup_tables(dbconn)
lookup['processes']

SELECT "REF_ID","NAME" FROM "TBL_FLOWS"
SELECT "REF_ID","NAME" FROM "TBL_CATEGORIES"
SELECT "TBL_PROCESSES"."REF_ID" "REF_ID","TBL_PROCESSES"."NAME" "PROCESS_NAME","TBL_LOCATIONS"."NAME" "LOCATION_NAME" FROM "TBL_PROCESSES" LEFT JOIN "TBL_LOCATIONS" ON CAST("TBL_PROCESSES"."F_LOCATION" AS INT)="TBL_LOCATIONS"."ID"
SELECT "REF_ID","NAME" FROM "TBL_FLOWS" WHERE "FLOW_TYPE"='PRODUCT_FLOW'
SELECT "TBL_IMPACT_METHODS"."NAME" "method_NAME","TBL_IMPACT_CATEGORIES"."REF_ID" "category_REF_ID","TBL_IMPACT_CATEGORIES"."NAME" "category_NAME" FROM "TBL_IMPACT_CATEGORIES" LEFT JOIN "TBL_IMPACT_METHODS" ON "TBL_IMPACT_CATEGORIES"."F_IMPACT_METHOD"="TBL_IMPACT_METHODS"."ID"


Unnamed: 0_level_0,PROCESS_NAME,LOCATION_NAME
REF_ID,Unnamed: 1_level_1,Unnamed: 2_level_1
59e8d600-0acc-465d-8e6f-e092f03b1e52,market for waste polyethylene terephthalate | ...,Lithuania
956ebeef-370e-34bb-ac83-11a10bba21d1,"ethylvinylacetate production, foil | ethylviny...",Rest-of-World
90695e4b-05f9-3f41-b03d-93bef747469a,"Mannheim process | sodium sulfate, anhydrite |...",Europe
a1e42a9d-2e86-351b-a803-49a2392eb820,"Mannheim process | hydrochloric acid, without ...",Europe
c9291945-f81f-3790-9a65-888287c32128,aluminium oxide factory construction | alumini...,Europe
...,...,...
f0aa9cd8-5773-39a3-9d19-fa80dc06b9fd,"treatment of blast furnace gas, in power plant...","China, Yunnan (云南)"
0c784bc1-9789-44e9-ba08-3653c3fe1ae1,"tap water production, conventional treatment |...",Peru
0a003052-1eb1-3140-b7a4-2fc080c4d5e5,"electricity production, photovoltaic, 3kWp sla...",SERC Reliability Corporation
48b9f68d-cb46-4fdd-a20b-95b5df78dfdf,Process Z,


## Parameters

### User-defined

These parameters must be specified by the user. Unlike sets, the user need to supply values or use functionality in the tool to
populate these parameters.

In [6]:
abstract_model.C = Param(abstract_model.F_m, abstract_model.K, abstract_model.D, abstract_model.T,
                         doc='Conversion factor for material flows')
abstract_model.Demand = Param(abstract_model.D, abstract_model.K, abstract_model.T)
abstract_model.Total_Demand = Param(abstract_model.D, abstract_model.K)
abstract_model.L = Param(abstract_model.F_m, abstract_model.F_t)
abstract_model.X = Param(abstract_model.K, abstract_model.T)
abstract_model.Y = Param(abstract_model.K, abstract_model.T)
abstract_model.d = Param(abstract_model.P, abstract_model.F_m, abstract_model.K, abstract_model.T)
abstract_model.phi = Param(abstract_model.F, abstract_model.P, abstract_model.T)

### OpenLCA data

We cannot just load these parameters when the database is specified because they depend on user input. They are completed in the model build phase after the user supplies the relevant sets.

In [7]:
abstract_model.Ef = Param(abstract_model.KPI, abstract_model.E)
abstract_model.EF = Param(abstract_model.E, abstract_model.F, abstract_model.P)
def ei_rule(model, kpi, f, p):
    return sum(model.Ef[kpi, e]*model.EF[e, f, p] for e in model.E)
abstract_model.EI = Param(abstract_model.KPI, abstract_model.F, abstract_model.P, rule=ei_rule)
abstract_model.XI = Param(abstract_model.P, abstract_model.F_m)
abstract_model.YI = Param(abstract_model.P, abstract_model.F_m)

## Variables

These are defined in the abstract model to correspond to the specification, but we are likely to use linked sets to decrease the amount of redundancy.

In [8]:
abstract_model.Flow = Var(abstract_model.F_m, abstract_model.P, abstract_model.K, abstract_model.T)
abstract_model.Storage = Var(abstract_model.F_s, abstract_model.K, abstract_model.T)
abstract_model.Service_Flow = Var(abstract_model.F_s, abstract_model.P, abstract_model.K, abstract_model.T)
abstract_model.Specific_Material_Transport_Flow = Var(abstract_model.F_m, abstract_model.P, abstract_model.F_t, 
                                             abstract_model.K, abstract_model.T)
abstract_model.Transport_Flow = Var(abstract_model.F_t, abstract_model.K, abstract_model.T)
abstract_model.Specific_Transport_Flow = Var(abstract_model.F_t, abstract_model.P, abstract_model.K, abstract_model.T)

## Objective

There are two types of objective in the abstract model. They are constructed at build time. The user needs to supply weights for each objective.

In [9]:
def environment_objective_rule(model, kpi):
    return sum(model.Flow[fm, p, k, t]*model.EI[kpi, fm, p]
               for fm in model.F_m for p in model.P for k in model.K for t in model.T) + \
            sum(model.Service_Flow[fs, p, k, t] * model.EI[kpi, fs, p]
                for fs in model.F_s for p in model.P for k in model.K for t in model.T) + \
            sum(model.Specific_Transport_Flow[ft, p, k, t] * model.EI[kpi, ft, p]
                for ft in model.F_t for p in model.P for k in model.K for t in model.T)

def cost_objective_rule(model):
    return sum(model.Flow[fm, p, k, t]*model.phi[fm, p, t]
               for fm in model.F_m for p in model.P for k in model.K for t in model.T) + \
            sum(model.Service_Flow[fs, p, k, t] * model.phi[fs, p, t]
                for fs in model.F_s for p in model.P for k in model.K for t in model.T) + \
            sum(model.Specific_Transport_Flow[ft, p, k, t] * model.phi[ft, p, t]
                for ft in model.F_t for p in model.P for k in model.K for t in model.T)

abstract_model.obj1 = Objective(abstract_model.KPI, rule=environment_objective_rule)
abstract_model.obj2 = Objective(rule=cost_objective_rule)
abstract_model.obj1.pprint()
abstract_model.obj2.pprint()

obj1 : Size=0, Index=KPI, Active=True
    Not constructed
obj2 : Size=0, Index=None, Active=True
    Not constructed


## Constraints

The constraints are determined at build time from information supplied by the user. One of the abstract constraints is shown below.

In [10]:
def flow_demand_rule(model, d, k):
    total_demand = sum(
        model.Flow[fm, pm, k, t] * model.C[fm, k, d, t] for fm in model.F_m for pm in model.P for t in model.T)
    return total_demand >= model.Total_Demand[d, k]
abstract_model.total_demand_constraint = Constraint(
    abstract_model.D, abstract_model.K, rule=flow_demand_rule)
abstract_model.total_demand_constraint.pprint()

total_demand_constraint : Size=0, Index=total_demand_constraint_index, Active=True
    Not constructed


# User interface

The mola specification module contains a class called ScheduleSpecification that contains the above abstract model. Indices to variables and parameters also define sets in pyomo.

In [11]:
import mola.specification4 as ms
spec = ms.ScheduleSpecification()
spec.abstract_model.pprint()

ModuleNotFoundError: No module named 'mola.specification'

The module contains a function to generate lookup tables that we can use to build a user interface. For example, we can
populate a widget with all material flows and then ask a user to select a subset.

In [None]:
import mola.dataview as dv
import mola.dataimport as di
conn = di.get_sqlite_connection()
lookup = dv.get_lookup_tables(conn)

## Sets 

The mola.widget module contains a function to generate the notebook widget.

In [None]:
import mola.widgets as mw
fw = mw.get_set(lookup['flows'])

If the user selects flows then they are assigned to $F_m$ otherwise defaults are used.

In [None]:
if len(fw.df) > 0:
    F_m = fw.df.index.to_list()
else:
    F_m = ['f1', 'f2']
F_m

We can then ask for the impact category. You can search on method and category.

In [None]:
ic = mw.get_set(lookup['impact_categories'])

Again, the $KPI$ set is defined or a default selected.

In [None]:
if len(ic.df) > 0:
    KPI = ic.df.index.to_list()
else:
    KPI = ['061b7db5-4f56-3368-bf50-9ff0fcc8dd1f']
KPI

## Dummy data

We need some dummy data to do a complete instantiation of the abstract model. The Specificaton class has a method to return a suitable set of data. The model parameters are dependent on the sets that the user defines so the parameters need to be consistent. 

In practice, we want to persist the model so we store user data in a JSON file. The dummy data is shown below.

In [None]:
import json
user_data = spec.get_dummy_data({'F_m': F_m})
json_file = 'user_data.json' 
with open(json_file, 'w') as fp:
    json.dump(user_data, fp)
user_data

## Parameters

Given the dummy set data, we can ask for parameter values from the user. For example, we can request the `Total Demand` which depends on the sets $D$ and $K$.

In [None]:
import qgrid
param_dfr = spec.get_param_dfr('user_data.json')
unnested_param_dfr = ms.unnest(param_dfr, ['Index', 'Value'])
param_qg = qgrid.show_grid(unnested_param_dfr, grid_options={'maxVisibleRows': 10})
param_qg

In [None]:
dfr = param_qg.get_changed_df()[['Param', 'Index', 'Value']]
dfr.set_index('Param').to_dict('split')
def x(l):
    return {'index': l[0], 'value':l[1]}
updated_parameters_dict = dfr.groupby('Param')[['Index','Value']].apply(lambda g: list(map(x, g.values.tolist()))).to_dict()
updated_parameters_dict

We need to return the data to the JSON file.

In [None]:
with open(json_file) as fp:
    json_data = json.load(fp)
json_data.update(updated_parameters_dict)
with open(json_file, 'w') as fp:
    json.dump(json_data, fp)
json_data


# Build Phase

In this phase we add data to the abstract model so generate a concrete model instance. The `populate` method in the Specification class uses a database and a JSON file and returns the concrete model. 

In [None]:
db_file = '/mnt/disk1/data/openlca/sqlite/system/CSV_juice_ecoinvent_36_apos_lci_20200206_20201029-102818.sqlite'
model_instance = spec.populate(db_file, json_file)
model_instance.pprint()

## Sets 
We can examine the contents of the populated sets in the model instance.

In [None]:
sets_dfr = pd.DataFrame(
    ([v.name, v.doc, len(v)] for v in model_instance.component_objects(Set, active=True)),
    columns=['Set', 'Description', 'Number of elements']
)
sets_dfr

## Parameters

The built model parameters are shown below. these reflect the dummy data set that we are using as well as any user configuration.

In [None]:
import qgrid
param_dfr = pd.DataFrame(
    ([o.name, o.doc, len(o), [index for index in o], [value(o[index]) for index in o]] for o in model_instance.component_objects(Param, active=True)),
    columns=['Param', 'Description', 'Number of elements', 'Dimension', 'Value']
)
qgrid.show_grid(param_dfr)

## Constraints

We can also see the concrete constraints.

In [None]:
import qgrid
dfr = pd.DataFrame(
    ([v.name, v.expr] for v in model_instance.component_data_objects(Constraint, active=True)),
    columns=['Constraint', 'Expression']
)
qgrid.show_grid(dfr)

## Objectives

We can also see the concrete objectives. These will need to be either activated or weight summed before solution before
solution because pyomo only supports a single objective function.

In [None]:
dfr = pd.DataFrame(
    ([v.name, v.expr] for v in model_instance.component_data_objects(Objective, active=True)),
    columns=['Objective', 'Expression']
)

qgrid.show_grid(dfr)

We activate the first objective.

In [None]:
model_instance.obj2.deactivate()
model_instance.obj1.activate()

# Apply Solver

In [None]:
opt = SolverFactory("glpk")
results = opt.solve(model_instance)
results.write()