## Module 4a: Parameter Estimation Using Flash Unit Model

In this module, we will be using Pyomo's `parmest` tool in conjuction with IDAES models for parameter estimation. We demonstrate these tools by estimating the parameters associated with the NRTL property model for a benzene-toluene mixture. The NRTL model has 2 sets of parameters: the non-randomness parameter (`alpha_ij`) and the binary interaction parameter (`tau_ij`), where `i` and `j` is the pure component species. In this example, we will be only estimate the binary interaction parameter (`tau_ij`) for a given dataset. When estimating parameters associated with the property package, IDAES provides the flexibility of doing the parameter estimation by just using the state block or by using a unit model with a specified property package. This module will demonstrate parameter estimation by using the flash unit model with the NRTL property package. 

We will complete the following tasks:
* Set up a method to return an initialized model
* Set up the parameter estimation problem using `parmest`
* Analyze the results
* Demonstrate advanced features from `parmest`

## Key links to documentation:
* NRTL Model - https://idaes-pse.readthedocs.io/en/latest/model_libraries/core_library/property_models/activity_coefficient.html
* parmest - https://pyomo.readthedocs.io/en/stable/contributed_packages/parmest/index.html



<div class="alert alert-block alert-info">
<b>Inline Exercise:</b>
import `ConcreteModel` from Pyomo, `FlowsheetBlock` and `Flash` from IDAES. 
</div>

In [1]:
# Todo: import ConcreteModel from pyomo.environ
from pyomo.environ import ConcreteModel, value

# Todo: import FlowsheetBlock from idaes.core
from idaes.core import FlowsheetBlock

# Todo: import Flash unit model from idaes.generic_models.unit_models
from idaes.generic_models.unit_models import Flash


In the next cell, we will be importing the parameter block that we will be using in this module and the idaes logger. 

In [2]:
from idaes.generic_models.properties.activity_coeff_models.\
    BTX_activity_coeff_VLE import BTXParameterBlock
import idaes.logger as idaeslog

In the next cell, we import `parmest` from Pyomo and the `pandas` package. We need `pandas` as `parmest` uses `pandas.dataframe` for handling the input data and the results.

In [3]:
import pyomo.contrib.parmest.parmest as parmest
import pandas as pd

## Setting up an initialized model

We need to provide a method that returns an initialized model to the `parmest` tool in Pyomo.

<div class="alert alert-block alert-info">
<b>Inline Exercise:</b>
Using what you have learned from previous modules, fill in the missing code below to return an initialized IDAES model. 
</div>

In [4]:
def NRTL_model(data):
    
    #Todo: Create a ConcreteModel object
    m = ConcreteModel()
    
    #Todo: Create FlowsheetBlock object
    m.fs = FlowsheetBlock(default={"dynamic": False})
    

    #Todo: Create a properties parameter object with the following options:
    # "valid_phase": ('Liq', 'Vap')
    # "activity_coeff_model": 'NRTL'
    m.fs.properties = BTXParameterBlock(default={"valid_phase":
                                                 ('Liq', 'Vap'),
                                                 "activity_coeff_model":
                                                 'NRTL'})
    m.fs.flash = Flash(default={"property_package": m.fs.properties})

    # Initialize at a certain inlet condition
    m.fs.flash.inlet.flow_mol.fix(1)
    m.fs.flash.inlet.temperature.fix(368)
    m.fs.flash.inlet.pressure.fix(101325)
    m.fs.flash.inlet.mole_frac_comp[0, "benzene"].fix(0.5)
    m.fs.flash.inlet.mole_frac_comp[0, "toluene"].fix(0.5)

    # Set Flash unit specifications
    m.fs.flash.heat_duty.fix(0)
    m.fs.flash.deltaP.fix(0)

    # Fix NRTL specific variables
    # alpha values (set at 0.3)
    m.fs.properties.\
        alpha["benzene", "benzene"].fix(0)
    m.fs.properties.\
        alpha["benzene", "toluene"].fix(0.3)
    m.fs.properties.\
        alpha["toluene", "toluene"].fix(0)
    m.fs.properties.\
        alpha["toluene", "benzene"].fix(0.3)

    # initial tau values
    m.fs.properties.\
        tau["benzene", "benzene"].fix(0)
    m.fs.properties.\
        tau["benzene", "toluene"].fix(0.1690)
    m.fs.properties.\
        tau["toluene", "toluene"].fix(0)
    m.fs.properties.\
        tau["toluene", "benzene"].fix(-0.1559)

    # Initialize the flash unit
    m.fs.flash.initialize(outlvl=idaeslog.INFO_LOW)

    # Fix at actual temperature
    m.fs.flash.inlet.temperature.fix(float(data["temperature"]))

    # Set bounds on variables to be estimated
    m.fs.properties.\
        tau["benzene", "toluene"].setlb(-5)
    m.fs.properties.\
        tau["benzene", "toluene"].setub(5)

    m.fs.properties.\
        tau["toluene", "benzene"].setlb(-5)
    m.fs.properties.\
        tau["toluene", "benzene"].setub(5)

    # Return initialized flash model
    return m


In [5]:
from idaes.core.util.model_statistics import degrees_of_freedom
import pytest

# Testing the initialized model
test_data = {"temperature": 368}

m = NRTL_model(test_data)

# Check that degrees of freedom is 0
assert degrees_of_freedom(m) == 0

# Check for output values
assert value(m.fs.flash.liq_outlet.mole_frac_comp[0, 'benzene']) == pytest.approx(0.4105, abs=1e-3)
assert value(m.fs.flash.vap_outlet.mole_frac_comp[0, 'benzene']) == pytest.approx(0.6326, abs=1e-3)

assert value(m.fs.flash.liq_outlet.mole_frac_comp[0, 'toluene']) == pytest.approx(0.5895, abs=1e-3)
assert value(m.fs.flash.vap_outlet.mole_frac_comp[0, 'toluene']) == pytest.approx(0.3673, abs=1e-3)

## Parameter estimation using parmest

In addition to providing a method to return an initialized model, the `parmest` tool needs the following:

* List of variable names to be estimated
* Dataset with multiple scenarios
* Expression to compute the sum of squared errors



In this example, we only estimate the binary interaction parameter (`tau_ij`). Given that this variable is usually indexed as `tau_ij = Var(component_list, component_list)`, there are 2*2=4 degrees of freedom. However, when i=j, the binary interaction parameter is 0. Therefore, in this problem, we estimate the binary interaction parameter for the following variables only:

* fs.properties.tau['benzene', 'toluene']
* fs.properties.tau['toluene', 'benzene']

<div class="alert alert-block alert-info">
<b>Inline Exercise:</b>
Create a list called `variable_name` with the above-mentioned variables declared as strings.
</div>

In [6]:
# Todo: Create a list of vars to estimate
variable_name = ["fs.properties.tau['benzene', 'toluene']",
                 "fs.properties.tau['toluene', 'benzene']"]


Pyomo's `parmest` tool supports the following data formats:
- pandas dataframe
- list of dictionaries
- list of json file names.

Please see the documentation for more details. 

For this example, we load data from the csv file `BT_NRTL_dataset.csv`. The dataset consists of fifty data points which provide the mole fraction of benzene in the vapor and liquid phase as a function of temperature. 

In [7]:
# Load all data from csv
data = pd.read_csv('BT_NRTL_dataset.csv')

# Display the dataset
#display(data)

# Split the data set into two data sets
data_subset_1 = data.loc[0:24]
#display(data_subset_1)

data_subset_2 = data.loc[25:49].reset_index()
#display(data_subset_2)

type(data_subset_2)

pandas.core.frame.DataFrame

In [8]:
from idaes.dmf import DMF
from idaes.dmf.resource import Resource
from idaes.dmf.resource import Triple, PR_DERIVED, create_relation_args
from idaes.dmf import magics
#_dmf = DMF(create=True, path=".")
_dmf = DMF(create=False, path="../workspace")



In [9]:
%dmf init ../workspace



*Success!* Using workspace at "../workspace"

In [10]:
ds = _dmf.find_one(name="BT NRTL dataset")

if not ds:
    ds = _dmf.new(file="BT_NRTL_dataset.csv", name="BT NRTL dataset")
else:
    _dmf.attach(ds)

ds1 = _dmf.find_one(name="BT NRTL split1")
if not ds1:
    data_subset_1.to_csv('BT_NRTL_dataset_split1.csv')
    ds1 = _dmf.new(file="BT_NRTL_dataset_split1.csv", name="BT NRTL split1")
    create_relation_args(ds1, PR_DERIVED, ds)
else:
    _dmf.attach(ds1) # attach to instance, for update()

ds2 = _dmf.find_one(name="BT NRTL split2")
if not ds2:
    data_subset_2.to_csv('BT_NRTL_dataset_split2.csv')
    ds2 = _dmf.new(file="BT_NRTL_dataset_split2.csv", name="BT NRTL split2")
    create_relation_args(ds2, PR_DERIVED, ds)
else:
    _dmf.attach(ds2)  # attach to instance, for update()

# save relations
_dmf.update()
print("done")

done


In [14]:
%dmf list

| ID | Name(s) | Type | Modified | Description | 
| -- | ------- | ---- | -------- | ----------- |
| 2a45ce6c4e5245aea1fff8bbcab4a371 | NRTL dataset | data | 1591557051.431507 | BT_NRTL_dataset.csv |
| b13aa0e477634252866b722ceb84056a | NRTL dataset split1 | data | 1591557051.445081 | BT_NRTL_dataset_split1.csv |
| 105ba24f35c9408096cf43c661da401f | NRTL dataset split2 | data | 1591557051.449112 | BT_NRTL_dataset_split2.csv |
| 9fed0506f18844029b51d05a1bb784aa | NRTL results split1 | other | 1591557383.286974 | Solution for data subset 1 |
| c9399169e43242d9bf78ceac0fb7ddc4 | NRTL results split2 | other | 1591557383.286974 | Solution for data subset 2 |
| 701c9c12ddea4eb8a7f43c42d857e7cd |  | data | 1591743197.994207 | BT_NRTL_dataset.csv |
| 0ce84def70fb4753ae6b982b12f23eb4 |  | data | 1591743670.717909 | BT_NRTL_dataset.csv |
| 166640351e1f47d18734135b1c86813a |  | data | 1591743670.736895 | BT_NRTL_dataset_split1.csv |
| 42e9daac3b1e49d4a7d5768532a2d172 |  | data | 1591744090.691507 | BT_NRTL_dataset.csv |
| 0e6ef05f9c34429fb9807c7e642f91b3 |  | data | 1591744090.703474 | BT_NRTL_dataset_split1.csv |
| 5548c5dcbded4340bab7982c02180f76 |  | data | 1591744090.713477 | BT_NRTL_dataset_split2.csv |
| 548acc344ce747b1b24f9a130623a12e |  | data | 1591744229.814279 | BT_NRTL_dataset.csv |
| d02877175c09484aba50b56f614e74dd |  | data | 1591744229.826991 | BT_NRTL_dataset_split1.csv |
| 664bb176ec0144a0bc8cf5500a9b2833 |  | data | 1591744229.83601 | BT_NRTL_dataset_split2.csv |
| 48b58ad6f6634635a1fef46e67ab59b5 | BT NRTL dataset | data | 1591744549.163546 | BT_NRTL_dataset.csv |
| 68e477317ab64c9492ec6382017a6fbb | BT NRTL split1 | data | 1591744549.174517 | BT_NRTL_dataset_split1.csv |
| c20cd17af49a43feb1572439698238f9 | BT NRTL split2 | data | 1591744549.185488 | BT_NRTL_dataset_split2.csv |

True

We need to provide a method to return an expression to compute the sum of squared errors that will be used as the objective in solving the parameter estimation problem. For this problem, the error will be computed for the mole fraction of benzene in the vapor and liquid phase between the model prediction and data. 

<div class="alert alert-block alert-info">
<b>Inline Exercise:</b>
Complete the following cell by adding an expression to compute the sum of square errors. 
</div>

In [15]:
# Create method to return an expression that computes the sum of squared error
def SSE(m, data):
    # Todo: Add expression for computing the sum of squared errors in mole fraction of benzene in the liquid
    # and vapor phase. For example, the squared error for the vapor phase is:
    # (float(data["vap_benzene"]) - m.fs.flash.vap_outlet.mole_frac_comp[0, "benzene"])**2
    expr = ((float(data["vap_benzene"]) -
             m.fs.flash.vap_outlet.mole_frac_comp[0, "benzene"])**2 +
            (float(data["liq_benzene"]) -
             m.fs.flash.liq_outlet.mole_frac_comp[0, "benzene"])**2)
    return expr*1E4

<div class="alert alert-block alert-warning">
<b>Note:</b>
Notice that we have scaled the expression up by a factor of 10000 as the SSE computed here will be an extremely small number given that we are using the difference in mole fraction in our expression. A well-scaled objective will help improve solve robustness when using IPOPT. 
</div>


We are now ready to set up the parameter estimation problem. We will create a parameter estimation object called `pest`. As shown below, we pass the method that returns an initialized model, dataset, list of variable names to estimate, and the SSE expression to the Estimator object. `tee=True` will print the solver output after solving the parameter estimation problem.

In [16]:
# Initialize a parameter estimation object for data subset 1
pest_data_subset_1 = parmest.Estimator(NRTL_model, data_subset_1, variable_name, SSE, tee=True)

# Initialize a parameter estimation object for data subset 2
pest_data_subset_2 = parmest.Estimator(NRTL_model, data_subset_2, variable_name, SSE, tee=True)

# Run parameter estimation using data subset 1
obj_value_1, parameters_1 = pest_data_subset_1.theta_est()

# Run parameter estimation using data subset 2
obj_value_2, parameters_2 = pest_data_subset_2.theta_est()

Ipopt 3.13.2: max_iter=6000


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt

This version of Ipopt was compiled from source code available at
    https://github.com/IDAES/Ipopt as part of the Institute for the Design of
    Advanced Energy Systems Process Systems Engineering Framework (IDAES PSE
    Framework) Copyright (c) 2018-2019. See https://github.com/IDAES/idaes-pse.

This version of Ipopt was compiled using HSL, a collection of Fortran codes
    for large-scale scientific computation.  All technical papers, sales and
    publicity material resulting from use of the HSL codes within IPOPT must
    contain the following acknowledgement:
        HSL, a collection of Fortran codes for large-scale scientific
        computatio

You will notice that the resulting parameter estimation problem, when using the flash unit model, will have 2952 variables and 2950 constraints. This is because the unit models in IDAES use control volume blocks which have two state blocks attached; one at the inlet and one at the outlet. Even though there are two state blocks, they still use the same parameter block i.e. `m.fs.properties` in our example which is where our parameters that need to be estimated exist. 

Let us display the results by running the next cell. 

In [17]:
print("----Using Data Subset 1----")
print()
print("The SSE at the optimal solution is %0.6f" % obj_value_1)
print()
print("The values for the parameters are as follows:")
for k,v in parameters_1.items():
    print(k, "=", v)

print()
print("----Using Data Subset 2----")
print()
print("The SSE at the optimal solution is %0.6f" % obj_value_2)
print()
print("The values for the parameters are as follows:")
for k,v in parameters_2.items():
    print(k, "=", v)

----Using Data Subset 1----

The SSE at the optimal solution is 0.001525

The values for the parameters are as follows:
fs.properties.tau[('benzene', 'toluene')] = 0.3217100103610996
fs.properties.tau[('toluene', 'benzene')] = -0.28307944072207775

----Using Data Subset 2----

The SSE at the optimal solution is 0.001784

The values for the parameters are as follows:
fs.properties.tau[('benzene', 'toluene')] = 0.07397102640934188
fs.properties.tau[('toluene', 'benzene')] = -0.0645984121830132


In [None]:
# Save to json

import json

# Create a dictionary to save the parameters
parameter_data_subset_1 = {}
parameter_data_subset_2 = {}

parameter_data_subset_1["tau['benzene', 'toluene']"] = \
    parameters_1["fs.properties.tau[('benzene', 'toluene')]"] 
parameter_data_subset_1["tau['toluene', 'benzene']"] = \
    parameters_1["fs.properties.tau[('toluene', 'benzene')]"]
parameter_data_subset_2["tau['benzene', 'toluene']"] = \
    parameters_2["fs.properties.tau[('benzene', 'toluene')]"] 
parameter_data_subset_2["tau['toluene', 'benzene']"] = \
    parameters_2["fs.properties.tau[('toluene', 'benzene')]"]

# Create json
with open("estimated_parameter.json", "w") as outfile:
    json.dump([{"data_subset_1":parameter_data_subset_1}, {"data_subset_2":parameter_data_subset_2}], outfile)

In [19]:
# save to DMF
# create resources
ds_s1 = _dmf.new(desc="Solution for data subset 1", data={'SSE': obj_value_1, 'parameters': parameters_1})
ds_s2 = _dmf.new(desc="Solution for data subset 2", data={'SSE': obj_value_2, 'parameters': parameters_2})
# relate resources to inputs
create_relation_args(ds_s1, PR_DERIVED, ds1)
create_relation_args(ds_s2, PR_DERIVED, ds2)
# save
_dmf.update()

True

In [24]:
ds.id

'48b58ad6f6634635a1fef46e67ab59b5'

In [25]:
!dmf related -d in --no-unicode 48b58ad6f6634635a1fef46e67ab59b5

48b5 data BT NRTL dataset
    |
    +--<-[derived]- c20c data BT NRTL split2
    .  |
    .  +--<-[derived]- 5f17 other -
    |
    +--<-[derived]- 68e4 data BT NRTL split1
       |
       +--<-[derived]- ed46 other -
