## Module 4b: Parameter Estimation Using Flash Unit Model

In this module, we will be using Pyomo's `parmest` tool in conjuction with IDAES models for parameter estimation. We will demonstrate these tools by estimating the parameters associated with the NRTL property model for a benzene-toluene mixture. The NRTL model has 2 sets of parameters: the non-randomness parameter (`alpha_ij`) and the binary interaction parameter (`tau_ij`), where `i` and `j` is the pure component species. In this example, we will be only estimate the binary interaction parameter (`tau_ij`) for a given dataset. When estimating parameters associated with the property package, IDAES provides the flexibility of doing the parameter estimation by just using the state block or by using a unit model with a specified property package. This module will demonstrate parameter estimation by using the flash uni model with NRTL property package. 

We will complete the following task:
* Set up a method to return an initialized model
* Set up the parameter estimation problem
* Analyze the results
* Demonstrate advanced features from `parmest`

## Key links to documentation:
* NRTL Model - https://idaes-pse.readthedocs.io/en/latest/model_libraries/core_library/property_models/activity_coefficient.html
* parmest - https://pyomo.readthedocs.io/en/stable/contributed_packages/parmest/index.html



<div class="alert alert-block alert-info">
<b>Inline Exercise:</b>
import `ConcreteModel` from Pyomo, `FlowsheetBlock` and `Flash` from IDAES. 
</div>

In [1]:
# Todo: import ConcreteModel from pyomo.environ
from pyomo.environ import ConcreteModel, value

# Todo: import FlowsheetBlock from idaes.core
from idaes.core import FlowsheetBlock

# Todo: import Flash unit model from idaes.generic_models.unit_models
from idaes.generic_models.unit_models import Flash


In the next cell, we will be importing the parameter block that we will be using in this module and also, idaes logger. 

In [2]:
from idaes.generic_models.properties.activity_coeff_models.\
    BTX_activity_coeff_VLE import BTXParameterBlock
import idaes.logger as idaeslog

In the next cell, we will import `parmest` from pyomo and the `pandas` package. 

In [3]:
import pyomo.contrib.parmest.parmest as parmest
import pandas as pd

## Setting up an initialized model

<div class="alert alert-block alert-info">
<b>Inline Exercise:</b>
Using what you have learned so far from previous modules, fill in the missing code below to complete returning an initialized IDAES model. 
</div>

In [4]:
def NRTL_model(data):
    
    #Todo: Create a ConcreteModel object
    m = ConcreteModel()
    
    #Todo: Create FlowsheetBlock object
    m.fs = FlowsheetBlock(default={"dynamic": False})
    

    #Todo: Create a properties parameter object with the following options:
    # "valid_phase": ('Liq', 'Vap')
    # "activity_coeff_model": 'NRTL'
    m.fs.properties = BTXParameterBlock(default={"valid_phase":
                                                 ('Liq', 'Vap'),
                                                 "activity_coeff_model":
                                                 'NRTL'})
    m.fs.flash = Flash(default={"property_package": m.fs.properties})

    # Initialize at a certain inlet condition
    m.fs.flash.inlet.flow_mol.fix(1)
    m.fs.flash.inlet.temperature.fix(368)
    m.fs.flash.inlet.pressure.fix(101325)
    m.fs.flash.inlet.mole_frac_comp[0, "benzene"].fix(0.5)
    m.fs.flash.inlet.mole_frac_comp[0, "toluene"].fix(0.5)

    # Set Flash unit specifications
    m.fs.flash.heat_duty.fix(0)
    m.fs.flash.deltaP.fix(0)

    # Fix NRTL specific variables
    # alpha values (set at 0.3)
    m.fs.properties.\
        alpha["benzene", "benzene"].fix(0)
    m.fs.properties.\
        alpha["benzene", "toluene"].fix(0.3)
    m.fs.properties.\
        alpha["toluene", "toluene"].fix(0)
    m.fs.properties.\
        alpha["toluene", "benzene"].fix(0.3)

    # initial tau values
    m.fs.properties.\
        tau["benzene", "benzene"].fix(0)
    m.fs.properties.\
        tau["benzene", "toluene"].fix(0.1690)
    m.fs.properties.\
        tau["toluene", "toluene"].fix(0)
    m.fs.properties.\
        tau["toluene", "benzene"].fix(-0.1559)

    # Initialize the flash unit
    m.fs.flash.initialize(outlvl=idaeslog.INFO_LOW)

    # Fix at actual temperature
    m.fs.flash.inlet.temperature.fix(float(data["temperature"]))

    # Set bounds on variables to be estimated
    m.fs.properties.\
        tau["benzene", "toluene"].setlb(-5)
    m.fs.properties.\
        tau["benzene", "toluene"].setub(5)

    m.fs.properties.\
        tau["toluene", "benzene"].setlb(-5)
    m.fs.properties.\
        tau["toluene", "benzene"].setub(5)

    # Return initialized flash model
    return m


In [6]:
from idaes.core.util.model_statistics import degrees_of_freedom
import pytest

# Testing the initialized model
test_data = {"temperature": 368}

m = NRTL_model(test_data)

# Check that degrees of freedom is 0
assert degrees_of_freedom(m) == 0

# Check for output values
assert value(m.fs.flash.liq_outlet.mole_frac_comp[0, 'benzene']) == pytest.approx(0.4105, abs=1e-3)
assert value(m.fs.flash.vap_outlet.mole_frac_comp[0, 'benzene']) == pytest.approx(0.6326, abs=1e-3)

assert value(m.fs.flash.liq_outlet.mole_frac_comp[0, 'toluene']) == pytest.approx(0.5895, abs=1e-3)
assert value(m.fs.flash.vap_outlet.mole_frac_comp[0, 'toluene']) == pytest.approx(0.3673, abs=1e-3)

## Parameter estimation using parmest

The `parmest` tool needs the following:

* a method that returns an initialized model (defined before)
* list of variable names to be estimated
* dataset
* expression to compute the sum of squared errors



In this example, we will only be estimating the binary interaction parameter (`tau_ij`). Given that this variable is usually indexed as `tau_ij = Var(component_list, component_list)`, there are 2*2=4 degrees of freedom. However, when i=j, the binary interaction parameter is 0. Therefore, in this problem, we will be estimating the binary interaction parameter for the following variables only:

* fs.properties.tau['benzene', 'toluene']
* fs.properties.tau['toluene', 'benzene']

<div class="alert alert-block alert-info">
<b>Inline Exercise:</b>
Create a list called `variable_name` with the above-mentioned variables declared as strings.
</div>

In [7]:
# Todo: Create a list of vars to estimate
variable_name = ["fs.properties.tau['benzene', 'toluene']",
                 "fs.properties.tau['toluene', 'benzene']"]


We will be loading data from the csv file `BT_NRTL_dataset.csv`. The dataset consists of fifty data points which provides the mole fraction of benzene in the vapor and liquid phase as a function of temperature. Pyomo's `parmest` tool supports the following data formats: pandas dataframe, list of dictionaries, and list of json file names. Please see the documentation for more details. 

In [8]:
# Load data from csv
data = pd.read_csv('BT_NRTL_dataset.csv')

We need to create an expression to compute the sum of squared errors that will be used as the objective in solving the parameter estimation problem. For this problem, the error will be computed for the mole fraction of benzene in the vapor and liquid phase between the model prediction and data. 

<div class="alert alert-block alert-info">
<b>Inline Exercise:</b>
Complete the following cell by adding an expression to compute the sum of square errors. 
</div>

In [9]:
# Create expression to compute the sum of squared error
def SSE(m, data):
    # Todo: Add expression for computing the sum of squared errors in mole fraction of benzene in the liquid
    # and vapor phase. 
    expr = ((float(data["vap_benzene"]) -
             m.fs.flash.vap_outlet.mole_frac_comp[0, "benzene"])**2 +
            (float(data["liq_benzene"]) -
             m.fs.flash.liq_outlet.mole_frac_comp[0, "benzene"])**2)
    return expr*1E4

<div class="alert alert-block alert-warning">
<b>Note:</b>
Notice that we have scaled the expression up by a factor of 10000 as the SSE computed here will be an extremely small number given that we are using the difference in mole fraction in our expression. This will help in using a well-scaled objective to improve solve robustness when using IPOPT. 
</div>


We are now ready to set up the parameter estimation problem. We will create a parameter estimation object called `pest`. As shown below, we pass the initialized model, data, variable_name, and the SSE expression to the Estimator method. `tee=True` will print the solver output after solving the parameter estimation problem.  

In [10]:
# Initialize a parameter estimation object
pest = parmest.Estimator(NRTL_model, data, variable_name, SSE, tee=True)

# Run parameter estimation using all data
obj_value, parameters = pest.theta_est()

Ipopt 3.12.13: max_iter=6000


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt

This version of Ipopt was compiled from source code available at
    https://github.com/IDAES/Ipopt as part of the Institute for the Design of
    Advanced Energy Systems Process Systems Engineering Framework (IDAES PSE
    Framework) Copyright (c) 2018-2019. See https://github.com/IDAES/idaes-pse.

This version of Ipopt was compiled using HSL, a collection of Fortran codes
    for large-scale scientific computation.  All technical papers, sales and
    publicity material resulting from use of the HSL codes within IPOPT must
    contain the following acknowledgement:
        HSL, a collection of Fortran codes for large-scale scientific
        computati

In [11]:
# Check for values of the parameter estimation problem
assert obj_value == pytest.approx(0.004663, 1e-3)
assert parameters["fs.properties.tau[('benzene', 'toluene')]"] == pytest.approx(0.47811, 1e-3) 
assert parameters["fs.properties.tau[('toluene', 'benzene')]"] == pytest.approx(-0.40924, 1e-3)

You will notice that the resulting parameter estimation problem will have 1102 variables and 1100 constraints. Let us display the results by running the next cell. 

In [12]:
print("The SSE at the optimal solution is %0.6f" % obj_value)
print()
print("The values for the parameters are as follows:")
for k,v in parameters.items():
    print(k, "=", v)

The SSE at the optimal solution is 0.004663

The values for the parameters are as follows:
fs.properties.tau[('benzene', 'toluene')] = 0.47810867841011423
fs.properties.tau[('toluene', 'benzene')] = -0.40924465377594205


Using the data that was provided, we have estimated the binary interaction parameters in the NRTL model for a benzene-toluene mixture. Although the dataset that was provided was temperature dependent, in this example we have estimated a single value that fits best for all temperatures.

### Advanced options for parmest: bootstrapping



Pyomo's `parmest` tool allows for bootstrapping where the parameter estimation is repeated over `n` samples with resampling from the original data set. In the following cell, we will run the parameter estimation with 10 number of bootstrap samples from the given dataset.  

In [13]:
# Run parameter estimation using bootstrap resample of the data (10 samples),
# plot results along with confidence regions
bootstrap_theta = pest.theta_est_bootstrap(10)

Ipopt 3.12.13: max_iter=6000


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt

This version of Ipopt was compiled from source code available at
    https://github.com/IDAES/Ipopt as part of the Institute for the Design of
    Advanced Energy Systems Process Systems Engineering Framework (IDAES PSE
    Framework) Copyright (c) 2018-2019. See https://github.com/IDAES/idaes-pse.

This version of Ipopt was compiled using HSL, a collection of Fortran codes
    for large-scale scientific computation.  All technical papers, sales and
    publicity material resulting from use of the HSL codes within IPOPT must
    contain the following acknowledgement:
        HSL, a collection of Fortran codes for large-scale scientific
        computati

Ipopt 3.12.13: max_iter=6000


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt

This version of Ipopt was compiled from source code available at
    https://github.com/IDAES/Ipopt as part of the Institute for the Design of
    Advanced Energy Systems Process Systems Engineering Framework (IDAES PSE
    Framework) Copyright (c) 2018-2019. See https://github.com/IDAES/idaes-pse.

This version of Ipopt was compiled using HSL, a collection of Fortran codes
    for large-scale scientific computation.  All technical papers, sales and
    publicity material resulting from use of the HSL codes within IPOPT must
    contain the following acknowledgement:
        HSL, a collection of Fortran codes for large-scale scientific
        computati

Ipopt 3.12.13: max_iter=6000


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt

This version of Ipopt was compiled from source code available at
    https://github.com/IDAES/Ipopt as part of the Institute for the Design of
    Advanced Energy Systems Process Systems Engineering Framework (IDAES PSE
    Framework) Copyright (c) 2018-2019. See https://github.com/IDAES/idaes-pse.

This version of Ipopt was compiled using HSL, a collection of Fortran codes
    for large-scale scientific computation.  All technical papers, sales and
    publicity material resulting from use of the HSL codes within IPOPT must
    contain the following acknowledgement:
        HSL, a collection of Fortran codes for large-scale scientific
        computati

  23  5.3383183e-03 2.35e-05 3.87e-10  -5.7 3.06e-03    -  1.00e+00 1.00e+00h  1
  24  5.3383181e-03 1.89e-05 9.68e-10  -8.6 6.54e-03    -  1.00e+00 1.00e+00h  1
  25  5.3383181e-03 7.45e-09 2.51e-14  -8.6 4.51e-07    -  1.00e+00 1.00e+00h  1

Number of Iterations....: 25

                                   (scaled)                 (unscaled)
Objective...............:   5.3383181023234881e-03    5.3383181023234881e-03
Dual infeasibility......:   2.5140943918865581e-14    2.5140943918865581e-14
Constraint violation....:   1.4104644499482428e-11    7.4505805969238281e-09
Complementarity.........:   2.5059035596800626e-09    2.5059035596800626e-09
Overall NLP error.......:   2.5059035596800626e-09    7.4505805969238281e-09


Number of objective function evaluations             = 69
Number of objective gradient evaluations             = 26
Number of equality constraint evaluations            = 69
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobia

   5  6.4434565e-03 3.21e-02 2.08e-04  -2.5 2.16e+00    -  1.00e+00 1.00e+00h  1
   6  6.2906783e-03 4.88e+01 6.28e-04  -3.8 1.07e+00    -  1.00e+00 1.00e+00h  1
   7  6.2841111e-03 1.40e-01 2.33e-04  -3.8 3.85e-02  -4.0 1.00e+00 1.00e+00h  1
   8  5.1825127e-03 2.90e+03 2.27e-02  -5.7 9.07e+00    -  9.00e-01 1.00e+00h  1
   9  4.9015236e-03 4.70e+02 6.09e-03  -5.7 7.85e+00    -  1.00e+00 1.00e+00h  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  10  4.7012496e-03 3.76e-01 1.74e-04  -5.7 3.41e+00    -  1.00e+00 1.00e+00h  1
  11  4.6985811e-03 5.72e-02 1.69e-06  -5.7 2.80e-01    -  1.00e+00 1.00e+00h  1
  12  4.6985731e-03 1.27e-05 2.58e-10  -5.7 2.96e-03    -  1.00e+00 1.00e+00h  1
  13  4.6985729e-03 3.30e-05 1.32e-09  -8.6 7.84e-03    -  1.00e+00 1.00e+00h  1
  14  4.6985729e-03 7.45e-09 2.51e-14  -8.6 7.46e-07    -  1.00e+00 1.00e+00h  1

Number of Iterations....: 14

                                   (scaled)                 (unscaled)
Objecti

   4  5.9955083e-03 1.97e+02 1.70e-03  -1.0 1.69e+00    -  1.00e+00 1.00e+00h  1
   5  7.1484212e-03 8.60e-02 7.25e-04  -1.7 7.00e+00    -  1.00e+00 1.00e+00h  1
   6  7.1153266e-03 1.38e+00 3.27e-04  -3.8 7.12e-01    -  1.00e+00 1.00e+00h  1
   7  5.6272940e-03 6.61e+03 4.45e-02  -3.8 1.15e+01    -  1.00e+00 1.00e+00h  1
   8  6.1942012e-03 5.07e+03 3.44e-02  -3.8 1.79e+01    -  1.00e+00 2.50e-01h  3
   9  6.6933699e-03 3.91e+03 2.67e-02  -3.8 1.46e+01    -  1.00e+00 2.50e-01h  3
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  10  6.7996224e-03 3.79e+03 2.59e-02  -3.8 2.30e+01    -  1.00e+00 3.12e-02h  6
  11  6.8359000e-03 3.76e+03 2.57e-02  -3.8 3.06e+01    -  1.00e+00 7.81e-03h  8
  12  6.8461287e-03 3.75e+03 2.57e-02  -3.8 3.42e+01    -  1.00e+00 1.95e-03h 10
  13  1.7203444e-02 2.70e+03 4.31e-02  -3.8 3.53e+01    -  1.00e+00 1.00e+00h  1
  14  1.6623812e-02 1.75e+03 1.96e+00  -3.8 3.44e+00  -4.0 1.00e+00 1.00e+00h  1
  15  6.7222845e-03 4.09e+00

In [None]:
display(bootstrap_theta)

In [None]:
parmest.pairwise_plot(bootstrap_theta, alpha=0.75, distributions=['Rect', 'MVN'])