## Module 4: Parameter Estimation Using the NRTL State Block

In this module, we will be using Pyomo's `parmest` tool in conjuction with IDAES models for parameter estimation. We will demonstrate these tools by estimating the parameters associated with the NRTL property model for a benzene-toluene mixture. The NRTL model has 2 sets of parameters: the non-randomness parameter (`alpha_ij`) and the binary interaction parameter (`tau_ij`), where `i` and `j` is the pure component species. In this example, we will be only estimate the binary interaction parameter (`tau_ij`) for a given dataset. When estimating parameters associated with the property package, IDAES provides the flexibility of doing the parameter estimation by just using the state block or by using a unit model with a specified property package. This module will demonstrate parameter estimation by using only the state block. 

We will complete the following task:
* Set up a method to return an initialized model
* Set up the parameter estimation problem
* Analyze the results
* Demonstrate advanced features from `parmest`

## Key links to documentation:
* NRTL Model - https://idaes-pse.readthedocs.io/en/latest/model_libraries/core_library/property_models/activity_coefficient.html
* parmest - https://pyomo.readthedocs.io/en/stable/contributed_packages/parmest/index.html



<div class="alert alert-block alert-info">
<b>Inline Exercise:</b>
import `ConcreteModel` from Pyomo and `FlowsheetBlock` from IDAES. 
</div>

In [None]:
# Todo: import ConcreteModel from pyomo.environ


# Todo: import FlowsheetBlock from idaes.core



In the next cell, we will be importing the parameter block that we will be using in this module and also, idaes logger. 

In [None]:
from idaes.generic_models.properties.activity_coeff_models.\
    BTX_activity_coeff_VLE import BTXParameterBlock
import idaes.logger as idaeslog

In the next cell, we will import `parmest` from pyomo and the `pandas` package. 

In [None]:
import pyomo.contrib.parmest.parmest as parmest
import pandas as pd

## Setting up an initialized model

<div class="alert alert-block alert-info">
<b>Inline Exercise:</b>
Using what you have learned so far from previous modules, fill in the missing code below to complete returning an initialized IDAES model. 
</div>

In [None]:
def NRTL_model(data):
    
    #Todo: Create a ConcreteModel object

    
    #Todo: Create FlowsheetBlock object
    

    #Todo: Create a properties parameter object with the following options:
    # "valid_phase": ('Liq', 'Vap')
    # "activity_coeff_model": 'NRTL'
    
    # Add state block to flowsheet (required when not using unit models)
    m.fs.state_block = m.fs.properties.state_block_class(
        default={"parameters": m.fs.properties,
                 "defined_state": True})

    
    # Todo: Fix the state varaibles on the state block
    # hint: state variables exist on the state block i.e. on m.fs.state_block
    

    # Fix NRTL specific parameters. 
    
    # non-randomness parameter - alpha_ij (set at 0.3, 0 if i=j)
    m.fs.properties.\
        alpha["benzene", "benzene"].fix(0)
    m.fs.properties.\
        alpha["benzene", "toluene"].fix(0.3)
    m.fs.properties.\
        alpha["toluene", "toluene"].fix(0)
    m.fs.properties.\
        alpha["toluene", "benzene"].fix(0.3)

    # binary interaction parameter - tau_ij (0 if i=j, else to be estimated later but fixing to initialize)
    m.fs.properties.\
        tau["benzene", "benzene"].fix(0)
    m.fs.properties.\
        tau["benzene", "toluene"].fix(0.1690)
    m.fs.properties.\
        tau["toluene", "toluene"].fix(0)
    m.fs.properties.\
        tau["toluene", "benzene"].fix(-0.1559)

    # Initialize the state block
    m.fs.state_block.initialize(outlvl=idaeslog.INFO)

    # Fix at actual temperature
    m.fs.state_block.temperature.fix(float(data["temperature"]))

    # Set bounds on variables to be estimated
    m.fs.properties.\
        tau["benzene", "toluene"].setlb(-5)
    m.fs.properties.\
        tau["benzene", "toluene"].setub(5)

    m.fs.properties.\
        tau["toluene", "benzene"].setlb(-5)
    m.fs.properties.\
        tau["toluene", "benzene"].setub(5)

    # Return initialized flash model
    return m


## Parameter estimation using parmest

The `parmest` tool needs the following:

* a method that returns an initialized model (defined before)
* list of variable names to be estimated
* dataset
* expression to compute the sum of squared errors



In this example, we will only be estimating the binary interaction parameter (`tau_ij`). Given that this variable is usually indexed as `tau_ij = Var(component_list, component_list)`, there are 2*2=4 degrees of freedom. However, when i=j, the binary interaction parameter is 0. Therefore, in this problem, we will be estimating the binary interaction parameter for the following variables only:

* fs.properties.tau['benzene', 'toluene']
* fs.properties.tau['toluene', 'benzene']

<div class="alert alert-block alert-info">
<b>Inline Exercise:</b>
Create a list called `variable_name` with the above-mentioned variables declared as strings.
</div>

In [None]:
# Todo: Create a list of vars to estimate
variable_name = []


We will be loading data from the csv file `BT_NRTL_dataset.csv`. The dataset consists of fifty data points which provides the mole fraction of benzene in the vapor and liquid phase as a function of temperature. Pyomo's `parmest` tool supports the following data formats: pandas dataframe, list of dictionaries, and list of json file names. Please see the documentation for more details. 

In [None]:
# Load data from csv
data = pd.read_csv('BT_NRTL_dataset.csv')

We need to create an expression to compute the sum of squared errors that will be used as the objective in solving the parameter estimation problem. For this problem, the error will be computed for the mole fraction of benzene in the vapor and liquid phase between the model prediction and data. 

<div class="alert alert-block alert-info">
<b>Inline Exercise:</b>
Complete the following cell by adding an expression to compute the sum of square errors. 
</div>

In [None]:
# Create expression to compute the sum of squared error
def SSE(m, data):
    # Todo: Add expression for computing the sum of squared errors in mole fraction of benzene in the liquid
    # and vapor phase. 
    expr = 
    
    return expr*1E4

<div class="alert alert-block alert-warning">
<b>Note:</b>
Notice that we have scaled the expression up by a factor of 10000 as the SSE computed here will be an extremely small number given that we are using the difference in mole fraction in our expression. This will help in using a well-scaled objective to improve solve robustness when using IPOPT. 
</div>


We are now ready to set up the parameter estimation problem. We will create a parameter estimation object called `pest`. As shown below, we pass the initialized model, data, variable_name, and the SSE expression to the Estimator method. `tee=True` will print the solver output after solving the parameter estimation problem.  

In [None]:
# Initialize a parameter estimation object
pest = parmest.Estimator(NRTL_model, data, variable_name, SSE, tee=True)

# Run parameter estimation using all data
obj_value, parameters = pest.theta_est()

You will notice that the resulting parameter estimation problem will have 1102 variables and 1100 constraints. Let us display the results by running the next cell. 

In [None]:
print("The SSE at the optimal solution is %0.6f" % obj_value)
print()
print("The values for the parameters are as follows:")
for k,v in parameters.items():
    print(k, "=", v)

Using the data that was provided, we have estimated the binary interaction parameters in the NRTL model for a benzene-toluene mixture. Although the dataset that was provided was temperature dependent, in this example we have estimated a single value that fits best for all temperatures.

### Advanced options for parmest: bootstrapping



Pyomo's `parmest` tool allows for bootstrapping where the parameter estimation is repeated over `n` samples with resampling from the original data set. In the following cell, we will run the parameter estimation with 10 number of bootstrap samples from the given dataset.  

In [None]:
# Run parameter estimation using bootstrap resample of the data (10 samples),
# plot results along with confidence regions
bootstrap_theta = pest.theta_est_bootstrap(10)

In [None]:
display(bootstrap_theta)

In [None]:
parmest.pairwise_plot(bootstrap_theta, alpha=0.75, distributions=['Rect', 'MVN'])