# System-wide Regression Optimization

Plant-wide optimization provides prescriptive guidance to plant operators about how a process should be run to optimize some objective, such as to maximize throughput, yield, or to minimize power consumption. Production processes in manufacturing and process industries comprise sequences of complex processes, each of which has a self-contained set of inputs and outputs. The outflow from an upstream process becomes an inflow into a downstream process. Within each process, there exists a complex relationship between the various set-points, material inflows and the throughput and quality of the desired output. A production site is a complex network of these unit processes. Top part of Figure 1 shows an abstract representation of a simple system with three processes which process some input material flow to produce some product. In this simplified case, the remote operator may want to optimize the overall production output by leveraging a variety of sensor data and process controls (set-points).

<img src="figures/Picture1.jpg" alt="Drawing" style="width: 550px;"/>


The "Process and System Regression Optimization" AI and optimization service comprises models and algorithms for optimizing set points for process control to achieve greater efficiency, productivity, and reduced risk.

The APIs offer two specific applications:

1. Single process regression-optimization aims to learn behavior and provide optimal set points for single unit systems with varying characteristics, and

2. System-wide regression-optimization provides optimal set points for systems comprised on multiple process units such as complex plants or manufacturing processes.

System-wide optimization considers system level process optimization, as illustrated in the figure shown below, where we can handle more complex scenarios. In this case, we can manage an entire plant that consists of several processes. Similar to the single process regression-optimization service, individual processes are modeled by their inputs, output and the corresponding regression functions. 

<img src="figures/Picture3.jpg" alt="Drawing" style="width: 350px;"/>


This notebook demonstrates using the system-wide regression-optimization service to optimize the performance of a system of plants. In order to take into account the interactions between processes and their operational constraints, we model the entire system as a network (process flow network or a graph) of multiple individual processes where outputs of upstream processes can flow as inputs of the downstream processes. The service leverages advanced machine learning techniques including piece-wise linear regression models, deep learning and ensemble models with optimization algorithms based on Mixed-Integer Linear Programming (MILP), nonlinear optimization, and derivative-free optimization for ensemble models.

### Credentials

This notebook requires two credentials. Please obtain your own credentials when customizing this notebook for your own work. Please visit [Regression Optimization @ IBM](https://developer.ibm.com/apis/catalog/ai4industry--regression-optimization-product/Introduction) for trial subscription.

In [1]:
# Credentials required for running notebook

Client_ID = "replace-with-valid-client-ID"
Client_Secret = "replace-with-valid-client-Secret"


### Load Dataset

The dataset below represents a single process comprising 4 features and 1 target. We will use the Regression-Optimization service to first generate regression models, choose the best performing one and then optimize the controllable variables to maximize output.

In [2]:
#read dataset from a local file
import pandas as pd
from io import StringIO

datafile_name = 'data/P1.csv'
data_df = pd.read_csv(datafile_name)
data_df.head()

Unnamed: 0,P1_1,P1_2,P1_3,P1_4,P1_5,P1_y
0,3558.734445,4372.057772,4368.767882,0.496077,7654.29693,6910.577119
1,4443.402776,5204.925143,4666.803883,0.57404,4568.129063,4229.638464
2,4704.597316,5299.015293,3789.424103,0.822494,1429.750307,1502.815985
3,4641.848748,5343.532958,5290.086972,0.630075,8514.035452,7669.505625
4,3779.492033,4582.668164,3887.728128,0.535722,5134.960053,4698.800721


### Regression Service Creation and Job Submission

We build regression models to capture the inflow-outflow (inputs-outputs) relationship for the process node. Prior to building regression models, the individual sub-processes and their controls, observed variables and outcome variables at the right granularity need to be identified. Prior to using this service, a user would need to complete data cleaning, and feature extraction and engineering techniques to train a regression model, with the intent to maximize accuracy and interpretability. The service supports several ML techniques to represent the historical behavior of a process. The machine learning model with the best test accuracy is deployed for each process which is in turn used during the optimization phase. 

The regression service should be called separately for each process, as shown below. Each regression service call needs 5 inputs: model_id, training_data_path, model_type, target_vars and control_vars. 
1. The choices for model_type include LR_SK (LinearRegression from scikit-learn), RF_SK (RandomForestRegressor from scikit-learn), RegTree_SK (DecisionTreeRegressor from scikit-learn combined with linear regression in the leaf nodes), MARS (multivariate adaptive regression splines from pyearth), NN_SK (MLPRegressor from scikit-learn) and NN_KERAS (deep neural network trained using Keras).
2. Target and control variables may be given any label. Any missing values will need to be removed prior to using this service.

#### Training for P1

In [3]:
import requests
import time
import pprint

headers = {
    'X-IBM-Client-Id': Client_ID,
    'X-IBM-Client-Secret': Client_Secret,
    'accept': "application/json",
}

reg_job_url = "https://api.ibm.com/ai4industry/run/pred-opt/v1/regression-model" 
opt_system_wide_job_url = "https://api.ibm.com/ai4industry/run/pred-opt/v1/system-wide-optimization" 

model_id = 'model_P1_1_LR_SK_nb_example_system_wide'
model_type = 'LR_SK'
target_vars= 'P1_y'
control_vars='P1_1,P1_2,P1_3,P1_4,P1_5'

In [4]:
# try to kill it first since we are going to recreate it using the POST operation in the following cells
response = requests.delete(reg_job_url +  "/" + model_id, headers=headers)
print(response.status_code)


400


In [5]:
fields = {
        'model-id':  model_id,
        'target-variable': target_vars,
        'input-variables': control_vars,
        'model-type': model_type,
        }

files = {
        'training-data':  ('training-data', open(datafile_name, 'rb'),'text/csv'),
        }
response = requests.post(reg_job_url, data=fields, files=files, headers=headers)

pprint.pprint(response.json())

{'input-variables': 'P1_1,P1_2,P1_3,P1_4,P1_5',
 'model-id': 'model_P1_1_LR_SK_nb_example_system_wide',
 'model-type': 'LR_SK',
 'queued-time': '2022-02-03T22:09:38.874536',
 'scaling-factor': 1.0,
 'status': 'queued',
 'target-variable': 'P1_y'}


In [6]:
# Here we waiting until the job is finished
retries = 0
status = "queued"
while retries < 10 and (status=="queued" or status=="running"):
    time.sleep(5)
    get_response = requests.get(reg_job_url +  "/" + model_id, headers=headers)
    pprint.pprint(get_response.json())
    status = get_response.json()['status']
    print(status)

status = get_response.json()['status']
print(status)

{'input-variables': 'P1_1,P1_2,P1_3,P1_4,P1_5',
 'model-id': 'model_P1_1_LR_SK_nb_example_system_wide',
 'model-type': 'LR_SK',
 'queued-time': '2022-02-03T22:09:38.874536',
 'scaling-factor': 1.0,
 'status': 'queued',
 'target-variable': 'P1_y'}
queued
{'input-variables': 'P1_1,P1_2,P1_3,P1_4,P1_5',
 'model-id': 'model_P1_1_LR_SK_nb_example_system_wide',
 'model-type': 'LR_SK',
 'queued-time': '2022-02-03T22:09:38.874536',
 'scaling-factor': 1.0,
 'start-time': '2022-02-03T22:09:48.565877',
 'status': 'running',
 'target-variable': 'P1_y'}
running
{'duration': 0.017187,
 'end-time': '2022-02-03T22:09:52.442817',
 'finish-time': '2022-02-03T22:09:53.489634',
 'input-variables': 'P1_1,P1_2,P1_3,P1_4,P1_5',
 'model-id': 'model_P1_1_LR_SK_nb_example_system_wide',
 'model-metadata': {'mean-absolute-error': 13.83381422041587,
                    'r2-score': 0.9999491246136101},
 'model-type': 'LR_SK',
 'output-files': ['model_P1_1_LR_SK_nb_example_system_wide',
                  'model_P1_1

#### Training for P2

In [7]:
datafile_name = 'data/P2.csv'
data_df = pd.read_csv(datafile_name)
data_df.head()

Unnamed: 0,P2_1,P2_2,P2_3,P2_4,P2_y
0,5203.701432,4647.398496,0.810821,1574.876258,59772.21094
1,5394.807386,5259.115816,0.884322,4873.364291,170391.284083
2,5123.295995,4187.075203,0.458995,3567.159586,125163.071034
3,4999.754275,4618.900006,0.589459,3881.872501,136081.195087
4,5061.699099,4153.707961,0.915153,541.21959,24895.222414


In [8]:
model_id = 'model_P2_1_LR_SK_nb_example_system_wide'
model_type = 'LR_SK'
target_vars= 'P2_y'
control_vars='P2_1,P2_2,P2_3,P2_4'

In [9]:
# try to kill it first since we are going to recreate it using the POST operation in the following cells
response = requests.delete(reg_job_url +  "/" + model_id, headers=headers)
print(response.status_code)

400


In [10]:
fields = {
        'model-id':  model_id,
        'target-variable': target_vars,
        'input-variables': control_vars,
        'model-type': model_type,
        }

files = {
        'training-data':  ('training-data', open(datafile_name, 'rb'),'text/csv'),
        }
response = requests.post(reg_job_url, data=fields, files=files, headers=headers)

pprint.pprint(response.json())

{'input-variables': 'P2_1,P2_2,P2_3,P2_4',
 'model-id': 'model_P2_1_LR_SK_nb_example_system_wide',
 'model-type': 'LR_SK',
 'queued-time': '2022-02-03T22:10:00.519681',
 'scaling-factor': 1.0,
 'status': 'queued',
 'target-variable': 'P2_y'}


In [11]:
# Here we waiting until the job is finished
retries = 0
status = "queued"
while retries < 10 and (status=="queued" or status=="running"):
    time.sleep(5)
    get_response = requests.get(reg_job_url +  "/" + model_id, headers=headers)
    pprint.pprint(get_response.json())
    status = get_response.json()['status']
    print(status)

status = get_response.json()['status']
print(status)

{'input-variables': 'P2_1,P2_2,P2_3,P2_4',
 'model-id': 'model_P2_1_LR_SK_nb_example_system_wide',
 'model-type': 'LR_SK',
 'queued-time': '2022-02-03T22:10:00.519681',
 'scaling-factor': 1.0,
 'status': 'queued',
 'target-variable': 'P2_y'}
queued
{'input-variables': 'P2_1,P2_2,P2_3,P2_4',
 'model-id': 'model_P2_1_LR_SK_nb_example_system_wide',
 'model-type': 'LR_SK',
 'queued-time': '2022-02-03T22:10:00.519681',
 'scaling-factor': 1.0,
 'start-time': '2022-02-03T22:10:08.974862',
 'status': 'running',
 'target-variable': 'P2_y'}
running
{'duration': 0.017762,
 'end-time': '2022-02-03T22:10:13.221250',
 'finish-time': '2022-02-03T22:10:14.071495',
 'input-variables': 'P2_1,P2_2,P2_3,P2_4',
 'model-id': 'model_P2_1_LR_SK_nb_example_system_wide',
 'model-metadata': {'mean-absolute-error': 0.07769783613481195,
                    'r2-score': 0.999999999998762},
 'model-type': 'LR_SK',
 'output-files': ['model_P2_1_LR_SK_nb_example_system_wide',
                  'model_P2_1_LR_SK_nb_exam

#### Training for P3_1

P3_1, P3_2, P3_3, P4_1 and P5_1 are examples of processes for which the relationship between the inputs and output variables are known and not derived from data. An example of this could be where the relationship is either dictated by first principles or provided by a subject matter expert. In such cases, model_type should be set to "LR". The data file would contain pre-determined weights or coefficients associated with the input variables.

In [12]:
datafile_name = 'data/P3_1.csv'
data_df = pd.read_csv(datafile_name)
data_df.head()

Unnamed: 0,P3_1,P3_2,P3_3,b
0,0,1,0,0


In [13]:
model_id = 'model_P3_1_LR_params_nb_example_system_wide'
model_type = 'LR'
target_vars= 'b'
control_vars='P3_1,P3_2,P3_3'

In [14]:
# try to kill it first since we are going to recreate it using the POST operation in the following cells
response = requests.delete(reg_job_url +  "/" + model_id, headers=headers)
print(response.status_code)

400


In [15]:
fields = {
        'model-id':  model_id,
        'target-variable': target_vars,
        'input-variables': control_vars,
        'model-type': model_type,
        }

files = {
        'training-data':  ('training-data', open(datafile_name, 'rb'),'text/csv'),
        }
response = requests.post(reg_job_url, data=fields, files=files, headers=headers)

pprint.pprint(response.json())

{'input-variables': 'P3_1,P3_2,P3_3',
 'model-id': 'model_P3_1_LR_params_nb_example_system_wide',
 'model-type': 'LR',
 'queued-time': '2022-02-03T22:10:21.626773',
 'scaling-factor': 1.0,
 'status': 'queued',
 'target-variable': 'b'}


In [16]:
# Here we waiting until the job is finished
retries = 0
status = "queued"
while retries < 10 and (status=="queued" or status=="running"):
    time.sleep(5)
    get_response = requests.get(reg_job_url +  "/" + model_id, headers=headers)
    pprint.pprint(get_response.json())
    status = get_response.json()['status']
    print(status)

status = get_response.json()['status']
print(status)

{'input-variables': 'P3_1,P3_2,P3_3',
 'model-id': 'model_P3_1_LR_params_nb_example_system_wide',
 'model-type': 'LR',
 'queued-time': '2022-02-03T22:10:21.626773',
 'scaling-factor': 1.0,
 'status': 'queued',
 'target-variable': 'b'}
queued
{'duration': 0.007042,
 'end-time': '2022-02-03T22:10:32.910411',
 'finish-time': '2022-02-03T22:10:33.659678',
 'input-variables': 'P3_1,P3_2,P3_3',
 'model-id': 'model_P3_1_LR_params_nb_example_system_wide',
 'model-metadata': {'mean-absolute-error': 'NA', 'r2-score': 'NA'},
 'model-type': 'LR',
 'output-files': ['model_P3_1_LR_params_nb_example_system_wide'],
 'queued-time': '2022-02-03T22:10:21.626773',
 'scaling-factor': 1.0,
 'start-time': '2022-02-03T22:10:32.903369',
 'status': 'finished',
 'target-variable': 'b'}
finished
finished


#### Training for P3_2

In [17]:
datafile_name = 'data/P3_2.csv'
data_df = pd.read_csv(datafile_name)
data_df.head()

Unnamed: 0,P3_1,P3_2,P3_3,b
0,0.30485,-0.30485,0.30485,0


In [18]:
model_id = 'model_P3_2_LR_params_nb_example_system_wide'
model_type = 'LR'
target_vars= 'b'
control_vars='P3_1,P3_2,P3_3'

In [19]:
# try to kill it first since we are going to recreate it using the POST operation in the following cells
response = requests.delete(reg_job_url +  "/" + model_id, headers=headers)
print(response.status_code)

400


In [20]:
fields = {
        'model-id':  model_id,
        'target-variable': target_vars,
        'input-variables': control_vars,
        'model-type': model_type,
        }

files = {
        'training-data':  ('training-data', open(datafile_name, 'rb'),'text/csv'),
        }
response = requests.post(reg_job_url, data=fields, files=files, headers=headers)

pprint.pprint(response.json())

{'input-variables': 'P3_1,P3_2,P3_3',
 'model-id': 'model_P3_2_LR_params_nb_example_system_wide',
 'model-type': 'LR',
 'queued-time': '2022-02-03T22:10:36.838671',
 'scaling-factor': 1.0,
 'status': 'queued',
 'target-variable': 'b'}


In [21]:
# Here we waiting until the job is finished
retries = 0
status = "queued"
while retries < 10 and (status=="queued" or status=="running"):
    time.sleep(5)
    get_response = requests.get(reg_job_url +  "/" + model_id, headers=headers)
    pprint.pprint(get_response.json())
    status = get_response.json()['status']
    print(status)

status = get_response.json()['status']
print(status)

{'input-variables': 'P3_1,P3_2,P3_3',
 'model-id': 'model_P3_2_LR_params_nb_example_system_wide',
 'model-type': 'LR',
 'queued-time': '2022-02-03T22:10:36.838671',
 'scaling-factor': 1.0,
 'status': 'queued',
 'target-variable': 'b'}
queued
{'duration': 0.007209,
 'end-time': '2022-02-03T22:10:47.880663',
 'finish-time': '2022-02-03T22:10:48.601002',
 'input-variables': 'P3_1,P3_2,P3_3',
 'model-id': 'model_P3_2_LR_params_nb_example_system_wide',
 'model-metadata': {'mean-absolute-error': 'NA', 'r2-score': 'NA'},
 'model-type': 'LR',
 'output-files': ['model_P3_2_LR_params_nb_example_system_wide'],
 'queued-time': '2022-02-03T22:10:36.838671',
 'scaling-factor': 1.0,
 'start-time': '2022-02-03T22:10:47.873454',
 'status': 'finished',
 'target-variable': 'b'}
finished
finished


#### Training for P3_3

In [22]:
datafile_name = 'data/P3_3.csv'
data_df = pd.read_csv(datafile_name)
data_df.head()

Unnamed: 0,P3_1,P3_2,P3_3,b
0,0.50954,-0.50954,0.50954,0


In [23]:
model_id = 'model_P3_3_LR_params_nb_example_system_wide'
model_type = 'LR'
target_vars= 'b'
control_vars='P3_1,P3_2,P3_3'

In [24]:
# try to kill it first since we are going to recreate it using the POST operation in the following cells
response = requests.delete(reg_job_url +  "/" + model_id, headers=headers)
print(response.status_code)

400


In [25]:
fields = {
        'model-id':  model_id,
        'target-variable': target_vars,
        'input-variables': control_vars,
        'model-type': model_type,
        }

files = {
        'training-data':  ('training-data', open(datafile_name, 'rb'),'text/csv'),
        }
response = requests.post(reg_job_url, data=fields, files=files, headers=headers)

pprint.pprint(response.json())

{'input-variables': 'P3_1,P3_2,P3_3',
 'model-id': 'model_P3_3_LR_params_nb_example_system_wide',
 'model-type': 'LR',
 'queued-time': '2022-02-03T22:10:51.603944',
 'scaling-factor': 1.0,
 'status': 'queued',
 'target-variable': 'b'}


In [26]:
# Here we waiting until the job is finished
retries = 0
status = "queued"
while retries < 10 and (status=="queued" or status=="running"):
    time.sleep(5)
    get_response = requests.get(reg_job_url +  "/" + model_id, headers=headers)
    pprint.pprint(get_response.json())
    status = get_response.json()['status']
    print(status)

status = get_response.json()['status']
print(status)

{'input-variables': 'P3_1,P3_2,P3_3',
 'model-id': 'model_P3_3_LR_params_nb_example_system_wide',
 'model-type': 'LR',
 'queued-time': '2022-02-03T22:10:51.603944',
 'scaling-factor': 1.0,
 'status': 'queued',
 'target-variable': 'b'}
queued
{'duration': 0.006626,
 'end-time': '2022-02-03T22:11:02.569829',
 'finish-time': '2022-02-03T22:11:03.327867',
 'input-variables': 'P3_1,P3_2,P3_3',
 'model-id': 'model_P3_3_LR_params_nb_example_system_wide',
 'model-metadata': {'mean-absolute-error': 'NA', 'r2-score': 'NA'},
 'model-type': 'LR',
 'output-files': ['model_P3_3_LR_params_nb_example_system_wide'],
 'queued-time': '2022-02-03T22:10:51.603944',
 'scaling-factor': 1.0,
 'start-time': '2022-02-03T22:11:02.563203',
 'status': 'finished',
 'target-variable': 'b'}
finished
finished


#### Training for P4_1

In [27]:
datafile_name = 'data/P4_1.csv'
data_df = pd.read_csv(datafile_name)
data_df.head()

Unnamed: 0,P4_1,P4_2,P4_3,b
0,0.66824,0.66824,0.66824,0


In [28]:
model_id = 'model_P4_1_LR_params_nb_example_system_wide'
model_type = 'LR'
target_vars= 'b'
control_vars='P4_1,P4_2,P4_3'

In [29]:
# try to kill it first since we are going to recreate it using the POST operation in the following cells
response = requests.delete(reg_job_url +  "/" + model_id, headers=headers)
print(response.status_code)

400


In [30]:
fields = {
        'model-id':  model_id,
        'target-variable': target_vars,
        'input-variables': control_vars,
        'model-type': model_type,
        }

files = {
        'training-data':  ('training-data', open(datafile_name, 'rb'),'text/csv'),
        }
response = requests.post(reg_job_url, data=fields, files=files, headers=headers)

pprint.pprint(response.json())

{'input-variables': 'P4_1,P4_2,P4_3',
 'model-id': 'model_P4_1_LR_params_nb_example_system_wide',
 'model-type': 'LR',
 'queued-time': '2022-02-03T22:11:06.645295',
 'scaling-factor': 1.0,
 'status': 'queued',
 'target-variable': 'b'}


In [31]:
# Here we waiting until the job is finished
retries = 0
status = "queued"
while retries < 10 and (status=="queued" or status=="running"):
    time.sleep(5)
    get_response = requests.get(reg_job_url +  "/" + model_id, headers=headers)
    pprint.pprint(get_response.json())
    status = get_response.json()['status']
    print(status)

status = get_response.json()['status']
print(status)

{'input-variables': 'P4_1,P4_2,P4_3',
 'model-id': 'model_P4_1_LR_params_nb_example_system_wide',
 'model-type': 'LR',
 'queued-time': '2022-02-03T22:11:06.645295',
 'scaling-factor': 1.0,
 'status': 'queued',
 'target-variable': 'b'}
queued
{'duration': 0.006138,
 'end-time': '2022-02-03T22:11:17.460343',
 'finish-time': '2022-02-03T22:11:18.085191',
 'input-variables': 'P4_1,P4_2,P4_3',
 'model-id': 'model_P4_1_LR_params_nb_example_system_wide',
 'model-metadata': {'mean-absolute-error': 'NA', 'r2-score': 'NA'},
 'model-type': 'LR',
 'output-files': ['model_P4_1_LR_params_nb_example_system_wide'],
 'queued-time': '2022-02-03T22:11:06.645295',
 'scaling-factor': 1.0,
 'start-time': '2022-02-03T22:11:17.454205',
 'status': 'finished',
 'target-variable': 'b'}
finished
finished


#### Training for P5_1

In [32]:
datafile_name = 'data/P5_1.csv'
data_df = pd.read_csv(datafile_name)
data_df.head()

Unnamed: 0,P5_1,P5_2,b
0,1,0,0


In [33]:
model_id = 'model_P5_1_LR_params_nb_example_system_wide'
model_type = 'LR'
target_vars= 'b'
control_vars='P5_1,P5_2'

In [34]:
# try to kill it first since we are going to recreate it using the POST operation in the following cells
response = requests.delete(reg_job_url +  "/" + model_id, headers=headers)
print(response.status_code)

400


In [35]:
fields = {
        'model-id':  model_id,
        'target-variable': target_vars,
        'input-variables': control_vars,
        'model-type': model_type,
        }

files = {
        'training-data':  ('training-data', open(datafile_name, 'rb'),'text/csv'),
        }
response = requests.post(reg_job_url, data=fields, files=files, headers=headers)

pprint.pprint(response.json())

{'input-variables': 'P5_1,P5_2',
 'model-id': 'model_P5_1_LR_params_nb_example_system_wide',
 'model-type': 'LR',
 'queued-time': '2022-02-03T22:11:21.651368',
 'scaling-factor': 1.0,
 'status': 'queued',
 'target-variable': 'b'}


In [36]:
# Here we waiting until the job is finished
retries = 0
status = "queued"
while retries < 10 and (status=="queued" or status=="running"):
    time.sleep(5)
    get_response = requests.get(reg_job_url +  "/" + model_id, headers=headers)
    pprint.pprint(get_response.json())
    status = get_response.json()['status']
    print(status)

status = get_response.json()['status']
print(status)

{'input-variables': 'P5_1,P5_2',
 'model-id': 'model_P5_1_LR_params_nb_example_system_wide',
 'model-type': 'LR',
 'queued-time': '2022-02-03T22:11:21.651368',
 'scaling-factor': 1.0,
 'status': 'queued',
 'target-variable': 'b'}
queued
{'duration': 0.006164,
 'end-time': '2022-02-03T22:11:33.158858',
 'finish-time': '2022-02-03T22:11:33.848747',
 'input-variables': 'P5_1,P5_2',
 'model-id': 'model_P5_1_LR_params_nb_example_system_wide',
 'model-metadata': {'mean-absolute-error': 'NA', 'r2-score': 'NA'},
 'model-type': 'LR',
 'output-files': ['model_P5_1_LR_params_nb_example_system_wide'],
 'queued-time': '2022-02-03T22:11:21.651368',
 'scaling-factor': 1.0,
 'start-time': '2022-02-03T22:11:33.152694',
 'status': 'finished',
 'target-variable': 'b'}
finished
finished


### Optimization Service Creation and Job Submission

Our AI-based framework allows the embedding of process behavior information, derived from data-driven regression models, within run-time system-wide scale optimization models. The regression model type has implications for the complexity of the resulting optimization model. The novelty of our approach is the ability to efficiently solve the optimization problem for various types of regressors. 

We have developed several customized algorithms for the generalized network optimization model. For regression models such as feed-forward neural networks with rectified linear unit (ReLU) activation functions (model_type = "NN_SK") or tree-based ensemble models (model_type = "RegTree_SK" or "RF_SK"), we have showed that the optimization model reduces to a mixed-integer linear program (MILP) which can be solved using existing mature MILP solvers. For nonlinear optimization models resulting from complex deep neural networks or general black-box ensemble methods (model_type = "NN_KERAS"), a novel two-level augmented Lagrangian method is developed. 

The figure below shows the optimization technique ("opt_type") to be used based on the regression model type ("model_type") chosen above in the regression service call.

<img src="figures/Picture2.jpg" alt="Drawing" style="width: 550px;"/>

Inputs to the optimization service include:
1. opt_id: optimization ID
2. reg_models_for_opt: string of regression models trained by the regression service above for each process. Ensure that the order is the same as the order in which process nodes are listed in output_regression_cfg.csv.
3. opt_type: type of optimization model to be used (refer to the figure above)
4. input_reg: see explanation below
5. output_reg: see explanation below
6. period: number of time periods for which set point recommendations are needed

In [37]:
# Setup System-wide Optimization Variables
opt_id = "opt_P1_1_LR_SK_nb_example_system_wide"
reg_models_for_opt = "model_P1_1_LR_SK_nb_example_system_wide,model_P2_1_LR_SK_nb_example_system_wide,model_P3_1_LR_params_nb_example_system_wide,model_P3_2_LR_params_nb_example_system_wide,model_P3_3_LR_params_nb_example_system_wide,model_P4_1_LR_params_nb_example_system_wide,model_P5_1_LR_params_nb_example_system_wide"
opt_type = 'MILP'
input_reg = 'configs/input_regression_cfg.csv'
output_reg = 'configs/output_regression_cfg.csv'
capacity_file = 'configs/capacity_cfg.csv'
graph_file = 'configs/graph_connection_cfg.csv'
period="12"

#### input_regression_cfg.csv
This file describes the process ("plant"), control and observed variables ("labels"), their lower and upper bounds, their initial values, maximum change allowed within 1 time period, and fixed observed values for the observed variables. Please note that the plant ID has to begin with the uppercase "P" followed by a numeral. E.g., "P1". The user may choose any label for the control and observed variables.

In [38]:
input_reg_df = pd.read_csv(input_reg)
input_reg_df.head()

Unnamed: 0,plant,label,lower,upper,init,rate_change,observed_value
0,P1,P1_1,3500.0,5500.0,4000.0,0.5,
1,,P1_2,3500.0,5500.0,4500.0,0.5,
2,,P1_3,3500.0,5500.0,4000.0,0.5,
3,,P1_4,,,,,0.7
4,,P1_5,,,,,85.0


#### output_regression_cfg.csv
This file describes the output (or outflow) variable of a process ("plant"), its label, product ID (1, 2, 3 etc. if there are multiple outputs from a process), lower and upper bounds for each product, "model_type" and "model_name" used in the regression service above, and "model_stats" generated by the regression service. Please note that plant ID has to begin with the uppercase "P" followed by a numeral. E.g., "P1".

In [39]:
output_reg_df = pd.read_csv(output_reg)
output_reg_df.head()

Unnamed: 0,plant,label,product,lower,upper,model_type,model_name,model_stats
0,P1,bitumen,1,50,15000,LR_SK,model_P1_1_LR_SK_nb_example_system_wide,model_P1_1_LR_SK_nb_example_system_wide_stats
1,P2,bitumen,1,100,20000,LR_SK,model_P2_1_LR_SK_nb_example_system_wide,model_P2_1_LR_SK_nb_example_system_wide_stats
2,P3,bitumen,1,100,600,LR,model_P3_1_LR_params_nb_example_system_wide,
3,,hot process water,2,100,11000,LR,model_P3_2_LR_params_nb_example_system_wide,
4,,diluent,3,100,26000,LR,model_P3_3_LR_params_nb_example_system_wide,


#### graph_connections_cfg.csv
This file contains the graph connections among plants and tanks. The first column is the labels of ‘start’ nodes and the first row is the labels of ‘end’ nodes. We recall that the process (or plant) node starts by ‘P’ and the storage (or tank)node starts by ‘T’. Value 0 means there is no connection from node the ‘start’node to the ‘end’ node. If the value is not 0, then there is a connection from the‘start’ node to the ‘end’ node.  If the ‘start’ node is a process node, the value N= 1,2,3, . . .is the product number that produces from the ‘start’ node to the ‘end’ node. For example, in the file below, P1 to T1 has the value 1, which means that there is a connection from P1 to T1. P2 to T1 has the value 0, which means that there is no connection from P2 to T1. P3 to T6 has the value 3, which means that product number 3 is produced from P3 to T6.

In [40]:
graph_df = pd.read_csv(graph_file)
graph_df.head(20)

Unnamed: 0.1,Unnamed: 0,P1,P2,T1,T2,T3,P3,T4,T5,T6,P4,P5
0,P1,0,0,1,0,0,0,0,0,0,0,0
1,P2,0,0,0,1,0,0,0,0,0,0,0
2,T1,0,0,0,0,1,0,0,0,0,0,0
3,T2,0,0,0,0,1,0,0,0,0,0,0
4,T3,0,0,0,0,0,1,0,0,0,0,0
5,P3,0,0,0,0,0,0,1,2,3,0,0
6,T4,0,0,0,0,0,0,0,0,0,1,0
7,T5,0,0,0,0,0,0,0,0,0,1,0
8,T6,0,0,0,0,0,0,0,0,0,0,1
9,P4,0,0,0,0,0,0,0,0,0,0,0


#### capacity.csv
Each column of this file contains the information of Storage (Tank) nodes as follows.
1. tank: Tank (or Storage) Name, e.g. T1, T2, ...
2. lower: lower bound value for the tank, i.e., the minimum value that can be stored.
3. upper:  upper bound value for the tank, i.e., the maximum value that can be stored.
4. init: Initial value at the storage, i.e., at time period 0.5
5. rate_change:  The maximum rate that the amount could be change from this period to the next period.

In [41]:
capacity_df = pd.read_csv(capacity_file)
capacity_df.head()

Unnamed: 0,tank,lower,upper,init,rate_change
0,T1,13500,90000,13500,0.5
1,T2,3000,15000,6000,0.5
2,T3,70000,135000,125000,0.5
3,T4,0,0,0,0.5
4,T5,0,0,0,0.5


In [42]:
# try to kill it first since we are going to recreate it using the POST operation in the following cells
response = requests.delete(opt_system_wide_job_url +  "/" + opt_id, headers=headers)
print(response.status_code)


400


In [43]:
# Submit System-wide Optimization Job
fields = {
            'optimization-id': opt_id,
            'regression-models': reg_models_for_opt,
            'optimization-type': opt_type,
            'total-period': period
        }

files = {
        'input-regression-config':  ('input-regression-config', open(input_reg, 'rb'),'text/csv'),
        'output-regression-config':  ('output-regression-config', open(output_reg, 'rb'),'text/csv'),
        'capacity-config':  ('capacity-config', open(capacity_file, 'rb'),'text/csv'),
        'graph-connections-config':  ('graph-connections-config', open(graph_file, 'rb'),'text/csv')
        }

response = requests.post(opt_system_wide_job_url, data=fields, files=files, headers=headers)
pprint.pprint(response.json())

{'capacity-config-fileName': '514212953/opt_P1_1_LR_SK_nb_example_system_wide/capacity.csv',
 'capacity-config-fileNameSuffix': 'capacity.csv',
 'graph-connections-config-fileName': '514212953/opt_P1_1_LR_SK_nb_example_system_wide/graph-connection.csv',
 'graph-connections-config-fileNameSuffix': 'graph-connection.csv',
 'input-regression-config-fileName': '514212953/opt_P1_1_LR_SK_nb_example_system_wide/input_regression.csv',
 'input-regression-config-fileNameSuffix': 'input_regression.csv',
 'job': 'system_wide_optimization',
 'optimization-id': 'opt_P1_1_LR_SK_nb_example_system_wide',
 'optimization-type': 'MILP',
 'output-regression-config-fileName': '514212953/opt_P1_1_LR_SK_nb_example_system_wide/output_regression.csv',
 'output-regression-config-fileNameSuffix': 'output_regression.csv',
 'queued-time': '2022-02-03T22:11:36.716328',
 'regression-models': 'model_P1_1_LR_SK_nb_example_system_wide,model_P2_1_LR_SK_nb_example_system_wide,model_P3_1_LR_params_nb_example_system_wide,mo

In [44]:
# wait for optmization job to finish
retries = 0
status = "queued"
while retries < 50 and (status=="queued" or status=="running"):
    time.sleep(5)
    get_response = requests.get(opt_system_wide_job_url +  "/"  + opt_id, headers=headers)
    pprint.pprint(get_response.json())
    status = get_response.json()['status']
    print(status)
print(status)


{'capacity-config-fileName': '514212953/opt_P1_1_LR_SK_nb_example_system_wide/capacity.csv',
 'capacity-config-fileNameSuffix': 'capacity.csv',
 'graph-connections-config-fileName': '514212953/opt_P1_1_LR_SK_nb_example_system_wide/graph-connection.csv',
 'graph-connections-config-fileNameSuffix': 'graph-connection.csv',
 'input-regression-config-fileName': '514212953/opt_P1_1_LR_SK_nb_example_system_wide/input_regression.csv',
 'input-regression-config-fileNameSuffix': 'input_regression.csv',
 'job': 'system_wide_optimization',
 'optimization-id': 'opt_P1_1_LR_SK_nb_example_system_wide',
 'optimization-type': 'MILP',
 'output-regression-config-fileName': '514212953/opt_P1_1_LR_SK_nb_example_system_wide/output_regression.csv',
 'output-regression-config-fileNameSuffix': 'output_regression.csv',
 'queued-time': '2022-02-03T22:11:36.716328',
 'regression-models': 'model_P1_1_LR_SK_nb_example_system_wide,model_P2_1_LR_SK_nb_example_system_wide,model_P3_1_LR_params_nb_example_system_wide,mo

### Optimization Outputs

1. input_regression_solution.csv: Columns ‘plant’, ‘Label’, ‘index’, and ‘Type’ of this file contain the information of the inputs of each process node. Columns ‘Period 1’, ..., ‘Period T’ contain the solutions of the inputs (decision variables) of each process node for all periods. Note that the ‘index’ column is the feature index of the regression function in ‘plant’ column. Column ‘Type’ is to specify the type of the input feature (primary or secondary). 
2. output_regression_solution.csv: Columns ‘plant’, ‘Label’, and ‘Product’ of this file contain the information of the outputs of each process node. Columns ‘Period 1’, ..., ‘Period T’ contain the solutions of the outputs (decision variables) of each process node for all periods. 
3. tank_level_solution: Column  ‘Tank’  of  this  file  contains  the  name  of  the storage (or tank) nodes. Columns ‘Period 1’, ..., ‘Period T’ contain the solutions of the capacity (decision variables) of the storage node for all periods.
4. flowsolution.csv: Column ‘From’ of this file contains the name of the start node and column ‘To’ contains the name of the end node.  Columns ‘Period 1’, ...,‘Period T’ contain the solutions of the flows (decision variables) for all periods.

#### Input Regression Solution

In [49]:
response = requests.get(opt_system_wide_job_url +  "/" + opt_id + "/solution/input-regression", headers=headers)
in_reg = StringIO(response.text)
in_df = pd.read_csv(in_reg, sep=",")
in_df.head()

Unnamed: 0,Plant,Label,Index,Type,Period 1,Period 2,Period 3,Period 4,Period 5,Period 6,Period 7,Period 8,Period 9,Period 10,Period 11,Period 12
0,P1,P1_1,1,primary,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0
1,P1,P1_2,2,primary,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0
2,P1,P1_3,3,primary,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0,5500.0
3,P2,P2_1,1,primary,9000.0,9000.0,9000.0,9000.0,9000.0,9000.0,9000.0,9000.0,9000.0,9000.0,9000.0,9000.0
4,P2,P2_2,2,primary,9000.0,9000.0,9000.0,9000.0,9000.0,9000.0,9000.0,9000.0,9000.0,9000.0,9000.0,9000.0


#### Output Regression Solution

In [50]:
response = requests.get(opt_system_wide_job_url +  "/" + opt_id + "/solution/output-regression", headers=headers)
out_reg = StringIO(response.text)
out_df = pd.read_csv(out_reg, sep=",")
out_df.head()

Unnamed: 0,Plant,Label,Product,Period 1,Period 2,Period 3,Period 4,Period 5,Period 6,Period 7,Period 8,Period 9,Period 10,Period 11,Period 12
0,P1,bitumen,1,382.29,382.29,382.29,382.29,382.29,382.29,382.29,382.29,382.29,382.29,382.29,382.29
1,P2,bitumen,1,17726.17,17726.17,17726.17,17726.17,17726.17,17726.17,17726.17,17726.17,17726.17,17726.17,17726.17,17726.17
2,P3,bitumen,1,600.0,600.0,600.0,600.0,600.0,600.0,600.0,600.0,600.0,600.0,600.0,600.0
3,P3,hot process water,2,11000.0,11000.0,11000.0,11000.0,11000.0,11000.0,11000.0,11000.0,11000.0,11000.0,11000.0,11000.0
4,P3,diluent,3,18385.89,18385.89,18385.89,18385.89,18385.89,18385.89,18385.89,18385.89,18385.89,18385.89,18385.89,18385.89


#### Flows Solution

In [51]:
response = requests.get(opt_system_wide_job_url +  "/" + opt_id + "/solution/flows", headers=headers)
out_reg = StringIO(response.text)
out_df = pd.read_csv(out_reg, sep=",")
out_df.head()

Unnamed: 0,From,To,Period 1,Period 2,Period 3,Period 4,Period 5,Period 6,Period 7,Period 8,Period 9,Period 10,Period 11,Period 12
0,P1,T1,382.29,382.29,382.29,382.29,382.29,382.29,382.29,382.29,382.29,382.29,382.29,382.29
1,P1,T2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,P2,T1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,P2,T2,17726.17,17726.17,17726.17,17726.17,17726.17,17726.17,17726.17,17726.17,17726.17,17726.17,17726.17,17726.17
4,T1,T3,382.29,382.29,382.29,382.29,382.29,382.29,382.29,382.29,382.29,382.29,382.29,382.29


#### Tank level Solution

In [52]:
response = requests.get(opt_system_wide_job_url +  "/" + opt_id + "/solution/tank-level", headers=headers)
out_reg = StringIO(response.text)
out_df = pd.read_csv(out_reg, sep=",")
out_df.head()

Unnamed: 0,Tank,Period 1,Period 2,Period 3,Period 4,Period 5,Period 6,Period 7,Period 8,Period 9,Period 10,Period 11,Period 12
0,T1,13500.0,13500.0,13500.0,13500.0,13500.0,13500.0,13500.0,13500.0,13500.0,13500.0,13500.0,13500.0
1,T2,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0,3000.0
2,T3,123425.15,118850.29,114275.44,109700.59,105125.74,100550.88,95976.03,91401.18,86826.32,82251.47,77676.62,73101.76
3,T4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,T5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
