# Problem 1 (50 points) 

Vapor-liquid equilibria data are correlated using two adjustable parameters $A_{12}$ and $A_{21}$ per binary
mixture. For low pressures, the equilibrium relation can be formulated as:

$$
\begin{aligned}
p = & x_1\exp\left(A_{12}\left(\frac{A_{21}x_2}{A_{12}x_1+A_{21}x_2}\right)^2\right)p_{water}^{sat}\\
& + x_2\exp\left(A_{21}\left(\frac{A_{12}x_1}{A_{12}x_1+A_{21}x_2}\right)^2\right)p_{1,4 dioxane}^{sat}.
\end{aligned}
$$

Here the saturation pressures are given by the Antoine equation

$$
\log_{10}(p^{sat}) = a_1 - \frac{a_2}{T + a_3},
$$

where $T = 20$($^{\circ}{\rm C}$) and $a_{1,2,3}$ for a water - 1,4 dioxane
system is given below.

|             | $a_1$     | $a_2$      | $a_3$     |
|:------------|:--------|:---------|:--------|
| Water       | 8.07131 | 1730.63  | 233.426 |
| 1,4 dioxane | 7.43155 | 1554.679 | 240.337 |


The following table lists the measured data. Recall that in a binary system $x_1 + x_2 = 1$.

|$x_1$ | 0.0 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1.0 |
|:-----|:--------|:---------|:--------|:-----|:-----|:-----|:-----|:-----|:-----|:-----|:-----|
|$p$| 28.1 | 34.4 | 36.7 | 36.9 | 36.8 | 36.7 | 36.5 | 35.4 | 32.9 | 27.7 | 17.5 |

Estimate $A_{12}$ and $A_{21}$ using data from the above table: 

## Part 1.
**Formulate the least square problem.**
$$\min_{A_{12}, A_{21}} \sum_{i=1}^{n} (p(x^{(i)}, A_{12},A_{21})-p^{(i)})^2  \quad \forall i=1,2,...11$$

such that  $$p(x^{(i)}, A_{12},A_{21})=x^{(i)}_1\exp\left(A_{12}\left(\frac{A_{21}x^{(i)}_2}{A_{12}x^{(i)}_1+A_{21}x^{(i)}_2}\right)^2\right)p_{water}^{sat} + x_2\exp\left(A_{21}\left(\frac{A_{12}x^{(i)}_1}{A_{12}x^{(i)}_1+A_{21}x^{(i)}_2}\right)^2\right)p_{1,4 dioxane}^{sat} $$
and $$ x_2 = 1-x_1 $$

## Part 2. 
**Since the model is nonlinear, the problem does not have an analytical solution. Therefore, solve it using the gradient descent or Newton's method implemented in HW1.** <br>
We will calculate the saturation pressures, using $p^{sat}=10^{a_1-\frac{a_2}{T+a_3}}$.

In [3]:
# HOUSEKEEPIN'
import torch as t
from torch.autograd import Variable
import numpy as np

p_satw=10**(8.071 - 1730.63/(20+233.426)) # 17.460784103526855 
p_sat14=10**(7.43155 - 1554.679/(20+240.337)) # 28.824099527405245   
x=[ 0.0 , 0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 , 0.8 , 0.9 , 1.0 ]
p=[ 28.1 , 34.4 , 36.7 , 36.9 , 36.8 , 36.7 , 36.5 , 35.4 , 32.9 , 27.7 , 17.5 ]

print(p_satw)
print(p_sat14)

17.460784103526855
28.824099527405245


In [6]:
# The FUNCTIONS.

def press(a,xi): # Calculating pressure!!
    x1=xi
    x2=1-x1
    A12= a[0]
    A21= a[1]
    return x1 * t.exp(A12* ( (A21*x2)/(A12*x1 + A21*x2) )**2 ) * p_satw + x2 * t.exp(A21* ( (A12*x1)/(A12*x1 + A21*x2) )**2 ) * p_sat14

def objf(a): # Total sum objective function...
    total=0
    for i in range(len(x)):
        xi=x[i]
        pres=press(a,xi)
        #print(press)
        total = total + (pres-p[i])**2  # Sum the squared difference.
    return total

def lines(a):  # because you just gotta have a line search!
    step=.1 # initiate
    while objf(a-step*a.grad) > objf(a)-step*(0)*np.matmul(a.grad,a.grad):
        step=.5*step
    return step

# VARIABLE SETUP.

a = Variable(t.tensor([1.0, 6]), requires_grad=True) 
e = 100 # Also our error term! Initialize.

# RUN THIS THING!
while e > 0.1:  # Error stop criteria. 
    objective=objf(a)
    objective.backward()
    step=lines(a)
    #print('Objective func is now ' +str(objective.data.numpy()))
    e = t.linalg.norm(a.grad) # update the error value.
    with t.no_grad():
        #print('a is ' + str(a))
        #print('grad is ' + str(a.grad))
        a -= step * a.grad
        #print('a is now ' + str(a) + ' and e is '+str(e))
        #print('step')
        # need to clear the gradient at every step, or otherwise it will accumulate...
        a.grad.zero_()
        
print('Final a is now ' + str(a.data.numpy()))
print('The objective function is now ' + str(objective.data.numpy()))

Final a is now [1.9594922 1.6899538]
The objective function is now 0.6712008


## Part 3. 
**Compare your optimized model with the data. Does your model fit well with the data?**

In [15]:
# Well, time to calculate these new p values.

a_solve=a.data

print('Data Pressures   Model Pressures')
for i in range(len(x)):
    print(str(p[i]) + '             ' + str(press(a_solve,x[i]).item()))
    

Data Pressures   Model Pressures
28.1             28.824098587036133
34.4             34.64545822143555
36.7             36.45291519165039
36.9             36.86630630493164
36.8             36.87278366088867
36.7             36.749027252197266
36.5             36.39037322998047
35.4             35.3852653503418
32.9             32.9476203918457
27.7             27.72649383544922
17.5             17.460784912109375


The model-generated pressures fit very well with the values in the original data. 

---

# Problem 2 (50 points) 

Solve the following problem using Bayesian Optimization:
$$
    \min_{x_1, x_2} \quad \left(4-2.1x_1^2 + \frac{x_1^4}{3}\right)x_1^2 + x_1x_2 + \left(-4 + 4x_2^2\right)x_2^2,
$$
for $x_1 \in [-3,3]$ and $x_2 \in [-2,2]$. A tutorial on Bayesian Optimization can be found [here](https://thuijskens.github.io/2016/12/29/bayesian-optimisation/).

In [1]:
## Sample code from tutorial link.

import sklearn.gaussian_process as gp

def bayesian_optimization(n_iters, sample_loss, xp, yp):
  """

  Arguments:
  ----------
    n_iters: int.
      Number of iterations to run the algorithm for.
    sample_loss: function.
      Loss function that takes an array of parameters.
    xp: array-like, shape = [n_samples, n_params].
      Array of previously evaluated hyperparameters.
    yp: array-like, shape = [n_samples, 1].
      Array of values of `sample_loss` for the hyperparameters
      in `xp`.
  """

  # Define the GP
  kernel = gp.kernels.Matern()
  model = gp.GaussianProcessRegressor(kernel=kernel,
                                      alpha=1e-4,
                                      n_restarts_optimizer=10,
                                      normalize_y=True)
  for i in range(n_iters):
    # Update our belief of the loss function
    model.fit(xp, yp)

    # sample_next_hyperparameter is a method that computes the arg
    # max of the acquisition function
    next_sample = sample_next_hyperparameter(model, yp)

    # Evaluate the loss for the new hyperparameters
    next_loss = sample_loss(next_sample)

    # Update xp and yp

In [41]:
# A simple example of using PyTorch for gradient descent

import torch as t
from torch.autograd import Variable

# Define a variable, make sure requires_grad=True so that PyTorch can take gradient with respect to this variable
x = Variable(t.tensor([1.0, 0.0]), requires_grad=True)

# Define a loss
loss = (x[0] - 1)**2 + (x[1] - 2)**2

# Take gradient
loss.backward()

# Check the gradient. numpy() turns the variable from a PyTorch tensor to a numpy array.
x.grad.numpy()

array([ 0., -4.], dtype=float32)

In [42]:
# Let's examine the gradient at a different x.
x.data = t.tensor([2.0, 1.0])
loss = (x[0] - 1)**2 + (x[1] - 2)**2
loss.backward()
x.grad.numpy()

array([ 2., -6.], dtype=float32)

In [43]:
# Here is a code for gradient descent without line search

import torch as t
from torch.autograd import Variable

x = Variable(t.tensor([1.0, 0.0]), requires_grad=True)

# Fix the step size
a = 0.01

# Start gradient descent
for i in range(1000):  # TODO: change the termination criterion
    loss = (x[0] - 1)**2 + (x[1] - 2)**2
    loss.backward()
    
    # no_grad() specifies that the operations within this context are not part of the computational graph, i.e., we don't need the gradient descent algorithm itself to be differentiable with respect to x
    with t.no_grad():
        x -= a * x.grad
        
        # need to clear the gradient at every step, or otherwise it will accumulate...
        x.grad.zero_()
        
print(x.data.numpy())
print(loss.data.numpy())

[1.        1.9999971]
8.185452e-12
