In [1]:
from IPython.core.display import display, HTML,display_html
display(HTML("<style>.container { width:85% !important; }</style>"))

## PINN - Feature Engineered, Line Search approach 
This notebook describes the theory and principle in developing this `FE_LS` approach which uses customized and scalable feature engineering (`FE`) and a Line Search Method (`LS`) to adaptively find the optimal learning rate during training.

### 1.) Feature Engineering

The relationship between the $c_f$ and other terms in the **Pressure Loss Equation** is given by:

$$
\Delta P = \zeta \cdot \frac{c_f}{2} \cdot \rho \cdot \left(\frac{\dot{V}}{A_{Duct}}\right)^{2}  + 
\zeta^* \cdot \frac{c_f^*}{2} \cdot \rho \cdot \left(\frac{\dot{V}}{A_{Duct}}\right)^{2}$$<span style="float:right">(1)</span> where, 
* $c_f$ is the correction factor for MTR,
* $c_f^*$ is the correction factor for a neighbouring duct part,
* $\Delta P$ is Pressure Loss,
* ${\zeta}$ is the Loss Coefficient, 
* ${\rho}$ is the Density,
* $\dot{V}$ is the Volume flow in _liters/sec_. <br>
* $A_{Duct} = \frac{\pi}{4} \cdot {D}^{2}$, where $A_{Duct}$ is the Duct area and $D$ is the Duct diameter

From equation (1), we can derive the $c_f$ for MTR

$$ c_f = \frac{\Delta P - \zeta^* \cdot \frac{c_f^*}{2} \cdot \rho \cdot \left(\frac{\dot{V}}{A_{Duct}}\right)^{2}  }{\zeta \cdot \frac{\rho}{2}\cdot \left(\frac{\dot{V}}{A_{Duct}}\right)^{2}} $$

Taking out the constants like ${\rho}$, $\dot{V}$, $A_{Duct}$, we arrive at:
<span style="float:right">(2)</span>
$$ c_f = \frac{2 \cdot \Delta P \cdot A_{Duct}^2 }{ \zeta \cdot \rho \cdot \dot{V}^2  } - \frac{ \zeta^* \cdot c_f^*}{\zeta} $$

In the equation (2) we will consider the first part as feature 1 ($x_{f_1}$) and the second part as feature 2 ($x_{f_2}$).  We make the assumption that $\zeta^* \cdot c_f^* = 1$, so

$$x_{f_1} =  \frac{2 \cdot \Delta P \cdot A_{Duct}^2}{ \zeta \cdot \rho \cdot \dot{V}^2  }$$ <br>
$$x_{f_2} = \frac{1}{\zeta}$$

Since these two features would constitute the 2 nodes (or neurons) in the input layer and setting the hidden layer dimension to have only 1 neuron, $w_{ij}$, the weight matrix from the input to hidden layer will be a [2x1] matrix of the form, $w_{11} \cdot x_{f_1} + w_{21} \cdot x_{f_2}$.

**Advantages of this Feature Engineering**:<br>
1.) This feature engineering approach also reduces the feature space that is required as in the general approach. This method also becomes very scalable as we can chain new features ($x_i$) for the correction factors for the duct parts and other restrictions in various temperature zones. 
For example we can calculate the `c_f` for the Duct Part which interacts with the MTR as below:
$$ c_f^* = \frac{2 \cdot \Delta P \cdot A_{Duct}^2 }{ \zeta^* \cdot \rho \cdot \dot{V}^2  } - \frac{ \zeta \cdot c_f}{\zeta^*} $$

2.) The second advantage in this feature engineered approach is that the non-linearity is removed [in Eqn (2)], making the solution to this problem linear, so there is no need for any special activation functions in the neural network which may introduce non-linearity and only linear (identity) activation function is used.

### 2.) Adaptive Learning Rate using Line Search
In conjunction, with this feature engineering approach, a line search method (based on [Secant algorithm](https://nickcdryan.com/2017/09/13/root-finding-algorithms-in-python-line-search-bisection-secant-newton-raphson-boydens-inverse-quadratic-interpolation-brents)) was also developed to find the learning rate adaptively during the training process and improve the convergence speed. Our error function **E** is a function of weights, $E = E(w)$, our goal is to find the global minima of the error function quickly $E(w) = Min!$: 
$$ \implies \Delta_w{E}(w_0 - \alpha \Delta_w{E_0}) \cdot  \Delta_w{E_0} = 0$$where
$w_0$ is the initial weights.  For simplicity, let's denote above equation as $f(\alpha)$. Then the basic steps for this line search method to find the best learning rate ($\alpha$) is:

 1. Calculate $f_0 = f(\alpha_0 = 0)$  
 2. Calculate $f_1 = f(\alpha_1 = 1)$  
 3. Assume that $f$ can be approximated by a linear function $\hat{f}$  $$\hat{f} = \frac{f_1 - f_0}{\alpha_1 - \alpha_0}\cdot \alpha + f_0$$ To find a new learning rate (i.e $\alpha_2$ ) for which $\hat{f} = 0 \implies$ $$\alpha_2 = - \frac{f_0  \cdot(\alpha_1 - \alpha_0)}{(f_1 - f_0)}$$ Good limits for $\alpha_2$ are $0.01 \leqslant \alpha_2 \leqslant 10$
 4. Calculate $f_2 = f(\alpha_2)$ 