# Section 5.3.2 of Final Year Project (FYP) Thesis

On normalisation of $\theta$ matrix and the Lasso and Linear regression

First, let us load $\theta$ matrix and $\frac{\partial U}{\partial t}$ vector. 

theta variable is $\theta$ matrix in numpy array format.

dt variable is $\frac{\partial U}{\partial t}$ vector in numpy array format.

Change the parent_directory whenever you deem fit.

The target Diffusion Equation used for this notebook is: $\frac{\partial U}{\partial t}$ = $10\frac{\partial^2 U}{\partial x^2}$

Also, note that the actual PDE recoverd by DeepMod was recorded in Table 8.1 in section 5.1.2 of the thesis.

The recoverd PDE by DeepMod was: $\frac{\partial U}{\partial t} = 0.061U\frac{\partial U}{\partial x} - 0.015U^{2}\frac{\partial U}{\partial x}$

In [2]:
import numpy as np
import os
from sklearn.linear_model import LinearRegression
from sklearn import linear_model

parent_directory = '/mnt/mbi/home/e0031794/Documents/FYP/FYP_results_11_9_2019/data_slicing_val_diff_10_1/1_trial/500_subset_clean/Out_subset_500_Original DeepMod/20200220_110440/'

theta_path = os.path.join(parent_directory, 'theta.npy')
theta = np.load(theta_path)

dt_path = os.path.join(parent_directory, 'time_deriv.npy')
dt = np.load(dt_path)[0] 

## Normalisation of $\theta$ matrix prior to Linear Regression

Next, we shall see what happens when we normalise $\theta$ matrix, prior to Linear Regression.

We will also perform a thresholding procedure similar to DeepMod and print the resulting bit mask/sparsity pattern.

Each element in the coefficient vector corresponds to the following PDE library: 

[1, u_x, u_xx, u_xxx, u, u(u_x), u(u_xx), u(u_xxx), u<sup>2</sup>, u<sup>2</sup>(u_x), u<sup>2</sup>(u_xx), u<sup>2</sup>u_xxx]

A true value in bit mask means a PDE term is picked up by DeepMod.

In [7]:
norm_lr = LinearRegression(normalize=True).fit(theta, dt) #creates LR model
print('Coefficients: ', str(norm_lr.coef_))

norm_coeff = norm_lr.coef_

upper_lim, lower_lim = np.median(norm_coeff) + np.std(norm_coeff), np.median(norm_coeff) - np.std(norm_coeff)
sparsity_mask_lr = (norm_coeff <= upper_lim) & (norm_coeff >= lower_lim)

print('bit mask: ', str(~sparsity_mask_lr) + '\n')

Coefficients:  [[ 0.         -0.83731145  8.745496   14.3497     -0.07737297 -1.6921284
   0.9870692   5.443123    0.02038583  0.8381653  -0.07162736  1.5373095 ]]
bit mask:  [[False False  True  True False False False  True False False False False]]



## No normalisation of $\theta$ matrix prior to Linear Regression

Similarly, let us see what happens if we do not normalise $\theta$ matrix prior to Linear Regression.

In [8]:
lr = LinearRegression(normalize=False).fit(theta, dt) #creates LR model
print('Coefficients: ', str(lr.coef_))

LR_coeff = lr.coef_

upper_lim, lower_lim = np.median(LR_coeff) + np.std(LR_coeff), np.median(LR_coeff) - np.std(LR_coeff)
sparsity_mask_lr = (LR_coeff <= upper_lim) & (LR_coeff >= lower_lim)

print('bit mask: ', str(~sparsity_mask_lr) + '\n')

Coefficients:  [[ 0.         -0.8373047   8.745494   14.349767   -0.07737475 -1.6921326
   0.98708075  5.4431124   0.02038648  0.83816576 -0.07162741  1.5373102 ]]
bit mask:  [[False False  True  True False False False  True False False False False]]



Notice the 3rd coefficient, which corresponds to diffusion coefficient, is similar in value with and without normalisation.

Ordinary Least Square Regression is not affected by normalisation of $\theta$

## Normalisation of $\theta$ matrix prior to Lasso Regression

Let us see what happens if we normalise $\theta$ matrix prior to Lasso Regression, using the same

$\theta$ matrix and $\frac{\partial U}{\partial t}$ vector as above.

In [10]:
norm_L1_lr = linear_model.Lasso(alpha=1e-05, normalize=True, max_iter=50000, tol=1e-06).fit(theta, dt)
print('Coefficients: ', str(norm_L1_lr.sparse_coef_.toarray()))

norm_L1_coeff = norm_L1_lr.sparse_coef_.toarray()

upper_lim, lower_lim = np.median(norm_L1_coeff ) + np.std(norm_L1_coeff ), np.median(norm_L1_coeff ) - np.std(norm_L1_coeff )
sparsity_mask_l1 = (norm_L1_coeff  <= upper_lim) & (norm_L1_coeff  >= lower_lim)

print('bit mask: ', str(~sparsity_mask_l1) + '\n')

Coefficients:  [[ 0.0000000e+00 -8.7900156e-01  8.9777632e+00  1.2496575e+01
  -8.4509566e-02  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  1.5270966e-01 -2.5805438e-04  3.8803408e-01]]
bit mask:  [[False False  True  True False False False False False False False False]]



  positive)


## No normalisation of $\theta$ matrix prior to Lasso Regression

Let us see what happens if we do not normalise $\theta$ matrix prior to Lasso Regression, using the same

$\theta$ matrix and $\frac{\partial U}{\partial t}$ vector as above.

In [11]:
L1_lr = linear_model.Lasso(alpha=1e-05, normalize=False, max_iter=50000, tol=1e-06).fit(theta, dt)
print('Coefficients: ', str(L1_lr.sparse_coef_.toarray()))

L1_coeff = L1_lr.sparse_coef_.toarray()

upper_lim, lower_lim = np.median(L1_coeff) + np.std(L1_coeff), np.median(L1_coeff) - np.std(L1_coeff)
sparsity_mask_l1 = (L1_coeff <= upper_lim) & (L1_coeff >= lower_lim)

print('bit mask: ', str(~sparsity_mask_l1) + '\n')

Coefficients:  [[ 0.         -0.7821964   8.715628   14.063971   -0.08637488 -1.7208784
   0.99282473  5.4442506   0.02165441  0.84042513 -0.07128842  1.5397874 ]]
bit mask:  [[False False  True  True False False False  True False False False False]]



  positive)


Notice the value of 3rd coefficient, which corresponds to diffusion coefficient, is related to the normalisation of $\theta$ matrix.

Also notice that with normalisation, the bit mask recovers: u_xx, u_xxx (redundant).

But, without normalisation, the bit mask recovers: u_xx, u_xxx (redundant) and u(u_xxx) (redundant).

It seems like normalisation of $\theta$ matrix is important to somewhat recover correct PDE terms.

Normalisation yields 1 redundant term. No normalisation yields 2 redundant terms.

## DeepMod's bit mask vs sklearn's bit mask

Let's see DeepMod's bit mask and compare the bit mask obtained from sklearn's Lasso Regression with normalisation of $\theta$ matrix.

In [15]:
parent_dir = '/mnt/mbi/home/e0031794/Documents/FYP/FYP_results_11_9_2019/data_slicing_val_diff_10_1/1_trial/500_subset_clean/google_drive_storage_Original DeepMod/'

sparse_pattern_from_deepmod = np.load(os.path.join(parent_dir, 'sparse_pattern_deepmod.npy'))

print('bit mask from DeepMod: ', sparse_pattern_from_deepmod[0])

bit mask from DeepMod:  [[False]
 [False]
 [False]
 [False]
 [False]
 [ True]
 [False]
 [False]
 [False]
 [ True]
 [False]
 [False]]


Bit mask from DeepMod yields u(u_x) and u<sup>2</sup>(u_x) (as seen from the elements with "True" values).

Note each element of the bit mask corresponds to: 

[1, u_x, u_xx, u_xxx, u, u(u_x), u(u_xx), u(u_xxx), u<sup>2</sup>, u<sup>2</sup>(u_x), u<sup>2</sup>(u_xx), u<sup>2</sup>u_xxx]

## Simulating refitting of DeepMod

Recall from the FYP thesis, section 2.4, which described DeepMod had two fittings: 

The first fitting contained deep learning + Lasso Regression.

The second fitting contained deep learning + Linear Regression. 

We shall simulate the second fitting based on the bit mask generated from normalisation of $\theta$ matrix prior to the performing sklearn's 

Lasso Regression.

In [14]:
#refitting with LR and L1 LR. select 3rd, 4th columns of theta matrix based on sparse pattern generated from
reduced_theta = theta[:, [2,3]]

lr = LinearRegression().fit(reduced_theta, dt) #creates LR model
print('Linear Regression coefficients', lr.coef_)

Linear Regression coefficients [[9.679331  4.0417533]]


From the above Linear Regression has two PDE basis terms:

u_xx, u_xxx and the coefficients are 9.679 and 4.041.

The recovered PDE is: $\frac{\partial U}{\partial t} = 9.679\frac{\partial^2 U}{\partial x^2} + 4.041\frac{\partial^3 U}{\partial x^3}$

Compare $\frac{\partial U}{\partial t} = 9.679\frac{\partial^2 U}{\partial x^2} + 4.041\frac{\partial^3 U}{\partial x^3}$ with the one recovered by DeepMod:

$\frac{\partial U}{\partial t} = 0.061U\frac{\partial U}{\partial x} - 0.015U^{2}\frac{\partial U}{\partial x}$