# 3D CNN explained!

Our first proposed model, 2DCNN, is able to take advantage of the spatial and spectral domain of the input due to its 2D convolutional layers. However,  it is not able to utilize the time domain of a datapoint $j$. It analyzes a 3D tensor $\mathbf{SX_t}^j$ that arses from stacking the Static features of $j$, $\mathbf{S}^j$, and only one element of $j^{th}$ time series $\mathbf{X}^j_t$.

Our second model, 3D CNN, on the other side takes full advantage of the time domain and thus is able to analyze $j^{th}$ spectral,spatial and time domain simultaneously. This is achieved via 3D convlutional layers present in its architecture, that consists of two branches which meet together to return the final prediction.

 More precisely, for the $j^{th}$ datapoint the model takes as an input the $j^s$ static tensor $\mathbf{S}^j \in \mathbb{R}^{2 \times (2r+1) \times(2r+1)}$ and its full time series of non static tensors $\{\textbf{X}^j_{t-2},\textbf{X}^j_{t-1},\textbf{X}^j_{t}\}$, each $\in \mathbb{R}^{k \times (2r+1) \times(2r+1)}$, here $k = 8$. It then propagate these two parts of the input to two different branches. The static tensor $\mathbf{S}^j$ is passed to a 2D Convolutional Branch and the time series of tensors $\{\mathbf{X}^j_k\}^t_{k=t-3}$ to a 3D Convolutional Branch (in the form of 4D tensor of shape (c,t,h,w)). Each branch extract high-level features which are then propagated together in the rest of the network. Finally, the model returns an output $\hat Y^j_{t+1}$ indicating the confidence of the model to observe deforestation at the target location in the following year t+1 where $j \in \mathbf{J}_{t}$ (see Chapter 4 for more details about ths notation).

Here we give more detailed explanation what each branch does.

$\mathbf{S}^j$ is passed to a 2D Convolutional branch that "convolve" with the input along the spatial domain in two sequential 2D convolutional layers (conv. layers) and return a 3D tensor of high level features, $\mathbf{Z}^j_s$. These 2 conv. layers have the same type of filters. Each filter sldes at stride = 1, no padding is performed and theirs spatial size is regilated by the first argument of the model parameter kernel_size. Since we want to stack 
the output of this model branch with this that process $\mathbf{X}_t$ along the channel axis, we want both branches outputs to have the same spatial sizes. Therefore we set the model to have convolutional filters in both those branches, conv_2D and conv_3D, of the same spatial size. Both 3D tensors of high-level filters evolving from these two branches have the same spatial size, $(h \times w)$. We summarie this branch as follows:

**conv_2D = torch.nn.Sequential** : $ \mathbf{S}^j \in \mathbb{R}^{2 \times (2r+1) \times(2r+1)} \rightarrow 
2 \times (2DConv \rightarrow ReLU \rightarrow 2DBN) \rightarrow \mathbf{Z}^j_{s} \in \mathbb{R}^{c_1,h,w} $
 
         self.conv_2D = torch.nn.Sequential(        
            torch.nn.Conv2d(input_dim[0],hidden_dim[0],kernel_size = kernel_size[0]), 
            torch.nn.ReLU(),
            torch.nn.BatchNorm2d(hidden_dim[0]),
            
            torch.nn.Conv2d(hidden_dim[0],hidden_dim[0],kernel_size = kernel_size[0]), 
            torch.nn.ReLU(),
            torch.nn.BatchNorm2d(hidden_dim[0]))
            
hidden_dim[0] defines the number of filters in each layer of this brach. Hene $c_1$ = hidden_dim[0] 

<br>$\{\mathbf{X}^j_k\}^t_{k=t-3}$ is passed to a 3D Convolutional Branch in the form of 4D tensor obtained by stacking the sequence by the time domain. Thus the input tensor $\mathbf{X}^j$ $\in \mathbb{R}^{8 \times 3 \times (2r+1) \times(2r+1)}$ has its first domain defined by the channels, the second by the time and the last two, by the space. The 3D Convolutional Branch "convolve" with $\mathbf{X}^j$ across its last three dimensions, time, height and width in two sequential 3D convolutional layers. Due to our limited time domain, of size 3, the 4D filters have shape $\in \mathbb{R}^{c,2,k_h,k_w}$, where the first dimension extends to the number of the input channels and the last three define the shape of its 3D kernels. While setting different values to their spatial sizes is possible, the size of its time domain could only be 2. After propagating the input through the 2 3D conv layers, its time domain is decresed to 1 : 3 - k_size[t] + 2*padding)/stride[t] + 1 = 2 and (2 - k_size[t] + 2*padding)/stride[t] + 1 = 1, where k_size[t] = 2, stride[t] = 1, padding = 0. The model slides its filters' 3D kernels along thee time domain at stride 1 and no padding is applied on the input 4D tensor. Therefore, after the two 3D convlutional layers the output of the 3D Convoltional Branch was a 3D tensor of high-level features with no time domain, $\mathbf{Z}^j_x$. We summarize this propagation as follows:

**conv_3D = torch.nn.Sequential** : $\mathbf{X}^j \in \mathbb{R}^{8 \times 3 \times (2r+1) \times(2r+1)} \rightarrow 2\times (3DConv \rightarrow ReLU \rightarrow 3DBN) \rightarrow \mathbf{Z}^j_{x} \in \mathbb{R}^{c_2,h,w}$

            self.conv_3D = torch.nn.Sequential(
                        torch.nn.Conv3d(in_channels = input_dim[1],
                                        out_channels = hidden_dim[1],
                                        kernel_size = kernel_size[1]),
                        torch.nn.ReLU(),
                        torch.nn.BatchNorm3d(hidden_dim[1]),
            
                        torch.nn.Conv3d(in_channels = hidden_dim[1],
                                        out_channels = hidden_dim[1],
                                        kernel_size = kernel_size[1]),
                        torch.nn.ReLU(),
                        torch.nn.BatchNorm3d(hidden_dim[1]))    
                        
As mentioned above, the spatial sizes of the filters in thin brach are the same as in conv_2d brach:
kernel_sizes[0] = kernel_sizes[1]. Eg: kernel_size=((3,3),(2,3,3),(5,5)) or kernel_size=((5,5),(2,5,5),(5,5)). Here 2 in (2,5,5) is kentel_size[t].

hidden_dim[1] defines the number of filters in each layer of this brach. Hene $c_2$ = hidden_dim[1]

This two 3D tensors of high-level features,$\mathbf{Z}^j_x,\mathbf{Z}^j_s$, returned form each branch are then stacked along their third domain to form the 3D tensor $\mathbf{Z}^j$.


    def forward(self, data , sigmoid = True ):
        
        s , x = data

        s = self.conv_2D.forward(s)
        x = self.conv_3D.forward(x)        
        
as mentioned above, s after conv_2D is of shape (b,c_1,h,w) and x after conv_3D of (b,c_2,1,h,w);
Stack them together along the channel axis:

        x = x.squeeze(dim = 2 )
        x = torch.cat((x,s),dim = 1)
        
and propagated it to the rest of the network. The final part of the network has another six 2D convolutional layers. 
In this six 2D convolutional layers all filters have shape kernel_size[2], stride = 1 and no padding. The number of filters in each layer is set via hidden_dim[2]. This propagation is

        x = self.final.forward(x) 
        
and after the last convolutional operation, the output is propagated to a SPP layer

        x = spp_layer(x, self.levels)
        
and two FC layers with DO in between and a sigmoid squashing function as in our CNN model:

        x= self.ln(x)
        if sigmoid: 
            x = self.sig(x)  

Where:

        self.final = torch.nn.Sequential(
                        torch.nn.Conv2d(hidden_dim[0]+hidden_dim[1], hidden_dim[2], kernel_size[2]),
                        torch.nn.ReLU(),
                        torch.nn.BatchNorm2d(hidden_dim[2]),
            
                        torch.nn.Conv2d(hidden_dim[2], hidden_dim[2], kernel_size[2]),
                        torch.nn.ReLU(),
                        torch.nn.BatchNorm2d(hidden_dim[2]),
        
                        torch.nn.Conv2d(hidden_dim[2], hidden_dim[2], kernel_size[2]),
                        torch.nn.ReLU(),
                        torch.nn.BatchNorm2d(hidden_dim[2]),
            
                        torch.nn.Conv2d(hidden_dim[2], hidden_dim[2], kernel_size[2]),
                        torch.nn.ReLU(),
                        torch.nn.BatchNorm2d(hidden_dim[2]),
            
                        torch.nn.Conv2d(hidden_dim[2], hidden_dim[2], kernel_size[2]),
                        torch.nn.ReLU(),
                        torch.nn.BatchNorm2d(hidden_dim[2]),
                    
                        torch.nn.Conv2d(hidden_dim[2], hidden_dim[2], kernel_size[2]),
                        torch.nn.ReLU(),
                        torch.nn.BatchNorm2d(hidden_dim[2]))  
And

        self.ln = torch.nn.Sequential( 
            torch.nn.Linear(ln_in,100),
            torch.nn.ReLU(),
            torch.nn.BatchNorm1d(100),
            torch.nn.Dropout(dropout),           
            torch.nn.Linear(100, 1))
        
        self.sig = torch.nn.Sigmoid()

 We summarise the final part of the network as follows:
$\mathbf{Z}^j \in \mathbb{R}^{(c_1+c_2),h,w} \rightarrow 6 \times (2DConv \rightarrow ReLU \rightarrow 2DBN) \rightarrow SPP(n,\mathbf{k}) \rightarrow FC(spp,100) \rightarrow ReLU \rightarrow 1DBN \rightarrow DO(p) \rightarrow FC(100,1) \rightarrow \sigma \rightarrow \hat Y^j_{t+1}$

Dropout is regulated by the dropout parameter of the model.

In this model, again, we utilized 2DBN and 3DBN between each 2D and 3D conv. layers after applying ReLU activation function. The number of filters in each of the 2D and 3D conv layers of the 3D_conv , 2D_conv and final branch was set as free parameter, with default set to **hidden_dim**=(16,32,32) for the 2D_conv, 3D_conv  and final branch respectively. The spatial size of the filters was also free parameter, with default set to **kernel_size** = ((5,5),(2,5,5),(5,5)) for the 2D_conv, 3D_conv  and final branch respectively. Parameters of SPP layer and DO layers are allowed to vary too, with default set to **levels** =(13,), **dropout** = 0.2. The model is able to analyze tensors of any spatial size. This model flexability allowed us to experiment with its architecture.

<img src="images/3DConvModel.png">

In [5]:
import os
os.getcwd()

'/rdsgpfs/general/project/aandedemand/live/satellite/junin/deforestation_forecasting/python_code/Notebooks'

In [7]:
%cd "/rdsgpfs/general/project/aandedemand/live/satellite/junin/deforestation_forecasting/python_code"
import torch
from spp_layer import *

/rdsgpfs/general/project/aandedemand/live/satellite/junin/deforestation_forecasting/python_code


In [8]:
class Conv_3D(torch.nn.Module):
    def __init__(self, input_dim=(2,8),
                 hidden_dim=(16,32,32),
                 kernel_size=((5,5),(2,5,5),(5,5)),
                 levels=(10,),
                 dropout = 0.2):
        super(Conv_3D, self).__init__()
        
        self.levels = levels
        self.hidden_dim = hidden_dim
        
        self.conv_2D = torch.nn.Sequential(        
            torch.nn.Conv2d(input_dim[0],hidden_dim[0],kernel_size = kernel_size[0]), 
            torch.nn.ReLU(),
            torch.nn.BatchNorm2d(hidden_dim[0]),
            
            torch.nn.Conv2d(hidden_dim[0],hidden_dim[0],kernel_size = kernel_size[0]), 
            torch.nn.ReLU(),
            torch.nn.BatchNorm2d(hidden_dim[0]))   
        
        self.conv_3D = torch.nn.Sequential(
                        torch.nn.Conv3d(in_channels = input_dim[1],
                                        out_channels = hidden_dim[1],
                                        kernel_size = kernel_size[1]),
                        torch.nn.ReLU(),
                        torch.nn.BatchNorm3d(hidden_dim[1]),
            
                        torch.nn.Conv3d(in_channels = hidden_dim[1],
                                        out_channels = hidden_dim[1],
                                        kernel_size = kernel_size[1]),
                        torch.nn.ReLU(),
                        torch.nn.BatchNorm3d(hidden_dim[1]))    
        
        self.final = torch.nn.Sequential(
                        torch.nn.Conv2d(hidden_dim[0]+hidden_dim[1], hidden_dim[2], kernel_size[2]),
                        torch.nn.ReLU(),
                        torch.nn.BatchNorm2d(hidden_dim[2]),
            
                        torch.nn.Conv2d(hidden_dim[2], hidden_dim[2], kernel_size[2]),
                        torch.nn.ReLU(),
                        torch.nn.BatchNorm2d(hidden_dim[2]),
        
                        torch.nn.Conv2d(hidden_dim[2], hidden_dim[2], kernel_size[2]),
                        torch.nn.ReLU(),
                        torch.nn.BatchNorm2d(hidden_dim[2]),
            
                        torch.nn.Conv2d(hidden_dim[2], hidden_dim[2], kernel_size[2]),
                        torch.nn.ReLU(),
                        torch.nn.BatchNorm2d(hidden_dim[2]),
            
                        torch.nn.Conv2d(hidden_dim[2], hidden_dim[2], kernel_size[2]),
                        torch.nn.ReLU(),
                        torch.nn.BatchNorm2d(hidden_dim[2]),
                    
                        torch.nn.Conv2d(hidden_dim[2], hidden_dim[2], kernel_size[2]),
                        torch.nn.ReLU(),
                        torch.nn.BatchNorm2d(hidden_dim[2]))  
        
        ln_in = 0
        for i in levels:
            ln_in += hidden_dim[2]*i*i
        
        self.ln = torch.nn.Sequential( 
            torch.nn.Linear(ln_in,100),
            torch.nn.ReLU(),
            torch.nn.BatchNorm1d(100),
            torch.nn.Dropout(dropout),           
            torch.nn.Linear(100, 1))
        
        self.sig = torch.nn.Sigmoid()
        
        
    def forward(self, data , sigmoid = True ):
        
        s , x = data

        s = self.conv_2D.forward(s)
        x = self.conv_3D.forward(x)        
        x = x.squeeze(dim = 2 )
        x = torch.cat((x,s),dim = 1)
        x = self.final.forward(x) 
        x = spp_layer(x, self.levels)
        x= self.ln(x)
        if sigmoid: 
            x = self.sig(x)  
            
        return x.flatten()

# Initial parameters:

Here again one must be careful when setting the levels and the kernel_sizes of the model. Below we illustrate how the spatial size of the input decreases until it is passed to spp_layer when defaut parameters are used.

We chose those parameters, as they are very similar to the 2D CNN model that showed good results.
levels is set so that when the input image is of size 45, no pooling is applied, and any input greater than 45 will be downsampled.

A possible modification of this model will be to have deeper conv_2d branch where the only requirement is to have output spatial size matching to the 3d_conv branch.

In [1]:
#set image parameters
size = 45
#set model parameters for 3D_CNN
input_dim= (2,8)
hidden_dim=(16,32,32)
kernel_size=((5,5),(2,5,5),(5,5))
levels=(13,)
dropout = 0.3

In [7]:
model = Conv_3D(
    input_dim = input_dim,
    hidden_dim = hidden_dim,
    kernel_size= kernel_size,
    levels=levels,
    dropout = dropout)

In [8]:
b, c1, c2, t, h, w = 3, 2, 8, 3, size, size
s = torch.rand(2,c1,size,size)
x = torch.rand(2,c2,t,size,size)
data = (s,x)
model(data)

tensor([0.5335, 0.4771], grad_fn=<AsStridedBackward>)

In [13]:
def check_input_size(size, kernel_size):
    print("Input image spatial size:",size)
    print("Changes of the spatial size in the two branches (2D_cov and 3D_conv)")
    for i in range(2):
        print("\tLayer",i+1)
        print("\tkernel_size: ", kernel_size[0])
        size = np.array(size) - np.array(kernel_size[0]) + 1
        print("\tSize after layer %d is applied:"%(i+1), size)
        print()
    print("Changes of the spatial size in the final brach:")
    for i in range(6):
        print("\tLayer",i+1)
        print("\tkernel_size: ", kernel_size[2])
        size = np.array(size) - np.array(kernel_size[2])  + 1
        print("\tSize after layer %d is applied:"%(i+1), size)
        print()

check_input_size((size,size), kernel_size)

Input image spatial size: (45, 45)
Changes of the spatial size in the two branches (2D_cov and 3D_conv)
	Layer 1
	kernel_size:  (5, 5)
	Size after layer 1 is applied: [41 41]

	Layer 2
	kernel_size:  (5, 5)
	Size after layer 2 is applied: [37 37]

Changes of the spatial size in the final brach:
	Layer 1
	kernel_size:  (5, 5)
	Size after layer 1 is applied: [33 33]

	Layer 2
	kernel_size:  (5, 5)
	Size after layer 2 is applied: [29 29]

	Layer 3
	kernel_size:  (5, 5)
	Size after layer 3 is applied: [25 25]

	Layer 4
	kernel_size:  (5, 5)
	Size after layer 4 is applied: [21 21]

	Layer 5
	kernel_size:  (5, 5)
	Size after layer 5 is applied: [17 17]

	Layer 6
	kernel_size:  (5, 5)
	Size after layer 6 is applied: [13 13]

