# **Tutorial 7**

### **A Deep Learning Sample**

In this example, we are going to use the experiment [A Machine Learning Benchmark for Facies Classification](https://github.com/yalaudah/facies_classification_benchmark) to demonstrate how you can integrate simple experiments using PyTorch into our framework.

This benchmark uses a labeled data based on *F3 Netherlands* block. This dataset is already available in our modules. But first of all, we need to extend it to the right format of the CNN model proposed by the experiment. For that, we need to recreate the functions `__len__()` and `__getitem__()` and other helper functions and attributes.

Before starting, we need to create a simple function to generate the metadata for the inline and crossline patches. This function will generate a type of index that can be access by the model iterator. For example, if our patch is a `inline=100` with size 50x50 with a stride of 30, the metadadata will generate patches like, `i_0_100_0`, `i_30_100_0`, `i_60_100_0`, etc. If it is a crossline, the metadata starts with `x`. That's why we need a `slice_type`.

In [1]:
def generate_patches(slice_type, x, y, z, stride, patch_size):
    patch_list = []
    horz_locations = range(0, x - patch_size, stride)
    vert_locations = range(0, y - patch_size, stride)
    for j in range(z):
        # patches calculation
        locations = [[i, k] for i in horz_locations for k in vert_locations]
        patches_list = [slice_type + '_' + str(i) + '_' + str(j) + '_' + str(k)
                        for i, k in locations]
        patch_list.append(patches_list)
    
    return patch_list

Once we have our patch set generator, we can define our dataset model. Notice that it imports attributes from object `F3Labeled`. It also contains data with raw and labeled data. As we mentioned previously, all we need to do now is to generate our `__len__()` and `__getitem__()` function based on our metadata. The `__generate_patches()` does that based on the shape of the original data. It also splits the dataset into train and validation. Remember that we have raw data and labeled, so they need to be equally divided.

We also have the `_load*()` functions to fit the framework API as we can see in previous tutorials. And we also have (as an extra) a `transform()` function to transform the data.

In [2]:
import torch
import itertools
import numpy as np

from sklearn.model_selection import train_test_split

from dasf.utils.types import is_gpu_array
from dasf_seismic.datasets import F3Labeled

class PatchedF3(F3Labeled):
    def __init__(self, download=False, root=None, chunks="auto", datatype="train"):
        super().__init__(download=download, root=root, chunks=chunks)
        
        self._name = ("%s (%s)" % (self._name, datatype))
        
        self.patch_size = 99
        self.per_val = 0.2
        self.datatype = datatype
        self.is_transform = True
        
    def _generate_patches(self, shape):
        stride = 50
        iline, xline, depth = shape
        
        i_list = generate_patches("i", xline, depth, iline, stride, self.patch_size)
        x_list = generate_patches("x", iline, depth, xline, stride, self.patch_size)
        
        i_list = list(itertools.chain(*i_list))
        x_list = list(itertools.chain(*x_list))

        list_train_val = i_list + x_list

        # create train and test splits:
        if self.datatype == "train":
            self.patches, _ = train_test_split(
                list_train_val, test_size=self.per_val, shuffle=True)
        elif self.datatype == "val":
            _, self.patches = train_test_split(
                list_train_val, test_size=self.per_val, shuffle=True)        

    def _lazy_load(self, xp, **kwargs):
        local_data, local_labels = super()._lazy_load(xp, **kwargs)
        
        self._generate_patches(local_data.shape)
        
        return local_data, local_labels

    def _load(self, xp, **kwargs):
        local_data, local_labels = super()._load(xp, **kwargs)

        self._generate_patches(local_data.shape)
        
        return local_data, local_labels

    def transform(self, img, lbl):
        # average of the training data  
        img -= 0.000941

        # to be in the BxCxHxW that PyTorch uses: 
        img, lbl = img.T, lbl.T

        img = np.expand_dims(img, 0)
        lbl = np.expand_dims(lbl, 0)
        
        # We need this because there is no from_cupy()
        if is_gpu_array(img):
            img = torch.as_tensor(img)
        else:
            img = torch.from_numpy(img)
            
        if is_gpu_array(lbl):
            lbl = torch.as_tensor(lbl)
        else:
            lbl = torch.from_numpy(lbl)

        img = img.float()
        lbl = lbl.long()
                
        return img, lbl
    
    def __len__(self):
        return len(self.patches)

    def __getitem__(self, index):
        patch_name = self.patches[index]
        direction, idx, xdx, ddx = patch_name.split(sep='_')

        x, idx, y = int(idx), int(xdx), int(ddx)

        if direction == 'i':
            im = self._data[idx, x:x+self.patch_size, y:y+self.patch_size]
            lbl = self._labels[idx, x:x+self.patch_size, y:y+self.patch_size]
        elif direction == 'x':    
            im = self._data[x:x+self.patch_size, idx, y:y+self.patch_size]
            lbl = self._labels[x:x+self.patch_size, idx, y:y+self.patch_size]
            
        if self.is_transform:
            im, lbl = self.transform(im, lbl)
        return im, lbl

Now, we have our customized dataset based on `F3Labeled`. Then, it is time to generate both train and validation instances.

In [3]:
train = PatchedF3(download=True, datatype="train")
val = PatchedF3(download=True, datatype="val")

Now, it is time to define our CNN model. Here, we are using a predefined model from the original [article](https://github.com/yalaudah/facies_classification_benchmark/blob/main/core/models/patch_deconvnet.py#L3). The only difference is that we moved the model to a PyTorch Lightning structure. If you don't know how to do it, check the official [documenation of PyTorch Lightning and its videos](https://pytorch-lightning.readthedocs.io/en/stable/starter/introduction.html).

In [4]:
from dasf.ml.dl.models import TorchPatchDeConvNet

# class weights initialization
class_weights = {'upper_ns': 0.7151,
                 'middle_ns': 0.8811,
                 'lower_ns': 0.5156,
                 'rijnland_chalk': 0.9346,
                 'scruff': 0.9683,
                 'zechstein': 0.9852}

model = TorchPatchDeConvNet(n_classes=len(class_weights), class_weights=class_weights)

We have all the pieces needed by the classification step. Here, we can create our Neural Network classifier. To keep the same pattern, we use here `max_iter` instead of `epoch`. In general, it has the same effect at the end.

In [5]:
from dasf.ml.dl import NeuralNetClassifier

classifier = NeuralNetClassifier(model=model, max_iter=10)

Then, let's train our model!

In [6]:
classifier.fit(X=train, y=val)

  rank_zero_deprecation(
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
  rank_zero_warn("You passed in a `val_dataloader` but have no `validation_step`. Skipping val loop.")
Missing logger folder: /seismic/docs/tutorials/lightning_logs

   | Name           | Type        | Params
------------------------------------------------
0  | unpool         | MaxUnpool2d | 0     
1  | conv_block1    | Sequential  | 37.8 K
2  | conv_block2    | Sequential  | 221 K 
3  | conv_block3    | Sequential  | 1.5 M 
4  | conv_block4    | Sequential  | 5.9 M 
5  | conv_block5    | Sequential  | 7.1 M 
6  | conv_block6    | Sequential  | 18.9 M
7  | conv_block7    | Sequential  | 16.8 M
8  | deconv_block8  | Sequential  | 18.9 M
9  | unpool_block9  | Sequential  | 0     
10 | deconv_block10 | Sequential  | 7.1 M 
11 | unpool_block11 | Sequential  | 0     
12 | deconv_block12 | Sequential  | 5.9 M 
13 | unpool

Training: 0it [00:00, ?it/s]

NameError: name 'ArrayGPU' is not defined