# Dataset creation

This notebook is used for creating the datasets used for the training, validation and testing of the deep-learning model. 

Author of the notebook:
Antonio Magherini (Antonio.Magherini@deltares.nl).

In [1]:
# move to root directory

%cd .. 

c:\Users\magherin\Desktop\jamuna_morpho


In [2]:
# reload modules to avoid restarting the notebook every time these are updated

%load_ext autoreload
%autoreload 2

In [3]:
# import modules 

import torch 

from preprocessing.dataset_generation import * 

In [4]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

Using device: cuda


Directories of original and preprocessed images. 

In [5]:
dir_orig = r'data\satellite\original'
dir_proc = r'data\satellite\preprocessed'
dir_dataset = r'data\satellite\dataset'

Available collections.

In [6]:
L5 = r'LANDSAT_LT05_C02_T1_L2'
JRC = r'JRC_GSW1_4_MonthlyHistory'
S1 = r'COPERNICUS_S1_GRD'

Set string variables.

In [7]:
train = 'training'
val = 'validation'
test = 'testing'

train_val_test_list = [train, val, test]

The next cells are used just to show how the different functions work. 

1. Create the input and target datasets: all images are loaded regardless of their quality.

In [8]:
input_dataset, target_dataset = create_datasets(train, 1, 5)



In [9]:
print(f'Input shape: {np.shape(input_dataset)}\nTarget shape: {np.shape(target_dataset)}')

Input shape: (30, 4, 1000, 500)
Target shape: (30, 1, 1000, 500)


2. Combine input and target datasets filtering out bad images (based on <code>no-data</code> and <code>water</code> thresholds). 

In [10]:
input_train1, target_train1 = combine_datasets(train, 1)

In [11]:
print(f"Input list: {np.shape(input_train1)}.\nTarget list: {np.shape(target_train1)}") 

Input list: (22, 4, 1000, 500).
Target list: (22, 1000, 500)


### 1. Training dataset

In [12]:
# training
dataset_train = create_full_dataset(train, device='cpu')
print(f"Training samples: {len(dataset_train)}")

Training samples: 577


In [13]:
print(f"Input dataset sample shape: {dataset_train[0][0].shape}\nTarget dataset sample shape: {dataset_train[0][1].shape}")

Input dataset sample shape: torch.Size([4, 1000, 500])
Target dataset sample shape: torch.Size([1000, 500])


### 2. Validation dataset

In [14]:
# validation
dataset_val = create_full_dataset(val, device='cpu')
print(f"Validation samples: {len(dataset_val)}")

Validation samples: 22


### 3. Testing dataset

In [15]:
# testing
dataset_test = create_full_dataset(test, device='cpu')
print(f"Testing samples: {len(dataset_test)}")

Testing samples: 22


In [16]:
# number of folders in the given directory

print(len(next(os.walk(dir_dataset))[1]))

30
