Old Documentation:

- [`import`](https://docs.python.org/3/reference/simple_stmts.html#the-import-statement)
- [`len`](https://docs.python.org/3/library/functions.html#len)
- [`numpy`](https://numpy.org/doc/1.19/user/whatisnumpy.html)
- [`numpy.array`](https://numpy.org/doc/stable/reference/generated/numpy.array.html)
- [numpy indexing](https://numpy.org/doc/stable/reference/arrays.indexing.html)
- [`torch`](https://pytorch.org/docs/stable/index.html)
- [`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor)
- [`torch.utils.data`](https://pytorch.org/docs/stable/data.html#torch.utils.data)
- [`torch.utils.data.Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset)

Import the numpy and pytorch (torch) modules.

In [1]:
import numpy as np
import torch

Create the sample dataset to be used in this notebook.

In [2]:
spectrograms = np.load("./samples/spectrograms/linear/all_spectrograms.npy", allow_pickle=True)

**Step 1:** Code to be executed once at the beginning for initialization

In [3]:
def init_(spectrograms, context=1, offset=1):
    
    # Create a global variable of with the spectrogram to be accessed in other functions
    global global_spectrograms
    global_spectrograms = spectrograms
    
    # Create a global variable to store the list of all (recording, timestep) index pairs.
    global global_index_map
    global_index_map = []
    
    # Use two for loops to create a list of all (recording, timestep) index pairs.
    for i, spectrogram in enumerate(spectrograms):
        for j, frame in enumerate(spectrogram):
            index_pair = (i, j)
            global_index_map.append(index_pair)
    
    # Create a global variable of the number of timesteps to be accessed in other functions
    global global_length
    global_length = len(global_index_map)

    # Create a global variable to store the context and offset constants
    global global_context, global_offset
    global_context = context
    global_offset = offset

    # Pad the rows of the spectrogram with zeros up to the offset count
    for i, spectrogram in enumerate(global_spectrograms):
        global_spectrograms[i] = np.pad(spectrogram, ((offset, offset), (0, 0)), 'constant', constant_values=0)
        
    return None

**Step 2:** Code to return the number of items as length

In [4]:
def len_():
    
    # Return the global variable of the number of timesteps in the spectrogram
    return global_length

**Step 3:** Code to return the x item at sample i, row j

In [5]:
def getitem_(index):
    
    # Get the recording index i and timestep index j that corresponds to the pair index
    i, j = global_index_map[index]
    
    # Define the starting timestep j, the context before the offset index
    start_j = j + global_offset - global_context
    
    # Define the ending timestep j, the context after the offset index
    end_j = j + global_offset + global_context + 1

    # Index the global spectrogram variable using the recording index i
    # and starting to ending timestep index j
    frame = global_spectrograms[i][start_j:end_j,:]
    
    return frame

**Step 4:** Code to return the collated list of items

In [6]:
def collate_fn_(batch):
    
    # Index the global variable X at input index i    
    batch = torch.as_tensor(batch)
    
    return batch

Example of how to use init, len, getitem, and collate as functions. Here, we create a batch of size 4.

In [7]:
# Initialize the spectrogram dataset
init_(spectrograms)

# Get the number of timesteps in the spectrogram dataset
dataset_size = len_()

# Create a random sample of 4 timesteps from the spectrogram dataset
sample_size = 4
sample_indices = np.random.choice(dataset_size, sample_size, replace=False)

# Create an empty list to store batch frame items from the spectrogram
batch = []

# Use a for loop to get all batch frame items
for i in sample_indices:
    batch.append(getitem_(i))
    
# Collate the list of batch items into a usable form
batch = collate_fn_(batch)
print(batch)

tensor([[[3.5993e+02, 6.4546e+04, 1.4306e+04,  ..., 5.9140e-06,
          1.3601e-06, 1.9567e-07],
         [6.7531e+02, 5.6624e+03, 1.5188e+02,  ..., 5.8123e-06,
          5.0022e-06, 5.3828e-06],
         [9.3238e+02, 4.8040e+04, 1.4674e+04,  ..., 7.5178e-06,
          2.5556e-06, 8.5881e-07]],

        [[0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00]],

        [[0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00]],

        [[1.0819e+03, 1.9851e+04, 6.4934e+02,  ..., 1.1824e-06,
          5.5867e-07, 8.1394e-06],
    

Example of how to create a Dataset class using init, len, getitem and collate 

In [8]:
class ExampleDataset(torch.utils.data.Dataset):
    
    def __init__(self, spectrograms, context=1, offset=1):
        
        ### Code to be executed once at the beginning for initialization
        self.spectrograms = spectrograms
        
        self.index_map = []
        for i, spectrogram in enumerate(spectrograms):
            for j, frame in enumerate(spectrogram):
                index_pair = (i, j)
                self.index_map.append(index_pair)
        
        self.length = len(self.index_map)
        
        self.context = context
        self.offset = offset

        for i, spectrogram in enumerate(self.spectrograms):
            self.spectrograms[i] = np.pad(spectrogram, ((offset, offset), (0, 0)), 'constant', constant_values=0)
        
    def __len__(self):
        
        ### Return the number of items as length
        return self.length
    
    def __getitem__(self, index):
        
        ### Return one item at recording i, timestep j
        i, j = self.index_map[index]
        start_j = j + self.offset - self.context
        end_j = j + self.offset + self.context + 1
        
        frame = self.spectrograms[i][start_j:end_j,:]

        return frame
    
    def collate_fn(batch):
        
        ### Specify how to collate list of items and what to return
        batch = torch.as_tensor(batch)

        return batch

Example of how to use the dataset class and create a batch of size 4.

In [9]:
# Instantiate the dataset class object
dataset = ExampleDataset(spectrograms)

# Create a random sample of 4 indecies from the dataset
sample_size = 4
sample_indices = np.random.choice(len(dataset), sample_size, replace=False)

# Create an empty list to store batch items from the dataset
batch = []

# Use a for loop to get all batch items
for i in sample_indices:
    batch.append(dataset[i])
    
# Collate the batch items into a usable form
batch = ExampleDataset.collate_fn(batch)
print(batch)

tensor([[[0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00]],

        [[2.3621e+03, 6.3872e+04, 9.7604e+03,  ..., 5.5466e-07,
          2.1206e-06, 3.6909e-08],
         [4.4937e+02, 3.8564e+04, 3.9723e+04,  ..., 6.6401e-06,
          7.8843e-06, 1.3130e-06],
         [1.6113e+03, 5.8914e+04, 1.1601e+04,  ..., 2.4747e-06,
          3.4297e-06, 2.5050e-07]],

        [[3.1472e+02, 1.3795e+04, 5.6144e+03,  ..., 1.6428e-05,
          1.0715e-07, 3.4364e-06],
         [1.5534e+02, 4.0900e+02, 8.3159e+02,  ..., 2.4203e-06,
          1.6974e-06, 3.1600e-06],
         [2.7039e+02, 1.2810e+04, 5.2712e+03,  ..., 1.2240e-06,
          6.5336e-07, 1.7630e-06]],

        [[1.6469e+03, 4.6789e+03, 2.9223e+03,  ..., 7.0634e-06,
          3.5327e-06, 1.5430e-06],
    

Note: The example provided does not take advantage of "multithreading". When a dataset class is combined with a data loader class, you are able to use multithreading and dramatically increase your data loading performance. The data loader class is covered in a future tutorial of this series.