Old Documentation:

- [`import`](https://docs.python.org/3/reference/simple_stmts.html#the-import-statement)
- [`len`](https://docs.python.org/3/library/functions.html#len)
- [`numpy`](https://numpy.org/doc/1.19/user/whatisnumpy.html)
- [`numpy.array`](https://numpy.org/doc/stable/reference/generated/numpy.array.html)
- [numpy indexing](https://numpy.org/doc/stable/reference/arrays.indexing.html)
- [`torch`](https://pytorch.org/docs/stable/index.html)
- [`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor)
- [`torch.utils.data`](https://pytorch.org/docs/stable/data.html#torch.utils.data)
- [`torch.utils.data.Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset)

Import the numpy and pytorch (torch) modules.

In [1]:
import numpy as np
import torch

Create the sample dataset to be used in this notebook.

In [2]:
spectrograms  = np.load("./samples/spectrograms/linear/all_spectrograms.npy", allow_pickle=True)

**Step 1:** Code to be executed once at the beginning for initialization

In [3]:
def init_(spectrograms):
    
    # Create a global variable of with the spectrogram to be accessed in other functions
    global global_spectrograms
    global_spectrograms = spectrograms
    
    # Create a global variable to store the list of all (recording, timestep) index pairs.
    global global_index_map
    global_index_map = []
    
    # Use two for loops to create a list of all (recording, timestep) index pairs.
    for i, spectrogram in enumerate(spectrograms):
        for j, frame in enumerate(spectrogram):
            index_pair = (i, j)
            global_index_map.append(index_pair)
    
    # Create a global variable of the number of timesteps to be accessed in other functions
    global global_length
    global_length = len(global_index_map)
    
    return None

**Step 2:** Code to return the number of items as length

In [4]:
def len_():
    
    # Return the global variable of the number of timesteps in the spectrogram
    return global_length

**Step 3:** Code to return the x item at sample i, row j

In [5]:
def getitem_(index):
    
    # Get the recording index i and timestep index j that corresponds to the pair index
    i, j = global_index_map[index]
    
    # Index the global spectrogram variable using the recording index i and timestep index j
    frame = global_spectrograms[i][j, :]
    
    return frame

**Step 4:** Code to return the collated list of items

In [6]:
def collate_fn_(batch):
    
    # Index the global variable X at input index i    
    batch = torch.as_tensor(batch)
    
    return batch

Example of how to use init, len, getitem, and collate as functions. Here, we create a batch of size 4.

In [7]:
# Initialize the spectrogram dataset
init_(spectrograms)

# Get the number of timesteps in the spectrogram dataset
dataset_size = len_()

# Create a random sample of 4 timesteps from the spectrogram dataset
sample_size = 4
sample_indices = np.random.choice(dataset_size, sample_size, replace=False)

# Create an empty list to store batch frame items from the spectrogram
batch = []

# Use a for loop to get all batch frame items
for i in sample_indices:
    batch.append(getitem_(i))
    
# Collate the list of batch items into a usable form
batch = collate_fn_(batch)
print(batch)

tensor([[4.4191e+01, 3.5118e+04, 4.4044e+03, 9.2870e+03, 1.5616e+04, 1.0287e+04,
         1.4174e+03, 5.0607e+01, 1.1144e+02, 3.2805e+00, 2.5750e+01, 4.5318e+01,
         4.6411e+01, 3.2684e+01, 1.2237e+01, 2.3336e+00, 1.6393e+00, 1.2512e-01,
         7.2831e-02, 6.1419e-01, 2.9565e+00, 7.8081e-01, 2.0018e-01, 9.6141e-02,
         8.3721e-02, 3.2468e-01, 5.0710e-02, 2.3230e-02, 1.8330e-02, 7.5365e-03,
         5.0574e-03, 8.1667e-01, 1.4213e+01, 2.4281e+01, 1.3704e+01, 2.6273e+00,
         6.9715e-02, 1.0229e-01, 4.3050e-01, 4.6943e-01, 1.8549e-01, 7.6213e-02,
         5.0961e-03, 2.4797e-03, 3.0911e-03, 4.9237e-03, 1.1782e-01, 2.0737e-01,
         2.6196e-01, 2.0209e-01, 1.9637e-02, 2.2573e-01, 1.2171e-01, 1.3073e-01,
         7.2328e-03, 1.2993e-04, 2.8189e-04, 1.5909e-03, 2.8205e-04, 2.2215e-05,
         3.4697e-06, 1.4747e-05, 1.0861e-05, 5.4165e-06, 2.5665e-05, 1.6192e-05,
         1.3001e-05, 4.1134e-06, 4.7362e-06, 3.1777e-06, 7.4716e-06, 4.0946e-06,
         5.4841e-06, 3.9782e

Example of how to create a Dataset class using init, len, getitem and collate 

In [8]:
class ExampleDataset(torch.utils.data.Dataset):
    
    def __init__(self, spectrograms):
        
        ### Code to be executed once at the beginning for initialization
        self.spectrograms = spectrograms
        
        self.index_map = []
        for i, spectrogram in enumerate(spectrograms):
            for j, frame in enumerate(spectrogram):
                index_pair = (i, j)
                self.index_map.append(index_pair)
        
        self.length = len(self.index_map)
        
    def __len__(self):
        
        ### Return the number of items as length
        return self.length
    
    def __getitem__(self, index):
        
        ### Return one item at recording i, timestep j
        i, j = global_index_map[index]
        frame = self.spectrograms[i][j, :]

        return frame
    
    def collate_fn(batch):
        
        ### Specify how to collate list of items and what to return
        batch = torch.as_tensor(batch)

        return batch

Example of how to use the dataset class and create a batch of size 4.

In [9]:
# Instantiate the dataset class object
dataset = ExampleDataset(spectrograms)

# Create a random sample of 4 indecies from the dataset
sample_size = 4
sample_indices = np.random.choice(len(dataset), sample_size, replace=False)

# Create an empty list to store batch items from the dataset
batch = []

# Use a for loop to get all batch items
for i in sample_indices:
    batch.append(dataset[i])
    
# Collate the batch items into a usable form
batch = ExampleDataset.collate_fn(batch)
print(batch)

tensor([[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
         0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
         0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
         0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
         0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
         0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
         0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
         0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
         0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
         0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
         0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
         0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
         0.0000e+00, 0.0000e

Note: The example provided does not take advantage of "multithreading". When a dataset class is combined with a data loader class, you are able to use multithreading and dramatically increase your data loading performance. The data loader class is covered in a future tutorial of this series.