## Homework 0 Part 2: Data Loaders

In this assignment, you will be provided with data and an expected result. Your task is to fill out the starter code to obtain the expected result. Do not modify the data (X or Y), and do not modify the instantiation of the dataset or dataloader.

All three versions -- easy difficulty, medium difficulty, and hard difficulty -- have the same solution code and the same examples. We recommend starting with the easy difficulty. Once you get the expected results with the easy difficulty, try again with the medium difficulty. If you want to challenge yourself, try again with the hard difficulty.

CUDA is not required to complete most of this assignment, but it is required to execute the final command (unless you have a GPU in your local machine). Please use AWS to access CUDA available resources by following the AWS recitation0.

Have fun!

<hr style="border:2px solid gray"> </hr>

In [13]:
import numpy as np
import torch

<hr style="border:2px solid gray"> </hr>

### Exercise 1

In [14]:
X = np.array([2,  3,  4,  5,  6,  7,  8,  9])

In [15]:
class ExampleDataset1(torch.utils.data.Dataset):
    
    def __init__(self, X):
        
        ### Assign data to self (1 line)
        self.X = X
        
        ### Assign length to self (1 line)
        self.length = len(X)
        
    def __len__(self):
        
        ### Return length (1 line)
        return self.length
    
    def __getitem__(self, i):
        
        ### Return data at index i (1 line)
        return self.X[i]
    
    def collate_fn(batch):
        
        ### Convert batch to tensor (1 line)
        batch_x = torch.as_tensor(batch)
        
        ### Return batched data and labels (1 line)
        return batch_x

In [16]:
dataset1 = ExampleDataset1(X)

dataloader1 = torch.utils.data.DataLoader(dataset1, 
                                          batch_size=2, 
                                          shuffle=False, 
                                          collate_fn=ExampleDataset1.collate_fn)

for i, batch in enumerate(dataloader1):
    print("Batch", i, ":\n", batch, "\n")

Batch 0 :
 tensor([2, 3]) 

Batch 1 :
 tensor([4, 5]) 

Batch 2 :
 tensor([6, 7]) 

Batch 3 :
 tensor([8, 9]) 



---
#### Expected Output:
```
Batch 0 :
 tensor([2, 3]) 

Batch 1 :
 tensor([4, 5]) 

Batch 2 :
 tensor([6, 7]) 

Batch 3 :
 tensor([8, 9]) 
```
---

### Exercise 2

In [17]:
X = np.array([2,  3,  4,  5,  6,  7,  8,  9])
Y = np.array([4,  9, 16, 25, 36, 49, 64, 81])

In [18]:
class ExampleDataset2(torch.utils.data.Dataset):
    
    def __init__(self, X, Y):
        
        ### Assign data and labels to self (1-2 lines)
        self.X = X
        self.Y = Y
        
        ### Assert data and labels have the same length (1 line)
        assert(len(X) == len(Y))
        
        ### Assign length to self (1 line)
        self.length = len(X)
        
    def __len__(self):
        
        ### Return length (1 line)
        return self.length
    
    def __getitem__(self, i):
        
        ### Return data and label at index (1 line)
        return self.X[i], self.Y[i]
    
    def collate_fn(batch):
        
        ### Select all data from batch (1 line)
        batch_x = [x for x,y in batch]
        
        ### Select all labels from batch (1 line)
        batch_y = [y for x,y in batch]
        
        ### Convert batched data and labels to tensors (2 lines)
        batch_x = torch.as_tensor(batch_x)
        batch_y = torch.as_tensor(batch_y)
        
        ### Return batched data and labels (1 line)
        return batch_x, batch_y

In [19]:
dataset2 = ExampleDataset2(X, Y)

dataloader2 = torch.utils.data.DataLoader(dataset2,
                                          batch_size=2, 
                                          shuffle=False,
                                          collate_fn=ExampleDataset2.collate_fn)

for i, batch in enumerate(dataloader2):
    print("Batch", i, ":\n", batch, "\n")

Batch 0 :
 (tensor([2, 3]), tensor([4, 9])) 

Batch 1 :
 (tensor([4, 5]), tensor([16, 25])) 

Batch 2 :
 (tensor([6, 7]), tensor([36, 49])) 

Batch 3 :
 (tensor([8, 9]), tensor([64, 81])) 



#### Expected Output:

```
Batch 0 :
 (tensor([2, 3]), tensor([4, 9])) 

Batch 1 :
 (tensor([4, 5]), tensor([16, 25])) 

Batch 2 :
 (tensor([6, 7]), tensor([36, 49])) 

Batch 3 :
 (tensor([8, 9]), tensor([64, 81])) 

```
---

### MFCC & Transcript Data

In [20]:
# DO NOT MAKE ANY CHANGES TO THIS CELL

# MFCCS: Shape [5 x 8]
X1 = np.asarray([[15, 2, 3,  0, -7, -15,  1, 10], 
                 [14, 3, 4, -1, -7, -12,  0,  8], 
                 [15, 4, 4, -4, -7,  -7,  0,  1], 
                 [16, 4, 4, -4, -8,  -4,  4,  0], 
                 [15, 2, 6, -1, -5,  -1, 10,  2]])

X2 = np.asarray([[14, 0, 7,  2, -9,  -6,  0,  2], 
                 [15, 1, 4,  3, -6,  -8,  1,  2], 
                 [15, 2, 2,  1, -6, -10,  2,  0], 
                 [16, 4, 3,  2, -8,  -9,  8,  2], 
                 [16, 2, 5,  0, -9,  -7,  9,  2]])

X3 = np.asarray([[16, 0, 4, -1, -6, 0, 4, -5], [16, 3, 4, 0, -6, 0, 7, -4], [16, 5, 4, 0, -5, -5, 0, 0], [17, 6, 6, -1, -5, -9, 1, 2], [16, 5, 6, -1, -5, -10, 0, 3]])
X4 = np.asarray([[15, 6, 10, 9, 2, -12, 3, 8], [14, 4, 11, 9, 6, -13, -1, 10], [13, 0, 13, 8, 6, -9, -3, 9], [14, -6, 15, 10, 7, -3, -6, 12], [13, -10, 16, 7, 0, -5, -9, 13]])
X5 = np.asarray([[14, 0, 8, -1, -5, -3, 6, 4], [15, 0, 9, 0, -4, -6, 0, 0], [15, 1, 12, 2, -3, -9, -2, 0], [17, 2, 7, 1, 0, -6, -2, -3], [17, 3, 5, 0, -2, -3, 4, -3]])
X6 = np.asarray([[15, -1, 2, 2, 0, -4, -2, 2], [16, 0, 5, 0, -5, -4, -1, 6], [16, 1, 3, 2, -3, -5, 1, 3], [16, 2, 6, 0, -8, -5, 2, 3], [16, 2, 6, 0, -6, -5, 2, -1]])

# TRANSCRIPTS
Y1 = np.asarray([9, 2, 19, 10, 27])
Y2 = np.asarray([15, 11, 21, 2, 9])
Y3 = np.asarray([9, 1, 30, 15, 11])
Y4 = np.asarray([29, 17, 6, 27, 3])
Y5 = np.asarray([2, 22, 8, 16, 30])
Y6 = np.asarray([13, 3, 27, 30, 10])


---

### Example 3

In [21]:
mfccs = {
    "mfcc_001": X1,
    "mfcc_002": X2,
    "mfcc_003": X3,
    "mfcc_004": X4,
    "mfcc_005": X5,
    "mfcc_006": X6
}

raw_data = {"mfccs": mfccs}

In [22]:
class ExampleDataset3(torch.utils.data.Dataset):
    
    def __init__(self, data):
        
        # Assign mfccs to self from data dict(1 line)
        self.mfccs_dict = data['mfccs']

        # Get file paths
        self.mfcc_files = sorted(self.mfccs_dict.keys())

        # Load files
        self.mfccs = []

        for i in range(len(self.mfcc_files)):
            # Load a single mfcc from mfccs_dict
            mfcc = self.mfccs_dict[self.mfcc_files[i]]
            self.mfccs.append(mfcc)
            
        # Assign Length of mfccs
        self.length = len(self.mfccs)

        
    def __len__(self):
        
        ### Return length (1 line)
        return self.length
    
    def __getitem__(self, index):
        
        ## Get data at index pair (1 line)
        xx = self.mfccs[index]

        return xx
    
    def collate_fn(batch):
        
        ### Convert batch to tensor (1 line)
        batch_x = torch.as_tensor(batch)
        
        ### Return batched data and labels (1 line)
        return batch_x

In [23]:
dataset3 = ExampleDataset3(raw_data)

dataloader3 = torch.utils.data.DataLoader(dataset3, 
                                          batch_size=2, 
                                          shuffle=False, 
                                          collate_fn=ExampleDataset3.collate_fn)

for i, batch in enumerate(dataloader3):
    print("Batch", i, ":\n", batch[0], "\n")

Batch 0 :
 tensor([[ 15,   2,   3,   0,  -7, -15,   1,  10],
        [ 14,   3,   4,  -1,  -7, -12,   0,   8],
        [ 15,   4,   4,  -4,  -7,  -7,   0,   1],
        [ 16,   4,   4,  -4,  -8,  -4,   4,   0],
        [ 15,   2,   6,  -1,  -5,  -1,  10,   2]]) 

Batch 1 :
 tensor([[ 16,   0,   4,  -1,  -6,   0,   4,  -5],
        [ 16,   3,   4,   0,  -6,   0,   7,  -4],
        [ 16,   5,   4,   0,  -5,  -5,   0,   0],
        [ 17,   6,   6,  -1,  -5,  -9,   1,   2],
        [ 16,   5,   6,  -1,  -5, -10,   0,   3]]) 

Batch 2 :
 tensor([[14,  0,  8, -1, -5, -3,  6,  4],
        [15,  0,  9,  0, -4, -6,  0,  0],
        [15,  1, 12,  2, -3, -9, -2,  0],
        [17,  2,  7,  1,  0, -6, -2, -3],
        [17,  3,  5,  0, -2, -3,  4, -3]]) 



  batch_x = torch.as_tensor(batch)


---
#### Expected Output

```
Batch 0 :
 tensor([[ 15,   2,   3,   0,  -7, -15,   1,  10],
        [ 14,   3,   4,  -1,  -7, -12,   0,   8],
        [ 15,   4,   4,  -4,  -7,  -7,   0,   1],
        [ 16,   4,   4,  -4,  -8,  -4,   4,   0],
        [ 15,   2,   6,  -1,  -5,  -1,  10,   2]]) 

Batch 1 :
 tensor([[ 16,   0,   4,  -1,  -6,   0,   4,  -5],
        [ 16,   3,   4,   0,  -6,   0,   7,  -4],
        [ 16,   5,   4,   0,  -5,  -5,   0,   0],
        [ 17,   6,   6,  -1,  -5,  -9,   1,   2],
        [ 16,   5,   6,  -1,  -5, -10,   0,   3]]) 

Batch 2 :
 tensor([[14,  0,  8, -1, -5, -3,  6,  4],
        [15,  0,  9,  0, -4, -6,  0,  0],
        [15,  1, 12,  2, -3, -9, -2,  0],
        [17,  2,  7,  1,  0, -6, -2, -3],
        [17,  3,  5,  0, -2, -3,  4, -3]]) 
```
---

<hr style="border:2px solid gray"> </hr>

### Exercise 4

In [24]:
mfccs = {
    "mfcc_001": X1,
    "mfcc_002": X2,
    "mfcc_003": X3,
    "mfcc_004": X4,
    "mfcc_005": X5,
    "mfcc_006": X6
}

transcripts = {
    "transcript_001": Y1,
    "transcript_002": Y2,
    "transcript_003": Y3,
    "transcript_004": Y4,
    "transcript_005": Y5,
    "transcript_006": Y6
}

raw_data = {"mfccs": mfccs, "transcripts": transcripts}

In [25]:
class ExampleDataset4(torch.utils.data.Dataset):
    
    def __init__(self, data):
        
        # Assign mfccs to self from data dict(1 line)
        self.mfccs_dict = data['mfccs']

        # Assign transcripts to self from data dict(1 line)
        self.transcripts_dict = data['transcripts']

        # Get file paths
        self.mfcc_files = sorted(self.mfccs_dict.keys())
        self.transcript_files = sorted(self.transcripts_dict.keys())

        # Load files
        self.mfccs = []
        self.transcripts = []

        for i in range(len(self.mfcc_files)):
            # Load a single mfcc from mfccs_dict
            mfcc = self.mfccs_dict[self.mfcc_files[i]]
            self.mfccs.append(mfcc)
            
            # Load a single transcript from transcript files
            transcript = self.transcripts_dict[self.transcript_files[i]]
            self.transcripts.append(transcript)

        # Assign Length
        self.length = len(self.mfccs)

        # Sanity check for mfcc, transcript pairs
        assert len(self.mfccs) == len(self.transcripts)

        
    def __len__(self):
        
        ### Return length (1 line)
        return self.length
    
    def __getitem__(self, index):
        
        ## Get mfcc at index pair (1 line)
        xx = self.mfccs[index]

        ### Get transcript at index pair (1 line)
        yy = self.transcripts[index]
 
        ### Return data (1 line)
        return xx, yy
    
    def collate_fn(batch):
        
        ### Select all mfccs from batch (1 line)
        batch_x = [x for x,y in batch]
        
        ### Select all transcripts from batch (1 line)
        batch_y = [y for x,y in batch]
        
        ### Convert batched data and labels to tensors (2 lines)
        batch_x = torch.as_tensor(batch_x)
        batch_y = torch.as_tensor(batch_y)
        
        ### Return batched data and labels (1 line)
        return batch_x, batch_y

In [26]:
dataset4 = ExampleDataset4(raw_data)

dataloader4 = torch.utils.data.DataLoader(dataset4, 
                                          batch_size=2, 
                                          shuffle=False, 
                                          collate_fn=ExampleDataset4.collate_fn)

for i, batch in enumerate(dataloader4):
    print("Batch", i, ":\n", batch[0], "\n", batch[1], "\n")

Batch 0 :
 tensor([[[ 15,   2,   3,   0,  -7, -15,   1,  10],
         [ 14,   3,   4,  -1,  -7, -12,   0,   8],
         [ 15,   4,   4,  -4,  -7,  -7,   0,   1],
         [ 16,   4,   4,  -4,  -8,  -4,   4,   0],
         [ 15,   2,   6,  -1,  -5,  -1,  10,   2]],

        [[ 14,   0,   7,   2,  -9,  -6,   0,   2],
         [ 15,   1,   4,   3,  -6,  -8,   1,   2],
         [ 15,   2,   2,   1,  -6, -10,   2,   0],
         [ 16,   4,   3,   2,  -8,  -9,   8,   2],
         [ 16,   2,   5,   0,  -9,  -7,   9,   2]]]) 
 tensor([[ 9,  2, 19, 10, 27],
        [15, 11, 21,  2,  9]]) 

Batch 1 :
 tensor([[[ 16,   0,   4,  -1,  -6,   0,   4,  -5],
         [ 16,   3,   4,   0,  -6,   0,   7,  -4],
         [ 16,   5,   4,   0,  -5,  -5,   0,   0],
         [ 17,   6,   6,  -1,  -5,  -9,   1,   2],
         [ 16,   5,   6,  -1,  -5, -10,   0,   3]],

        [[ 15,   6,  10,   9,   2, -12,   3,   8],
         [ 14,   4,  11,   9,   6, -13,  -1,  10],
         [ 13,   0,  13,   8,   6,  -9, 

---
#### Expected Output:

```
Batch 0 :
 tensor([[[ 15,   2,   3,   0,  -7, -15,   1,  10],
         [ 14,   3,   4,  -1,  -7, -12,   0,   8],
         [ 15,   4,   4,  -4,  -7,  -7,   0,   1],
         [ 16,   4,   4,  -4,  -8,  -4,   4,   0],
         [ 15,   2,   6,  -1,  -5,  -1,  10,   2]],

        [[ 14,   0,   7,   2,  -9,  -6,   0,   2],
         [ 15,   1,   4,   3,  -6,  -8,   1,   2],
         [ 15,   2,   2,   1,  -6, -10,   2,   0],
         [ 16,   4,   3,   2,  -8,  -9,   8,   2],
         [ 16,   2,   5,   0,  -9,  -7,   9,   2]]]) 
 tensor([[ 9,  2, 19, 10, 27],
        [15, 11, 21,  2,  9]]) 

Batch 1 :
 tensor([[[ 16,   0,   4,  -1,  -6,   0,   4,  -5],
         [ 16,   3,   4,   0,  -6,   0,   7,  -4],
         [ 16,   5,   4,   0,  -5,  -5,   0,   0],
         [ 17,   6,   6,  -1,  -5,  -9,   1,   2],
         [ 16,   5,   6,  -1,  -5, -10,   0,   3]],

        [[ 15,   6,  10,   9,   2, -12,   3,   8],
         [ 14,   4,  11,   9,   6, -13,  -1,  10],
         [ 13,   0,  13,   8,   6,  -9,  -3,   9],
         [ 14,  -6,  15,  10,   7,  -3,  -6,  12],
         [ 13, -10,  16,   7,   0,  -5,  -9,  13]]]) 
 tensor([[ 9,  1, 30, 15, 11],
        [29, 17,  6, 27,  3]]) 

Batch 2 :
 tensor([[[14,  0,  8, -1, -5, -3,  6,  4],
         [15,  0,  9,  0, -4, -6,  0,  0],
         [15,  1, 12,  2, -3, -9, -2,  0],
         [17,  2,  7,  1,  0, -6, -2, -3],
         [17,  3,  5,  0, -2, -3,  4, -3]],

        [[15, -1,  2,  2,  0, -4, -2,  2],
         [16,  0,  5,  0, -5, -4, -1,  6],
         [16,  1,  3,  2, -3, -5,  1,  3],
         [16,  2,  6,  0, -8, -5,  2,  3],
         [16,  2,  6,  0, -6, -5,  2, -1]]]) 
 tensor([[ 2, 22,  8, 16, 30],
        [13,  3, 27, 30, 10]]) 
```
---

<hr style="border:2px solid gray"> </hr>

### Exercise 5

In [27]:
mfccs = {
    "mfcc_001": X1,
    "mfcc_002": X2,
    "mfcc_003": X3,
    "mfcc_004": X4,
    "mfcc_005": X5,
    "mfcc_006": X6
}

raw_data = {"mfccs": mfccs}

In [28]:
class ExampleDataset5(torch.utils.data.Dataset):
    
    def __init__(self, data, offset, context):
        
        self.context = offset
        self.offset = context
        
        # Assign mfccs to self from data dict(1 line)
        self.mfccs_dict = data['mfccs']

        # Get file paths
        self.mfcc_files = sorted(self.mfccs_dict.keys())

        # Load files
        self.mfccs = []

        for i in range(len(self.mfcc_files)):
            # Load a single mfcc
            mfcc = self.mfccs_dict[self.mfcc_files[i]]
            self.mfccs.append(mfcc)
            
        # Assign Length
        self.length = len(self.mfccs)

        ### Zero pad data as-needed for context size = 1 (1-2 lines)
        self.mfccs = np.pad(self.mfccs, ((1, 1), (0, 0), (0, 0)), 'constant', constant_values=0)

        
    def __len__(self):
        
        ### Return length (1 line)
        return self.length
    
    def __getitem__(self, index):
        
        ### Calculate starting timestep using offset and context (1 line)
        start_i = index + self.offset - self.context
        
        ## Calculate ending timestep using offset and context (1 line)
        end_i = index + self.offset + self.context + 1

        ### Get data at index pair with context (1 line)
        xx = self.mfccs[start_i:end_i]

        ### Return data at index pair with context and label at index pair (1 line)
        return xx

    def collate_fn(batch):
        
        ### Convert batch to tensor (1 line)
        batch_x = torch.as_tensor(batch)
        
        ### Return batched data and labels (1 line)
        return batch_x

In [29]:
dataset5 = ExampleDataset5(raw_data, offset=1, context=1)

dataloader5 = torch.utils.data.DataLoader(dataset5, 
                                          batch_size=2, 
                                          shuffle=False, 
                                          collate_fn=ExampleDataset5.collate_fn)

for i, batch in enumerate(dataloader5):
    print(f"Batch {i}:\nmfcc_shape: {batch[0].shape}\n\n{batch[0]}\n\n")

Batch 0:
mfcc_shape: torch.Size([3, 5, 8])

tensor([[[  0,   0,   0,   0,   0,   0,   0,   0],
         [  0,   0,   0,   0,   0,   0,   0,   0],
         [  0,   0,   0,   0,   0,   0,   0,   0],
         [  0,   0,   0,   0,   0,   0,   0,   0],
         [  0,   0,   0,   0,   0,   0,   0,   0]],

        [[ 15,   2,   3,   0,  -7, -15,   1,  10],
         [ 14,   3,   4,  -1,  -7, -12,   0,   8],
         [ 15,   4,   4,  -4,  -7,  -7,   0,   1],
         [ 16,   4,   4,  -4,  -8,  -4,   4,   0],
         [ 15,   2,   6,  -1,  -5,  -1,  10,   2]],

        [[ 14,   0,   7,   2,  -9,  -6,   0,   2],
         [ 15,   1,   4,   3,  -6,  -8,   1,   2],
         [ 15,   2,   2,   1,  -6, -10,   2,   0],
         [ 16,   4,   3,   2,  -8,  -9,   8,   2],
         [ 16,   2,   5,   0,  -9,  -7,   9,   2]]])


Batch 1:
mfcc_shape: torch.Size([3, 5, 8])

tensor([[[ 14,   0,   7,   2,  -9,  -6,   0,   2],
         [ 15,   1,   4,   3,  -6,  -8,   1,   2],
         [ 15,   2,   2,   1,  -6, -1

---
#### Expected Output:

```
Batch 0:
mfcc_shape: torch.Size([3, 5, 8])

tensor([[[  0,   0,   0,   0,   0,   0,   0,   0],
         [  0,   0,   0,   0,   0,   0,   0,   0],
         [  0,   0,   0,   0,   0,   0,   0,   0],
         [  0,   0,   0,   0,   0,   0,   0,   0],
         [  0,   0,   0,   0,   0,   0,   0,   0]],

        [[ 15,   2,   3,   0,  -7, -15,   1,  10],
         [ 14,   3,   4,  -1,  -7, -12,   0,   8],
         [ 15,   4,   4,  -4,  -7,  -7,   0,   1],
         [ 16,   4,   4,  -4,  -8,  -4,   4,   0],
         [ 15,   2,   6,  -1,  -5,  -1,  10,   2]],

        [[ 14,   0,   7,   2,  -9,  -6,   0,   2],
         [ 15,   1,   4,   3,  -6,  -8,   1,   2],
         [ 15,   2,   2,   1,  -6, -10,   2,   0],
         [ 16,   4,   3,   2,  -8,  -9,   8,   2],
         [ 16,   2,   5,   0,  -9,  -7,   9,   2]]])


Batch 1:
mfcc_shape: torch.Size([3, 5, 8])

tensor([[[ 14,   0,   7,   2,  -9,  -6,   0,   2],
         [ 15,   1,   4,   3,  -6,  -8,   1,   2],
         [ 15,   2,   2,   1,  -6, -10,   2,   0],
         [ 16,   4,   3,   2,  -8,  -9,   8,   2],
         [ 16,   2,   5,   0,  -9,  -7,   9,   2]],

        [[ 16,   0,   4,  -1,  -6,   0,   4,  -5],
         [ 16,   3,   4,   0,  -6,   0,   7,  -4],
         [ 16,   5,   4,   0,  -5,  -5,   0,   0],
         [ 17,   6,   6,  -1,  -5,  -9,   1,   2],
         [ 16,   5,   6,  -1,  -5, -10,   0,   3]],

        [[ 15,   6,  10,   9,   2, -12,   3,   8],
         [ 14,   4,  11,   9,   6, -13,  -1,  10],
         [ 13,   0,  13,   8,   6,  -9,  -3,   9],
         [ 14,  -6,  15,  10,   7,  -3,  -6,  12],
         [ 13, -10,  16,   7,   0,  -5,  -9,  13]]])


Batch 2:
mfcc_shape: torch.Size([3, 5, 8])

tensor([[[ 15,   6,  10,   9,   2, -12,   3,   8],
         [ 14,   4,  11,   9,   6, -13,  -1,  10],
         [ 13,   0,  13,   8,   6,  -9,  -3,   9],
         [ 14,  -6,  15,  10,   7,  -3,  -6,  12],
         [ 13, -10,  16,   7,   0,  -5,  -9,  13]],

        [[ 14,   0,   8,  -1,  -5,  -3,   6,   4],
         [ 15,   0,   9,   0,  -4,  -6,   0,   0],
         [ 15,   1,  12,   2,  -3,  -9,  -2,   0],
         [ 17,   2,   7,   1,   0,  -6,  -2,  -3],
         [ 17,   3,   5,   0,  -2,  -3,   4,  -3]],

        [[ 15,  -1,   2,   2,   0,  -4,  -2,   2],
         [ 16,   0,   5,   0,  -5,  -4,  -1,   6],
         [ 16,   1,   3,   2,  -3,  -5,   1,   3],
         [ 16,   2,   6,   0,  -8,  -5,   2,   3],
         [ 16,   2,   6,   0,  -6,  -5,   2,  -1]]])

```
---

### Exercise 6

In [30]:
mfccs = {
    "mfcc_001": X1,
    "mfcc_002": X2,
    "mfcc_003": X3,
    "mfcc_004": X4,
    "mfcc_005": X5,
    "mfcc_006": X6
}

transcripts = {
    "transcript_001": Y1,
    "transcript_002": Y2,
    "transcript_003": Y3,
    "transcript_004": Y4,
    "transcript_005": Y5,
    "transcript_006": Y6
}

raw_data = {"mfccs": mfccs, "transcripts": transcripts}

In [36]:
class ExampleDataset6(torch.utils.data.Dataset):
    
    def __init__(self, data, offset, context):
        
        # Add context and offset to self (1-2 line)
        self.offset = offset
        self.context = context

        # Assign data to self (1 line)
        self.mfccs_dict = data['mfccs']
        self.transcripts_dict = data['transcripts']

        # Get file paths
        self.mfcc_files = sorted(self.mfccs_dict.keys())
        self.transcript_files = sorted(self.transcripts_dict.keys())

        # Load files
        self.mfccs = []
        self.transcripts = []

        for i in range(len(self.mfcc_files)):
            # Load a single mfcc
            mfcc = self.mfccs_dict[self.mfcc_files[i]]
            self.mfccs.append(mfcc)
            
            # Load a single transcript
            transcript = self.transcripts_dict[self.transcript_files[i]]
            self.transcripts.append(transcript)

        # Assign Length
        self.length = len(self.mfccs)

        ### Zero pad data as-needed for context size = 1 (1-2 lines)
        self.mfccs = np.pad(self.mfccs, ((1, 1), (0, 0), (0, 0)), 'constant', constant_values = 0)
        
        # Sanity check for mfcc, transcript pairs
        assert len(self.mfccs) == len(self.transcripts)+ self.offset + self.context

        
    def __len__(self):
        
        ### Return length (1 line)
        return self.length
    
    def __getitem__(self, index):
        
        ### Calculate starting timestep using offset and context (1 line)
        start_i = index + self.offset - self.context
        
        ## Calculate ending timestep using offset and context (1 line)
        end_i = index + self.offset + self.context + 1

        ### Get mfcc at index pair with context (1 line)
        xx = self.mfccs[start_i:end_i]

        ### Get transcript at index pair (1 line)
        yy = self.transcripts[index]
        
        ### Return mfcc at index pair with context and transcript at index pair (1 line)
        return xx, yy
        
    def collate_fn(batch):
        
        ### Select all data from batch (1 line)
        batch_x = [x for x,y in batch]
        
        ### Select all labels from batch (1 line)
        batch_y = [y for x,y in batch]
        
        ### Convert batched data and labels to tensors (2 lines)
        batch_x = torch.as_tensor(batch_x)
        batch_y = torch.as_tensor(batch_y)
        
        ### Return batched data and labels (1 line)
        return batch_x, batch_y

In [37]:
dataset6 = ExampleDataset6(raw_data, offset=1, context=1)

dataloader6 = torch.utils.data.DataLoader(dataset6, 
                                          batch_size=2, 
                                          shuffle=False, 
                                          collate_fn=ExampleDataset6.collate_fn)

for i, batch in enumerate(dataloader6):
    print(f"Batch {i}:\nmfcc_shape: {batch[0].shape}, transcript_shape: {batch[1].shape}\n\n{batch[0]}\n{batch[1]}\n\n")

Batch 0:
mfcc_shape: torch.Size([2, 3, 5, 8]), transcript_shape: torch.Size([2, 5])

tensor([[[[  0,   0,   0,   0,   0,   0,   0,   0],
          [  0,   0,   0,   0,   0,   0,   0,   0],
          [  0,   0,   0,   0,   0,   0,   0,   0],
          [  0,   0,   0,   0,   0,   0,   0,   0],
          [  0,   0,   0,   0,   0,   0,   0,   0]],

         [[ 15,   2,   3,   0,  -7, -15,   1,  10],
          [ 14,   3,   4,  -1,  -7, -12,   0,   8],
          [ 15,   4,   4,  -4,  -7,  -7,   0,   1],
          [ 16,   4,   4,  -4,  -8,  -4,   4,   0],
          [ 15,   2,   6,  -1,  -5,  -1,  10,   2]],

         [[ 14,   0,   7,   2,  -9,  -6,   0,   2],
          [ 15,   1,   4,   3,  -6,  -8,   1,   2],
          [ 15,   2,   2,   1,  -6, -10,   2,   0],
          [ 16,   4,   3,   2,  -8,  -9,   8,   2],
          [ 16,   2,   5,   0,  -9,  -7,   9,   2]]],


        [[[ 15,   2,   3,   0,  -7, -15,   1,  10],
          [ 14,   3,   4,  -1,  -7, -12,   0,   8],
          [ 15,   4,   

---
#### Expected Output:
```
Batch 0:
mfcc_shape: torch.Size([2, 3, 5, 8]), transcript_shape: torch.Size([2, 5])

tensor([[[[  0,   0,   0,   0,   0,   0,   0,   0],
          [  0,   0,   0,   0,   0,   0,   0,   0],
          [  0,   0,   0,   0,   0,   0,   0,   0],
          [  0,   0,   0,   0,   0,   0,   0,   0],
          [  0,   0,   0,   0,   0,   0,   0,   0]],

         [[ 15,   2,   3,   0,  -7, -15,   1,  10],
          [ 14,   3,   4,  -1,  -7, -12,   0,   8],
          [ 15,   4,   4,  -4,  -7,  -7,   0,   1],
          [ 16,   4,   4,  -4,  -8,  -4,   4,   0],
          [ 15,   2,   6,  -1,  -5,  -1,  10,   2]],

         [[ 14,   0,   7,   2,  -9,  -6,   0,   2],
          [ 15,   1,   4,   3,  -6,  -8,   1,   2],
          [ 15,   2,   2,   1,  -6, -10,   2,   0],
          [ 16,   4,   3,   2,  -8,  -9,   8,   2],
          [ 16,   2,   5,   0,  -9,  -7,   9,   2]]],


        [[[ 15,   2,   3,   0,  -7, -15,   1,  10],
          [ 14,   3,   4,  -1,  -7, -12,   0,   8],
          [ 15,   4,   4,  -4,  -7,  -7,   0,   1],
          [ 16,   4,   4,  -4,  -8,  -4,   4,   0],
          [ 15,   2,   6,  -1,  -5,  -1,  10,   2]],

         [[ 14,   0,   7,   2,  -9,  -6,   0,   2],
          [ 15,   1,   4,   3,  -6,  -8,   1,   2],
          [ 15,   2,   2,   1,  -6, -10,   2,   0],
          [ 16,   4,   3,   2,  -8,  -9,   8,   2],
          [ 16,   2,   5,   0,  -9,  -7,   9,   2]],

         [[ 16,   0,   4,  -1,  -6,   0,   4,  -5],
          [ 16,   3,   4,   0,  -6,   0,   7,  -4],
          [ 16,   5,   4,   0,  -5,  -5,   0,   0],
          [ 17,   6,   6,  -1,  -5,  -9,   1,   2],
          [ 16,   5,   6,  -1,  -5, -10,   0,   3]]]])
tensor([[ 9,  2, 19, 10, 27],
        [15, 11, 21,  2,  9]])


Batch 1:
mfcc_shape: torch.Size([2, 3, 5, 8]), transcript_shape: torch.Size([2, 5])

tensor([[[[ 14,   0,   7,   2,  -9,  -6,   0,   2],
          [ 15,   1,   4,   3,  -6,  -8,   1,   2],
          [ 15,   2,   2,   1,  -6, -10,   2,   0],
          [ 16,   4,   3,   2,  -8,  -9,   8,   2],
          [ 16,   2,   5,   0,  -9,  -7,   9,   2]],

         [[ 16,   0,   4,  -1,  -6,   0,   4,  -5],
          [ 16,   3,   4,   0,  -6,   0,   7,  -4],
          [ 16,   5,   4,   0,  -5,  -5,   0,   0],
          [ 17,   6,   6,  -1,  -5,  -9,   1,   2],
          [ 16,   5,   6,  -1,  -5, -10,   0,   3]],

         [[ 15,   6,  10,   9,   2, -12,   3,   8],
          [ 14,   4,  11,   9,   6, -13,  -1,  10],
          [ 13,   0,  13,   8,   6,  -9,  -3,   9],
          [ 14,  -6,  15,  10,   7,  -3,  -6,  12],
          [ 13, -10,  16,   7,   0,  -5,  -9,  13]]],


        [[[ 16,   0,   4,  -1,  -6,   0,   4,  -5],
          [ 16,   3,   4,   0,  -6,   0,   7,  -4],
          [ 16,   5,   4,   0,  -5,  -5,   0,   0],
          [ 17,   6,   6,  -1,  -5,  -9,   1,   2],
          [ 16,   5,   6,  -1,  -5, -10,   0,   3]],

         [[ 15,   6,  10,   9,   2, -12,   3,   8],
          [ 14,   4,  11,   9,   6, -13,  -1,  10],
          [ 13,   0,  13,   8,   6,  -9,  -3,   9],
          [ 14,  -6,  15,  10,   7,  -3,  -6,  12],
          [ 13, -10,  16,   7,   0,  -5,  -9,  13]],

         [[ 14,   0,   8,  -1,  -5,  -3,   6,   4],
          [ 15,   0,   9,   0,  -4,  -6,   0,   0],
          [ 15,   1,  12,   2,  -3,  -9,  -2,   0],
          [ 17,   2,   7,   1,   0,  -6,  -2,  -3],
          [ 17,   3,   5,   0,  -2,  -3,   4,  -3]]]])
tensor([[ 9,  1, 30, 15, 11],
        [29, 17,  6, 27,  3]])


Batch 2:
mfcc_shape: torch.Size([2, 3, 5, 8]), transcript_shape: torch.Size([2, 5])

tensor([[[[ 15,   6,  10,   9,   2, -12,   3,   8],
          [ 14,   4,  11,   9,   6, -13,  -1,  10],
          [ 13,   0,  13,   8,   6,  -9,  -3,   9],
          [ 14,  -6,  15,  10,   7,  -3,  -6,  12],
          [ 13, -10,  16,   7,   0,  -5,  -9,  13]],

         [[ 14,   0,   8,  -1,  -5,  -3,   6,   4],
          [ 15,   0,   9,   0,  -4,  -6,   0,   0],
          [ 15,   1,  12,   2,  -3,  -9,  -2,   0],
          [ 17,   2,   7,   1,   0,  -6,  -2,  -3],
          [ 17,   3,   5,   0,  -2,  -3,   4,  -3]],

         [[ 15,  -1,   2,   2,   0,  -4,  -2,   2],
          [ 16,   0,   5,   0,  -5,  -4,  -1,   6],
          [ 16,   1,   3,   2,  -3,  -5,   1,   3],
          [ 16,   2,   6,   0,  -8,  -5,   2,   3],
          [ 16,   2,   6,   0,  -6,  -5,   2,  -1]]],


        [[[ 14,   0,   8,  -1,  -5,  -3,   6,   4],
          [ 15,   0,   9,   0,  -4,  -6,   0,   0],
          [ 15,   1,  12,   2,  -3,  -9,  -2,   0],
          [ 17,   2,   7,   1,   0,  -6,  -2,  -3],
          [ 17,   3,   5,   0,  -2,  -3,   4,  -3]],

         [[ 15,  -1,   2,   2,   0,  -4,  -2,   2],
          [ 16,   0,   5,   0,  -5,  -4,  -1,   6],
          [ 16,   1,   3,   2,  -3,  -5,   1,   3],
          [ 16,   2,   6,   0,  -8,  -5,   2,   3],
          [ 16,   2,   6,   0,  -6,  -5,   2,  -1]],

         [[  0,   0,   0,   0,   0,   0,   0,   0],
          [  0,   0,   0,   0,   0,   0,   0,   0],
          [  0,   0,   0,   0,   0,   0,   0,   0],
          [  0,   0,   0,   0,   0,   0,   0,   0],
          [  0,   0,   0,   0,   0,   0,   0,   0]]]])
tensor([[ 2, 22,  8, 16, 30],
        [13,  3, 27, 30, 10]])

```
---

### Exercise 7

In [38]:
!nvidia-smi

Tue Jan 31 03:42:17 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   52C    P0    25W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

---
#### Expected Output (your result should look similar, but not exactly the same):
```
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04    Driver Version: 460.27.04    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 207...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   40C    P8     9W /  N/A |      5MiB /  7982MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       970      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+
```
---