### Running Your Code in Colab

We recommend you develop locally and then use Colab to run your code. You'll need to upload the code to drive.

For experiments with a batch size greater than one, use the T4 GPU in order to see a *significant* speedup. This is crucial in order to run the experiments in a reasonable amount of time.

Make sure to [download](https://drive.google.com/file/d/1mECKLG3NWH9uwFgAYRIdAKrABbTGkbVq/view?usp=sharing) the full training data.

In [1]:
# Mount your drive to access files in your Google Drive
from google.colab import drive

drive.mount("/content/drive")

Mounted at /content/drive


In [2]:
# cd into the directory where your code is
# e.g. %cd /content/drive/MyDrive/cmu/10-301/hw7/handout
%cd /content/drive/MyDrive/stuff/10601-HW7

/content/drive/MyDrive/stuff/10601-HW7


In [3]:
# Set your data paths. Change this if the data is in a different location
tiny_train_stories = "data/tiny_train_stories.json"
tiny_valid_stories = "data/tiny_valid_stories.json"

full_train_stories = "data/HW7_large_stories/train_stories.json"
full_valid_stories = "data/HW7_large_stories/valid_stories.json"

#### Testing

In [10]:
!python test_rnn.py

Testing SelfAttention Test Case 1...Passed
.Testing SelfAttention Test Case 2...Passed
.Testing RNN Test Case 1...Passed
.Testing RNN Test Case 2...Passed
.Testing RNNCell Test Case 1...Passed
.Testing RNNCell Test Case 2...Passed
.Using device: cuda
Traceback (most recent call last):
  File "/content/drive/MyDrive/stuff/10601-HW7/test_rnn.py", line 221, in <module>
    unittest.main()
  File "/usr/lib/python3.10/unittest/main.py", line 101, in __init__
    self.runTests()
  File "/usr/lib/python3.10/unittest/main.py", line 271, in runTests
    self.result = testRunner.run(self.test)
  File "/usr/lib/python3.10/unittest/runner.py", line 184, in run
    test(result)
  File "/usr/lib/python3.10/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python3.10/unittest/suite.py", line 122, in run
    test(result)
  File "/usr/lib/python3.10/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python3.10/unittest

In [5]:
!python rnn.py --train_data {tiny_train_stories} --val_data {tiny_valid_stories} --embed_dim 64 --hidden_dim 128 --train_losses_out train_loss.txt --val_losses_out valid_loss.txt --metrics_out metrics.txt --dk 32 --dv 32 --num_sequences 128 --batch_size 8

Using device: cpu
RNNLanguageModel(
  (embeddings): Embedding(1024, 64)
  (rnn): RNN(
    (cell): RNNCell(
      (i2h): Linear(in_features=64, out_features=128, bias=True)
      (h2h): Linear(in_features=128, out_features=128, bias=True)
      (activation): ReLU()
    )
    (out): Linear(in_features=128, out_features=128, bias=True)
  )
  (attention): SelfAttention(
    (query_transform): Linear(in_features=128, out_features=32, bias=True)
    (key_transform): Linear(in_features=128, out_features=32, bias=True)
    (value_transform): Linear(in_features=128, out_features=32, bias=True)
    (output_transform): Linear(in_features=32, out_features=128, bias=True)
  )
  (lm_head): Linear(in_features=128, out_features=1024, bias=True)
)
Number of Parameters:  255584
Loading data
Finished Loading Dataset
Batch: 0 | Sequence Length: 128 | Elapsed time (minutes): 1.9e-05
Batch: 1 | Sequence Length: 128 | Elapsed time (minutes): 0.012847
Batch: 2 | Sequence Length: 128 | Elapsed time (minutes): 

#### 5.1

Uncomment the corresponding `embed_hidden_dims`

In [4]:
embed_hidden_dims = 64
# embed_hidden_dims = 128
# embed_hidden_dims = 256
# embed_hidden_dims = 512
!python rnn.py --train_data {full_train_stories} --val_data {full_valid_stories} --embed_dim {embed_hidden_dims} --hidden_dim {embed_hidden_dims} --train_losses_out train_losses_51_64.txt --val_losses_out val_losses_51_64.txt --metrics_out metrics.txt --dk 128 --dv 128 --num_sequences 50000 --batch_size 128

Using device: cuda
RNNLanguageModel(
  (embeddings): Embedding(1024, 64)
  (rnn): RNN(
    (cell): RNNCell(
      (i2h): Linear(in_features=64, out_features=64, bias=True)
      (h2h): Linear(in_features=64, out_features=64, bias=True)
      (activation): ReLU()
    )
    (out): Linear(in_features=64, out_features=64, bias=True)
  )
  (attention): SelfAttention(
    (query_transform): Linear(in_features=64, out_features=128, bias=True)
    (key_transform): Linear(in_features=64, out_features=128, bias=True)
    (value_transform): Linear(in_features=64, out_features=128, bias=True)
    (output_transform): Linear(in_features=128, out_features=64, bias=True)
  )
  (lm_head): Linear(in_features=64, out_features=1024, bias=True)
)
Number of Parameters:  177792
Loading data
Finished Loading Dataset
Batch: 0 | Sequence Length: 128 | Elapsed time (minutes): 0.000243
Batch: 39 | Sequence Length: 128 | Elapsed time (minutes): 0.23683
Batch: 78 | Sequence Length: 128 | Elapsed time (minutes): 0.4

In [5]:
#embed_hidden_dims = 64
embed_hidden_dims = 128
# embed_hidden_dims = 256
# embed_hidden_dims = 512
!python rnn.py --train_data {full_train_stories} --val_data {full_valid_stories} --embed_dim {embed_hidden_dims} --hidden_dim {embed_hidden_dims} --train_losses_out train_losses_51_128.txt --val_losses_out val_losses_51_128.txt --metrics_out metrics.txt --dk 128 --dv 128 --num_sequences 50000 --batch_size 128

Using device: cuda
RNNLanguageModel(
  (embeddings): Embedding(1024, 128)
  (rnn): RNN(
    (cell): RNNCell(
      (i2h): Linear(in_features=128, out_features=128, bias=True)
      (h2h): Linear(in_features=128, out_features=128, bias=True)
      (activation): ReLU()
    )
    (out): Linear(in_features=128, out_features=128, bias=True)
  )
  (attention): SelfAttention(
    (query_transform): Linear(in_features=128, out_features=128, bias=True)
    (key_transform): Linear(in_features=128, out_features=128, bias=True)
    (value_transform): Linear(in_features=128, out_features=128, bias=True)
    (output_transform): Linear(in_features=128, out_features=128, bias=True)
  )
  (lm_head): Linear(in_features=128, out_features=1024, bias=True)
)
Number of Parameters:  378752
Loading data
Finished Loading Dataset
Batch: 0 | Sequence Length: 128 | Elapsed time (minutes): 1.6e-05
Batch: 39 | Sequence Length: 128 | Elapsed time (minutes): 0.230774
Batch: 78 | Sequence Length: 128 | Elapsed time (m

In [6]:
#embed_hidden_dims = 64
# embed_hidden_dims = 128
embed_hidden_dims = 256
# embed_hidden_dims = 512
!python rnn.py --train_data {full_train_stories} --val_data {full_valid_stories} --embed_dim {embed_hidden_dims} --hidden_dim {embed_hidden_dims} --train_losses_out train_losses_51_256.txt --val_losses_out val_losses_51_256.txt --metrics_out metrics.txt --dk 128 --dv 128 --num_sequences 50000 --batch_size 128

Using device: cuda
RNNLanguageModel(
  (embeddings): Embedding(1024, 256)
  (rnn): RNN(
    (cell): RNNCell(
      (i2h): Linear(in_features=256, out_features=256, bias=True)
      (h2h): Linear(in_features=256, out_features=256, bias=True)
      (activation): ReLU()
    )
    (out): Linear(in_features=256, out_features=256, bias=True)
  )
  (attention): SelfAttention(
    (query_transform): Linear(in_features=256, out_features=128, bias=True)
    (key_transform): Linear(in_features=256, out_features=128, bias=True)
    (value_transform): Linear(in_features=256, out_features=128, bias=True)
    (output_transform): Linear(in_features=128, out_features=256, bias=True)
  )
  (lm_head): Linear(in_features=256, out_features=1024, bias=True)
)
Number of Parameters:  854400
Loading data
Finished Loading Dataset
Batch: 0 | Sequence Length: 128 | Elapsed time (minutes): 1.4e-05
Batch: 39 | Sequence Length: 128 | Elapsed time (minutes): 0.339556
Batch: 78 | Sequence Length: 128 | Elapsed time (m

In [7]:
#embed_hidden_dims = 64
# embed_hidden_dims = 128
# embed_hidden_dims = 256
embed_hidden_dims = 512
!python rnn.py --train_data {full_train_stories} --val_data {full_valid_stories} --embed_dim {embed_hidden_dims} --hidden_dim {embed_hidden_dims} --train_losses_out train_losses_51_512.txt --val_losses_out val_losses_51_512.txt --metrics_out metrics.txt --dk 128 --dv 128 --num_sequences 50000 --batch_size 128

Using device: cuda
RNNLanguageModel(
  (embeddings): Embedding(1024, 512)
  (rnn): RNN(
    (cell): RNNCell(
      (i2h): Linear(in_features=512, out_features=512, bias=True)
      (h2h): Linear(in_features=512, out_features=512, bias=True)
      (activation): ReLU()
    )
    (out): Linear(in_features=512, out_features=512, bias=True)
  )
  (attention): SelfAttention(
    (query_transform): Linear(in_features=512, out_features=128, bias=True)
    (key_transform): Linear(in_features=512, out_features=128, bias=True)
    (value_transform): Linear(in_features=512, out_features=128, bias=True)
    (output_transform): Linear(in_features=128, out_features=512, bias=True)
  )
  (lm_head): Linear(in_features=512, out_features=1024, bias=True)
)
Number of Parameters:  2100608
Loading data
Finished Loading Dataset
Batch: 0 | Sequence Length: 128 | Elapsed time (minutes): 1.5e-05
Batch: 39 | Sequence Length: 128 | Elapsed time (minutes): 0.570004
Batch: 78 | Sequence Length: 128 | Elapsed time (

#### 5.2

In [8]:
batch_size = 32
# batch_size = 64
# batch_size = 128
# batch_size = 256
!python rnn.py --train_data {full_train_stories} --val_data {full_valid_stories} --embed_dim 128 --hidden_dim 128 --train_losses_out 52_32_train_losses.txt --val_losses_out 52_32_val_losses.txt --metrics_out metrics.txt --dk 128 --dv 128 --num_sequences 50000 --batch_size {batch_size}

Using device: cuda
RNNLanguageModel(
  (embeddings): Embedding(1024, 128)
  (rnn): RNN(
    (cell): RNNCell(
      (i2h): Linear(in_features=128, out_features=128, bias=True)
      (h2h): Linear(in_features=128, out_features=128, bias=True)
      (activation): ReLU()
    )
    (out): Linear(in_features=128, out_features=128, bias=True)
  )
  (attention): SelfAttention(
    (query_transform): Linear(in_features=128, out_features=128, bias=True)
    (key_transform): Linear(in_features=128, out_features=128, bias=True)
    (value_transform): Linear(in_features=128, out_features=128, bias=True)
    (output_transform): Linear(in_features=128, out_features=128, bias=True)
  )
  (lm_head): Linear(in_features=128, out_features=1024, bias=True)
)
Number of Parameters:  378752
Loading data
Finished Loading Dataset
Batch: 0 | Sequence Length: 128 | Elapsed time (minutes): 1.4e-05
Batch: 156 | Sequence Length: 128 | Elapsed time (minutes): 0.804181
Batch: 312 | Sequence Length: 128 | Elapsed time 

In [9]:
#batch_size = 32
batch_size = 64
# batch_size = 128
# batch_size = 256
!python rnn.py --train_data {full_train_stories} --val_data {full_valid_stories} --embed_dim 128 --hidden_dim 128 --train_losses_out 52_64_train_losses.txt --val_losses_out 52_64_val_losses.txt --metrics_out metrics.txt --dk 128 --dv 128 --num_sequences 50000 --batch_size {batch_size}

Using device: cuda
RNNLanguageModel(
  (embeddings): Embedding(1024, 128)
  (rnn): RNN(
    (cell): RNNCell(
      (i2h): Linear(in_features=128, out_features=128, bias=True)
      (h2h): Linear(in_features=128, out_features=128, bias=True)
      (activation): ReLU()
    )
    (out): Linear(in_features=128, out_features=128, bias=True)
  )
  (attention): SelfAttention(
    (query_transform): Linear(in_features=128, out_features=128, bias=True)
    (key_transform): Linear(in_features=128, out_features=128, bias=True)
    (value_transform): Linear(in_features=128, out_features=128, bias=True)
    (output_transform): Linear(in_features=128, out_features=128, bias=True)
  )
  (lm_head): Linear(in_features=128, out_features=1024, bias=True)
)
Number of Parameters:  378752
Loading data
Finished Loading Dataset
Batch: 0 | Sequence Length: 128 | Elapsed time (minutes): 1.7e-05
Batch: 78 | Sequence Length: 128 | Elapsed time (minutes): 0.419103
Batch: 156 | Sequence Length: 128 | Elapsed time (

In [10]:
#batch_size = 32
#batch_size = 64
batch_size = 128
# batch_size = 256
!python rnn.py --train_data {full_train_stories} --val_data {full_valid_stories} --embed_dim 128 --hidden_dim 128 --train_losses_out 52_128_train_losses.txt --val_losses_out 52_128_val_losses.txt --metrics_out metrics.txt --dk 128 --dv 128 --num_sequences 50000 --batch_size {batch_size}

Using device: cuda
RNNLanguageModel(
  (embeddings): Embedding(1024, 128)
  (rnn): RNN(
    (cell): RNNCell(
      (i2h): Linear(in_features=128, out_features=128, bias=True)
      (h2h): Linear(in_features=128, out_features=128, bias=True)
      (activation): ReLU()
    )
    (out): Linear(in_features=128, out_features=128, bias=True)
  )
  (attention): SelfAttention(
    (query_transform): Linear(in_features=128, out_features=128, bias=True)
    (key_transform): Linear(in_features=128, out_features=128, bias=True)
    (value_transform): Linear(in_features=128, out_features=128, bias=True)
    (output_transform): Linear(in_features=128, out_features=128, bias=True)
  )
  (lm_head): Linear(in_features=128, out_features=1024, bias=True)
)
Number of Parameters:  378752
Loading data
Finished Loading Dataset
Batch: 0 | Sequence Length: 128 | Elapsed time (minutes): 1.5e-05
Batch: 39 | Sequence Length: 128 | Elapsed time (minutes): 0.235569
Batch: 78 | Sequence Length: 128 | Elapsed time (m

In [11]:
batch_size = 32
# batch_size = 64
# batch_size = 128
batch_size = 256
!python rnn.py --train_data {full_train_stories} --val_data {full_valid_stories} --embed_dim 128 --hidden_dim 128 --train_losses_out 52_256_train_losses.txt --val_losses_out 52_256_val_losses.txt --metrics_out metrics.txt --dk 128 --dv 128 --num_sequences 50000 --batch_size {batch_size}

Using device: cuda
RNNLanguageModel(
  (embeddings): Embedding(1024, 128)
  (rnn): RNN(
    (cell): RNNCell(
      (i2h): Linear(in_features=128, out_features=128, bias=True)
      (h2h): Linear(in_features=128, out_features=128, bias=True)
      (activation): ReLU()
    )
    (out): Linear(in_features=128, out_features=128, bias=True)
  )
  (attention): SelfAttention(
    (query_transform): Linear(in_features=128, out_features=128, bias=True)
    (key_transform): Linear(in_features=128, out_features=128, bias=True)
    (value_transform): Linear(in_features=128, out_features=128, bias=True)
    (output_transform): Linear(in_features=128, out_features=128, bias=True)
  )
  (lm_head): Linear(in_features=128, out_features=1024, bias=True)
)
Number of Parameters:  378752
Loading data
Finished Loading Dataset
Batch: 0 | Sequence Length: 128 | Elapsed time (minutes): 1.8e-05
Batch: 19 | Sequence Length: 128 | Elapsed time (minutes): 0.183312
Batch: 38 | Sequence Length: 128 | Elapsed time (m

#### 5.3

In [12]:
num_sequences = 10000
# num_sequences = 20000
# num_sequences = 50000
# num_sequences = 100000
!python rnn.py --train_data {full_train_stories} --val_data {full_valid_stories} --embed_dim 128 --hidden_dim 128 --train_losses_out 53_10000_train_losses.txt --val_losses_out 53_10000_val_losses.txt --metrics_out metrics.txt --dk 128 --dv 128 --num_sequences {num_sequences} --batch_size 128

Using device: cuda
RNNLanguageModel(
  (embeddings): Embedding(1024, 128)
  (rnn): RNN(
    (cell): RNNCell(
      (i2h): Linear(in_features=128, out_features=128, bias=True)
      (h2h): Linear(in_features=128, out_features=128, bias=True)
      (activation): ReLU()
    )
    (out): Linear(in_features=128, out_features=128, bias=True)
  )
  (attention): SelfAttention(
    (query_transform): Linear(in_features=128, out_features=128, bias=True)
    (key_transform): Linear(in_features=128, out_features=128, bias=True)
    (value_transform): Linear(in_features=128, out_features=128, bias=True)
    (output_transform): Linear(in_features=128, out_features=128, bias=True)
  )
  (lm_head): Linear(in_features=128, out_features=1024, bias=True)
)
Number of Parameters:  378752
Loading data
Finished Loading Dataset
Batch: 0 | Sequence Length: 128 | Elapsed time (minutes): 1.5e-05
Batch: 7 | Sequence Length: 128 | Elapsed time (minutes): 0.068427
Batch: 14 | Sequence Length: 128 | Elapsed time (mi

In [13]:
# num_sequences = 10000
num_sequences = 20000
# num_sequences = 50000
# num_sequences = 100000
!python rnn.py --train_data {full_train_stories} --val_data {full_valid_stories} --embed_dim 128 --hidden_dim 128 --train_losses_out 53_20000_train_losses.txt --val_losses_out 53_20000_val_losses.txt --metrics_out metrics.txt --dk 128 --dv 128 --num_sequences {num_sequences} --batch_size 128

Using device: cuda
RNNLanguageModel(
  (embeddings): Embedding(1024, 128)
  (rnn): RNN(
    (cell): RNNCell(
      (i2h): Linear(in_features=128, out_features=128, bias=True)
      (h2h): Linear(in_features=128, out_features=128, bias=True)
      (activation): ReLU()
    )
    (out): Linear(in_features=128, out_features=128, bias=True)
  )
  (attention): SelfAttention(
    (query_transform): Linear(in_features=128, out_features=128, bias=True)
    (key_transform): Linear(in_features=128, out_features=128, bias=True)
    (value_transform): Linear(in_features=128, out_features=128, bias=True)
    (output_transform): Linear(in_features=128, out_features=128, bias=True)
  )
  (lm_head): Linear(in_features=128, out_features=1024, bias=True)
)
Number of Parameters:  378752
Loading data
Finished Loading Dataset
Batch: 0 | Sequence Length: 128 | Elapsed time (minutes): 1.4e-05
Batch: 15 | Sequence Length: 128 | Elapsed time (minutes): 0.099915
Batch: 30 | Sequence Length: 128 | Elapsed time (m

In [14]:
#num_sequences = 10000
# num_sequences = 20000
num_sequences = 50000
# num_sequences = 100000
!python rnn.py --train_data {full_train_stories} --val_data {full_valid_stories} --embed_dim 128 --hidden_dim 128 --train_losses_out 53_50000_train_losses.txt --val_losses_out 53_50000_val_losses.txt --metrics_out metrics.txt --dk 128 --dv 128 --num_sequences {num_sequences} --batch_size 128

Using device: cuda
RNNLanguageModel(
  (embeddings): Embedding(1024, 128)
  (rnn): RNN(
    (cell): RNNCell(
      (i2h): Linear(in_features=128, out_features=128, bias=True)
      (h2h): Linear(in_features=128, out_features=128, bias=True)
      (activation): ReLU()
    )
    (out): Linear(in_features=128, out_features=128, bias=True)
  )
  (attention): SelfAttention(
    (query_transform): Linear(in_features=128, out_features=128, bias=True)
    (key_transform): Linear(in_features=128, out_features=128, bias=True)
    (value_transform): Linear(in_features=128, out_features=128, bias=True)
    (output_transform): Linear(in_features=128, out_features=128, bias=True)
  )
  (lm_head): Linear(in_features=128, out_features=1024, bias=True)
)
Number of Parameters:  378752
Loading data
Finished Loading Dataset
Batch: 0 | Sequence Length: 128 | Elapsed time (minutes): 2e-05
Batch: 39 | Sequence Length: 128 | Elapsed time (minutes): 0.229975
Batch: 78 | Sequence Length: 128 | Elapsed time (min

In [15]:
# num_sequences = 10000
# num_sequences = 20000
# num_sequences = 50000
num_sequences = 100000
!python rnn.py --train_data {full_train_stories} --val_data {full_valid_stories} --embed_dim 128 --hidden_dim 128 --train_losses_out 53_100000_train_losses.txt --val_losses_out 53_100000_val_losses.txt --metrics_out metrics.txt --dk 128 --dv 128 --num_sequences {num_sequences} --batch_size 128

Using device: cuda
RNNLanguageModel(
  (embeddings): Embedding(1024, 128)
  (rnn): RNN(
    (cell): RNNCell(
      (i2h): Linear(in_features=128, out_features=128, bias=True)
      (h2h): Linear(in_features=128, out_features=128, bias=True)
      (activation): ReLU()
    )
    (out): Linear(in_features=128, out_features=128, bias=True)
  )
  (attention): SelfAttention(
    (query_transform): Linear(in_features=128, out_features=128, bias=True)
    (key_transform): Linear(in_features=128, out_features=128, bias=True)
    (value_transform): Linear(in_features=128, out_features=128, bias=True)
    (output_transform): Linear(in_features=128, out_features=128, bias=True)
  )
  (lm_head): Linear(in_features=128, out_features=1024, bias=True)
)
Number of Parameters:  378752
Loading data
Finished Loading Dataset
Batch: 0 | Sequence Length: 128 | Elapsed time (minutes): 1.4e-05
Batch: 78 | Sequence Length: 128 | Elapsed time (minutes): 0.451161
Batch: 156 | Sequence Length: 128 | Elapsed time (

#### 5.4



In [None]:
!python rnn.py --train_data {full_train_stories} --val_data {full_valid_stories} --embed_dim 512 --hidden_dim 512 --train_losses_out train_losses.txt --val_losses_out val_losses.txt --metrics_out metrics.txt --dk 256 --dv 256 --num_sequences 250000 --batch_size 128

In [1]:
files = {
    64: 'train_losses_51_64.txt',
    128: 'train_losses_51_128.txt',
    256: 'train_losses_51_256.txt',
    512: 'train_losses_51_512.txt'
}

SyntaxError: unterminated string literal (detected at line 2) (<ipython-input-1-d998cc55eb01>, line 2)