### Running Your Code in Colab

We recommend you develop locally and then use Colab to run your code. You'll need to upload the code to drive.

For experiments with a batch size greater than one, use the T4 GPU in order to see a *significant* speedup. This is crucial in order to run the experiments in a reasonable amount of time.

Make sure to [download](https://drive.google.com/file/d/1mECKLG3NWH9uwFgAYRIdAKrABbTGkbVq/view?usp=sharing) the full training data.

In [None]:
# Mount your drive to access files in your Google Drive
from google.colab import drive

drive.mount("/content/drive")

ModuleNotFoundError: No module named 'google'

In [None]:
# cd into the directory where your code is
# e.g. %cd /content/drive/MyDrive/cmu/10-301/hw7/handout
%cd /content/drive/MyDrive/<path to your code>

In [1]:
# Set your data paths. Change this if the data is in a different location
tiny_train_stories = "data/tiny_train_stories.json"
tiny_valid_stories = "data/tiny_valid_stories.json"

full_train_stories = "data/HW7_large_stories/train_stories.json"
full_valid_stories = "data/HW7_large_stories/valid_stories.json"

#### Testing

In [2]:
!python test_rnn.py

Testing SelfAttention Test Case 1...Passed
.Testing SelfAttention Test Case 2...Passed
.Testing RNN Test Case 1...Passed
.Testing RNN Test Case 2...Passed
.Testing RNNCell Test Case 1...Passed
.Testing RNNCell Test Case 2...Passed
.
----------------------------------------------------------------------
Ran 6 tests in 0.009s

OK


In [2]:
!python rnn.py --train_data {tiny_train_stories} --val_data {tiny_valid_stories} --embed_dim 64 --hidden_dim 128 --train_losses_out train_loss.txt --val_losses_out valid_loss.txt --metrics_out metrics.txt --dk 32 --dv 32 --num_sequences 128 --batch_size 8

Using device: mps
RNNLanguageModel(
  (embeddings): Embedding(1024, 64)
  (rnn): RNN(
    (cell): RNNCell(
      (i2h): Linear(in_features=64, out_features=128, bias=True)
      (h2h): Linear(in_features=128, out_features=128, bias=True)
      (activation): ReLU()
    )
    (out): Linear(in_features=128, out_features=128, bias=True)
  )
  (attention): SelfAttention(
    (query_transform): Linear(in_features=128, out_features=32, bias=True)
    (key_transform): Linear(in_features=128, out_features=32, bias=True)
    (value_transform): Linear(in_features=128, out_features=32, bias=True)
    (output_transform): Linear(in_features=32, out_features=128, bias=True)
  )
  (lm_head): Linear(in_features=128, out_features=1024, bias=True)
)
Number of Parameters:  255584
Loading data
Finished Loading Dataset
Batch: 0 | Sequence Length: 128 | Elapsed time (minutes): 7e-06
Batch: 1 | Sequence Length: 128 | Elapsed time (minutes): 0.128612
Batch: 2 | Sequence Length: 128 | Elapsed time (minutes): 0.

In [None]:
from rnn import *

lm = torch.load("model.pt")
test_str = ["Once upon a time there was a"]


def complete():

    lm.eval()

    input = tokenizer.encode(prefix, add_special_tokens=False, return_tensors="pt")
    input = input.to(device)
    output = lm.generate(input, max_tokens=num_tokens, temperature=temperature)

    return tokenizer.decode(output)


for ts in test_str:
    completion = complete(ts, num_tokens=64, temperature=0.3)
    print("  Test prefix:", ts)
    print("  Test output:", completion)

NameError: name 'lm' is not defined

#### 5.1

Uncomment the corresponding `embed_hidden_dims`

In [13]:
embed_hidden_dims = 64
# embed_hidden_dims = 128
# embed_hidden_dims = 256
# embed_hidden_dims = 512
!python rnn.py --train_data {full_train_stories} --val_data {full_valid_stories} --embed_dim {embed_hidden_dims} --hidden_dim {embed_hidden_dims} --train_losses_out train_losses.txt --val_losses_out val_losses.txt --metrics_out metrics.txt --dk 128 --dv 128 --num_sequences 50000 --batch_size 128

10367.72s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Using device: mps
RNNLanguageModel(
  (embeddings): Embedding(1024, 64)
  (rnn): RNN(
    (cell): RNNCell(
      (i2h): Linear(in_features=64, out_features=64, bias=True)
      (h2h): Linear(in_features=64, out_features=64, bias=True)
      (activation): ReLU()
    )
    (out): Linear(in_features=64, out_features=64, bias=True)
  )
  (attention): SelfAttention(
    (query_transform): Linear(in_features=64, out_features=128, bias=True)
    (key_transform): Linear(in_features=64, out_features=128, bias=True)
    (value_transform): Linear(in_features=64, out_features=128, bias=True)
    (output_transform): Linear(in_features=128, out_features=64, bias=True)
  )
  (lm_head): Linear(in_features=64, out_features=1024, bias=True)
)
Number of Parameters:  177792
Loading data
Finished Loading Dataset
Batch: 0 | Sequence Length: 128 | Elapsed time (minutes): 8.7e-05
Batch: 39 | Sequence Length: 128 | Elapsed time (minutes): 0.476304
Batch: 78 | Sequence Length: 128 | Elapsed time (minutes): 0.78

#### 5.2

In [8]:
batch_size = 32
# batch_size = 64
# batch_size = 128
# batch_size = 256
!python rnn.py --train_data {full_train_stories} --val_data {full_valid_stories} --embed_dim 128 --hidden_dim 128 --train_losses_out train_losses.txt --val_losses_out val_losses.txt --metrics_out metrics.txt --dk 128 --dv 128 --num_sequences 50000 --batch_size {batch_size}

5316.02s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Using device: mps
RNNLanguageModel(
  (embeddings): Embedding(1024, 128)
  (rnn): RNN(
    (cell): RNNCell(
      (i2h): Linear(in_features=128, out_features=128, bias=True)
      (h2h): Linear(in_features=128, out_features=128, bias=True)
      (activation): ReLU()
    )
    (out): Linear(in_features=128, out_features=128, bias=True)
  )
  (attention): SelfAttention(
    (query_transform): Linear(in_features=128, out_features=128, bias=True)
    (key_transform): Linear(in_features=128, out_features=128, bias=True)
    (value_transform): Linear(in_features=128, out_features=128, bias=True)
    (output_transform): Linear(in_features=128, out_features=128, bias=True)
  )
  (lm_head): Linear(in_features=128, out_features=1024, bias=True)
)
Number of Parameters:  378752
Loading data
Traceback (most recent call last):
  File "/Users/jpthek9/repos/10601-HW7/rnn.py", line 724, in <module>
    main(args)
  File "/Users/jpthek9/repos/10601-HW7/rnn.py", line 608, in main
    train_data = Sentenc

#### 5.3

In [9]:
num_sequences = 10000
# num_sequences = 20000
# num_sequences = 50000
# num_sequences = 100000
!python rnn.py --train_data {full_train_stories} --val_data {full_valid_stories} --embed_dim 128 --hidden_dim 128 --train_losses_out train_losses.txt --val_losses_out val_losses.txt --metrics_out metrics.txt --dk 128 --dv 128 --num_sequences {num_sequences} --batch_size 128

5322.44s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Using device: mps
RNNLanguageModel(
  (embeddings): Embedding(1024, 128)
  (rnn): RNN(
    (cell): RNNCell(
      (i2h): Linear(in_features=128, out_features=128, bias=True)
      (h2h): Linear(in_features=128, out_features=128, bias=True)
      (activation): ReLU()
    )
    (out): Linear(in_features=128, out_features=128, bias=True)
  )
  (attention): SelfAttention(
    (query_transform): Linear(in_features=128, out_features=128, bias=True)
    (key_transform): Linear(in_features=128, out_features=128, bias=True)
    (value_transform): Linear(in_features=128, out_features=128, bias=True)
    (output_transform): Linear(in_features=128, out_features=128, bias=True)
  )
  (lm_head): Linear(in_features=128, out_features=1024, bias=True)
)
Number of Parameters:  378752
Loading data
Traceback (most recent call last):
  File "/Users/jpthek9/repos/10601-HW7/rnn.py", line 724, in <module>
    main(args)
  File "/Users/jpthek9/repos/10601-HW7/rnn.py", line 608, in main
    train_data = Sentenc

#### 5.4



In [None]:
!python rnn.py --train_data {full_train_stories} --val_data {full_valid_stories} --embed_dim 512 --hidden_dim 512 --train_losses_out train_losses.txt --val_losses_out val_losses.txt --metrics_out metrics.txt --dk 256 --dv 256 --num_sequences 250000 --batch_size 128