Initializing Custom Trainer
Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]Downloading (…)/main/tokenizer.json: 100%|██████████| 1.39M/1.39M [00:00<00:00, 2.89MB/s]Downloading (…)/main/tokenizer.json: 100%|██████████| 1.39M/1.39M [00:00<00:00, 2.88MB/s]
/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/transformers/models/t5/tokenization_t5_fast.py:155: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-small automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.
- To avoid this warning, please instantiate this tokenizer with `model_max_length` set to your preferred value.
  warnings.warn(
Failed to extract subscription information, Exception=AttributeError; 'Logger' object has no attribute 'activity_info'
Failed to extract subscription information, Exception=AttributeError; 'Logger' object has no attribute 'activity_info'
Failed to extract subscription information, Exception=AttributeError; 'Logger' object has no attribute 'activity_info'
Failed to extract subscription information, Exception=AttributeError; 'Logger' object has no attribute 'activity_info'
Failed to extract subscription information, Exception=AttributeError; 'Logger' object has no attribute 'activity_info'
Failed to extract subscription information, Exception=AttributeError; 'Logger' object has no attribute 'activity_info'
Failed to extract subscription information, Exception=AttributeError; 'Logger' object has no attribute 'activity_info'
Failed to extract subscription information, Exception=AttributeError; 'Logger' object has no attribute 'activity_info'
node_rank : 0
node_rank : 0
main_address : localhost
initializing deepspeed distributed: GLOBAL_RANK: 2, MEMBER: 3/4
gpu-compute-shankar-4x16:350:350 [2] NCCL INFO cudaDriverVersion 11040
gpu-compute-shankar-4x16:350:350 [2] NCCL INFO Bootstrap : Using eth0:10.0.0.4<0>
gpu-compute-shankar-4x16:350:350 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v6 symbol.
gpu-compute-shankar-4x16:350:350 [2] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin (v4)
gpu-compute-shankar-4x16:350:350 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.
gpu-compute-shankar-4x16:350:350 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Plugin Path : /usr/local/nccl-rdma-sharp-plugins/lib/libnccl-net.so
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P plugin IBext
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO NET/IB : No device found.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO NET/Socket : Using [0]eth0:10.0.0.4<0>
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Using network Socket
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 1(=200000) and dev 0(=100000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 0(=100000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 3(=400000) and dev 0(=100000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 0(=100000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 3(=400000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 0(=100000) and dev 2(=300000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 1(=200000) and dev 2(=300000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 3(=400000) and dev 2(=300000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 0(=100000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 1(=200000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 1(=200000) and dev 0(=100000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 0(=100000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 3(=400000) and dev 0(=100000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 0(=100000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 3(=400000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 0(=100000) and dev 2(=300000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 1(=200000) and dev 2(=300000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 3(=400000) and dev 2(=300000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 0(=100000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 1(=200000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Setting affinity for GPU 2 to ffff,00000000
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Channel 00 : 2[300000] -> 3[400000] via SHM/direct/direct
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Channel 01 : 2[300000] -> 3[400000] via SHM/direct/direct
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Connected all rings
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Channel 00 : 2[300000] -> 1[200000] via SHM/direct/direct
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Channel 01 : 2[300000] -> 1[200000] via SHM/direct/direct
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO Connected all trees
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
gpu-compute-shankar-4x16:350:2599 [2] NCCL INFO comm 0x56079b8f8300 rank 2 nranks 4 cudaDev 2 busId 300000 - Init COMPLETE
Missing logger folder: /mnt/azureml/cr/j/30e0e2e93ade4186bd3779f617afd768/exe/wd/lightning_logs
LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Using network Socket
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 1(=200000) and dev 0(=100000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 0(=100000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 3(=400000) and dev 0(=100000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 0(=100000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 3(=400000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 0(=100000) and dev 2(=300000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 1(=200000) and dev 2(=300000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 3(=400000) and dev 2(=300000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 0(=100000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 1(=200000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 1(=200000) and dev 0(=100000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 0(=100000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 3(=400000) and dev 0(=100000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 0(=100000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 3(=400000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 0(=100000) and dev 2(=300000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 1(=200000) and dev 2(=300000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 3(=400000) and dev 2(=300000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 0(=100000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 1(=200000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Setting affinity for GPU 2 to ffff,00000000
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Channel 00 : 2[300000] -> 3[400000] via SHM/direct/direct
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Channel 01 : 2[300000] -> 3[400000] via SHM/direct/direct
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Connected all rings
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 3(=400000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Channel 00 : 2[300000] -> 1[200000] via SHM/direct/direct
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Could not enable P2P between dev 2(=300000) and dev 1(=200000)
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Channel 01 : 2[300000] -> 1[200000] via SHM/direct/direct
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO Connected all trees
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
gpu-compute-shankar-4x16:350:2616 [2] NCCL INFO comm 0x56079b910a60 rank 2 nranks 4 cudaDev 2 busId 300000 - Init COMPLETE
Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
Loading extension module utils...
Time to load utils op: 22.33176565170288 seconds
Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.001119852066040039 seconds
Cleaning up all outstanding Run operations, waiting 300.0 seconds
1 items cleaning up...
Cleanup took 0.10608601570129395 seconds
Traceback (most recent call last):
  File "training_script.py", line 102, in <module>
    train(trainer)
  File "training_script.py", line 91, in train
    trainer.train(train_df=train_df, eval_df=test_df)
  File "/mnt/azureml/cr/j/30e0e2e93ade4186bd3779f617afd768/exe/wd/trainer.py", line 111, in train
    trainer.fit(self.T5Model, self.data_module)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 520, in fit
    call._call_and_handle_interrupt(
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 42, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 92, in launch
    return function(*args, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 559, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 935, in _run
    results = self._run_stage()
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 978, in _run_stage
    self.fit_loop.run()
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 201, in run
    self.advance()
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 354, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 133, in run
    self.advance(data_fetcher)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 218, in advance
    batch_output = self.automatic_optimization.run(trainer.optimizers[0], kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 185, in run
    self._optimizer_step(kwargs.get("batch_idx", 0), closure)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 261, in _optimizer_step
    call._call_lightning_module_hook(
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 142, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/core/module.py", line 1266, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 158, in step
    step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 257, in optimizer_step
    optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 224, in optimizer_step
    return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/deepspeed.py", line 92, in optimizer_step
    closure_result = closure()
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 140, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 126, in closure
    step_output = self._step_fn()
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 308, in _training_step
    training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 288, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 329, in training_step
    return self.model(*args, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1846, in forward
    loss = self.module(*inputs, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/pytorch_lightning/overrides/base.py", line 90, in forward
    output = self._forward_module.training_step(*inputs, **kwargs)
  File "/mnt/azureml/cr/j/30e0e2e93ade4186bd3779f617afd768/exe/wd/model_module.py", line 75, in training_step
    loss, outputs = self(
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/mnt/azureml/cr/j/30e0e2e93ade4186bd3779f617afd768/exe/wd/model_module.py", line 58, in forward
    output = self.model(
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 1626, in forward
    encoder_outputs = self.encoder(
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 1055, in forward
    layer_outputs = layer_module(
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 687, in forward
    self_attention_outputs = self.layer[0](
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 593, in forward
    attention_output = self.SelfAttention(
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 556, in forward
    attn_weights = nn.functional.dropout(
  File "/azureml-envs/azureml_3263cc21f12e8d16ce50e2ff0b93f3ff/lib/python3.8/site-packages/torch/nn/functional.py", line 1252, in dropout
    return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 890.00 MiB (GPU 2; 14.76 GiB total capacity; 13.28 GiB already allocated; 75.75 MiB free; 13.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF