2024-03-27T20:39:49.8929373Z ##[section]Starting: Testing: distributed 2024-03-27T20:39:49.8935450Z ============================================================================== 2024-03-27T20:39:49.8935870Z Task : Bash 2024-03-27T20:39:49.8936490Z Description : Run a Bash script on macOS, Linux, or Windows 2024-03-27T20:39:49.8936752Z Version : 3.237.1 2024-03-27T20:39:49.8937009Z Author : Microsoft Corporation 2024-03-27T20:39:49.8937267Z Help : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/bash 2024-03-27T20:39:49.8937549Z ============================================================================== 2024-03-27T20:39:50.1963980Z Generating script. 2024-03-27T20:39:50.1977869Z ========================== Starting Command Output =========================== 2024-03-27T20:39:50.1985161Z [command]/usr/bin/bash /__w/_temp/c97698ba-5dbc-499c-8688-297965b4dc36.sh 2024-03-27T20:39:50.2053492Z source path: thunder/tests/distributed 2024-03-27T20:39:50.2053912Z pytest arg: 2024-03-27T20:39:54.8900014Z collected tests: 2024-03-27T20:39:54.8901434Z ---------------- 2024-03-27T20:39:54.8902068Z thunder/tests/distributed/test_checkpoint.py::test_split_state_dict 2024-03-27T20:39:54.8902558Z thunder/tests/distributed/test_checkpoint.py::DistributedCheckpointTest::test_get_model_state_dict 2024-03-27T20:39:54.8902936Z thunder/tests/distributed/test_checkpoint.py::DistributedCheckpointTest::test_load_model_state_dict 2024-03-27T20:39:54.8903255Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_all_gather_executor_nvfuser 2024-03-27T20:39:54.8903560Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_all_gather_executor_torch 2024-03-27T20:39:54.8903857Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_all_reduce_executor_nvfuser 2024-03-27T20:39:54.8904142Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_all_reduce_executor_torch 2024-03-27T20:39:54.8904432Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_broadcast_executor_nvfuser 2024-03-27T20:39:54.8904727Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_broadcast_executor_torch 2024-03-27T20:39:54.8905012Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_compile_ddp_module 2024-03-27T20:39:54.8905291Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_compile_module 2024-03-27T20:39:54.8905578Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_grad_bucketing_executor_nvfuser_bucket_size_in_mb_0 2024-03-27T20:39:54.8905906Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_grad_bucketing_executor_nvfuser_bucket_size_in_mb_1000 2024-03-27T20:39:54.8906249Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_grad_bucketing_executor_torch_bucket_size_in_mb_0 2024-03-27T20:39:54.8906570Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_grad_bucketing_executor_torch_bucket_size_in_mb_1000 2024-03-27T20:39:54.8906923Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_grad_parity_with_without_bucketing_executor_nvfuser 2024-03-27T20:39:54.8907270Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_grad_parity_with_without_bucketing_executor_torch 2024-03-27T20:39:54.8907558Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_model_as_argument 2024-03-27T20:39:54.8907892Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_with_no_sync_grad_accumulation_executor_nvfuser_bucket_size_in_mb_0_dataset_size_1 2024-03-27T20:39:54.8908291Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_with_no_sync_grad_accumulation_executor_nvfuser_bucket_size_in_mb_0_dataset_size_2 2024-03-27T20:39:54.8908674Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_with_no_sync_grad_accumulation_executor_nvfuser_bucket_size_in_mb_25_dataset_size_1 2024-03-27T20:39:54.8909259Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_with_no_sync_grad_accumulation_executor_nvfuser_bucket_size_in_mb_25_dataset_size_2 2024-03-27T20:39:54.8909791Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_with_no_sync_grad_accumulation_executor_torch_bucket_size_in_mb_0_dataset_size_1 2024-03-27T20:39:54.8910734Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_with_no_sync_grad_accumulation_executor_torch_bucket_size_in_mb_0_dataset_size_2 2024-03-27T20:39:54.8911266Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_with_no_sync_grad_accumulation_executor_torch_bucket_size_in_mb_25_dataset_size_1 2024-03-27T20:39:54.8911631Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_with_no_sync_grad_accumulation_executor_torch_bucket_size_in_mb_25_dataset_size_2 2024-03-27T20:39:54.8911945Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_broadcast_from 2024-03-27T20:39:54.8912258Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero2 2024-03-27T20:39:54.8912613Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero3 2024-03-27T20:39:54.8912974Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_layer_zero2 2024-03-27T20:39:54.8913315Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_layer_zero3 2024-03-27T20:39:54.8913715Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero2 2024-03-27T20:39:54.8914070Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero3 2024-03-27T20:39:54.8914415Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_layer_zero2 2024-03-27T20:39:54.8914773Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_layer_zero3 2024-03-27T20:39:54.8915084Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_shard_unshard 2024-03-27T20:39:54.8915387Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_limit_in_flight_allgathers_executor_nvfuser_bucketing_block_zero3 2024-03-27T20:39:54.8915726Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_limit_in_flight_allgathers_executor_nvfuser_bucketing_layer_zero3 2024-03-27T20:39:54.8916050Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_limit_in_flight_allgathers_executor_nvfuser_bucketing_none_zero3 2024-03-27T20:39:54.8916383Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_limit_in_flight_allgathers_executor_torch_bucketing_block_zero3 2024-03-27T20:39:54.8916712Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_limit_in_flight_allgathers_executor_torch_bucketing_layer_zero3 2024-03-27T20:39:54.8917038Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_limit_in_flight_allgathers_executor_torch_bucketing_none_zero3 2024-03-27T20:39:54.8917346Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_materialize_meta_tensors 2024-03-27T20:39:54.8917635Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_reduce_scatter_executor_nvfuser 2024-03-27T20:39:54.8917923Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_reduce_scatter_executor_torch 2024-03-27T20:39:54.8918208Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_rematerialize_all_gather 2024-03-27T20:39:54.8918488Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_sort_waits_executor_nvfuser 2024-03-27T20:39:54.8918774Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_sort_waits_executor_torch 2024-03-27T20:39:54.8919044Z thunder/tests/distributed/test_ddp.py::test_native_ddp_torch_cuda_float32[0] 2024-03-27T20:39:54.8919298Z thunder/tests/distributed/test_ddp.py::test_native_ddp_torch_cuda_float32[25] 2024-03-27T20:39:54.8919561Z thunder/tests/distributed/test_ddp.py::test_native_ddp_nvfuser_cuda_float32[0] 2024-03-27T20:39:54.8919938Z thunder/tests/distributed/test_ddp.py::test_native_ddp_nvfuser_cuda_float32[25] 2024-03-27T20:39:54.8920219Z thunder/tests/distributed/test_ddp.py::test_native_fsdp_torch_cuda_float32[FSDPBucketingStrategy.NONE] 2024-03-27T20:39:54.8920609Z thunder/tests/distributed/test_ddp.py::test_native_fsdp_torch_cuda_float32[FSDPBucketingStrategy.LAYER] 2024-03-27T20:39:54.8920903Z thunder/tests/distributed/test_ddp.py::test_native_fsdp_torch_cuda_float32[FSDPBucketingStrategy.BLOCK] 2024-03-27T20:39:54.8921200Z thunder/tests/distributed/test_ddp.py::test_native_fsdp_nvfuser_cuda_float32[FSDPBucketingStrategy.NONE] 2024-03-27T20:39:54.8921499Z thunder/tests/distributed/test_ddp.py::test_native_fsdp_nvfuser_cuda_float32[FSDPBucketingStrategy.LAYER] 2024-03-27T20:39:54.8921800Z thunder/tests/distributed/test_ddp.py::test_native_fsdp_nvfuser_cuda_float32[FSDPBucketingStrategy.BLOCK] 2024-03-27T20:39:54.8922039Z ================ 2024-03-27T20:39:58.3064536Z thunder/tests/distributed/test_checkpoint.py::test_split_state_dict status >>> 0 2024-03-27T20:40:04.3542246Z 2024-03-27 20:40:04,352 _dedup_tensors.py:44 INFO p:process 0 t:MainThread: Duplicate keys to remove: {1: [MetadataIndex(fqn='buf', offset=torch.Size([]), index=None)]} 2024-03-27T20:40:04.4064274Z 2024-03-27 20:40:04,405 _dedup_tensors.py:44 INFO p:process 0 t:MainThread: Duplicate keys to remove: {1: [MetadataIndex(fqn='buf', offset=torch.Size([]), index=None)]} 2024-03-27T20:40:06.3584547Z thunder/tests/distributed/test_checkpoint.py::DistributedCheckpointTest::test_get_model_state_dict status >>> 0 2024-03-27T20:40:15.2477873Z thunder/tests/distributed/test_checkpoint.py::DistributedCheckpointTest::test_load_model_state_dict status >>> 0 2024-03-27T20:40:23.7619942Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_all_gather_executor_nvfuser status >>> 0 2024-03-27T20:40:32.1091086Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_all_gather_executor_torch status >>> 0 2024-03-27T20:40:41.0066068Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_all_reduce_executor_nvfuser status >>> 0 2024-03-27T20:40:49.7288465Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_all_reduce_executor_torch status >>> 0 2024-03-27T20:40:58.6359403Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_broadcast_executor_nvfuser status >>> 0 2024-03-27T20:41:07.4144036Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_broadcast_executor_torch status >>> 0 2024-03-27T20:41:14.8559128Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_compile_ddp_module status >>> 0 2024-03-27T20:41:24.9379426Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_compile_module status >>> 0 2024-03-27T20:41:34.8842991Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_grad_bucketing_executor_nvfuser_bucket_size_in_mb_0 status >>> 0 2024-03-27T20:41:44.5017301Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_grad_bucketing_executor_nvfuser_bucket_size_in_mb_1000 status >>> 0 2024-03-27T20:41:53.5305054Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_grad_bucketing_executor_torch_bucket_size_in_mb_0 status >>> 0 2024-03-27T20:42:02.6973618Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_grad_bucketing_executor_torch_bucket_size_in_mb_1000 status >>> 0 2024-03-27T20:42:13.7495000Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_grad_parity_with_without_bucketing_executor_nvfuser status >>> 0 2024-03-27T20:42:23.5409995Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_grad_parity_with_without_bucketing_executor_torch status >>> 0 2024-03-27T20:42:32.0175082Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_model_as_argument status >>> 0 2024-03-27T20:42:37.6996883Z STAGE:2024-03-27 20:42:37 4549:4549 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:37.7070275Z STAGE:2024-03-27 20:42:37 4550:4550 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:40.0238456Z STAGE:2024-03-27 20:42:40 4549:4549 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:40.0240975Z STAGE:2024-03-27 20:42:40 4550:4550 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:40.0242544Z STAGE:2024-03-27 20:42:40 4549:4549 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:40.0246015Z STAGE:2024-03-27 20:42:40 4550:4550 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:40.0554390Z STAGE:2024-03-27 20:42:40 4549:4549 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:40.0555471Z STAGE:2024-03-27 20:42:40 4550:4550 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:41.5292315Z STAGE:2024-03-27 20:42:41 4550:4550 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:41.5293827Z STAGE:2024-03-27 20:42:41 4550:4550 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:41.5511158Z STAGE:2024-03-27 20:42:41 4550:4550 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:41.5552151Z STAGE:2024-03-27 20:42:41 4550:4550 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:41.5553134Z STAGE:2024-03-27 20:42:41 4550:4550 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:41.5625019Z STAGE:2024-03-27 20:42:41 4549:4549 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:41.5626123Z STAGE:2024-03-27 20:42:41 4549:4549 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:41.5634519Z STAGE:2024-03-27 20:42:41 4550:4550 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:41.5673494Z STAGE:2024-03-27 20:42:41 4550:4550 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:41.5674676Z STAGE:2024-03-27 20:42:41 4550:4550 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:41.5845410Z STAGE:2024-03-27 20:42:41 4550:4550 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:41.5852366Z STAGE:2024-03-27 20:42:41 4549:4549 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:41.5900127Z STAGE:2024-03-27 20:42:41 4549:4549 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:41.5901498Z STAGE:2024-03-27 20:42:41 4549:4549 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:41.5988348Z STAGE:2024-03-27 20:42:41 4549:4549 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:41.6024921Z STAGE:2024-03-27 20:42:41 4549:4549 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:41.6026163Z STAGE:2024-03-27 20:42:41 4549:4549 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:41.6206326Z STAGE:2024-03-27 20:42:41 4549:4549 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:43.3282360Z STAGE:2024-03-27 20:42:43 4549:4549 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:43.3283874Z STAGE:2024-03-27 20:42:43 4549:4549 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:43.3295007Z STAGE:2024-03-27 20:42:43 4550:4550 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:43.3299849Z STAGE:2024-03-27 20:42:43 4550:4550 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:43.3882227Z STAGE:2024-03-27 20:42:43 4549:4549 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:43.3902071Z STAGE:2024-03-27 20:42:43 4550:4550 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:44.0793039Z STAGE:2024-03-27 20:42:44 4549:4549 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:44.0794563Z STAGE:2024-03-27 20:42:44 4549:4549 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:44.0796595Z STAGE:2024-03-27 20:42:44 4550:4550 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:44.0797722Z STAGE:2024-03-27 20:42:44 4550:4550 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:44.0932595Z STAGE:2024-03-27 20:42:44 4549:4549 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:44.0940346Z STAGE:2024-03-27 20:42:44 4550:4550 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:44.0987616Z STAGE:2024-03-27 20:42:44 4549:4549 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:44.0989418Z STAGE:2024-03-27 20:42:44 4549:4549 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:44.0990426Z STAGE:2024-03-27 20:42:44 4550:4550 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:44.0992156Z STAGE:2024-03-27 20:42:44 4550:4550 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:44.1130087Z STAGE:2024-03-27 20:42:44 4549:4549 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:44.1131377Z STAGE:2024-03-27 20:42:44 4550:4550 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:44.1174246Z STAGE:2024-03-27 20:42:44 4549:4549 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:44.1175349Z STAGE:2024-03-27 20:42:44 4549:4549 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:44.1176212Z STAGE:2024-03-27 20:42:44 4550:4550 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:44.1177102Z STAGE:2024-03-27 20:42:44 4550:4550 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:44.1312450Z STAGE:2024-03-27 20:42:44 4549:4549 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:44.1313610Z STAGE:2024-03-27 20:42:44 4550:4550 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:44.1359498Z STAGE:2024-03-27 20:42:44 4549:4549 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:44.1360610Z STAGE:2024-03-27 20:42:44 4549:4549 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:44.1361512Z STAGE:2024-03-27 20:42:44 4550:4550 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:44.1365077Z STAGE:2024-03-27 20:42:44 4550:4550 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:46.3504377Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_with_no_sync_grad_accumulation_executor_nvfuser_bucket_size_in_mb_0_dataset_size_1 status >>> 0 2024-03-27T20:42:52.2557948Z STAGE:2024-03-27 20:42:52 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:52.2833914Z STAGE:2024-03-27 20:42:52 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:54.5526713Z STAGE:2024-03-27 20:42:54 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:54.5529087Z STAGE:2024-03-27 20:42:54 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:54.5529794Z STAGE:2024-03-27 20:42:54 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:54.5535100Z STAGE:2024-03-27 20:42:54 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:54.5816863Z STAGE:2024-03-27 20:42:54 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:54.5818251Z STAGE:2024-03-27 20:42:54 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:54.5869779Z STAGE:2024-03-27 20:42:54 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:54.5871068Z STAGE:2024-03-27 20:42:54 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:54.5872866Z STAGE:2024-03-27 20:42:54 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:54.5874115Z STAGE:2024-03-27 20:42:54 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:54.6030324Z STAGE:2024-03-27 20:42:54 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:54.6100286Z STAGE:2024-03-27 20:42:54 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:56.0811428Z STAGE:2024-03-27 20:42:56 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:56.0812543Z STAGE:2024-03-27 20:42:56 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:56.1011058Z STAGE:2024-03-27 20:42:56 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:56.1054700Z STAGE:2024-03-27 20:42:56 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:56.1056904Z STAGE:2024-03-27 20:42:56 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:56.1141822Z STAGE:2024-03-27 20:42:56 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:56.1176574Z STAGE:2024-03-27 20:42:56 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:56.1177301Z STAGE:2024-03-27 20:42:56 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:56.1265470Z STAGE:2024-03-27 20:42:56 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:56.1267999Z STAGE:2024-03-27 20:42:56 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:56.1338941Z STAGE:2024-03-27 20:42:56 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:56.1497612Z STAGE:2024-03-27 20:42:56 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:56.1544854Z STAGE:2024-03-27 20:42:56 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:56.1547050Z STAGE:2024-03-27 20:42:56 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:56.1631788Z STAGE:2024-03-27 20:42:56 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:56.1670006Z STAGE:2024-03-27 20:42:56 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:56.1672240Z STAGE:2024-03-27 20:42:56 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:56.1833060Z STAGE:2024-03-27 20:42:56 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:57.8032295Z STAGE:2024-03-27 20:42:57 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:57.8033879Z STAGE:2024-03-27 20:42:57 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:57.8035545Z STAGE:2024-03-27 20:42:57 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:57.8040796Z STAGE:2024-03-27 20:42:57 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:57.8589814Z STAGE:2024-03-27 20:42:57 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:57.8623840Z STAGE:2024-03-27 20:42:57 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:57.8630303Z STAGE:2024-03-27 20:42:57 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:57.8631841Z STAGE:2024-03-27 20:42:57 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:57.8703113Z STAGE:2024-03-27 20:42:57 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:57.8705540Z STAGE:2024-03-27 20:42:57 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:57.8711701Z STAGE:2024-03-27 20:42:57 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:57.8748278Z STAGE:2024-03-27 20:42:57 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:57.8750014Z STAGE:2024-03-27 20:42:57 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:57.8795300Z STAGE:2024-03-27 20:42:57 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:57.8829403Z STAGE:2024-03-27 20:42:57 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:57.8837914Z STAGE:2024-03-27 20:42:57 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:57.8839473Z STAGE:2024-03-27 20:42:57 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:57.8863336Z STAGE:2024-03-27 20:42:57 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:57.8864694Z STAGE:2024-03-27 20:42:57 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:57.8922412Z STAGE:2024-03-27 20:42:57 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:57.8947098Z STAGE:2024-03-27 20:42:57 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:57.8957340Z STAGE:2024-03-27 20:42:57 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:57.8958540Z STAGE:2024-03-27 20:42:57 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:57.9045721Z STAGE:2024-03-27 20:42:57 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:57.9091168Z STAGE:2024-03-27 20:42:57 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:57.9092570Z STAGE:2024-03-27 20:42:57 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:57.9095032Z STAGE:2024-03-27 20:42:57 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:57.9096211Z STAGE:2024-03-27 20:42:57 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:57.9258111Z STAGE:2024-03-27 20:42:57 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:57.9268559Z STAGE:2024-03-27 20:42:57 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:58.6184410Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:58.6185883Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:58.6186819Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:58.6189119Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:58.6321932Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:58.6323066Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:58.6367852Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:58.6369917Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:58.6371600Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:58.6374728Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:58.6509658Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:58.6511299Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:58.6555577Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:58.6556676Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:58.6558363Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post ProcessingSTAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:58.6559151Z 2024-03-27T20:42:58.8006959Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:58.8111787Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:58.8158339Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:58.8159970Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:58.8162102Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:58.8166525Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:58.8333338Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:58.8334453Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:58.8378166Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:58.8379432Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:58.8380337Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:58.8381241Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:58.8512022Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:58.8513083Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:58.8566244Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:58.8567368Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:58.8568260Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:58.8569141Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:58.8705081Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:58.8706179Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:58.8756341Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:58.8758329Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:58.8759741Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:58.8762908Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:58.8898135Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:58.8902094Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:42:58.8952874Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:58.8954534Z STAGE:2024-03-27 20:42:58 4792:4792 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:42:58.8955594Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:42:58.8956630Z STAGE:2024-03-27 20:42:58 4791:4791 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:00.9364787Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_with_no_sync_grad_accumulation_executor_nvfuser_bucket_size_in_mb_0_dataset_size_2 status >>> 0 2024-03-27T20:43:06.6534646Z STAGE:2024-03-27 20:43:06 5034:5034 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:06.6556606Z STAGE:2024-03-27 20:43:06 5033:5033 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:09.0796523Z STAGE:2024-03-27 20:43:09 5033:5033 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:09.0798052Z STAGE:2024-03-27 20:43:09 5033:5033 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:09.0852983Z STAGE:2024-03-27 20:43:09 5034:5034 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:09.0855893Z STAGE:2024-03-27 20:43:09 5034:5034 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:09.1004853Z STAGE:2024-03-27 20:43:09 5033:5033 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:09.1059537Z STAGE:2024-03-27 20:43:09 5034:5034 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:10.5496202Z STAGE:2024-03-27 20:43:10 5033:5033 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:10.5497419Z STAGE:2024-03-27 20:43:10 5033:5033 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:10.5510844Z STAGE:2024-03-27 20:43:10 5034:5034 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:10.5512702Z STAGE:2024-03-27 20:43:10 5034:5034 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:10.5698457Z STAGE:2024-03-27 20:43:10 5033:5033 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:10.5713197Z STAGE:2024-03-27 20:43:10 5034:5034 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:10.5737337Z STAGE:2024-03-27 20:43:10 5033:5033 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:10.5738034Z STAGE:2024-03-27 20:43:10 5033:5033 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:10.5747936Z STAGE:2024-03-27 20:43:10 5034:5034 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:10.5748626Z STAGE:2024-03-27 20:43:10 5034:5034 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:10.5834944Z STAGE:2024-03-27 20:43:10 5033:5033 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:10.5838161Z STAGE:2024-03-27 20:43:10 5034:5034 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:10.5874541Z STAGE:2024-03-27 20:43:10 5034:5034 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:10.5875251Z STAGE:2024-03-27 20:43:10 5034:5034 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:10.5878527Z STAGE:2024-03-27 20:43:10 5033:5033 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:10.5879173Z STAGE:2024-03-27 20:43:10 5033:5033 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:10.6031809Z STAGE:2024-03-27 20:43:10 5034:5034 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:10.6043806Z STAGE:2024-03-27 20:43:10 5033:5033 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:12.1991073Z STAGE:2024-03-27 20:43:12 5034:5034 ActivityProfilerController.cpp:320] Completed Stage: CollectionSTAGE:2024-03-27 20:43:12 5033:5033 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:12.1993289Z STAGE:2024-03-27 20:43:12 5033:5033 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:12.2021937Z 2024-03-27T20:43:12.2028951Z STAGE:2024-03-27 20:43:12 5034:5034 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:12.2533232Z STAGE:2024-03-27 20:43:12 5033:5033 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:12.2587854Z STAGE:2024-03-27 20:43:12 5034:5034 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:12.9694795Z STAGE:2024-03-27 20:43:12 5033:5033 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:12.9696027Z STAGE:2024-03-27 20:43:12 5033:5033 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:12.9696672Z STAGE:2024-03-27 20:43:12 5034:5034 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:12.9698192Z STAGE:2024-03-27 20:43:12 5034:5034 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:12.9805559Z STAGE:2024-03-27 20:43:12 5034:5034 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:12.9808201Z STAGE:2024-03-27 20:43:12 5033:5033 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:12.9853915Z STAGE:2024-03-27 20:43:12 5034:5034 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:12.9855976Z STAGE:2024-03-27 20:43:12 5034:5034 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:12.9857869Z STAGE:2024-03-27 20:43:12 5033:5033 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:12.9860418Z STAGE:2024-03-27 20:43:12 5033:5033 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:12.9962691Z STAGE:2024-03-27 20:43:12 5034:5034 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:12.9970992Z STAGE:2024-03-27 20:43:12 5033:5033 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:13.0013787Z STAGE:2024-03-27 20:43:13 5034:5034 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:13.0016075Z STAGE:2024-03-27 20:43:13 5034:5034 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:13.0017787Z STAGE:2024-03-27 20:43:13 5033:5033 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:13.0020119Z STAGE:2024-03-27 20:43:13 5033:5033 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:13.0132185Z STAGE:2024-03-27 20:43:13 5034:5034 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:13.0145413Z STAGE:2024-03-27 20:43:13 5033:5033 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:13.0178477Z STAGE:2024-03-27 20:43:13 5034:5034 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:13.0180433Z STAGE:2024-03-27 20:43:13 5034:5034 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:13.0182347Z STAGE:2024-03-27 20:43:13 5033:5033 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:13.0185286Z STAGE:2024-03-27 20:43:13 5033:5033 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:15.2132964Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_with_no_sync_grad_accumulation_executor_nvfuser_bucket_size_in_mb_25_dataset_size_1 status >>> 0 2024-03-27T20:43:20.9084026Z STAGE:2024-03-27 20:43:20 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:20.9289096Z STAGE:2024-03-27 20:43:20 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:23.2967603Z STAGE:2024-03-27 20:43:23 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:23.2969819Z STAGE:2024-03-27 20:43:23 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:23.3030501Z STAGE:2024-03-27 20:43:23 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:23.3037601Z STAGE:2024-03-27 20:43:23 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:23.3168362Z STAGE:2024-03-27 20:43:23 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:23.3233598Z STAGE:2024-03-27 20:43:23 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:23.3278154Z STAGE:2024-03-27 20:43:23 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:23.3279628Z STAGE:2024-03-27 20:43:23 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:23.3283486Z STAGE:2024-03-27 20:43:23 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:23.3284559Z STAGE:2024-03-27 20:43:23 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:23.3415502Z STAGE:2024-03-27 20:43:23 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:23.3417754Z STAGE:2024-03-27 20:43:23 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:24.8212587Z STAGE:2024-03-27 20:43:24 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:24.8213950Z STAGE:2024-03-27 20:43:24 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:24.8333673Z STAGE:2024-03-27 20:43:24 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:24.8335640Z STAGE:2024-03-27 20:43:24 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:24.8434874Z STAGE:2024-03-27 20:43:24 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:24.8485468Z STAGE:2024-03-27 20:43:24 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:24.8487246Z STAGE:2024-03-27 20:43:24 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:24.8570732Z STAGE:2024-03-27 20:43:24 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:24.8572976Z STAGE:2024-03-27 20:43:24 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:24.8615903Z STAGE:2024-03-27 20:43:24 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:24.8617305Z STAGE:2024-03-27 20:43:24 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:24.8622504Z STAGE:2024-03-27 20:43:24 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:24.8625055Z STAGE:2024-03-27 20:43:24 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:24.8711350Z STAGE:2024-03-27 20:43:24 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:24.8751364Z STAGE:2024-03-27 20:43:24 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:24.8752929Z STAGE:2024-03-27 20:43:24 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:24.8781803Z STAGE:2024-03-27 20:43:24 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:24.8930062Z STAGE:2024-03-27 20:43:24 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:26.5636870Z STAGE:2024-03-27 20:43:26 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:26.5638045Z STAGE:2024-03-27 20:43:26 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:26.5641593Z STAGE:2024-03-27 20:43:26 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:26.5645371Z STAGE:2024-03-27 20:43:26 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:26.6218636Z STAGE:2024-03-27 20:43:26 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:26.6235552Z STAGE:2024-03-27 20:43:26 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:26.6264153Z STAGE:2024-03-27 20:43:26 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:26.6265865Z STAGE:2024-03-27 20:43:26 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:26.6290894Z STAGE:2024-03-27 20:43:26 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:26.6292304Z STAGE:2024-03-27 20:43:26 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:26.6352445Z STAGE:2024-03-27 20:43:26 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:26.6388383Z STAGE:2024-03-27 20:43:26 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:26.6394173Z STAGE:2024-03-27 20:43:26 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:26.6395614Z STAGE:2024-03-27 20:43:26 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:26.6438760Z STAGE:2024-03-27 20:43:26 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:26.6440456Z STAGE:2024-03-27 20:43:26 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:26.6477173Z STAGE:2024-03-27 20:43:26 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:26.6521633Z STAGE:2024-03-27 20:43:26 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:26.6523016Z STAGE:2024-03-27 20:43:26 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:26.6531101Z STAGE:2024-03-27 20:43:26 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:26.6586572Z STAGE:2024-03-27 20:43:26 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:26.6587919Z STAGE:2024-03-27 20:43:26 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:26.6607010Z STAGE:2024-03-27 20:43:26 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:26.6679464Z STAGE:2024-03-27 20:43:26 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:26.6728117Z STAGE:2024-03-27 20:43:26 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:26.6729371Z STAGE:2024-03-27 20:43:26 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:26.6730643Z STAGE:2024-03-27 20:43:26 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:26.6732669Z STAGE:2024-03-27 20:43:26 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:26.6868753Z STAGE:2024-03-27 20:43:26 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:26.6881485Z STAGE:2024-03-27 20:43:26 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:27.3968277Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:27.3971209Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:27.3979017Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:27.4000532Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:27.4098221Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:27.4118383Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:27.4181664Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:27.4182464Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:27.4184054Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:27.4184875Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:27.4302248Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:27.4331968Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:27.4399546Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:27.4403685Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:27.4406348Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:27.4457474Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:27.4530326Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:27.4576633Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:27.4615586Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:27.4652166Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:27.4652756Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:27.4715402Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:27.4780711Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:27.4807104Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:27.4839493Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:27.4849885Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:27.4850583Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:27.4873931Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:27.4993068Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:27.5004092Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:27.5041559Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:27.5042944Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:27.5047790Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:27.5049608Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:27.6201555Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:27.6557489Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:27.6606911Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:27.6609241Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:27.6612731Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:27.6615226Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:27.6717221Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:27.6721724Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:27.6762399Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:27.6763358Z STAGE:2024-03-27 20:43:27 5276:5276 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:27.6770295Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:27.6772607Z STAGE:2024-03-27 20:43:27 5275:5275 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:29.7214052Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_with_no_sync_grad_accumulation_executor_nvfuser_bucket_size_in_mb_25_dataset_size_2 status >>> 0 2024-03-27T20:43:35.3713117Z STAGE:2024-03-27 20:43:35 5517:5517 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:35.3849309Z STAGE:2024-03-27 20:43:35 5518:5518 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:36.5125025Z STAGE:2024-03-27 20:43:36 5517:5517 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:36.5127338Z STAGE:2024-03-27 20:43:36 5518:5518 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:36.5128290Z STAGE:2024-03-27 20:43:36 5517:5517 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:36.5132368Z STAGE:2024-03-27 20:43:36 5518:5518 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:36.5449150Z STAGE:2024-03-27 20:43:36 5517:5517 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:36.5470095Z STAGE:2024-03-27 20:43:36 5518:5518 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:37.3925762Z STAGE:2024-03-27 20:43:37 5517:5517 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:37.3926900Z STAGE:2024-03-27 20:43:37 5517:5517 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:37.4062799Z STAGE:2024-03-27 20:43:37 5517:5517 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:37.4103456Z STAGE:2024-03-27 20:43:37 5517:5517 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:37.4105531Z STAGE:2024-03-27 20:43:37 5517:5517 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:37.4229423Z STAGE:2024-03-27 20:43:37 5517:5517 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:37.4277834Z STAGE:2024-03-27 20:43:37 5517:5517 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:37.4278724Z STAGE:2024-03-27 20:43:37 5518:5518 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:37.4280100Z STAGE:2024-03-27 20:43:37 5517:5517 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:37.4281269Z STAGE:2024-03-27 20:43:37 5518:5518 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:37.4412187Z STAGE:2024-03-27 20:43:37 5518:5518 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:37.4454841Z STAGE:2024-03-27 20:43:37 5518:5518 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:37.4456674Z STAGE:2024-03-27 20:43:37 5518:5518 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:37.4494181Z STAGE:2024-03-27 20:43:37 5517:5517 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:37.4579401Z STAGE:2024-03-27 20:43:37 5518:5518 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:37.4632416Z STAGE:2024-03-27 20:43:37 5518:5518 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:37.4634558Z STAGE:2024-03-27 20:43:37 5518:5518 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:37.4852926Z STAGE:2024-03-27 20:43:37 5518:5518 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:38.3234770Z STAGE:2024-03-27 20:43:38 5517:5517 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:38.3235634Z STAGE:2024-03-27 20:43:38 5517:5517 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:38.3236070Z STAGE:2024-03-27 20:43:38 5518:5518 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:38.3240023Z STAGE:2024-03-27 20:43:38 5518:5518 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:38.3783873Z STAGE:2024-03-27 20:43:38 5518:5518 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:38.3793328Z STAGE:2024-03-27 20:43:38 5517:5517 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:39.0652310Z STAGE:2024-03-27 20:43:39 5518:5518 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:39.0653056Z STAGE:2024-03-27 20:43:39 5518:5518 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:39.0653453Z STAGE:2024-03-27 20:43:39 5517:5517 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:39.0656692Z STAGE:2024-03-27 20:43:39 5517:5517 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:39.0840855Z STAGE:2024-03-27 20:43:39 5518:5518 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:39.0841462Z STAGE:2024-03-27 20:43:39 5517:5517 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:39.0890420Z STAGE:2024-03-27 20:43:39 5517:5517 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:39.0891547Z STAGE:2024-03-27 20:43:39 5518:5518 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:39.0891966Z STAGE:2024-03-27 20:43:39 5517:5517 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:39.0893485Z STAGE:2024-03-27 20:43:39 5518:5518 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:39.1074621Z STAGE:2024-03-27 20:43:39 5517:5517 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:39.1079833Z STAGE:2024-03-27 20:43:39 5518:5518 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:39.1126694Z STAGE:2024-03-27 20:43:39 5517:5517 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:39.1129471Z STAGE:2024-03-27 20:43:39 5518:5518 ActivityProfilerController.cpp:320] Completed Stage: CollectionSTAGE:2024-03-27 20:43:39 5517:5517 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:39.1129761Z 2024-03-27T20:43:39.1132437Z STAGE:2024-03-27 20:43:39 5518:5518 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:39.1328015Z STAGE:2024-03-27 20:43:39 5517:5517 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:39.1336476Z STAGE:2024-03-27 20:43:39 5518:5518 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:39.1374158Z STAGE:2024-03-27 20:43:39 5517:5517 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:39.1376032Z STAGE:2024-03-27 20:43:39 5517:5517 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:39.1376696Z STAGE:2024-03-27 20:43:39 5518:5518 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:39.1380217Z STAGE:2024-03-27 20:43:39 5518:5518 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:41.1231959Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_with_no_sync_grad_accumulation_executor_torch_bucket_size_in_mb_0_dataset_size_1 status >>> 0 2024-03-27T20:43:46.7807708Z STAGE:2024-03-27 20:43:46 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:46.8094975Z STAGE:2024-03-27 20:43:46 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:47.8901196Z STAGE:2024-03-27 20:43:47 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:47.8902026Z STAGE:2024-03-27 20:43:47 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:47.8902455Z STAGE:2024-03-27 20:43:47 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:47.8906810Z STAGE:2024-03-27 20:43:47 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:47.9193263Z STAGE:2024-03-27 20:43:47 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:47.9202436Z STAGE:2024-03-27 20:43:47 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:47.9245459Z STAGE:2024-03-27 20:43:47 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:47.9247549Z STAGE:2024-03-27 20:43:47 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:47.9248171Z STAGE:2024-03-27 20:43:47 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:47.9250407Z STAGE:2024-03-27 20:43:47 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:47.9453364Z STAGE:2024-03-27 20:43:47 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:47.9455533Z STAGE:2024-03-27 20:43:47 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:48.7757500Z STAGE:2024-03-27 20:43:48 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:48.7758946Z STAGE:2024-03-27 20:43:48 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:48.7890808Z STAGE:2024-03-27 20:43:48 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:48.7921962Z STAGE:2024-03-27 20:43:48 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:48.7923049Z STAGE:2024-03-27 20:43:48 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:48.8024762Z STAGE:2024-03-27 20:43:48 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:48.8026709Z STAGE:2024-03-27 20:43:48 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:48.8053245Z STAGE:2024-03-27 20:43:48 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:48.8098350Z STAGE:2024-03-27 20:43:48 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:48.8099807Z STAGE:2024-03-27 20:43:48 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:48.8163393Z STAGE:2024-03-27 20:43:48 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:48.8195187Z STAGE:2024-03-27 20:43:48 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:48.8196398Z STAGE:2024-03-27 20:43:48 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:48.8314646Z STAGE:2024-03-27 20:43:48 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:48.8332253Z STAGE:2024-03-27 20:43:48 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:48.8371138Z STAGE:2024-03-27 20:43:48 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:48.8372252Z STAGE:2024-03-27 20:43:48 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:48.8584311Z STAGE:2024-03-27 20:43:48 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:49.7113956Z STAGE:2024-03-27 20:43:49 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:49.7115546Z STAGE:2024-03-27 20:43:49 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:49.7116435Z STAGE:2024-03-27 20:43:49 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:49.7119540Z STAGE:2024-03-27 20:43:49 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:49.7650865Z STAGE:2024-03-27 20:43:49 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:49.7652030Z STAGE:2024-03-27 20:43:49 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:49.7697404Z STAGE:2024-03-27 20:43:49 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:49.7698102Z STAGE:2024-03-27 20:43:49 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:49.7698570Z STAGE:2024-03-27 20:43:49 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:49.7704731Z STAGE:2024-03-27 20:43:49 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:49.7833643Z STAGE:2024-03-27 20:43:49 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:49.7835587Z STAGE:2024-03-27 20:43:49 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:49.7873737Z STAGE:2024-03-27 20:43:49 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:49.7874843Z STAGE:2024-03-27 20:43:49 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:49.7875758Z STAGE:2024-03-27 20:43:49 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:49.7876642Z STAGE:2024-03-27 20:43:49 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:49.7996804Z STAGE:2024-03-27 20:43:49 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:49.7997876Z STAGE:2024-03-27 20:43:49 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:49.8036124Z STAGE:2024-03-27 20:43:49 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:49.8037735Z STAGE:2024-03-27 20:43:49 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:49.8038827Z STAGE:2024-03-27 20:43:49 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:49.8039736Z STAGE:2024-03-27 20:43:49 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:49.8163547Z STAGE:2024-03-27 20:43:49 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:49.8199572Z STAGE:2024-03-27 20:43:49 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:49.8251515Z STAGE:2024-03-27 20:43:49 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:49.8252619Z STAGE:2024-03-27 20:43:49 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:49.8254236Z STAGE:2024-03-27 20:43:49 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:49.8258261Z STAGE:2024-03-27 20:43:49 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:49.8468121Z STAGE:2024-03-27 20:43:49 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:49.8482237Z STAGE:2024-03-27 20:43:49 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:50.5859832Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:50.5861141Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:50.5863270Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:50.5868376Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:50.6045736Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:50.6064699Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:50.6117533Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:50.6119297Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:50.6124354Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:50.6127053Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:50.6301887Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:50.6312584Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:50.6369724Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:50.6371908Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:50.6373020Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:50.6375517Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:50.7705209Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:50.7860255Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:50.7917854Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:50.7921150Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:50.7924939Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:50.7928750Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:50.8140103Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:50.8151018Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:50.8200379Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:50.8201915Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:50.8203983Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:50.8207333Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:50.8384217Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:50.8390687Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:50.8444411Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:50.8446170Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:50.8450117Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:50.8453449Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:50.8642083Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:50.8657101Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:50.8705653Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:50.8707105Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:50.8708969Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:50.8712770Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:50.8891755Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:50.8898185Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:50.8954965Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:50.8956361Z STAGE:2024-03-27 20:43:50 5743:5743 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:50.8959289Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:50.8962991Z STAGE:2024-03-27 20:43:50 5744:5744 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:52.9661331Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_with_no_sync_grad_accumulation_executor_torch_bucket_size_in_mb_0_dataset_size_2 status >>> 0 2024-03-27T20:43:58.6864936Z STAGE:2024-03-27 20:43:58 5970:5970 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:58.7143305Z STAGE:2024-03-27 20:43:58 5969:5969 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:59.7638445Z STAGE:2024-03-27 20:43:59 5969:5969 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:59.7641812Z STAGE:2024-03-27 20:43:59 5969:5969 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:59.7708266Z STAGE:2024-03-27 20:43:59 5970:5970 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:43:59.7715242Z STAGE:2024-03-27 20:43:59 5970:5970 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:43:59.7853386Z STAGE:2024-03-27 20:43:59 5969:5969 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:43:59.7937178Z STAGE:2024-03-27 20:43:59 5970:5970 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:00.5899194Z STAGE:2024-03-27 20:44:00 5969:5969 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:00.5900547Z STAGE:2024-03-27 20:44:00 5969:5969 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:00.6027986Z STAGE:2024-03-27 20:44:00 5969:5969 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:00.6066104Z STAGE:2024-03-27 20:44:00 5969:5969 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:00.6067458Z STAGE:2024-03-27 20:44:00 5969:5969 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:00.6079191Z STAGE:2024-03-27 20:44:00 5970:5970 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:00.6082961Z STAGE:2024-03-27 20:44:00 5970:5970 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:00.6193333Z STAGE:2024-03-27 20:44:00 5969:5969 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:00.6212680Z STAGE:2024-03-27 20:44:00 5970:5970 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:00.6234854Z STAGE:2024-03-27 20:44:00 5969:5969 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:00.6238472Z STAGE:2024-03-27 20:44:00 5969:5969 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:00.6247634Z STAGE:2024-03-27 20:44:00 5970:5970 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:00.6248730Z STAGE:2024-03-27 20:44:00 5970:5970 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:00.6377018Z STAGE:2024-03-27 20:44:00 5970:5970 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:00.6417507Z STAGE:2024-03-27 20:44:00 5970:5970 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:00.6420774Z STAGE:2024-03-27 20:44:00 5970:5970 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:00.6431442Z STAGE:2024-03-27 20:44:00 5969:5969 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:00.6656650Z STAGE:2024-03-27 20:44:00 5970:5970 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:01.5148109Z STAGE:2024-03-27 20:44:01 5969:5969 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:01.5149500Z STAGE:2024-03-27 20:44:01 5969:5969 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:01.5153333Z STAGE:2024-03-27 20:44:01 5970:5970 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:01.5155714Z STAGE:2024-03-27 20:44:01 5970:5970 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:01.5664732Z STAGE:2024-03-27 20:44:01 5969:5969 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:01.5679384Z STAGE:2024-03-27 20:44:01 5970:5970 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:02.2528934Z STAGE:2024-03-27 20:44:02 5969:5969 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:02.2529900Z STAGE:2024-03-27 20:44:02 5970:5970 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:02.2530470Z STAGE:2024-03-27 20:44:02 5969:5969 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:02.2531679Z STAGE:2024-03-27 20:44:02 5970:5970 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:02.2693649Z STAGE:2024-03-27 20:44:02 5970:5970 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:02.2694593Z STAGE:2024-03-27 20:44:02 5969:5969 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:02.2739764Z STAGE:2024-03-27 20:44:02 5970:5970 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:02.2740621Z STAGE:2024-03-27 20:44:02 5970:5970 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:02.2742506Z STAGE:2024-03-27 20:44:02 5969:5969 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:02.2745850Z STAGE:2024-03-27 20:44:02 5969:5969 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:02.2896137Z STAGE:2024-03-27 20:44:02 5970:5970 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:02.2908113Z STAGE:2024-03-27 20:44:02 5969:5969 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:02.2952045Z STAGE:2024-03-27 20:44:02 5970:5970 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:02.2953412Z STAGE:2024-03-27 20:44:02 5970:5970 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:02.2954874Z STAGE:2024-03-27 20:44:02 5969:5969 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:02.2958585Z STAGE:2024-03-27 20:44:02 5969:5969 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:02.3112862Z STAGE:2024-03-27 20:44:02 5970:5970 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:02.3116628Z STAGE:2024-03-27 20:44:02 5969:5969 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:02.3161591Z STAGE:2024-03-27 20:44:02 5970:5970 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:02.3162696Z STAGE:2024-03-27 20:44:02 5970:5970 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:02.3164629Z STAGE:2024-03-27 20:44:02 5969:5969 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:02.3167957Z STAGE:2024-03-27 20:44:02 5969:5969 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:04.4339308Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_with_no_sync_grad_accumulation_executor_torch_bucket_size_in_mb_25_dataset_size_1 status >>> 0 2024-03-27T20:44:09.9841024Z STAGE:2024-03-27 20:44:09 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:09.9888045Z STAGE:2024-03-27 20:44:09 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:11.0805814Z STAGE:2024-03-27 20:44:11 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:11.0807798Z STAGE:2024-03-27 20:44:11 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:11.0875150Z STAGE:2024-03-27 20:44:11 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:11.0880589Z STAGE:2024-03-27 20:44:11 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:11.1008849Z STAGE:2024-03-27 20:44:11 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:11.1091879Z STAGE:2024-03-27 20:44:11 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:11.1130836Z STAGE:2024-03-27 20:44:11 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:11.1131804Z STAGE:2024-03-27 20:44:11 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:11.1132579Z STAGE:2024-03-27 20:44:11 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:11.1133349Z STAGE:2024-03-27 20:44:11 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:11.1311418Z STAGE:2024-03-27 20:44:11 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:11.1312370Z STAGE:2024-03-27 20:44:11 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:11.9960729Z STAGE:2024-03-27 20:44:11 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:11.9961764Z STAGE:2024-03-27 20:44:11 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:12.0090952Z STAGE:2024-03-27 20:44:12 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:12.0117281Z STAGE:2024-03-27 20:44:12 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:12.0120006Z STAGE:2024-03-27 20:44:12 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:12.0246371Z STAGE:2024-03-27 20:44:12 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:12.0252310Z STAGE:2024-03-27 20:44:12 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:12.0254535Z STAGE:2024-03-27 20:44:12 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:12.0278621Z STAGE:2024-03-27 20:44:12 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:12.0280880Z STAGE:2024-03-27 20:44:12 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:12.0385583Z STAGE:2024-03-27 20:44:12 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:12.0413611Z STAGE:2024-03-27 20:44:12 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:12.0415589Z STAGE:2024-03-27 20:44:12 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:12.0511621Z STAGE:2024-03-27 20:44:12 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:12.0540916Z STAGE:2024-03-27 20:44:12 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:12.0572638Z STAGE:2024-03-27 20:44:12 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:12.0574285Z STAGE:2024-03-27 20:44:12 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:12.0773186Z STAGE:2024-03-27 20:44:12 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:12.9629273Z STAGE:2024-03-27 20:44:12 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:12.9630511Z STAGE:2024-03-27 20:44:12 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:12.9631354Z STAGE:2024-03-27 20:44:12 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:12.9635298Z STAGE:2024-03-27 20:44:12 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:13.0135054Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:13.0135958Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:13.0187129Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:13.0189325Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:13.0189764Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:13.0192918Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:13.0313913Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:13.0314394Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:13.0359856Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:13.0361071Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:13.0363048Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:13.0365479Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:13.0498431Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:13.0500911Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:13.0534576Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:13.0536017Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:13.0537060Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:13.0539227Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:13.0667420Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:13.0668497Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:13.0719969Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: CollectionSTAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:13.0720275Z 2024-03-27T20:44:13.0721369Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:13.0721809Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:13.0911598Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:13.0915281Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:13.7712178Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:13.7713286Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:13.7713920Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:13.7716229Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:13.7869310Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:13.7874846Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:13.7921788Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:13.7923622Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:13.7926599Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:13.7928992Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:13.8077074Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:13.8088003Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:13.8146835Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:13.8148837Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:13.8153553Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:13.8155540Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:13.8302333Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:13.8319820Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:13.8382740Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:13.8385598Z STAGE:2024-03-27 20:44:13 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:13.8388695Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:13.8390747Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:13.9744058Z STAGE:2024-03-27 20:44:13 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:14.0264640Z STAGE:2024-03-27 20:44:14 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:14.0301663Z STAGE:2024-03-27 20:44:14 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:14.0305141Z STAGE:2024-03-27 20:44:14 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:14.0306092Z STAGE:2024-03-27 20:44:14 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:14.0307609Z STAGE:2024-03-27 20:44:14 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:14.0462205Z STAGE:2024-03-27 20:44:14 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:14.0463103Z STAGE:2024-03-27 20:44:14 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:14.0498887Z STAGE:2024-03-27 20:44:14 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:14.0503506Z STAGE:2024-03-27 20:44:14 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: CollectionSTAGE:2024-03-27 20:44:14 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:14.0504410Z 2024-03-27T20:44:14.0505672Z STAGE:2024-03-27 20:44:14 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:14.0657959Z STAGE:2024-03-27 20:44:14 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:14.0674864Z STAGE:2024-03-27 20:44:14 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:14.0695684Z STAGE:2024-03-27 20:44:14 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:14.0697450Z STAGE:2024-03-27 20:44:14 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:14.0698071Z STAGE:2024-03-27 20:44:14 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:14.0700753Z STAGE:2024-03-27 20:44:14 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:14.0858643Z STAGE:2024-03-27 20:44:14 6195:6195 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:14.0883760Z STAGE:2024-03-27 20:44:14 6196:6196 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2024-03-27T20:44:14.0940687Z STAGE:2024-03-27 20:44:14 6195:6195 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:14.0942424Z STAGE:2024-03-27 20:44:14 6195:6195 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:14.0943363Z STAGE:2024-03-27 20:44:14 6196:6196 ActivityProfilerController.cpp:320] Completed Stage: Collection 2024-03-27T20:44:14.0944268Z STAGE:2024-03-27 20:44:14 6196:6196 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2024-03-27T20:44:16.3003943Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_ddp_with_no_sync_grad_accumulation_executor_torch_bucket_size_in_mb_25_dataset_size_2 status >>> 0 2024-03-27T20:44:24.8390220Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_broadcast_from status >>> 0 2024-03-27T20:44:33.5385278Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] Caught exception: 2024-03-27T20:44:33.5386865Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] Traceback (most recent call last): 2024-03-27T20:44:33.5388759Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:44:33.5389865Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] getattr(self, test_name)() 2024-03-27T20:44:33.5391598Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:44:33.5392601Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] fn() 2024-03-27T20:44:33.5394029Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:44:33.5394937Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] method(*args, **kwargs) 2024-03-27T20:44:33.5396870Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 440, in instantiated_test 2024-03-27T20:44:33.5398051Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] test(self, **param_kwargs) 2024-03-27T20:44:33.5399063Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 712, in test_fsdp_grad_parity_with_without_bucketing 2024-03-27T20:44:33.5400184Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] self.assertTrue(has_pack_multiple_tensors, msg=f"{[bsym.args[0] for bsym in pack_bsyms]=}") 2024-03-27T20:44:33.5400938Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue 2024-03-27T20:44:33.5401567Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] raise self.failureException(msg) 2024-03-27T20:44:33.5402324Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] AssertionError: False is not true : [bsym.args[0] for bsym in pack_bsyms]=[[t_net1_bias], [t_net2_bias], [t_net1_weight], [t_net2_weight]] 2024-03-27T20:44:33.5402950Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:44:33.5403562Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] To execute this test, run the following from the base repo dir: 2024-03-27T20:44:33.5404299Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] python test_ddp.py -k test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero2 2024-03-27T20:44:33.5404892Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:44:33.5405551Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:44:33.5406184Z [rank0]:[2024-03-27 20:44:33,537] torch.testing._internal.common_distributed: [ERROR] exiting process 0 with exit code: 10 2024-03-27T20:44:33.5436797Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] Caught exception: 2024-03-27T20:44:33.5437497Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] Traceback (most recent call last): 2024-03-27T20:44:33.5438378Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:44:33.5438852Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] getattr(self, test_name)() 2024-03-27T20:44:33.5439348Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:44:33.5439967Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] fn() 2024-03-27T20:44:33.5440438Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:44:33.5440965Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] method(*args, **kwargs) 2024-03-27T20:44:33.5441448Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 440, in instantiated_test 2024-03-27T20:44:33.5441881Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] test(self, **param_kwargs) 2024-03-27T20:44:33.5442364Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 712, in test_fsdp_grad_parity_with_without_bucketing 2024-03-27T20:44:33.5442879Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] self.assertTrue(has_pack_multiple_tensors, msg=f"{[bsym.args[0] for bsym in pack_bsyms]=}") 2024-03-27T20:44:33.5443350Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue 2024-03-27T20:44:33.5443760Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] raise self.failureException(msg) 2024-03-27T20:44:33.5444271Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] AssertionError: False is not true : [bsym.args[0] for bsym in pack_bsyms]=[[t_net1_bias], [t_net2_bias], [t_net1_weight], [t_net2_weight]] 2024-03-27T20:44:33.5444678Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:44:33.5445078Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] To execute this test, run the following from the base repo dir: 2024-03-27T20:44:33.5445573Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] python test_ddp.py -k test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero2 2024-03-27T20:44:33.5445973Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:44:33.5446393Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:44:33.5446809Z [rank1]:[2024-03-27 20:44:33,543] torch.testing._internal.common_distributed: [ERROR] exiting process 1 with exit code: 10 2024-03-27T20:44:35.1475270Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero2 status >>> 1 2024-03-27T20:44:35.1485254Z ============================= test session starts ============================== 2024-03-27T20:44:35.1485613Z platform linux -- Python 3.10.12, pytest-8.0.2, pluggy-1.4.0 -- /usr/bin/python 2024-03-27T20:44:35.1485869Z cachedir: .pytest_cache 2024-03-27T20:44:35.1486177Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/__w/5/s/.hypothesis/examples')) 2024-03-27T20:44:35.1486543Z Test order randomisation NOT enabled. Enable with --random-order or --random-order-bucket= 2024-03-27T20:44:35.1486755Z rootdir: /__w/5/s 2024-03-27T20:44:35.1486918Z configfile: pyproject.toml 2024-03-27T20:44:35.1487254Z plugins: hypothesis-6.99.10, timeout-2.2.0, cov-4.1.0, timestamper-0.0.9, random-order-1.1.1, xdist-3.5.0 2024-03-27T20:44:35.1487461Z timeout: 900.0s 2024-03-27T20:44:35.1487623Z timeout method: signal 2024-03-27T20:44:35.1487784Z timeout func_only: False 2024-03-27T20:44:35.1488013Z collecting ... collected 1 item 2024-03-27T20:44:35.1488599Z 2024-03-27T20:44:35.1489072Z [2024-03-27 20:44:27] thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero2 Process 0 terminated with exit code 10, terminating remaining processes. 2024-03-27T20:44:35.1489539Z FAILED 2024-03-27T20:44:35.1489694Z 2024-03-27T20:44:35.1489925Z =================================== FAILURES =================================== 2024-03-27T20:44:35.1490359Z _ CompileDDPTest.test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero2 _ 2024-03-27T20:44:35.1490838Z /usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py:533: in wrapper 2024-03-27T20:44:35.1491128Z self._join_processes(fn) 2024-03-27T20:44:35.1491504Z /usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py:752: in _join_processes 2024-03-27T20:44:35.1491753Z self._check_return_codes(elapsed_time) 2024-03-27T20:44:35.1491948Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2024-03-27T20:44:35.1492049Z 2024-03-27T20:44:35.1492281Z self = 2024-03-27T20:44:35.1492536Z elapsed_time = 6.816273212432861 2024-03-27T20:44:35.1492625Z 2024-03-27T20:44:35.1492847Z def _check_return_codes(self, elapsed_time) -> None: 2024-03-27T20:44:35.1493017Z """ 2024-03-27T20:44:35.1493194Z Checks that the return codes of all spawned processes match, and skips 2024-03-27T20:44:35.1493406Z tests if they returned a return code indicating a skipping condition. 2024-03-27T20:44:35.1493581Z """ 2024-03-27T20:44:35.1493748Z # If no processes are spawned, there is nothing to check. 2024-03-27T20:44:35.1493933Z if not self.processes: 2024-03-27T20:44:35.1494123Z logger.warning("Note: no subprocesses were spawned, test was likely skipped.") 2024-03-27T20:44:35.1494315Z return 2024-03-27T20:44:35.1494455Z 2024-03-27T20:44:35.1494610Z first_process = self.processes[0] 2024-03-27T20:44:35.1494794Z # first, we check if there are errors in actual processes 2024-03-27T20:44:35.1494997Z # (via TEST_ERROR_EXIT CODE), and raise an exception for those. 2024-03-27T20:44:35.1495206Z # the reason we do this is to attempt to raise a more helpful error 2024-03-27T20:44:35.1495406Z # message than "Process x terminated/timed out" 2024-03-27T20:44:35.1495606Z # TODO: we should pipe the exception of the failed subprocess here. 2024-03-27T20:44:35.1495807Z # Currently, the actual exception is displayed as a logging output. 2024-03-27T20:44:35.1495994Z errored_processes = [ 2024-03-27T20:44:35.1496149Z (i, p) 2024-03-27T20:44:35.1496310Z for i, p in enumerate(self.processes) 2024-03-27T20:44:35.1496502Z if p.exitcode == MultiProcessTestCase.TEST_ERROR_EXIT_CODE 2024-03-27T20:44:35.1496673Z ] 2024-03-27T20:44:35.1496817Z if errored_processes: 2024-03-27T20:44:35.1496974Z error = "" 2024-03-27T20:44:35.1497130Z for i, process in errored_processes: 2024-03-27T20:44:35.1497300Z # Get error from pipe. 2024-03-27T20:44:35.1497481Z error_message = self.pid_to_pipe[process.pid].recv() 2024-03-27T20:44:35.1497657Z error += ( 2024-03-27T20:44:35.1497830Z "Process {} exited with error code {} and exception:\n{}\n".format( 2024-03-27T20:44:35.1498050Z i, MultiProcessTestCase.TEST_ERROR_EXIT_CODE, error_message 2024-03-27T20:44:35.1498228Z ) 2024-03-27T20:44:35.1498366Z ) 2024-03-27T20:44:35.1498494Z 2024-03-27T20:44:35.1498643Z > raise RuntimeError(error) 2024-03-27T20:44:35.1499075Z E RuntimeError: Process 0 exited with error code 10 and exception: 2024-03-27T20:44:35.1499432Z E Traceback (most recent call last): 2024-03-27T20:44:35.1499790Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:44:35.1500136Z E getattr(self, test_name)() 2024-03-27T20:44:35.1500481Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:44:35.1500748Z E fn() 2024-03-27T20:44:35.1501072Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:44:35.1501361Z E method(*args, **kwargs) 2024-03-27T20:44:35.1501700Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 440, in instantiated_test 2024-03-27T20:44:35.1502002Z E test(self, **param_kwargs) 2024-03-27T20:44:35.1502343Z E File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 712, in test_fsdp_grad_parity_with_without_bucketing 2024-03-27T20:44:35.1502711Z E self.assertTrue(has_pack_multiple_tensors, msg=f"{[bsym.args[0] for bsym in pack_bsyms]=}") 2024-03-27T20:44:35.1503043Z E File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue 2024-03-27T20:44:35.1503313Z E raise self.failureException(msg) 2024-03-27T20:44:35.1503676Z E AssertionError: False is not true : [bsym.args[0] for bsym in pack_bsyms]=[[t_net1_bias], [t_net2_bias], [t_net1_weight], [t_net2_weight]] 2024-03-27T20:44:35.1503944Z E  2024-03-27T20:44:35.1504203Z E To execute this test, run the following from the base repo dir: 2024-03-27T20:44:35.1504561Z E python test_ddp.py -k test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero2 2024-03-27T20:44:35.1504815Z E  2024-03-27T20:44:35.1505085Z E This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:44:35.1505218Z 2024-03-27T20:44:35.1505515Z /usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py:802: RuntimeError 2024-03-27T20:44:35.1505966Z - generated xml file: /__w/5/s/thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero2-results.xml - 2024-03-27T20:44:35.1506354Z =========================== short test summary info ============================ 2024-03-27T20:44:35.1506847Z FAILED thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero2 - RuntimeError: Process 0 exited with error code 10 and exception: 2024-03-27T20:44:35.1507159Z Traceback (most recent call last): 2024-03-27T20:44:35.1507476Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:44:35.1507700Z getattr(self, test_name)() 2024-03-27T20:44:35.1508002Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:44:35.1508207Z fn() 2024-03-27T20:44:35.1508495Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:44:35.1508711Z method(*args, **kwargs) 2024-03-27T20:44:35.1509016Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 440, in instantiated_test 2024-03-27T20:44:35.1509237Z test(self, **param_kwargs) 2024-03-27T20:44:35.1509456Z File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 712, in test_fsdp_grad_parity_with_without_bucketing 2024-03-27T20:44:35.1509791Z self.assertTrue(has_pack_multiple_tensors, msg=f"{[bsym.args[0] for bsym in pack_bsyms]=}") 2024-03-27T20:44:35.1510066Z File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue 2024-03-27T20:44:35.1510252Z raise self.failureException(msg) 2024-03-27T20:44:35.1510486Z AssertionError: False is not true : [bsym.args[0] for bsym in pack_bsyms]=[[t_net1_bias], [t_net2_bias], [t_net1_weight], [t_net2_weight]] 2024-03-27T20:44:35.1510642Z 2024-03-27T20:44:35.1510802Z To execute this test, run the following from the base repo dir: 2024-03-27T20:44:35.1511132Z python test_ddp.py -k test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero2 2024-03-27T20:44:35.1511277Z 2024-03-27T20:44:35.1511448Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:44:35.1511788Z ============================== 1 failed in 9.40s =============================== 2024-03-27T20:44:43.6155042Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] Caught exception: 2024-03-27T20:44:43.6157059Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] Traceback (most recent call last): 2024-03-27T20:44:43.6158291Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:44:43.6159299Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] getattr(self, test_name)() 2024-03-27T20:44:43.6160421Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:44:43.6161349Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] fn() 2024-03-27T20:44:43.6162434Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:44:43.6163401Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] method(*args, **kwargs) 2024-03-27T20:44:43.6164506Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 440, in instantiated_test 2024-03-27T20:44:43.6165485Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] test(self, **param_kwargs) 2024-03-27T20:44:43.6166577Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 712, in test_fsdp_grad_parity_with_without_bucketing 2024-03-27T20:44:43.6167736Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] self.assertTrue(has_pack_multiple_tensors, msg=f"{[bsym.args[0] for bsym in pack_bsyms]=}") 2024-03-27T20:44:43.6168814Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue 2024-03-27T20:44:43.6169745Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] raise self.failureException(msg) 2024-03-27T20:44:43.6170898Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] AssertionError: False is not true : [bsym.args[0] for bsym in pack_bsyms]=[[t_net1_bias], [t_net1_weight], [t_net2_bias], [t_net2_weight]] 2024-03-27T20:44:43.6171839Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:44:43.6172754Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] To execute this test, run the following from the base repo dir: 2024-03-27T20:44:43.6174553Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] python test_ddp.py -k test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero3 2024-03-27T20:44:43.6175630Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:44:43.6176579Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:44:43.6177855Z [rank1]:[2024-03-27 20:44:43,614] torch.testing._internal.common_distributed: [ERROR] exiting process 1 with exit code: 10 2024-03-27T20:44:43.6260335Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] Caught exception: 2024-03-27T20:44:43.6262022Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] Traceback (most recent call last): 2024-03-27T20:44:43.6263328Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:44:43.6264483Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] getattr(self, test_name)() 2024-03-27T20:44:43.6265641Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:44:43.6266630Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] fn() 2024-03-27T20:44:43.6267749Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:44:43.6268769Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] method(*args, **kwargs) 2024-03-27T20:44:43.6269924Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 440, in instantiated_test 2024-03-27T20:44:43.6270965Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] test(self, **param_kwargs) 2024-03-27T20:44:43.6272093Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 712, in test_fsdp_grad_parity_with_without_bucketing 2024-03-27T20:44:43.6273283Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] self.assertTrue(has_pack_multiple_tensors, msg=f"{[bsym.args[0] for bsym in pack_bsyms]=}") 2024-03-27T20:44:43.6274378Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue 2024-03-27T20:44:43.6275384Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] raise self.failureException(msg) 2024-03-27T20:44:43.6276599Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] AssertionError: False is not true : [bsym.args[0] for bsym in pack_bsyms]=[[t_net1_bias], [t_net1_weight], [t_net2_bias], [t_net2_weight]] 2024-03-27T20:44:43.6277589Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:44:43.6278559Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] To execute this test, run the following from the base repo dir: 2024-03-27T20:44:43.6279840Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] python test_ddp.py -k test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero3 2024-03-27T20:44:43.6281081Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:44:43.6282080Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:44:43.6283700Z [rank0]:[2024-03-27 20:44:43,625] torch.testing._internal.common_distributed: [ERROR] exiting process 0 with exit code: 10 2024-03-27T20:44:44.2660211Z SIGSEGV(11), PID: 6887, Thread 6887: 2024-03-27T20:44:44.2661201Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.2661745Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2662242Z frame #2: + 0x3feada (0x7ff2ba9feada in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn_ops_infer.so.8) 2024-03-27T20:44:44.2662777Z frame #3: + 0x4073c5 (0x7ff2baa073c5 in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn_ops_infer.so.8) 2024-03-27T20:44:44.2663317Z frame #4: + 0x404538 (0x7ff2baa04538 in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn_ops_infer.so.8) 2024-03-27T20:44:44.2663839Z frame #5: + 0x4059af (0x7ff2baa059af in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn_ops_infer.so.8) 2024-03-27T20:44:44.2664354Z frame #6: + 0x3f9ae8 (0x7ff2ba9f9ae8 in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn_ops_infer.so.8) 2024-03-27T20:44:44.2664874Z frame #7: + 0x3faee5 (0x7ff2ba9faee5 in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn_ops_infer.so.8) 2024-03-27T20:44:44.2665295Z frame #8: + 0x45495 (0x7ff43367e495 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2665671Z frame #9: on_exit + 0 (0x7ff43367e610 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2747223Z frame #10: + 0x2755fb (0x55b5a9eb85fb in /usr/bin/python) 2024-03-27T20:44:44.2748583Z frame #11: + 0x262b6f (0x55b5a9ea5b6f in /usr/bin/python) 2024-03-27T20:44:44.2749276Z frame #12: PyErr_PrintEx + 0x1d (0x55b5a9ea591d in /usr/bin/python) 2024-03-27T20:44:44.2749787Z frame #13: PyRun_SimpleStringFlags + 0x72 (0x55b5a9e95992 in /usr/bin/python) 2024-03-27T20:44:44.2750316Z frame #14: Py_RunMain + 0x375 (0x55b5a9e94b15 in /usr/bin/python) 2024-03-27T20:44:44.2784538Z frame #15: Py_BytesMain + 0x2d (0x55b5a9e6b02d in /usr/bin/python) 2024-03-27T20:44:44.2785674Z frame #16: + 0x29d90 (0x7ff433662d90 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2786383Z frame #17: __libc_start_main + 0x80 (0x7ff433662e40 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2786884Z frame #18: _start + 0x25 (0x55b5a9e6af25 in /usr/bin/python) 2024-03-27T20:44:44.2787169Z 2024-03-27T20:44:44.2787521Z SIGSEGV(11), PID: 6887, Thread 6952: 2024-03-27T20:44:44.2788250Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.2788978Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2789625Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2790366Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2791164Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.2791920Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2793114Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2793469Z 2024-03-27T20:44:44.2793865Z SIGSEGV(11), PID: 6887, Thread 6953: 2024-03-27T20:44:44.2794908Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.2795724Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2796544Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2797283Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2798163Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.2798976Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2799640Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2799947Z 2024-03-27T20:44:44.2800341Z SIGSEGV(11), PID: 6887, Thread 6954: 2024-03-27T20:44:44.2801125Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.2801945Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2802687Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2803427Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2804312Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.2805141Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2805823Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2806116Z 2024-03-27T20:44:44.2806476Z SIGSEGV(11), PID: 6887, Thread 6955: 2024-03-27T20:44:44.2807140Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.2807869Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2808509Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2809213Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2809986Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.2810763Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2811418Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2811745Z 2024-03-27T20:44:44.2899751Z SIGSEGV(11), PID: 6887, Thread 6956: 2024-03-27T20:44:44.2901694Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.2903020Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2903594Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2904144Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2904853Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.2906119Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2906785Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2906999Z 2024-03-27T20:44:44.2981838Z SIGSEGV(11), PID: 6887, Thread 6957: 2024-03-27T20:44:44.2983142Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.2983800Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2984448Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2985164Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2987000Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.2987627Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2988017Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.2988183Z 2024-03-27T20:44:44.3023571Z SIGSEGV(11), PID: 6887, Thread 6958: 2024-03-27T20:44:44.3025385Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3030568Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3031680Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3032416Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3033327Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3034177Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3034879Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3035196Z 2024-03-27T20:44:44.3099210Z SIGSEGV(11), PID: 6887, Thread 6959: 2024-03-27T20:44:44.3103630Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3104785Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3105440Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3106071Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3106857Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3107815Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3108473Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3112878Z 2024-03-27T20:44:44.3113370Z SIGSEGV(11), PID: 6887, Thread 6960: 2024-03-27T20:44:44.3114209Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3114948Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3115586Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3116175Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3117575Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3118484Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3119082Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3119400Z 2024-03-27T20:44:44.3173910Z SIGSEGV(11), PID: 6887, Thread 6961: 2024-03-27T20:44:44.3174466Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3176254Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3176859Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3180738Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3181348Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3181890Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3182334Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3182779Z 2024-03-27T20:44:44.3183049Z SIGSEGV(11), PID: 6887, Thread 6962: 2024-03-27T20:44:44.3183601Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3184136Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3184635Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3187789Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3188435Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3188963Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3189419Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3189636Z 2024-03-27T20:44:44.3189914Z SIGSEGV(11), PID: 6887, Thread 6963: 2024-03-27T20:44:44.3190452Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3190983Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3191464Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3191932Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3192509Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3193037Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3193492Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3193707Z 2024-03-27T20:44:44.3260633Z SIGSEGV(11), PID: 6887, Thread 6964: 2024-03-27T20:44:44.3300105Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3316765Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3317749Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3318389Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3318982Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3319517Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3319984Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3320193Z 2024-03-27T20:44:44.3320455Z SIGSEGV(11), PID: 6887, Thread 6965: 2024-03-27T20:44:44.3320989Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3321537Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3322018Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3322492Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3323074Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3323600Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3324041Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3324255Z 2024-03-27T20:44:44.3324517Z SIGSEGV(11), PID: 6887, Thread 6966: 2024-03-27T20:44:44.3325039Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3325635Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3326120Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3326598Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3327179Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3327703Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3328145Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3328361Z 2024-03-27T20:44:44.3328608Z SIGSEGV(11), PID: 6887, Thread 6967: 2024-03-27T20:44:44.3329145Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3329676Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3380321Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3381502Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3382134Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3382584Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3382956Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3383087Z 2024-03-27T20:44:44.3383266Z SIGSEGV(11), PID: 6887, Thread 6968: 2024-03-27T20:44:44.3383718Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3386516Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3386987Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3387316Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3387748Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3388118Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3388420Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3388545Z 2024-03-27T20:44:44.3388695Z SIGSEGV(11), PID: 6887, Thread 6969: 2024-03-27T20:44:44.3394508Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3394965Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3398821Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3399777Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3403587Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3404262Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3404705Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3404915Z 2024-03-27T20:44:44.3405169Z SIGSEGV(11), PID: 6887, Thread 6970: 2024-03-27T20:44:44.3405645Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3406117Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3406544Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3406962Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3407616Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3408088Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3408496Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3408689Z 2024-03-27T20:44:44.3408925Z SIGSEGV(11), PID: 6887, Thread 6971: 2024-03-27T20:44:44.3409391Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3409867Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3410286Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3410750Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3411271Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3411985Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3412411Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3413091Z 2024-03-27T20:44:44.3413335Z SIGSEGV(11), PID: 6887, Thread 6972: 2024-03-27T20:44:44.3413967Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3414454Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3414875Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3415287Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3415796Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3416320Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3416727Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3416923Z 2024-03-27T20:44:44.3417154Z SIGSEGV(11), PID: 6887, Thread 6973: 2024-03-27T20:44:44.3417616Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3418080Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3418508Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3419309Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3420169Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3471923Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3472559Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3472737Z 2024-03-27T20:44:44.3473002Z SIGSEGV(11), PID: 6887, Thread 6974: 2024-03-27T20:44:44.3473547Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3474134Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3474635Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3475120Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3475731Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3476288Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3476743Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3476911Z 2024-03-27T20:44:44.3477122Z SIGSEGV(11), PID: 6887, Thread 6975: 2024-03-27T20:44:44.3477683Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3478235Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3478688Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3479133Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3479731Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3480450Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3481084Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3481300Z 2024-03-27T20:44:44.3481462Z SIGSEGV(11), PID: 6887, Thread 6976: 2024-03-27T20:44:44.3481855Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3482242Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3482572Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3482891Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3483321Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3483760Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3484202Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3484359Z 2024-03-27T20:44:44.3484564Z SIGSEGV(11), PID: 6887, Thread 6977: 2024-03-27T20:44:44.3485030Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3485425Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3485752Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3486081Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3486509Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3486891Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3487210Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3487323Z 2024-03-27T20:44:44.3487499Z SIGSEGV(11), PID: 6887, Thread 6978: 2024-03-27T20:44:44.3487887Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3488266Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3488590Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3488918Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3489350Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3489728Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3490041Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3490153Z 2024-03-27T20:44:44.3562550Z SIGSEGV(11), PID: 6887, Thread 6979: 2024-03-27T20:44:44.3564344Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3564812Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3565159Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3565496Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3566172Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3566613Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3566925Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3567058Z 2024-03-27T20:44:44.3568702Z SIGSEGV(11), PID: 6887, Thread 6980: 2024-03-27T20:44:44.3569127Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3570203Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3570718Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3571171Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3571698Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3572178Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3572568Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3572767Z 2024-03-27T20:44:44.3573022Z SIGSEGV(11), PID: 6887, Thread 6981: 2024-03-27T20:44:44.3573679Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3574253Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3574686Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3575120Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3575648Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3576125Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3576531Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3576732Z 2024-03-27T20:44:44.3577735Z SIGSEGV(11), PID: 6887, Thread 6982: 2024-03-27T20:44:44.3578215Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3578847Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3579332Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3579775Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3580307Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3580786Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3581183Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3581387Z 2024-03-27T20:44:44.3582108Z SIGSEGV(11), PID: 6887, Thread 6983: 2024-03-27T20:44:44.3582587Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3583078Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3583712Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3584172Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3584784Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3585823Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3586237Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3586444Z 2024-03-27T20:44:44.3586702Z SIGSEGV(11), PID: 6887, Thread 6984: 2024-03-27T20:44:44.3587167Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3587651Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3588088Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3588522Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3589036Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3589488Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3589887Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3590086Z 2024-03-27T20:44:44.3590340Z SIGSEGV(11), PID: 6887, Thread 6985: 2024-03-27T20:44:44.3591409Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3591913Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3592340Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3592932Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3593545Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3594046Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3594467Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3594667Z 2024-03-27T20:44:44.3594888Z SIGSEGV(11), PID: 6887, Thread 6986: 2024-03-27T20:44:44.3595359Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3595847Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3596274Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3596698Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3597764Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3598279Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3598698Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3598903Z 2024-03-27T20:44:44.3599151Z SIGSEGV(11), PID: 6887, Thread 6987: 2024-03-27T20:44:44.3599629Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3600216Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3600702Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3601117Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3601614Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3602130Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3602570Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3602770Z 2024-03-27T20:44:44.3603450Z SIGSEGV(11), PID: 6887, Thread 6988: 2024-03-27T20:44:44.3603943Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3604446Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3604888Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3605310Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3605810Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3606303Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3606701Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3606900Z 2024-03-27T20:44:44.3607158Z SIGSEGV(11), PID: 6887, Thread 6989: 2024-03-27T20:44:44.3607634Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3608102Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3608522Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3608937Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3609432Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3609908Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3610383Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3610587Z 2024-03-27T20:44:44.3611800Z SIGSEGV(11), PID: 6887, Thread 6990: 2024-03-27T20:44:44.3612335Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3612843Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3613263Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3613678Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3614176Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3614634Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3615054Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3615368Z 2024-03-27T20:44:44.3616311Z SIGSEGV(11), PID: 6887, Thread 6991: 2024-03-27T20:44:44.3616854Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3617444Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3617878Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3618291Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3618972Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3619479Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3619931Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3620189Z 2024-03-27T20:44:44.3620417Z SIGSEGV(11), PID: 6887, Thread 6992: 2024-03-27T20:44:44.3620895Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3621411Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3622118Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3622645Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3623219Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3623738Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3624192Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3624404Z 2024-03-27T20:44:44.3687230Z SIGSEGV(11), PID: 6887, Thread 6993: 2024-03-27T20:44:44.3688761Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3689281Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3689703Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3690182Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3690701Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3691200Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3691609Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3691808Z 2024-03-27T20:44:44.3692180Z SIGSEGV(11), PID: 6887, Thread 6994: 2024-03-27T20:44:44.3696613Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3697141Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3697563Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3699219Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3700141Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3701196Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3701772Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3702248Z 2024-03-27T20:44:44.3702532Z SIGSEGV(11), PID: 6887, Thread 6995: 2024-03-27T20:44:44.3703028Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3703501Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3703918Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3704351Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3704972Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3705645Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3706129Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3706320Z 2024-03-27T20:44:44.3706561Z SIGSEGV(11), PID: 6887, Thread 6996: 2024-03-27T20:44:44.3707037Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3707610Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3708061Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3708585Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3709279Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3709773Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3710183Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3710368Z 2024-03-27T20:44:44.3710670Z SIGSEGV(11), PID: 6887, Thread 6997: 2024-03-27T20:44:44.3711151Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3711609Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3712033Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3712446Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3712956Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3713433Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3713834Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3714032Z 2024-03-27T20:44:44.3714299Z SIGSEGV(11), PID: 6887, Thread 6998: 2024-03-27T20:44:44.3714931Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3715603Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3716251Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3716753Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3717389Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3717930Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3718322Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3718520Z 2024-03-27T20:44:44.3718760Z SIGSEGV(11), PID: 6887, Thread 6999: 2024-03-27T20:44:44.3719256Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3719914Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3720420Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3720862Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3721376Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3721846Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3722235Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3722435Z 2024-03-27T20:44:44.3722704Z SIGSEGV(11), PID: 6887, Thread 7000: 2024-03-27T20:44:44.3723203Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3723678Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3724098Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3724520Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3725098Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3725560Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3725957Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3726161Z 2024-03-27T20:44:44.3726421Z SIGSEGV(11), PID: 6887, Thread 7001: 2024-03-27T20:44:44.3726900Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3727366Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3727798Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3728219Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3728738Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3729191Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3729591Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3729803Z 2024-03-27T20:44:44.3730039Z SIGSEGV(11), PID: 6887, Thread 7002: 2024-03-27T20:44:44.3730628Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3731200Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3731739Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3732152Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3732730Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3733246Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3733656Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3733861Z 2024-03-27T20:44:44.3734097Z SIGSEGV(11), PID: 6887, Thread 7003: 2024-03-27T20:44:44.3734643Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3735298Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3735750Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3736162Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3736674Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3737134Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3737546Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3737734Z 2024-03-27T20:44:44.3738039Z SIGSEGV(11), PID: 6887, Thread 7004: 2024-03-27T20:44:44.3738592Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3739241Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3739679Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3740092Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3740620Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3741098Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3741646Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3741908Z 2024-03-27T20:44:44.3742162Z SIGSEGV(11), PID: 6887, Thread 7005: 2024-03-27T20:44:44.3742642Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3743119Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3743553Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3744103Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3744859Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3745410Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3745877Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3746134Z 2024-03-27T20:44:44.3747186Z SIGSEGV(11), PID: 6887, Thread 7006: 2024-03-27T20:44:44.3747869Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3748720Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3749320Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3749795Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3750380Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3750901Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3751351Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3751614Z 2024-03-27T20:44:44.3751882Z SIGSEGV(11), PID: 6887, Thread 7007: 2024-03-27T20:44:44.3752425Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3752943Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3753423Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3753947Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3754528Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3755053Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3755502Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3755704Z 2024-03-27T20:44:44.3755968Z SIGSEGV(11), PID: 6887, Thread 7008: 2024-03-27T20:44:44.3756508Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3757025Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3757498Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3757968Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3758547Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3759069Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3759518Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3759723Z 2024-03-27T20:44:44.3759980Z SIGSEGV(11), PID: 6887, Thread 7009: 2024-03-27T20:44:44.3760549Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3761083Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3761557Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3762024Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3762603Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3763122Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3763561Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3763864Z 2024-03-27T20:44:44.3808117Z SIGSEGV(11), PID: 6887, Thread 7010: 2024-03-27T20:44:44.3809694Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3810902Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3811392Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3811857Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3812469Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3812998Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3813442Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3813609Z 2024-03-27T20:44:44.3813796Z SIGSEGV(11), PID: 6887, Thread 7011: 2024-03-27T20:44:44.3814316Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3814869Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3815318Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3815746Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3816301Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3816792Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3817211Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3817367Z 2024-03-27T20:44:44.3817620Z SIGSEGV(11), PID: 6887, Thread 7012: 2024-03-27T20:44:44.3818135Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3818646Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3819289Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3819720Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3820338Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3820845Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3821286Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3821451Z 2024-03-27T20:44:44.3821657Z SIGSEGV(11), PID: 6887, Thread 7013: 2024-03-27T20:44:44.3822182Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3822683Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3823111Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3823537Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3824093Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3824850Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3825315Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3825569Z 2024-03-27T20:44:44.3825811Z SIGSEGV(11), PID: 6887, Thread 7014: 2024-03-27T20:44:44.3826382Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3826949Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3827430Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3882924Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3883956Z frame #4: + 0x3506fb (0x7ff3820036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:44:44.3884530Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3885041Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3885273Z 2024-03-27T20:44:44.3945113Z SIGSEGV(11), PID: 6887, Thread 7015: 2024-03-27T20:44:44.3946026Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3946585Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3946944Z frame #2: __poll + 0x4f (0x7ff433751bcf in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3947283Z frame #3: + 0x2b9cbf (0x7ff38804ecbf in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:44:44.3947733Z frame #4: + 0x37ee2f (0x7ff388113e2f in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:44:44.3948219Z frame #5: + 0x2b4bef (0x7ff388049bef in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:44:44.3948589Z frame #6: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3948907Z frame #7: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3949019Z 2024-03-27T20:44:44.3949203Z SIGSEGV(11), PID: 6887, Thread 7018: 2024-03-27T20:44:44.3949585Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3949960Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3950270Z frame #2: __poll + 0x4f (0x7ff433751bcf in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3950590Z frame #3: + 0x2b9cbf (0x7ff38804ecbf in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:44:44.3950935Z frame #4: + 0x37ee2f (0x7ff388113e2f in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:44:44.3951271Z frame #5: + 0x2b4bef (0x7ff388049bef in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:44:44.3951603Z frame #6: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3951914Z frame #7: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3952026Z 2024-03-27T20:44:44.3952175Z SIGSEGV(11), PID: 6887, Thread 7020: 2024-03-27T20:44:44.3952548Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3952923Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3953284Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3953753Z frame #3: pthread_cond_timedwait + 0x23b (0x7ff4336cce9b in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3954091Z frame #4: + 0x218d9a (0x7ff387fadd9a in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:44:44.3954495Z frame #5: + 0x2b4bef (0x7ff388049bef in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:44:44.3954817Z frame #6: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3955133Z frame #7: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3955256Z 2024-03-27T20:44:44.3963976Z SIGSEGV(11), PID: 6887, Thread 7024: 2024-03-27T20:44:44.3969456Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3972193Z frame #1: c10::FatalSignalHandler::fatalSignalHandler(int) + 0x152 (0x7ff4329879c2 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3974109Z frame #2: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3975181Z frame #3: c10::cuda::CUDAKernelLaunchRegistry::has_failed() const + 0x19 (0x7ff432df3eb9 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so) 2024-03-27T20:44:44.3976333Z frame #4: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3d (0x7ff432df47cd in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so) 2024-03-27T20:44:44.3977389Z frame #5: c10::cuda::ExchangeDevice(int) + 0x8a (0x7ff432df4d0a in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so) 2024-03-27T20:44:44.3979090Z frame #6: std::_Sp_counted_ptr_inplace >, std::allocator > >, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x66 (0x7ff3e840cc76 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) 2024-03-27T20:44:44.3980528Z frame #7: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x48 (0x7ff431191cc8 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so) 2024-03-27T20:44:44.3981640Z frame #8: c10d::ProcessGroupNCCL::WorkNCCL::~WorkNCCL() + 0x135 (0x7ff3e83d7315 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) 2024-03-27T20:44:44.3982667Z frame #9: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x845 (0x7ff3e83da2e5 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) 2024-03-27T20:44:44.3983915Z frame #10: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7ff3e83da839 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) 2024-03-27T20:44:44.3984857Z frame #11: + 0xdc253 (0x7ff4320b0253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) 2024-03-27T20:44:44.3985761Z frame #12: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3986612Z frame #13: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3987034Z 2024-03-27T20:44:44.3987482Z SIGSEGV(11), PID: 6887, Thread 7030: 2024-03-27T20:44:44.3988394Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3989349Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3990167Z frame #2: __poll + 0x4f (0x7ff433751bcf in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3991109Z frame #3: + 0x69afc (0x7ff38cc69afc in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/nccl/lib/libnccl.so.2) 2024-03-27T20:44:44.3992011Z frame #4: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3992853Z frame #5: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3993940Z 2024-03-27T20:44:44.3994460Z SIGSEGV(11), PID: 6887, Thread 7032: 2024-03-27T20:44:44.3995403Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7ff432987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:44:44.3996553Z frame #1: + 0x42520 (0x7ff43367b520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3997392Z frame #2: + 0x91117 (0x7ff4336ca117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3998380Z frame #3: pthread_cond_wait + 0x211 (0x7ff4336cca41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.3999326Z frame #4: + 0x6791f (0x7ff38cc6791f in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/nccl/lib/libnccl.so.2) 2024-03-27T20:44:44.4000253Z frame #5: + 0x94ac3 (0x7ff4336cdac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.4001166Z frame #6: clone + 0x44 (0x7ff43375ea04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:44:44.4001592Z 2024-03-27T20:44:49.6161499Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero3 status >>> 1 2024-03-27T20:44:49.6172656Z ============================= test session starts ============================== 2024-03-27T20:44:49.6174101Z platform linux -- Python 3.10.12, pytest-8.0.2, pluggy-1.4.0 -- /usr/bin/python 2024-03-27T20:44:49.6174776Z cachedir: .pytest_cache 2024-03-27T20:44:49.6175771Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/__w/5/s/.hypothesis/examples')) 2024-03-27T20:44:49.6176694Z Test order randomisation NOT enabled. Enable with --random-order or --random-order-bucket= 2024-03-27T20:44:49.6177162Z rootdir: /__w/5/s 2024-03-27T20:44:49.6177607Z configfile: pyproject.toml 2024-03-27T20:44:49.6178483Z plugins: hypothesis-6.99.10, timeout-2.2.0, cov-4.1.0, timestamper-0.0.9, random-order-1.1.1, xdist-3.5.0 2024-03-27T20:44:49.6179242Z timeout: 900.0s 2024-03-27T20:44:49.6179702Z timeout method: signal 2024-03-27T20:44:49.6180079Z timeout func_only: False 2024-03-27T20:44:49.6180568Z collecting ... collected 1 item 2024-03-27T20:44:49.6180926Z 2024-03-27T20:44:49.6181978Z [2024-03-27 20:44:37] thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero3 Process 1 terminated with exit code 10, terminating remaining processes. 2024-03-27T20:44:49.6182738Z FAILED 2024-03-27T20:44:49.6182912Z 2024-03-27T20:44:49.6183316Z =================================== FAILURES =================================== 2024-03-27T20:44:49.6184074Z _ CompileDDPTest.test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero3 _ 2024-03-27T20:44:49.6184892Z /usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py:533: in wrapper 2024-03-27T20:44:49.6185373Z self._join_processes(fn) 2024-03-27T20:44:49.6186085Z /usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py:752: in _join_processes 2024-03-27T20:44:49.6186616Z self._check_return_codes(elapsed_time) 2024-03-27T20:44:49.6187034Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2024-03-27T20:44:49.6187261Z 2024-03-27T20:44:49.6187766Z self = 2024-03-27T20:44:49.6188312Z elapsed_time = 6.919006586074829 2024-03-27T20:44:49.6188508Z 2024-03-27T20:44:49.6188993Z def _check_return_codes(self, elapsed_time) -> None: 2024-03-27T20:44:49.6189364Z """ 2024-03-27T20:44:49.6189738Z Checks that the return codes of all spawned processes match, and skips 2024-03-27T20:44:49.6190211Z tests if they returned a return code indicating a skipping condition. 2024-03-27T20:44:49.6191170Z """ 2024-03-27T20:44:49.6191537Z # If no processes are spawned, there is nothing to check. 2024-03-27T20:44:49.6192073Z if not self.processes: 2024-03-27T20:44:49.6192504Z logger.warning("Note: no subprocesses were spawned, test was likely skipped.") 2024-03-27T20:44:49.6192914Z return 2024-03-27T20:44:49.6193210Z 2024-03-27T20:44:49.6193528Z first_process = self.processes[0] 2024-03-27T20:44:49.6193930Z # first, we check if there are errors in actual processes 2024-03-27T20:44:49.6194371Z # (via TEST_ERROR_EXIT CODE), and raise an exception for those. 2024-03-27T20:44:49.6194824Z # the reason we do this is to attempt to raise a more helpful error 2024-03-27T20:44:49.6195254Z # message than "Process x terminated/timed out" 2024-03-27T20:44:49.6195680Z # TODO: we should pipe the exception of the failed subprocess here. 2024-03-27T20:44:49.6196147Z # Currently, the actual exception is displayed as a logging output. 2024-03-27T20:44:49.6196542Z errored_processes = [ 2024-03-27T20:44:49.6196868Z (i, p) 2024-03-27T20:44:49.6197222Z for i, p in enumerate(self.processes) 2024-03-27T20:44:49.6197648Z if p.exitcode == MultiProcessTestCase.TEST_ERROR_EXIT_CODE 2024-03-27T20:44:49.6198016Z ] 2024-03-27T20:44:49.6198331Z if errored_processes: 2024-03-27T20:44:49.6198646Z error = "" 2024-03-27T20:44:49.6198999Z for i, process in errored_processes: 2024-03-27T20:44:49.6199364Z # Get error from pipe. 2024-03-27T20:44:49.6199759Z error_message = self.pid_to_pipe[process.pid].recv() 2024-03-27T20:44:49.6200124Z error += ( 2024-03-27T20:44:49.6200517Z "Process {} exited with error code {} and exception:\n{}\n".format( 2024-03-27T20:44:49.6200981Z i, MultiProcessTestCase.TEST_ERROR_EXIT_CODE, error_message 2024-03-27T20:44:49.6201372Z ) 2024-03-27T20:44:49.6201670Z ) 2024-03-27T20:44:49.6201949Z 2024-03-27T20:44:49.6202266Z > raise RuntimeError(error) 2024-03-27T20:44:49.6202897Z E RuntimeError: Process 1 exited with error code 10 and exception: 2024-03-27T20:44:49.6203491Z E Traceback (most recent call last): 2024-03-27T20:44:49.6204255Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:44:49.6204903Z E getattr(self, test_name)() 2024-03-27T20:44:49.6205658Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:44:49.6206226Z E fn() 2024-03-27T20:44:49.6206940Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:44:49.6207567Z E method(*args, **kwargs) 2024-03-27T20:44:49.6208339Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 440, in instantiated_test 2024-03-27T20:44:49.6208988Z E test(self, **param_kwargs) 2024-03-27T20:44:49.6209741Z E File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 712, in test_fsdp_grad_parity_with_without_bucketing 2024-03-27T20:44:49.6210543Z E self.assertTrue(has_pack_multiple_tensors, msg=f"{[bsym.args[0] for bsym in pack_bsyms]=}") 2024-03-27T20:44:49.6211264Z E File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue 2024-03-27T20:44:49.6211846Z E raise self.failureException(msg) 2024-03-27T20:44:49.6212657Z E AssertionError: False is not true : [bsym.args[0] for bsym in pack_bsyms]=[[t_net1_bias], [t_net1_weight], [t_net2_bias], [t_net2_weight]] 2024-03-27T20:44:49.6213385Z E  2024-03-27T20:44:49.6214066Z E To execute this test, run the following from the base repo dir: 2024-03-27T20:44:49.6214854Z E python test_ddp.py -k test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero3 2024-03-27T20:44:49.6215401Z E  2024-03-27T20:44:49.6216006Z E This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:44:49.6216290Z 2024-03-27T20:44:49.6216943Z /usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py:802: RuntimeError 2024-03-27T20:44:49.6217949Z - generated xml file: /__w/5/s/thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero3-results.xml - 2024-03-27T20:44:49.6218948Z =========================== short test summary info ============================ 2024-03-27T20:44:49.6220054Z FAILED thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero3 - RuntimeError: Process 1 exited with error code 10 and exception: 2024-03-27T20:44:49.6220740Z Traceback (most recent call last): 2024-03-27T20:44:49.6221437Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:44:49.6221908Z getattr(self, test_name)() 2024-03-27T20:44:49.6222590Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:44:49.6223038Z fn() 2024-03-27T20:44:49.6223671Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:44:49.6224121Z method(*args, **kwargs) 2024-03-27T20:44:49.6224807Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 440, in instantiated_test 2024-03-27T20:44:49.6225287Z test(self, **param_kwargs) 2024-03-27T20:44:49.6225769Z File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 712, in test_fsdp_grad_parity_with_without_bucketing 2024-03-27T20:44:49.6226326Z self.assertTrue(has_pack_multiple_tensors, msg=f"{[bsym.args[0] for bsym in pack_bsyms]=}") 2024-03-27T20:44:49.6226812Z File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue 2024-03-27T20:44:49.6227232Z raise self.failureException(msg) 2024-03-27T20:44:49.6227735Z AssertionError: False is not true : [bsym.args[0] for bsym in pack_bsyms]=[[t_net1_bias], [t_net1_weight], [t_net2_bias], [t_net2_weight]] 2024-03-27T20:44:49.6228052Z 2024-03-27T20:44:49.6228412Z To execute this test, run the following from the base repo dir: 2024-03-27T20:44:49.6229115Z python test_ddp.py -k test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_block_zero3 2024-03-27T20:44:49.6229422Z 2024-03-27T20:44:49.6229803Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:44:49.6230545Z ============================== 1 failed in 9.48s =============================== 2024-03-27T20:45:00.5122833Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_layer_zero2 status >>> 0 2024-03-27T20:45:10.8983030Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_nvfuser_bucketing_layer_zero3 status >>> 0 2024-03-27T20:45:18.3898965Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] Caught exception: 2024-03-27T20:45:18.3899834Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] Traceback (most recent call last): 2024-03-27T20:45:18.3904845Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:45:18.3907537Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] getattr(self, test_name)() 2024-03-27T20:45:18.3908302Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:45:18.3908945Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] fn() 2024-03-27T20:45:18.3909659Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:45:18.3910313Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] method(*args, **kwargs) 2024-03-27T20:45:18.3911130Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 440, in instantiated_test 2024-03-27T20:45:18.3911761Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] test(self, **param_kwargs) 2024-03-27T20:45:18.3912456Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 712, in test_fsdp_grad_parity_with_without_bucketing 2024-03-27T20:45:18.3913184Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] self.assertTrue(has_pack_multiple_tensors, msg=f"{[bsym.args[0] for bsym in pack_bsyms]=}") 2024-03-27T20:45:18.3913881Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue 2024-03-27T20:45:18.3914482Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] raise self.failureException(msg) 2024-03-27T20:45:18.3915227Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] AssertionError: False is not true : [bsym.args[0] for bsym in pack_bsyms]=[[t_net1_bias], [t_net2_bias], [t_net1_weight], [t_net2_weight]] 2024-03-27T20:45:18.3915797Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:45:18.3916380Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] To execute this test, run the following from the base repo dir: 2024-03-27T20:45:18.3917135Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] python test_ddp.py -k test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero2 2024-03-27T20:45:18.3917719Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:45:18.3918369Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:45:18.3918945Z [rank0]:[2024-03-27 20:45:18,388] torch.testing._internal.common_distributed: [ERROR] exiting process 0 with exit code: 10 2024-03-27T20:45:18.3940622Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] Caught exception: 2024-03-27T20:45:18.3941211Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] Traceback (most recent call last): 2024-03-27T20:45:18.3942007Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:45:18.3942786Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] getattr(self, test_name)() 2024-03-27T20:45:18.3945256Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:45:18.3946010Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] fn() 2024-03-27T20:45:18.3946655Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:45:18.3947248Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] method(*args, **kwargs) 2024-03-27T20:45:18.3947974Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 440, in instantiated_test 2024-03-27T20:45:18.3948583Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] test(self, **param_kwargs) 2024-03-27T20:45:18.3949258Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 712, in test_fsdp_grad_parity_with_without_bucketing 2024-03-27T20:45:18.3949961Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] self.assertTrue(has_pack_multiple_tensors, msg=f"{[bsym.args[0] for bsym in pack_bsyms]=}") 2024-03-27T20:45:18.3950609Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue 2024-03-27T20:45:18.3951188Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] raise self.failureException(msg) 2024-03-27T20:45:18.3951890Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] AssertionError: False is not true : [bsym.args[0] for bsym in pack_bsyms]=[[t_net1_bias], [t_net2_bias], [t_net1_weight], [t_net2_weight]] 2024-03-27T20:45:18.3952493Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:45:18.3953078Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] To execute this test, run the following from the base repo dir: 2024-03-27T20:45:18.3953778Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] python test_ddp.py -k test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero2 2024-03-27T20:45:18.3954337Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:45:18.3954929Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:45:18.3955524Z [rank1]:[2024-03-27 20:45:18,393] torch.testing._internal.common_distributed: [ERROR] exiting process 1 with exit code: 10 2024-03-27T20:45:18.9994209Z SIGSEGV(11), PID: 7571, Thread 7571: 2024-03-27T20:45:18.9995787Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:18.9997236Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:18.9998157Z frame #2: c10::Dispatcher::deregisterFallback_(c10::DispatchKey) + 0x44f (0x7f611337484f in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so) 2024-03-27T20:45:18.9999087Z frame #3: + 0x1a21ccd (0x7f6113374ccd in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so) 2024-03-27T20:45:18.9999982Z frame #4: + 0x1790a2d (0x7f61130e3a2d in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so) 2024-03-27T20:45:19.0000813Z frame #5: + 0x45495 (0x7f612b5f8495 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0002701Z frame #6: on_exit + 0 (0x7f612b5f8610 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0003223Z frame #7: + 0x2755fb (0x5562846c35fb in /usr/bin/python) 2024-03-27T20:45:19.0003948Z frame #8: + 0x262b6f (0x5562846b0b6f in /usr/bin/python) 2024-03-27T20:45:19.0004287Z frame #9: PyErr_PrintEx + 0x1d (0x5562846b091d in /usr/bin/python) 2024-03-27T20:45:19.0004647Z frame #10: PyRun_SimpleStringFlags + 0x72 (0x5562846a0992 in /usr/bin/python) 2024-03-27T20:45:19.0005280Z frame #11: Py_RunMain + 0x375 (0x55628469fb15 in /usr/bin/python) 2024-03-27T20:45:19.0005692Z frame #12: Py_BytesMain + 0x2d (0x55628467602d in /usr/bin/python) 2024-03-27T20:45:19.0006272Z frame #13: + 0x29d90 (0x7f612b5dcd90 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0006859Z frame #14: __libc_start_main + 0x80 (0x7f612b5dce40 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0007288Z frame #15: _start + 0x25 (0x556284675f25 in /usr/bin/python) 2024-03-27T20:45:19.0007488Z 2024-03-27T20:45:19.0007731Z SIGSEGV(11), PID: 7571, Thread 7572: 2024-03-27T20:45:19.0008421Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0009102Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0009703Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0010265Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0011000Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0011712Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0012273Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0012481Z 2024-03-27T20:45:19.0012758Z SIGSEGV(11), PID: 7571, Thread 7573: 2024-03-27T20:45:19.0013646Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0014493Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0015100Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0015669Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0018110Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0018925Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0019518Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0019907Z 2024-03-27T20:45:19.0020408Z SIGSEGV(11), PID: 7571, Thread 7574: 2024-03-27T20:45:19.0021229Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0021928Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0022516Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0023076Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0023812Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0024698Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0025529Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0027677Z 2024-03-27T20:45:19.0027956Z SIGSEGV(11), PID: 7571, Thread 7575: 2024-03-27T20:45:19.0028610Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0029278Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0029850Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0030438Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0031176Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0031858Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0032643Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0033003Z 2024-03-27T20:45:19.0033353Z SIGSEGV(11), PID: 7571, Thread 7576: 2024-03-27T20:45:19.0034026Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0034704Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0035273Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0035860Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0036602Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0037488Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0038080Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0038291Z 2024-03-27T20:45:19.0038585Z SIGSEGV(11), PID: 7571, Thread 7577: 2024-03-27T20:45:19.0039246Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0039910Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0040469Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0041054Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0041991Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0042689Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0043258Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0043476Z 2024-03-27T20:45:19.0043781Z SIGSEGV(11), PID: 7571, Thread 7578: 2024-03-27T20:45:19.0044482Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0045155Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0045715Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0046299Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0047185Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0048013Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0048804Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0049164Z 2024-03-27T20:45:19.0049514Z SIGSEGV(11), PID: 7571, Thread 7579: 2024-03-27T20:45:19.0050628Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0051345Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0051909Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0052488Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0053241Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0053919Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0054471Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0054681Z 2024-03-27T20:45:19.0059137Z SIGSEGV(11), PID: 7571, Thread 7580: 2024-03-27T20:45:19.0060859Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0061613Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0062211Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0062828Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0063565Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0064268Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0064835Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0065053Z 2024-03-27T20:45:19.0066423Z SIGSEGV(11), PID: 7571, Thread 7581: 2024-03-27T20:45:19.0067170Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0067896Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0068492Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0069094Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0069838Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0070514Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0071087Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0071333Z 2024-03-27T20:45:19.0071892Z SIGSEGV(11), PID: 7571, Thread 7582: 2024-03-27T20:45:19.0072685Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0073486Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0074614Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0075210Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0076165Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0076848Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0077419Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0077654Z 2024-03-27T20:45:19.0078387Z SIGSEGV(11), PID: 7571, Thread 7583: 2024-03-27T20:45:19.0079092Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0079786Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0080392Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0080984Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0081719Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0082383Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0082946Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0083157Z 2024-03-27T20:45:19.0085758Z SIGSEGV(11), PID: 7571, Thread 7584: 2024-03-27T20:45:19.0086575Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0087291Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0087888Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0088480Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0089202Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0090136Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0090712Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0090947Z 2024-03-27T20:45:19.0095765Z SIGSEGV(11), PID: 7571, Thread 7585: 2024-03-27T20:45:19.0096594Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0097221Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0097747Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0098270Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0099042Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0099637Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0100131Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0100347Z 2024-03-27T20:45:19.0106547Z SIGSEGV(11), PID: 7571, Thread 7586: 2024-03-27T20:45:19.0108062Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0109321Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0110232Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0110997Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0111958Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0112830Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0113580Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0113873Z 2024-03-27T20:45:19.0116578Z SIGSEGV(11), PID: 7571, Thread 7587: 2024-03-27T20:45:19.0117742Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0118651Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0119424Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0120207Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0121169Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0122025Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0122983Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0123311Z 2024-03-27T20:45:19.0123692Z SIGSEGV(11), PID: 7571, Thread 7588: 2024-03-27T20:45:19.0124577Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0125684Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0126450Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0127205Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0128149Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0129012Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0129726Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0130014Z 2024-03-27T20:45:19.0130855Z SIGSEGV(11), PID: 7571, Thread 7589: 2024-03-27T20:45:19.0131884Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0132802Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0133571Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0134325Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0135279Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0136145Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0136863Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0137434Z 2024-03-27T20:45:19.0138263Z SIGSEGV(11), PID: 7571, Thread 7590: 2024-03-27T20:45:19.0139382Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0140552Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0141327Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0142080Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0143034Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0144164Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0144907Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0145205Z 2024-03-27T20:45:19.0147306Z SIGSEGV(11), PID: 7571, Thread 7591: 2024-03-27T20:45:19.0148467Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0149456Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0150311Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0151282Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0152251Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0153115Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0153835Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0154119Z 2024-03-27T20:45:19.0154488Z SIGSEGV(11), PID: 7571, Thread 7592: 2024-03-27T20:45:19.0155364Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0156432Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0157259Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0158015Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0158960Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0159813Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0160536Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0160819Z 2024-03-27T20:45:19.0161248Z SIGSEGV(11), PID: 7571, Thread 7593: 2024-03-27T20:45:19.0162332Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0163216Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0163969Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0164718Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0165649Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0166788Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0167930Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0168388Z 2024-03-27T20:45:19.0168793Z SIGSEGV(11), PID: 7571, Thread 7594: 2024-03-27T20:45:19.0169857Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0170732Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0171495Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0172244Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0173182Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0174039Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0174753Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0175042Z 2024-03-27T20:45:19.0175371Z SIGSEGV(11), PID: 7571, Thread 7595: 2024-03-27T20:45:19.0176526Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0177477Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0178234Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0179173Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0180159Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0181018Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0181736Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0181999Z 2024-03-27T20:45:19.0182388Z SIGSEGV(11), PID: 7571, Thread 7596: 2024-03-27T20:45:19.0183245Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0184285Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0185193Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0185941Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0186894Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0187754Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0188476Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0188739Z 2024-03-27T20:45:19.0189128Z SIGSEGV(11), PID: 7571, Thread 7597: 2024-03-27T20:45:19.0189988Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0190837Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0191589Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0192498Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0193855Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0194832Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0195548Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0195811Z 2024-03-27T20:45:19.0196185Z SIGSEGV(11), PID: 7571, Thread 7598: 2024-03-27T20:45:19.0197046Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0198175Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0199070Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0199879Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0201012Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0201996Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0203028Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0203365Z 2024-03-27T20:45:19.0203791Z SIGSEGV(11), PID: 7571, Thread 7599: 2024-03-27T20:45:19.0204688Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0205545Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0206300Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0207207Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0208374Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0209253Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0209967Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0210230Z 2024-03-27T20:45:19.0210575Z SIGSEGV(11), PID: 7571, Thread 7600: 2024-03-27T20:45:19.0211433Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0212368Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0213757Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0214550Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0215506Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0216368Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0217081Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0217348Z 2024-03-27T20:45:19.0217753Z SIGSEGV(11), PID: 7571, Thread 7601: 2024-03-27T20:45:19.0218610Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0219717Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0220685Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0221864Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0223065Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0223889Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0224402Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0224589Z 2024-03-27T20:45:19.0224824Z SIGSEGV(11), PID: 7571, Thread 7602: 2024-03-27T20:45:19.0225409Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0226112Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0226660Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0227194Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0227849Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0228425Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0228913Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0229093Z 2024-03-27T20:45:19.0229374Z SIGSEGV(11), PID: 7571, Thread 7603: 2024-03-27T20:45:19.0229957Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0230585Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0231249Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0232125Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0232853Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0233480Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0233960Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0234141Z 2024-03-27T20:45:19.0234378Z SIGSEGV(11), PID: 7571, Thread 7604: 2024-03-27T20:45:19.0234957Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0235523Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0236032Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0236619Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0237344Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0237922Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0238387Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0238589Z 2024-03-27T20:45:19.0238870Z SIGSEGV(11), PID: 7571, Thread 7605: 2024-03-27T20:45:19.0239440Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0240194Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0240770Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0241319Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0241991Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0242565Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0243030Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0243227Z 2024-03-27T20:45:19.0243465Z SIGSEGV(11), PID: 7571, Thread 7606: 2024-03-27T20:45:19.0244157Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0244771Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0245295Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0245808Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0246448Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0247021Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0247484Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0247680Z 2024-03-27T20:45:19.0247943Z SIGSEGV(11), PID: 7571, Thread 7607: 2024-03-27T20:45:19.0248575Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0249217Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0250005Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0250525Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0251169Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0251741Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0252205Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0252403Z 2024-03-27T20:45:19.0252668Z SIGSEGV(11), PID: 7571, Thread 7608: 2024-03-27T20:45:19.0253238Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0253820Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0254366Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0254983Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0255628Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0256186Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0256666Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0257046Z 2024-03-27T20:45:19.0257300Z SIGSEGV(11), PID: 7571, Thread 7609: 2024-03-27T20:45:19.0257887Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0258537Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0259413Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0260029Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0260688Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0261249Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0261730Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0261932Z 2024-03-27T20:45:19.0262184Z SIGSEGV(11), PID: 7571, Thread 7610: 2024-03-27T20:45:19.0262765Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0263345Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0263855Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0264360Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0265004Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0265563Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0266044Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0266241Z 2024-03-27T20:45:19.0268129Z SIGSEGV(11), PID: 7571, Thread 7611: 2024-03-27T20:45:19.0268831Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0269418Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0269928Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0270432Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0271059Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0271632Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0272200Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0272444Z 2024-03-27T20:45:19.0272736Z SIGSEGV(11), PID: 7571, Thread 7612: 2024-03-27T20:45:19.0273330Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0273895Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0274411Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0274902Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0275514Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0276077Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0276837Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0277112Z 2024-03-27T20:45:19.0278311Z SIGSEGV(11), PID: 7571, Thread 7613: 2024-03-27T20:45:19.0279014Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0279594Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0280110Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0280616Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0281245Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0281825Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0282316Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0282519Z 2024-03-27T20:45:19.0284190Z SIGSEGV(11), PID: 7571, Thread 7614: 2024-03-27T20:45:19.0284901Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0285930Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0286466Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0286981Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0287611Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0288193Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0288680Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0288876Z 2024-03-27T20:45:19.0290938Z SIGSEGV(11), PID: 7571, Thread 7615: 2024-03-27T20:45:19.0291585Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0292188Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0292701Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0293211Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0293839Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0294425Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0294917Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0295111Z 2024-03-27T20:45:19.0296791Z SIGSEGV(11), PID: 7571, Thread 7616: 2024-03-27T20:45:19.0297522Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0298143Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0298821Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0299365Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0300189Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0300848Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0301346Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0301539Z 2024-03-27T20:45:19.0304175Z SIGSEGV(11), PID: 7571, Thread 7617: 2024-03-27T20:45:19.0304827Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0305432Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0305945Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0306453Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0307106Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0307674Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0308158Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0308354Z 2024-03-27T20:45:19.0312529Z SIGSEGV(11), PID: 7571, Thread 7618: 2024-03-27T20:45:19.0313329Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0314055Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0314657Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0315249Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0315993Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0316689Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0317254Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0317492Z 2024-03-27T20:45:19.0318036Z SIGSEGV(11), PID: 7571, Thread 7619: 2024-03-27T20:45:19.0318768Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0319456Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0320084Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0320692Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0321444Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0322106Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0322670Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0322913Z 2024-03-27T20:45:19.0326952Z SIGSEGV(11), PID: 7571, Thread 7620: 2024-03-27T20:45:19.0327733Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0328463Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0329065Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0329935Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0330775Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0331442Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0332000Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0332233Z 2024-03-27T20:45:19.0333943Z SIGSEGV(11), PID: 7571, Thread 7621: 2024-03-27T20:45:19.0334676Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0335394Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0336000Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0336590Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0337341Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0337993Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0338560Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0338948Z 2024-03-27T20:45:19.0340314Z SIGSEGV(11), PID: 7571, Thread 7622: 2024-03-27T20:45:19.0341094Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0341860Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0342469Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0343063Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0343817Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0344468Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0345028Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0345262Z 2024-03-27T20:45:19.0348695Z SIGSEGV(11), PID: 7571, Thread 7623: 2024-03-27T20:45:19.0349436Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0350174Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0350766Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0351361Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0352103Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0352750Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0353311Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0353544Z 2024-03-27T20:45:19.0355494Z SIGSEGV(11), PID: 7571, Thread 7624: 2024-03-27T20:45:19.0356240Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0357194Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0358087Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0358890Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0359619Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0360304Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0360860Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0361098Z 2024-03-27T20:45:19.0363219Z SIGSEGV(11), PID: 7571, Thread 7625: 2024-03-27T20:45:19.0363995Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0364734Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0365341Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0365927Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0366655Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0367333Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0367900Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0368139Z 2024-03-27T20:45:19.0373131Z SIGSEGV(11), PID: 7571, Thread 7626: 2024-03-27T20:45:19.0373917Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0374623Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0375215Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0375816Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0376832Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0377494Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0378063Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0378297Z 2024-03-27T20:45:19.0378828Z SIGSEGV(11), PID: 7571, Thread 7627: 2024-03-27T20:45:19.0379578Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0380283Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0380871Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0381455Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0382203Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0382887Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0383428Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0383657Z 2024-03-27T20:45:19.0384203Z SIGSEGV(11), PID: 7571, Thread 7628: 2024-03-27T20:45:19.0384894Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0385693Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0386285Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0386881Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0387599Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0388280Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0388847Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0389089Z 2024-03-27T20:45:19.0401016Z SIGSEGV(11), PID: 7571, Thread 7629: 2024-03-27T20:45:19.0402524Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0403212Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0403700Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0404187Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0404958Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0405645Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0406105Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0406307Z 2024-03-27T20:45:19.0406522Z SIGSEGV(11), PID: 7571, Thread 7630: 2024-03-27T20:45:19.0407060Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0407597Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0408074Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0408530Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0409208Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0409857Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0410349Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0410523Z 2024-03-27T20:45:19.0410750Z SIGSEGV(11), PID: 7571, Thread 7631: 2024-03-27T20:45:19.0411475Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0412122Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0412600Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0413057Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0413649Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0414175Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0426502Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0426900Z 2024-03-27T20:45:19.0427203Z SIGSEGV(11), PID: 7571, Thread 7632: 2024-03-27T20:45:19.0427771Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0428309Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0428786Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0429322Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0430128Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0430686Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0431135Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0431313Z 2024-03-27T20:45:19.0431541Z SIGSEGV(11), PID: 7571, Thread 7633: 2024-03-27T20:45:19.0432085Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0432620Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0433088Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0433548Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0434145Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0434693Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0435144Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0435313Z 2024-03-27T20:45:19.0435526Z SIGSEGV(11), PID: 7571, Thread 7634: 2024-03-27T20:45:19.0436063Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0436598Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0437057Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0437514Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0438110Z frame #4: + 0x3506fb (0x7f607a0036fb in /usr/local/lib/python3.10/dist-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so) 2024-03-27T20:45:19.0438651Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0439119Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0439288Z 2024-03-27T20:45:19.0439513Z SIGSEGV(11), PID: 7571, Thread 7698: 2024-03-27T20:45:19.0440053Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0440577Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0441025Z frame #2: __poll + 0x4f (0x7f612b6cbbcf in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0441495Z frame #3: + 0x2b9cbf (0x7f608004ecbf in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:45:19.0442805Z frame #4: + 0x37ee2f (0x7f6080113e2f in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:45:19.0443502Z frame #5: + 0x2b4bef (0x7f6080049bef in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:45:19.0444046Z frame #6: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0444482Z frame #7: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0444663Z 2024-03-27T20:45:19.0444875Z SIGSEGV(11), PID: 7571, Thread 7700: 2024-03-27T20:45:19.0445415Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0445954Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0446399Z frame #2: __poll + 0x4f (0x7f612b6cbbcf in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0446867Z frame #3: + 0x2b9cbf (0x7f608004ecbf in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:45:19.0447346Z frame #4: + 0x37ee2f (0x7f6080113e2f in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:45:19.0447838Z frame #5: + 0x2b4bef (0x7f6080049bef in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:45:19.0448314Z frame #6: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0448760Z frame #7: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0448932Z 2024-03-27T20:45:19.0449179Z SIGSEGV(11), PID: 7571, Thread 7702: 2024-03-27T20:45:19.0449727Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0450264Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0450737Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0451209Z frame #3: pthread_cond_timedwait + 0x23b (0x7f612b646e9b in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0451703Z frame #4: + 0x218d9a (0x7f607ffadd9a in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:45:19.0452183Z frame #5: + 0x2b4bef (0x7f6080049bef in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:45:19.0452657Z frame #6: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0453149Z frame #7: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0453320Z 2024-03-27T20:45:19.0454427Z SIGSEGV(11), PID: 7571, Thread 7706: 2024-03-27T20:45:19.0456002Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0457251Z frame #1: c10::FatalSignalHandler::fatalSignalHandler(int) + 0x152 (0x7f612a9879c2 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0457926Z frame #2: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0458899Z frame #3: c10::cuda::CUDAKernelLaunchRegistry::has_failed() const + 0x19 (0x7f612ad6deb9 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so) 2024-03-27T20:45:19.0460010Z frame #4: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3d (0x7f612ad6e7cd in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so) 2024-03-27T20:45:19.0460845Z frame #5: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x6c (0x7f60e03d23ac in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) 2024-03-27T20:45:19.0461631Z frame #6: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7f60e03d64c8 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) 2024-03-27T20:45:19.0462912Z frame #7: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x15a (0x7f60e03d9bfa in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) 2024-03-27T20:45:19.0463634Z frame #8: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7f60e03da839 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) 2024-03-27T20:45:19.0464449Z frame #9: + 0xdc253 (0x7f612a0b0253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) 2024-03-27T20:45:19.0465000Z frame #10: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0465528Z frame #11: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0465740Z 2024-03-27T20:45:19.0466072Z SIGSEGV(11), PID: 7571, Thread 7708: 2024-03-27T20:45:19.0466713Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0467349Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0467881Z frame #2: __poll + 0x4f (0x7f612b6cbbcf in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0468446Z frame #3: + 0x2b9cbf (0x7f608004ecbf in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:45:19.0469301Z frame #4: + 0x37ee2f (0x7f6080113e2f in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:45:19.0469889Z frame #5: + 0x2b4bef (0x7f6080049bef in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:45:19.0470489Z frame #6: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0471121Z frame #7: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0471422Z 2024-03-27T20:45:19.0471696Z SIGSEGV(11), PID: 7571, Thread 7711: 2024-03-27T20:45:19.0472369Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0473044Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0473575Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0474068Z frame #3: pthread_cond_timedwait + 0x23b (0x7f612b646e9b in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0474592Z frame #4: + 0x218d9a (0x7f607ffadd9a in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:45:19.0475113Z frame #5: + 0x2b4bef (0x7f6080049bef in /usr/lib/x86_64-linux-gnu/libcuda.so.1) 2024-03-27T20:45:19.0475736Z frame #6: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0476286Z frame #7: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0476478Z 2024-03-27T20:45:19.0477040Z SIGSEGV(11), PID: 7571, Thread 7714: 2024-03-27T20:45:19.0477683Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0478319Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0478884Z frame #2: __poll + 0x4f (0x7f612b6cbbcf in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0479515Z frame #3: + 0x69afc (0x7f6084c69afc in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/nccl/lib/libnccl.so.2) 2024-03-27T20:45:19.0480104Z frame #4: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0480668Z frame #5: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0480860Z 2024-03-27T20:45:19.0481151Z SIGSEGV(11), PID: 7571, Thread 7715: 2024-03-27T20:45:19.0481771Z frame #0: c10::FatalSignalHandler::stacktraceSignalHandler(bool) + 0x72 (0x7f612a987522 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) 2024-03-27T20:45:19.0482602Z frame #1: + 0x42520 (0x7f612b5f5520 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0483245Z frame #2: + 0x91117 (0x7f612b644117 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0483805Z frame #3: pthread_cond_wait + 0x211 (0x7f612b646a41 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0484464Z frame #4: + 0x6791f (0x7f6084c6791f in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/nccl/lib/libnccl.so.2) 2024-03-27T20:45:19.0485057Z frame #5: + 0x94ac3 (0x7f612b647ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0485574Z frame #6: clone + 0x44 (0x7f612b6d8a04 in /usr/lib/x86_64-linux-gnu/libc.so.6) 2024-03-27T20:45:19.0485763Z 2024-03-27T20:45:24.2408948Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero2 status >>> 1 2024-03-27T20:45:24.2417063Z ============================= test session starts ============================== 2024-03-27T20:45:24.2417732Z platform linux -- Python 3.10.12, pytest-8.0.2, pluggy-1.4.0 -- /usr/bin/python 2024-03-27T20:45:24.2418315Z cachedir: .pytest_cache 2024-03-27T20:45:24.2419126Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/__w/5/s/.hypothesis/examples')) 2024-03-27T20:45:24.2419851Z Test order randomisation NOT enabled. Enable with --random-order or --random-order-bucket= 2024-03-27T20:45:24.2420206Z rootdir: /__w/5/s 2024-03-27T20:45:24.2420507Z configfile: pyproject.toml 2024-03-27T20:45:24.2421092Z plugins: hypothesis-6.99.10, timeout-2.2.0, cov-4.1.0, timestamper-0.0.9, random-order-1.1.1, xdist-3.5.0 2024-03-27T20:45:24.2421642Z timeout: 900.0s 2024-03-27T20:45:24.2421907Z timeout method: signal 2024-03-27T20:45:24.2422215Z timeout func_only: False 2024-03-27T20:45:24.2422617Z collecting ... collected 1 item 2024-03-27T20:45:24.2422783Z 2024-03-27T20:45:24.2423664Z [2024-03-27 20:45:13] thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero2 Process 0 terminated with exit code 10, terminating remaining processes. 2024-03-27T20:45:24.2424229Z FAILED 2024-03-27T20:45:24.2424384Z 2024-03-27T20:45:24.2424635Z =================================== FAILURES =================================== 2024-03-27T20:45:24.2425110Z _ CompileDDPTest.test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero2 _ 2024-03-27T20:45:24.2425621Z /usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py:533: in wrapper 2024-03-27T20:45:24.2425916Z self._join_processes(fn) 2024-03-27T20:45:24.2426361Z /usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py:752: in _join_processes 2024-03-27T20:45:24.2426694Z self._check_return_codes(elapsed_time) 2024-03-27T20:45:24.2426962Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2024-03-27T20:45:24.2427098Z 2024-03-27T20:45:24.2427420Z self = 2024-03-27T20:45:24.2427765Z elapsed_time = 5.710801839828491 2024-03-27T20:45:24.2427885Z 2024-03-27T20:45:24.2428203Z def _check_return_codes(self, elapsed_time) -> None: 2024-03-27T20:45:24.2428438Z """ 2024-03-27T20:45:24.2428673Z Checks that the return codes of all spawned processes match, and skips 2024-03-27T20:45:24.2428971Z tests if they returned a return code indicating a skipping condition. 2024-03-27T20:45:24.2429211Z """ 2024-03-27T20:45:24.2429442Z # If no processes are spawned, there is nothing to check. 2024-03-27T20:45:24.2429681Z if not self.processes: 2024-03-27T20:45:24.2429962Z logger.warning("Note: no subprocesses were spawned, test was likely skipped.") 2024-03-27T20:45:24.2430734Z return 2024-03-27T20:45:24.2430922Z 2024-03-27T20:45:24.2431289Z first_process = self.processes[0] 2024-03-27T20:45:24.2431550Z # first, we check if there are errors in actual processes 2024-03-27T20:45:24.2431830Z # (via TEST_ERROR_EXIT CODE), and raise an exception for those. 2024-03-27T20:45:24.2432116Z # the reason we do this is to attempt to raise a more helpful error 2024-03-27T20:45:24.2432387Z # message than "Process x terminated/timed out" 2024-03-27T20:45:24.2432675Z # TODO: we should pipe the exception of the failed subprocess here. 2024-03-27T20:45:24.2432994Z # Currently, the actual exception is displayed as a logging output. 2024-03-27T20:45:24.2433242Z errored_processes = [ 2024-03-27T20:45:24.2433449Z (i, p) 2024-03-27T20:45:24.2433660Z for i, p in enumerate(self.processes) 2024-03-27T20:45:24.2433939Z if p.exitcode == MultiProcessTestCase.TEST_ERROR_EXIT_CODE 2024-03-27T20:45:24.2434173Z ] 2024-03-27T20:45:24.2434372Z if errored_processes: 2024-03-27T20:45:24.2434573Z error = "" 2024-03-27T20:45:24.2434799Z for i, process in errored_processes: 2024-03-27T20:45:24.2435030Z # Get error from pipe. 2024-03-27T20:45:24.2435278Z error_message = self.pid_to_pipe[process.pid].recv() 2024-03-27T20:45:24.2435519Z error += ( 2024-03-27T20:45:24.2435753Z "Process {} exited with error code {} and exception:\n{}\n".format( 2024-03-27T20:45:24.2436047Z i, MultiProcessTestCase.TEST_ERROR_EXIT_CODE, error_message 2024-03-27T20:45:24.2436292Z ) 2024-03-27T20:45:24.2436480Z ) 2024-03-27T20:45:24.2436654Z 2024-03-27T20:45:24.2436855Z > raise RuntimeError(error) 2024-03-27T20:45:24.2437320Z E RuntimeError: Process 0 exited with error code 10 and exception: 2024-03-27T20:45:24.2437698Z E Traceback (most recent call last): 2024-03-27T20:45:24.2438185Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:45:24.2438583Z E getattr(self, test_name)() 2024-03-27T20:45:24.2439057Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:45:24.2439418Z E fn() 2024-03-27T20:45:24.2439863Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:45:24.2440252Z E method(*args, **kwargs) 2024-03-27T20:45:24.2440715Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 440, in instantiated_test 2024-03-27T20:45:24.2441121Z E test(self, **param_kwargs) 2024-03-27T20:45:24.2441590Z E File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 712, in test_fsdp_grad_parity_with_without_bucketing 2024-03-27T20:45:24.2442101Z E self.assertTrue(has_pack_multiple_tensors, msg=f"{[bsym.args[0] for bsym in pack_bsyms]=}") 2024-03-27T20:45:24.2442549Z E File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue 2024-03-27T20:45:24.2442918Z E raise self.failureException(msg) 2024-03-27T20:45:24.2443416Z E AssertionError: False is not true : [bsym.args[0] for bsym in pack_bsyms]=[[t_net1_bias], [t_net2_bias], [t_net1_weight], [t_net2_weight]] 2024-03-27T20:45:24.2443774Z E  2024-03-27T20:45:24.2444130Z E To execute this test, run the following from the base repo dir: 2024-03-27T20:45:24.2444714Z E python test_ddp.py -k test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero2 2024-03-27T20:45:24.2445123Z E  2024-03-27T20:45:24.2445491Z E This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:45:24.2445676Z 2024-03-27T20:45:24.2446086Z /usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py:802: RuntimeError 2024-03-27T20:45:24.2446702Z - generated xml file: /__w/5/s/thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero2-results.xml - 2024-03-27T20:45:24.2447232Z =========================== short test summary info ============================ 2024-03-27T20:45:24.2447901Z FAILED thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero2 - RuntimeError: Process 0 exited with error code 10 and exception: 2024-03-27T20:45:24.2448321Z Traceback (most recent call last): 2024-03-27T20:45:24.2448753Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:45:24.2449058Z getattr(self, test_name)() 2024-03-27T20:45:24.2449470Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:45:24.2449750Z fn() 2024-03-27T20:45:24.2450148Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:45:24.2450438Z method(*args, **kwargs) 2024-03-27T20:45:24.2450854Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 440, in instantiated_test 2024-03-27T20:45:24.2451158Z test(self, **param_kwargs) 2024-03-27T20:45:24.2451457Z File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 712, in test_fsdp_grad_parity_with_without_bucketing 2024-03-27T20:45:24.2451810Z self.assertTrue(has_pack_multiple_tensors, msg=f"{[bsym.args[0] for bsym in pack_bsyms]=}") 2024-03-27T20:45:24.2452127Z File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue 2024-03-27T20:45:24.2452380Z raise self.failureException(msg) 2024-03-27T20:45:24.2452731Z AssertionError: False is not true : [bsym.args[0] for bsym in pack_bsyms]=[[t_net1_bias], [t_net2_bias], [t_net1_weight], [t_net2_weight]] 2024-03-27T20:45:24.2452964Z 2024-03-27T20:45:24.2453202Z To execute this test, run the following from the base repo dir: 2024-03-27T20:45:24.2453646Z python test_ddp.py -k test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero2 2024-03-27T20:45:24.2453847Z 2024-03-27T20:45:24.2454076Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:45:24.2454544Z ============================== 1 failed in 8.24s =============================== 2024-03-27T20:45:31.6270452Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] Caught exception: 2024-03-27T20:45:31.6272416Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] Traceback (most recent call last): 2024-03-27T20:45:31.6273676Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:45:31.6274665Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] getattr(self, test_name)() 2024-03-27T20:45:31.6275750Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:45:31.6276653Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] fn() 2024-03-27T20:45:31.6278306Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:45:31.6279450Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] method(*args, **kwargs) 2024-03-27T20:45:31.6280558Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 440, in instantiated_test 2024-03-27T20:45:31.6281532Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] test(self, **param_kwargs) 2024-03-27T20:45:31.6282608Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 712, in test_fsdp_grad_parity_with_without_bucketing 2024-03-27T20:45:31.6283829Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] self.assertTrue(has_pack_multiple_tensors, msg=f"{[bsym.args[0] for bsym in pack_bsyms]=}") 2024-03-27T20:45:31.6284867Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue 2024-03-27T20:45:31.6285781Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] raise self.failureException(msg) 2024-03-27T20:45:31.6286909Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] AssertionError: False is not true : [bsym.args[0] for bsym in pack_bsyms]=[[t_net1_bias], [t_net1_weight], [t_net2_bias], [t_net2_weight]] 2024-03-27T20:45:31.6287819Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:45:31.6288711Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] To execute this test, run the following from the base repo dir: 2024-03-27T20:45:31.6289801Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] python test_ddp.py -k test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero3 2024-03-27T20:45:31.6290677Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:45:31.6291597Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:45:31.6292511Z [rank1]:[2024-03-27 20:45:31,626] torch.testing._internal.common_distributed: [ERROR] exiting process 1 with exit code: 10 2024-03-27T20:45:31.6357948Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] Caught exception: 2024-03-27T20:45:31.6359606Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] Traceback (most recent call last): 2024-03-27T20:45:31.6361000Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:45:31.6361998Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] getattr(self, test_name)() 2024-03-27T20:45:31.6363074Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:45:31.6363966Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] fn() 2024-03-27T20:45:31.6364984Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:45:31.6365913Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] method(*args, **kwargs) 2024-03-27T20:45:31.6367218Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 440, in instantiated_test 2024-03-27T20:45:31.6368296Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] test(self, **param_kwargs) 2024-03-27T20:45:31.6369338Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 712, in test_fsdp_grad_parity_with_without_bucketing 2024-03-27T20:45:31.6370453Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] self.assertTrue(has_pack_multiple_tensors, msg=f"{[bsym.args[0] for bsym in pack_bsyms]=}") 2024-03-27T20:45:31.6371474Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue 2024-03-27T20:45:31.6372376Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] raise self.failureException(msg) 2024-03-27T20:45:31.6373487Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] AssertionError: False is not true : [bsym.args[0] for bsym in pack_bsyms]=[[t_net1_bias], [t_net1_weight], [t_net2_bias], [t_net2_weight]] 2024-03-27T20:45:31.6374377Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:45:31.6375237Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] To execute this test, run the following from the base repo dir: 2024-03-27T20:45:31.6376306Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] python test_ddp.py -k test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero3 2024-03-27T20:45:31.6377176Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:45:31.6378079Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:45:31.6379390Z [rank0]:[2024-03-27 20:45:31,634] torch.testing._internal.common_distributed: [ERROR] exiting process 0 with exit code: 10 2024-03-27T20:45:33.2480680Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero3 status >>> 1 2024-03-27T20:45:33.2496316Z ============================= test session starts ============================== 2024-03-27T20:45:33.2496909Z platform linux -- Python 3.10.12, pytest-8.0.2, pluggy-1.4.0 -- /usr/bin/python 2024-03-27T20:45:33.2497357Z cachedir: .pytest_cache 2024-03-27T20:45:33.2497867Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/__w/5/s/.hypothesis/examples')) 2024-03-27T20:45:33.2498476Z Test order randomisation NOT enabled. Enable with --random-order or --random-order-bucket= 2024-03-27T20:45:33.2498988Z rootdir: /__w/5/s 2024-03-27T20:45:33.2499240Z configfile: pyproject.toml 2024-03-27T20:45:33.2499759Z plugins: hypothesis-6.99.10, timeout-2.2.0, cov-4.1.0, timestamper-0.0.9, random-order-1.1.1, xdist-3.5.0 2024-03-27T20:45:33.2500304Z timeout: 900.0s 2024-03-27T20:45:33.2500557Z timeout method: signal 2024-03-27T20:45:33.2500814Z timeout func_only: False 2024-03-27T20:45:33.2501169Z collecting ... collected 1 item 2024-03-27T20:45:33.2501314Z 2024-03-27T20:45:33.2502065Z [2024-03-27 20:45:26] thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero3 Process 1 terminated with exit code 10, terminating remaining processes. 2024-03-27T20:45:33.2502603Z FAILED 2024-03-27T20:45:33.2502720Z 2024-03-27T20:45:33.2503000Z =================================== FAILURES =================================== 2024-03-27T20:45:33.2504200Z _ CompileDDPTest.test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero3 _ 2024-03-27T20:45:33.2504950Z /usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py:533: in wrapper 2024-03-27T20:45:33.2505295Z self._join_processes(fn) 2024-03-27T20:45:33.2505818Z /usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py:752: in _join_processes 2024-03-27T20:45:33.2506199Z self._check_return_codes(elapsed_time) 2024-03-27T20:45:33.2506494Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2024-03-27T20:45:33.2506648Z 2024-03-27T20:45:33.2507016Z self = 2024-03-27T20:45:33.2507395Z elapsed_time = 5.412540435791016 2024-03-27T20:45:33.2507534Z 2024-03-27T20:45:33.2507885Z def _check_return_codes(self, elapsed_time) -> None: 2024-03-27T20:45:33.2508146Z """ 2024-03-27T20:45:33.2508409Z Checks that the return codes of all spawned processes match, and skips 2024-03-27T20:45:33.2508938Z tests if they returned a return code indicating a skipping condition. 2024-03-27T20:45:33.2509200Z """ 2024-03-27T20:45:33.2509459Z # If no processes are spawned, there is nothing to check. 2024-03-27T20:45:33.2509743Z if not self.processes: 2024-03-27T20:45:33.2510060Z logger.warning("Note: no subprocesses were spawned, test was likely skipped.") 2024-03-27T20:45:33.2510331Z return 2024-03-27T20:45:33.2510533Z 2024-03-27T20:45:33.2510763Z first_process = self.processes[0] 2024-03-27T20:45:33.2511051Z # first, we check if there are errors in actual processes 2024-03-27T20:45:33.2511350Z # (via TEST_ERROR_EXIT CODE), and raise an exception for those. 2024-03-27T20:45:33.2511671Z # the reason we do this is to attempt to raise a more helpful error 2024-03-27T20:45:33.2511973Z # message than "Process x terminated/timed out" 2024-03-27T20:45:33.2512281Z # TODO: we should pipe the exception of the failed subprocess here. 2024-03-27T20:45:33.2512604Z # Currently, the actual exception is displayed as a logging output. 2024-03-27T20:45:33.2512860Z errored_processes = [ 2024-03-27T20:45:33.2513081Z (i, p) 2024-03-27T20:45:33.2513330Z for i, p in enumerate(self.processes) 2024-03-27T20:45:33.2513629Z if p.exitcode == MultiProcessTestCase.TEST_ERROR_EXIT_CODE 2024-03-27T20:45:33.2513878Z ] 2024-03-27T20:45:33.2514100Z if errored_processes: 2024-03-27T20:45:33.2514339Z error = "" 2024-03-27T20:45:33.2514589Z for i, process in errored_processes: 2024-03-27T20:45:33.2514834Z # Get error from pipe. 2024-03-27T20:45:33.2515123Z error_message = self.pid_to_pipe[process.pid].recv() 2024-03-27T20:45:33.2515398Z error += ( 2024-03-27T20:45:33.2515664Z "Process {} exited with error code {} and exception:\n{}\n".format( 2024-03-27T20:45:33.2515972Z i, MultiProcessTestCase.TEST_ERROR_EXIT_CODE, error_message 2024-03-27T20:45:33.2516239Z ) 2024-03-27T20:45:33.2516457Z ) 2024-03-27T20:45:33.2516664Z 2024-03-27T20:45:33.2516870Z > raise RuntimeError(error) 2024-03-27T20:45:33.2517316Z E RuntimeError: Process 1 exited with error code 10 and exception: 2024-03-27T20:45:33.2517720Z E Traceback (most recent call last): 2024-03-27T20:45:33.2518252Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:45:33.2518724Z E getattr(self, test_name)() 2024-03-27T20:45:33.2519380Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:45:33.2519883Z E fn() 2024-03-27T20:45:33.2520383Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:45:33.2520830Z E method(*args, **kwargs) 2024-03-27T20:45:33.2521363Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 440, in instantiated_test 2024-03-27T20:45:33.2521796Z E test(self, **param_kwargs) 2024-03-27T20:45:33.2522339Z E File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 712, in test_fsdp_grad_parity_with_without_bucketing 2024-03-27T20:45:33.2522932Z E self.assertTrue(has_pack_multiple_tensors, msg=f"{[bsym.args[0] for bsym in pack_bsyms]=}") 2024-03-27T20:45:33.2523446Z E File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue 2024-03-27T20:45:33.2523871Z E raise self.failureException(msg) 2024-03-27T20:45:33.2524447Z E AssertionError: False is not true : [bsym.args[0] for bsym in pack_bsyms]=[[t_net1_bias], [t_net1_weight], [t_net2_bias], [t_net2_weight]] 2024-03-27T20:45:33.2524843Z E  2024-03-27T20:45:33.2525257Z E To execute this test, run the following from the base repo dir: 2024-03-27T20:45:33.2525835Z E python test_ddp.py -k test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero3 2024-03-27T20:45:33.2528069Z E  2024-03-27T20:45:33.2528490Z E This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:45:33.2528675Z 2024-03-27T20:45:33.2529130Z /usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py:802: RuntimeError 2024-03-27T20:45:33.2529880Z - generated xml file: /__w/5/s/thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero3-results.xml - 2024-03-27T20:45:33.2530507Z =========================== short test summary info ============================ 2024-03-27T20:45:33.2531288Z FAILED thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero3 - RuntimeError: Process 1 exited with error code 10 and exception: 2024-03-27T20:45:33.2531754Z Traceback (most recent call last): 2024-03-27T20:45:33.2532242Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:45:33.2532594Z getattr(self, test_name)() 2024-03-27T20:45:33.2533107Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:45:33.2533427Z fn() 2024-03-27T20:45:33.2533903Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:45:33.2534231Z method(*args, **kwargs) 2024-03-27T20:45:33.2534725Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 440, in instantiated_test 2024-03-27T20:45:33.2535067Z test(self, **param_kwargs) 2024-03-27T20:45:33.2535383Z File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 712, in test_fsdp_grad_parity_with_without_bucketing 2024-03-27T20:45:33.2535774Z self.assertTrue(has_pack_multiple_tensors, msg=f"{[bsym.args[0] for bsym in pack_bsyms]=}") 2024-03-27T20:45:33.2536138Z File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue 2024-03-27T20:45:33.2536443Z raise self.failureException(msg) 2024-03-27T20:45:33.2536794Z AssertionError: False is not true : [bsym.args[0] for bsym in pack_bsyms]=[[t_net1_bias], [t_net1_weight], [t_net2_bias], [t_net2_weight]] 2024-03-27T20:45:33.2537155Z 2024-03-27T20:45:33.2537418Z To execute this test, run the following from the base repo dir: 2024-03-27T20:45:33.2538030Z python test_ddp.py -k test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_block_zero3 2024-03-27T20:45:33.2538255Z 2024-03-27T20:45:33.2538517Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:45:33.2539152Z ============================== 1 failed in 7.99s =============================== 2024-03-27T20:45:43.1263884Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_layer_zero2 status >>> 0 2024-03-27T20:45:52.6528012Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_grad_parity_with_without_bucketing_executor_torch_bucketing_layer_zero3 status >>> 0 2024-03-27T20:46:01.1177501Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_fsdp_shard_unshard status >>> 0 2024-03-27T20:46:20.0064654Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_limit_in_flight_allgathers_executor_nvfuser_bucketing_block_zero3 status >>> 0 2024-03-27T20:46:39.4915608Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_limit_in_flight_allgathers_executor_nvfuser_bucketing_layer_zero3 status >>> 0 2024-03-27T20:46:58.4941762Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_limit_in_flight_allgathers_executor_nvfuser_bucketing_none_zero3 status >>> 0 2024-03-27T20:47:09.3793691Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_limit_in_flight_allgathers_executor_torch_bucketing_block_zero3 status >>> 0 2024-03-27T20:47:20.5165259Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_limit_in_flight_allgathers_executor_torch_bucketing_layer_zero3 status >>> 0 2024-03-27T20:47:31.3751571Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_limit_in_flight_allgathers_executor_torch_bucketing_none_zero3 status >>> 0 2024-03-27T20:47:38.9300831Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_materialize_meta_tensors status >>> 0 2024-03-27T20:47:47.6481555Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_reduce_scatter_executor_nvfuser status >>> 0 2024-03-27T20:47:55.7963326Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_reduce_scatter_executor_torch status >>> 0 2024-03-27T20:48:03.3413059Z [rank0]:[2024-03-27 20:48:03,340] torch.testing._internal.common_distributed: [ERROR] Caught exception: 2024-03-27T20:48:03.3414243Z [rank0]:[2024-03-27 20:48:03,340] torch.testing._internal.common_distributed: [ERROR] Traceback (most recent call last): 2024-03-27T20:48:03.3419756Z [rank0]:[2024-03-27 20:48:03,340] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:48:03.3421784Z [rank0]:[2024-03-27 20:48:03,340] torch.testing._internal.common_distributed: [ERROR] getattr(self, test_name)() 2024-03-27T20:48:03.3423517Z [rank0]:[2024-03-27 20:48:03,340] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:48:03.3424752Z [rank0]:[2024-03-27 20:48:03,340] torch.testing._internal.common_distributed: [ERROR] fn() 2024-03-27T20:48:03.3425883Z [rank0]:[2024-03-27 20:48:03,340] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:48:03.3427255Z [rank0]:[2024-03-27 20:48:03,340] torch.testing._internal.common_distributed: [ERROR] method(*args, **kwargs) 2024-03-27T20:48:03.3428569Z [rank0]:[2024-03-27 20:48:03,340] torch.testing._internal.common_distributed: [ERROR] File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 483, in test_rematerialize_all_gather 2024-03-27T20:48:03.3430668Z [rank0]:[2024-03-27 20:48:03,340] torch.testing._internal.common_distributed: [ERROR] (fwd_trc,) = ( 2024-03-27T20:48:03.3431866Z [rank0]:[2024-03-27 20:48:03,340] torch.testing._internal.common_distributed: [ERROR] ValueError: too many values to unpack (expected 1) 2024-03-27T20:48:03.3432686Z [rank0]:[2024-03-27 20:48:03,340] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:48:03.3433600Z [rank0]:[2024-03-27 20:48:03,340] torch.testing._internal.common_distributed: [ERROR] To execute this test, run the following from the base repo dir: 2024-03-27T20:48:03.3434564Z [rank0]:[2024-03-27 20:48:03,340] torch.testing._internal.common_distributed: [ERROR] python test_ddp.py -k test_rematerialize_all_gather 2024-03-27T20:48:03.3435377Z [rank0]:[2024-03-27 20:48:03,340] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:48:03.3436311Z [rank0]:[2024-03-27 20:48:03,340] torch.testing._internal.common_distributed: [ERROR] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:48:03.3437256Z [rank0]:[2024-03-27 20:48:03,340] torch.testing._internal.common_distributed: [ERROR] exiting process 0 with exit code: 10 2024-03-27T20:48:03.3514752Z [rank1]:[2024-03-27 20:48:03,350] torch.testing._internal.common_distributed: [ERROR] Caught exception: 2024-03-27T20:48:03.3515668Z [rank1]:[2024-03-27 20:48:03,350] torch.testing._internal.common_distributed: [ERROR] Traceback (most recent call last): 2024-03-27T20:48:03.3516798Z [rank1]:[2024-03-27 20:48:03,350] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:48:03.3517770Z [rank1]:[2024-03-27 20:48:03,350] torch.testing._internal.common_distributed: [ERROR] getattr(self, test_name)() 2024-03-27T20:48:03.3518837Z [rank1]:[2024-03-27 20:48:03,350] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:48:03.3519754Z [rank1]:[2024-03-27 20:48:03,350] torch.testing._internal.common_distributed: [ERROR] fn() 2024-03-27T20:48:03.3521128Z [rank1]:[2024-03-27 20:48:03,350] torch.testing._internal.common_distributed: [ERROR] File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:48:03.3522337Z [rank1]:[2024-03-27 20:48:03,350] torch.testing._internal.common_distributed: [ERROR] method(*args, **kwargs) 2024-03-27T20:48:03.3523350Z [rank1]:[2024-03-27 20:48:03,350] torch.testing._internal.common_distributed: [ERROR] File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 483, in test_rematerialize_all_gather 2024-03-27T20:48:03.3524252Z [rank1]:[2024-03-27 20:48:03,350] torch.testing._internal.common_distributed: [ERROR] (fwd_trc,) = ( 2024-03-27T20:48:03.3525134Z [rank1]:[2024-03-27 20:48:03,350] torch.testing._internal.common_distributed: [ERROR] ValueError: too many values to unpack (expected 1) 2024-03-27T20:48:03.3525976Z [rank1]:[2024-03-27 20:48:03,350] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:48:03.3526881Z [rank1]:[2024-03-27 20:48:03,350] torch.testing._internal.common_distributed: [ERROR] To execute this test, run the following from the base repo dir: 2024-03-27T20:48:03.3527835Z [rank1]:[2024-03-27 20:48:03,350] torch.testing._internal.common_distributed: [ERROR] python test_ddp.py -k test_rematerialize_all_gather 2024-03-27T20:48:03.3528637Z [rank1]:[2024-03-27 20:48:03,350] torch.testing._internal.common_distributed: [ERROR] 2024-03-27T20:48:03.3529544Z [rank1]:[2024-03-27 20:48:03,350] torch.testing._internal.common_distributed: [ERROR] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:48:03.3530472Z [rank1]:[2024-03-27 20:48:03,350] torch.testing._internal.common_distributed: [ERROR] exiting process 1 with exit code: 10 2024-03-27T20:48:04.7013182Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_rematerialize_all_gather status >>> 1 2024-03-27T20:48:04.7025969Z ============================= test session starts ============================== 2024-03-27T20:48:04.7027217Z platform linux -- Python 3.10.12, pytest-8.0.2, pluggy-1.4.0 -- /usr/bin/python 2024-03-27T20:48:04.7027591Z cachedir: .pytest_cache 2024-03-27T20:48:04.7028129Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/__w/5/s/.hypothesis/examples')) 2024-03-27T20:48:04.7028671Z Test order randomisation NOT enabled. Enable with --random-order or --random-order-bucket= 2024-03-27T20:48:04.7029013Z rootdir: /__w/5/s 2024-03-27T20:48:04.7029282Z configfile: pyproject.toml 2024-03-27T20:48:04.7029767Z plugins: hypothesis-6.99.10, timeout-2.2.0, cov-4.1.0, timestamper-0.0.9, random-order-1.1.1, xdist-3.5.0 2024-03-27T20:48:04.7030170Z timeout: 900.0s 2024-03-27T20:48:04.7030450Z timeout method: signal 2024-03-27T20:48:04.7031057Z timeout func_only: False 2024-03-27T20:48:04.7031471Z collecting ... collected 1 item 2024-03-27T20:48:04.7031613Z 2024-03-27T20:48:04.7032428Z [2024-03-27 20:47:58] thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_rematerialize_all_gather Process 1 terminated with exit code 10, terminating remaining processes. 2024-03-27T20:48:04.7033172Z FAILED 2024-03-27T20:48:04.7033366Z 2024-03-27T20:48:04.7037282Z =================================== FAILURES =================================== 2024-03-27T20:48:04.7038564Z _________________ CompileDDPTest.test_rematerialize_all_gather _________________ 2024-03-27T20:48:04.7039208Z /usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py:533: in wrapper 2024-03-27T20:48:04.7039670Z self._join_processes(fn) 2024-03-27T20:48:04.7040613Z /usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py:752: in _join_processes 2024-03-27T20:48:04.7041120Z self._check_return_codes(elapsed_time) 2024-03-27T20:48:04.7041566Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2024-03-27T20:48:04.7041860Z 2024-03-27T20:48:04.7042206Z self = 2024-03-27T20:48:04.7042501Z elapsed_time = 5.509700298309326 2024-03-27T20:48:04.7042600Z 2024-03-27T20:48:04.7042846Z def _check_return_codes(self, elapsed_time) -> None: 2024-03-27T20:48:04.7043028Z """ 2024-03-27T20:48:04.7043220Z Checks that the return codes of all spawned processes match, and skips 2024-03-27T20:48:04.7043475Z tests if they returned a return code indicating a skipping condition. 2024-03-27T20:48:04.7043659Z """ 2024-03-27T20:48:04.7043863Z # If no processes are spawned, there is nothing to check. 2024-03-27T20:48:04.7044119Z if not self.processes: 2024-03-27T20:48:04.7044402Z logger.warning("Note: no subprocesses were spawned, test was likely skipped.") 2024-03-27T20:48:04.7044789Z return 2024-03-27T20:48:04.7045102Z 2024-03-27T20:48:04.7045436Z first_process = self.processes[0] 2024-03-27T20:48:04.7045821Z # first, we check if there are errors in actual processes 2024-03-27T20:48:04.7046062Z # (via TEST_ERROR_EXIT CODE), and raise an exception for those. 2024-03-27T20:48:04.7046332Z # the reason we do this is to attempt to raise a more helpful error 2024-03-27T20:48:04.7046734Z # message than "Process x terminated/timed out" 2024-03-27T20:48:04.7047148Z # TODO: we should pipe the exception of the failed subprocess here. 2024-03-27T20:48:04.7047593Z # Currently, the actual exception is displayed as a logging output. 2024-03-27T20:48:04.7047961Z errored_processes = [ 2024-03-27T20:48:04.7048281Z (i, p) 2024-03-27T20:48:04.7048525Z for i, p in enumerate(self.processes) 2024-03-27T20:48:04.7049364Z if p.exitcode == MultiProcessTestCase.TEST_ERROR_EXIT_CODE 2024-03-27T20:48:04.7049590Z ] 2024-03-27T20:48:04.7049795Z if errored_processes: 2024-03-27T20:48:04.7050142Z error = "" 2024-03-27T20:48:04.7050309Z for i, process in errored_processes: 2024-03-27T20:48:04.7050473Z # Get error from pipe. 2024-03-27T20:48:04.7050660Z error_message = self.pid_to_pipe[process.pid].recv() 2024-03-27T20:48:04.7050844Z error += ( 2024-03-27T20:48:04.7051036Z "Process {} exited with error code {} and exception:\n{}\n".format( 2024-03-27T20:48:04.7051251Z i, MultiProcessTestCase.TEST_ERROR_EXIT_CODE, error_message 2024-03-27T20:48:04.7051419Z ) 2024-03-27T20:48:04.7051562Z ) 2024-03-27T20:48:04.7051696Z 2024-03-27T20:48:04.7051844Z > raise RuntimeError(error) 2024-03-27T20:48:04.7052199Z E RuntimeError: Process 1 exited with error code 10 and exception: 2024-03-27T20:48:04.7052475Z E Traceback (most recent call last): 2024-03-27T20:48:04.7052834Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:48:04.7053133Z E getattr(self, test_name)() 2024-03-27T20:48:04.7053477Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:48:04.7053745Z E fn() 2024-03-27T20:48:04.7054068Z E File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:48:04.7054356Z E method(*args, **kwargs) 2024-03-27T20:48:04.7054678Z E File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 483, in test_rematerialize_all_gather 2024-03-27T20:48:04.7054950Z E (fwd_trc,) = ( 2024-03-27T20:48:04.7055212Z E ValueError: too many values to unpack (expected 1) 2024-03-27T20:48:04.7055427Z E  2024-03-27T20:48:04.7055691Z E To execute this test, run the following from the base repo dir: 2024-03-27T20:48:04.7055983Z E python test_ddp.py -k test_rematerialize_all_gather 2024-03-27T20:48:04.7056203Z E  2024-03-27T20:48:04.7056474Z E This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:48:04.7056605Z 2024-03-27T20:48:04.7056921Z /usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py:802: RuntimeError 2024-03-27T20:48:04.7057315Z - generated xml file: /__w/5/s/thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_rematerialize_all_gather-results.xml - 2024-03-27T20:48:04.7057667Z =========================== short test summary info ============================ 2024-03-27T20:48:04.7058101Z FAILED thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_rematerialize_all_gather - RuntimeError: Process 1 exited with error code 10 and exception: 2024-03-27T20:48:04.7058375Z Traceback (most recent call last): 2024-03-27T20:48:04.7058850Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 649, in run_test 2024-03-27T20:48:04.7059089Z getattr(self, test_name)() 2024-03-27T20:48:04.7059402Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_distributed.py", line 535, in wrapper 2024-03-27T20:48:04.7059611Z fn() 2024-03-27T20:48:04.7059906Z File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2652, in wrapper 2024-03-27T20:48:04.7060126Z method(*args, **kwargs) 2024-03-27T20:48:04.7060324Z File "/__w/5/s/thunder/tests/distributed/test_ddp.py", line 483, in test_rematerialize_all_gather 2024-03-27T20:48:04.7060618Z (fwd_trc,) = ( 2024-03-27T20:48:04.7060791Z ValueError: too many values to unpack (expected 1) 2024-03-27T20:48:04.7060942Z 2024-03-27T20:48:04.7061111Z To execute this test, run the following from the base repo dir: 2024-03-27T20:48:04.7061378Z python test_ddp.py -k test_rematerialize_all_gather 2024-03-27T20:48:04.7061482Z 2024-03-27T20:48:04.7061658Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2024-03-27T20:48:04.7062003Z ============================== 1 failed in 8.14s =============================== 2024-03-27T20:48:13.1919276Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_sort_waits_executor_nvfuser status >>> 0 2024-03-27T20:48:21.1979608Z thunder/tests/distributed/test_ddp.py::CompileDDPTest::test_sort_waits_executor_torch status >>> 0 2024-03-27T20:48:29.9497286Z thunder/tests/distributed/test_ddp.py::test_native_ddp_torch_cuda_float32[0] status >>> 0 2024-03-27T20:48:38.6628078Z thunder/tests/distributed/test_ddp.py::test_native_ddp_torch_cuda_float32[25] status >>> 0 2024-03-27T20:48:48.1541554Z thunder/tests/distributed/test_ddp.py::test_native_ddp_nvfuser_cuda_float32[0] status >>> 0 2024-03-27T20:48:57.5914111Z thunder/tests/distributed/test_ddp.py::test_native_ddp_nvfuser_cuda_float32[25] status >>> 0 2024-03-27T20:49:06.4267837Z thunder/tests/distributed/test_ddp.py::test_native_fsdp_torch_cuda_float32[FSDPBucketingStrategy.NONE] status >>> 0 2024-03-27T20:49:15.9201461Z thunder/tests/distributed/test_ddp.py::test_native_fsdp_torch_cuda_float32[FSDPBucketingStrategy.LAYER] status >>> 0 2024-03-27T20:49:24.5972819Z thunder/tests/distributed/test_ddp.py::test_native_fsdp_torch_cuda_float32[FSDPBucketingStrategy.BLOCK] status >>> 0 2024-03-27T20:49:34.5125258Z thunder/tests/distributed/test_ddp.py::test_native_fsdp_nvfuser_cuda_float32[FSDPBucketingStrategy.NONE] status >>> 0 2024-03-27T20:49:44.5331407Z thunder/tests/distributed/test_ddp.py::test_native_fsdp_nvfuser_cuda_float32[FSDPBucketingStrategy.LAYER] status >>> 0 2024-03-27T20:49:54.2595973Z thunder/tests/distributed/test_ddp.py::test_native_fsdp_nvfuser_cuda_float32[FSDPBucketingStrategy.BLOCK] status >>> 0 2024-03-27T20:49:54.3420984Z No data to report. 2024-03-27T20:49:54.4274698Z No data to report. 2024-03-27T20:49:54.6296595Z [2024-03-27T20:49:54.628Z] ['info'] 2024-03-27T20:49:54.6297452Z _____ _ 2024-03-27T20:49:54.6297788Z / ____| | | 2024-03-27T20:49:54.6303280Z | | ___ __| | ___ ___ _____ __ 2024-03-27T20:49:54.6303942Z | | / _ \ / _` |/ _ \/ __/ _ \ \ / / 2024-03-27T20:49:54.6304256Z | |___| (_) | (_| | __/ (_| (_) \ V / 2024-03-27T20:49:54.6304575Z \_____\___/ \__,_|\___|\___\___/ \_/ 2024-03-27T20:49:54.6304741Z 2024-03-27T20:49:54.6305042Z Codecov report uploader 0.7.2 2024-03-27T20:49:54.6375444Z [2024-03-27T20:49:54.636Z] ['info'] => Project root located at: /__w/5/s 2024-03-27T20:49:54.6401565Z [2024-03-27T20:49:54.639Z] ['info'] -> Token found by arguments 2024-03-27T20:49:54.7445672Z [2024-03-27T20:49:54.743Z] ['info'] Searching for coverage files... 2024-03-27T20:49:54.7697080Z [2024-03-27T20:49:54.769Z] ['info'] Warning: Some files located via search were excluded from upload. 2024-03-27T20:49:54.7698149Z [2024-03-27T20:49:54.769Z] ['info'] If Codecov did not locate your files, please review https://docs.codecov.com/docs/supported-report-formats 2024-03-27T20:49:54.7703757Z [2024-03-27T20:49:54.769Z] ['error'] There was an error running the uploader: No coverage files located, please try use `-f`, or change the project root with `-R` 2024-03-27T20:49:54.7704911Z [2024-03-27T20:49:54.769Z] ['info'] Codecov will exit with status code 0. If you are expecting a non-zero exit code, please pass in the `-Z` flag 2024-03-27T20:49:54.7791848Z 2024-03-27T20:49:54.8027413Z ##[section]Finishing: Testing: distributed