{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":82935122,"defaultBranch":"master","name":"pytorch","ownerLogin":"bottler","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2017-02-23T14:28:09.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/669761?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1688413327.0","currentOid":""},"activityList":{"items":[{"before":"7e3f1e8bff81107f39163eb12fbc472c13e2d556","after":null,"ref":"refs/heads/export-D46683554","pushedAt":"2023-07-03T19:42:07.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"bottler","name":"Jeremy Reizenstein","path":"/bottler","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/669761?s=80&v=4"}},{"before":null,"after":"7e3f1e8bff81107f39163eb12fbc472c13e2d556","ref":"refs/heads/export-D46683554","pushedAt":"2023-06-13T12:21:21.245Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"bottler","name":"Jeremy Reizenstein","path":"/bottler","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/669761?s=80&v=4"},"commit":{"message":"NCCL process group: avoid workEnqueue when capturing cuda graph\n\nSummary:\nIn torch.distributed, we make ProcessGroupNCCL not call workEnqueue when the cuda stream is capturing. I.e., when capturing a CUDA graph, we do not enqueue anything for the watchdog thread to consider. This allows capturing NCCL operations in a CUDA Graph.\n\nThis is followup to an internal discussion [1] where the watchdog thread was observed to crash when using cuda graphs containing an all_reduce. The watchdog thread wants to query events pertaining to enqueued work items, but this can't be done for \"events\" created during cuda graph capture.\n\n[1] https://fb.workplace.com/groups/1405155842844877/posts/6975201909173548/\n\nThis is another attempt at https://github.com/pytorch/pytorch/pull/102542 / D46274814, fixing the test failures.\n\nTest Plan: The repro mentioned in https://fb.workplace.com/groups/1405155842844877/posts/7003002339726838/ runs successfully after this change.\n\nDifferential Revision: D46683554\n\nfbshipit-source-id: ec1aa293b01ec1efe7da6d01bd7275b248fd5ea9","shortMessageHtmlLink":"NCCL process group: avoid workEnqueue when capturing cuda graph"}},{"before":"e3f3fc2c9ec7ba93c95549bf5e50a9f8d01cb131","after":"1bacd4c58bf06765498cfc497f8b9ab76f57c4ad","ref":"refs/heads/export-D46274814","pushedAt":"2023-06-08T16:02:05.039Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"bottler","name":"Jeremy Reizenstein","path":"/bottler","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/669761?s=80&v=4"},"commit":{"message":"nccl process group: no workEnqueue when capturing (#102542)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/pytorch/pull/102542\n\nWhen capturing a cuda graph, do not enqueue anything for the watchdog thread.\n\nSee discussion in https://fb.workplace.com/groups/1405155842844877/posts/6975201909173548/\n\nTest Plan: The repro mentioned in https://fb.workplace.com/groups/1405155842844877/posts/7003002339726838/ runs successfully after this change.\n\nDifferential Revision: D46274814\n\nfbshipit-source-id: 62a9d34343e2536cc566d1c636ba88a6c7c45b1b","shortMessageHtmlLink":"nccl process group: no workEnqueue when capturing (pytorch#102542)"}},{"before":"11838b413373657792d2b5ee04bed72080b7d4fd","after":"e3f3fc2c9ec7ba93c95549bf5e50a9f8d01cb131","ref":"refs/heads/export-D46274814","pushedAt":"2023-06-05T15:45:29.801Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"bottler","name":"Jeremy Reizenstein","path":"/bottler","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/669761?s=80&v=4"},"commit":{"message":"nccl process group: no workEnqueue when capturing (#102542)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/pytorch/pull/102542\n\nWhen capturing a cuda graph, do not enqueue anything for the watchdog thread.\n\nSee discussion in https://fb.workplace.com/groups/1405155842844877/posts/6975201909173548/\n\nTest Plan: The repro mentioned in https://fb.workplace.com/groups/1405155842844877/posts/7003002339726838/ runs successfully after this change.\n\nDifferential Revision: D46274814\n\nfbshipit-source-id: 61eaa91d261b384cf3835ac1480fc1d28484a97b","shortMessageHtmlLink":"nccl process group: no workEnqueue when capturing (pytorch#102542)"}},{"before":"bca6f2a94e3ca288f46204edfd90ae20177ee1c5","after":"11838b413373657792d2b5ee04bed72080b7d4fd","ref":"refs/heads/export-D46274814","pushedAt":"2023-06-05T15:35:02.722Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"bottler","name":"Jeremy Reizenstein","path":"/bottler","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/669761?s=80&v=4"},"commit":{"message":"nccl process group: no workEnqueue when capturing (#102542)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/pytorch/pull/102542\n\nWhen capturing a cuda graph, do not enqueue anything for the watchdog thread.\n\nSee discussion in https://fb.workplace.com/groups/1405155842844877/posts/6975201909173548/\n\nTest Plan: The repro mentioned in https://fb.workplace.com/groups/1405155842844877/posts/7003002339726838/ runs successfully after this change.\n\nDifferential Revision: D46274814\n\nfbshipit-source-id: 90310403b5f473d14d2dd3e454aa81aa7c6bbfd5","shortMessageHtmlLink":"nccl process group: no workEnqueue when capturing (pytorch#102542)"}},{"before":"7d6c6144cd3001a2b0d8dbae0de908d1a7c4842d","after":"bca6f2a94e3ca288f46204edfd90ae20177ee1c5","ref":"refs/heads/export-D46274814","pushedAt":"2023-06-05T11:45:23.106Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"bottler","name":"Jeremy Reizenstein","path":"/bottler","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/669761?s=80&v=4"},"commit":{"message":"nccl process group: no workEnqueue when capturing (#102542)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/pytorch/pull/102542\n\nWhen capturing a cuda graph, do not enqueue anything for the watchdog thread.\n\nSee discussion in https://fb.workplace.com/groups/1405155842844877/posts/6975201909173548/\n\nTest Plan: The repro mentioned in https://fb.workplace.com/groups/1405155842844877/posts/7003002339726838/ runs successfully after this change.\n\nDifferential Revision: D46274814\n\nfbshipit-source-id: ea58dc6f3078e3bd9916c12a2b2ac80ccebfbaf6","shortMessageHtmlLink":"nccl process group: no workEnqueue when capturing (pytorch#102542)"}},{"before":"f76d8b02130d7aaaefa976f96f0edc99d7de414a","after":"7d6c6144cd3001a2b0d8dbae0de908d1a7c4842d","ref":"refs/heads/export-D46274814","pushedAt":"2023-06-05T11:32:26.583Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"bottler","name":"Jeremy Reizenstein","path":"/bottler","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/669761?s=80&v=4"},"commit":{"message":"nccl process group: no workEnqueue when capturing (#102542)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/pytorch/pull/102542\n\nWhen capturing a cuda graph, do not enqueue anything for the watchdog thread.\n\nSee discussion in https://fb.workplace.com/groups/1405155842844877/posts/6975201909173548/\n\nTest Plan: The repro mentioned in https://fb.workplace.com/groups/1405155842844877/posts/7003002339726838/ runs successfully after this change.\n\nDifferential Revision: D46274814\n\nfbshipit-source-id: f8cf415c6e7cdf2f1b6cb7d369087ca4164cfd23","shortMessageHtmlLink":"nccl process group: no workEnqueue when capturing (pytorch#102542)"}},{"before":null,"after":"f76d8b02130d7aaaefa976f96f0edc99d7de414a","ref":"refs/heads/export-D46274814","pushedAt":"2023-05-30T16:42:51.482Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"bottler","name":"Jeremy Reizenstein","path":"/bottler","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/669761?s=80&v=4"},"commit":{"message":"nccl process group: no workEnqueue when capturing\n\nSummary:\nWhen capturing a cuda graph, do not enqueue anything for the watchdog thread.\n\nSee discussion in https://fb.workplace.com/groups/1405155842844877/posts/6975201909173548/\n\nTest Plan: The repro mentioned in https://fb.workplace.com/groups/1405155842844877/posts/7003002339726838/ runs successfully after this change.\n\nDifferential Revision: D46274814\n\nfbshipit-source-id: 2c6838bd9b00f28212a5f27b8e5654dcf2008175","shortMessageHtmlLink":"nccl process group: no workEnqueue when capturing"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAADTfSaVgA","startCursor":null,"endCursor":null}},"title":"Activity ยท bottler/pytorch"}