{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":355719318,"defaultBranch":"main","name":"pytorch","ownerLogin":"Xia-Weiwen","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2021-04-08T00:36:39.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/12522207?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1717123039.0","currentOid":""},"activityList":{"items":[{"before":"1d70cdf011b4cbeb0259624566146afaf1b396be","after":"900c425b199c6ae59dd4f3affa7c5a7e1768265b","ref":"refs/heads/fix_qlinear_mutation","pushedAt":"2024-06-03T14:31:17.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"fix format","shortMessageHtmlLink":"fix format"}},{"before":"80ec56aec9b0f3850a74fc3b1e1fc554d9a6a1b0","after":"1d70cdf011b4cbeb0259624566146afaf1b396be","ref":"refs/heads/fix_qlinear_mutation","pushedAt":"2024-06-03T06:48:18.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"Fix format issue","shortMessageHtmlLink":"Fix format issue"}},{"before":"60831c4ec8b5fcb591f547125bc62a579fa8bfd1","after":"80ec56aec9b0f3850a74fc3b1e1fc554d9a6a1b0","ref":"refs/heads/fix_qlinear_mutation","pushedAt":"2024-06-03T06:45:30.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"Refine code","shortMessageHtmlLink":"Refine code"}},{"before":"0b3b9724b069ac05c707ea0c35374b90c9631e2e","after":"60831c4ec8b5fcb591f547125bc62a579fa8bfd1","ref":"refs/heads/fix_qlinear_mutation","pushedAt":"2024-06-03T06:14:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"Refine code","shortMessageHtmlLink":"Refine code"}},{"before":"b14f4dbaa106ff2e9d76669577ecec7c0a38867f","after":"0b3b9724b069ac05c707ea0c35374b90c9631e2e","ref":"refs/heads/fix_qlinear_mutation","pushedAt":"2024-06-03T05:42:56.000Z","pushType":"push","commitsCount":103,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"Merge remote-tracking branch 'origin/main' into fix_qlinear_mutation","shortMessageHtmlLink":"Merge remote-tracking branch 'origin/main' into fix_qlinear_mutation"}},{"before":"b1792a622dee8f529c05e83541195b9c642a54b3","after":"e2e3ca94ccce1c0abbfd75ac0368793e1756c268","ref":"refs/heads/main","pushedAt":"2024-06-03T05:27:29.000Z","pushType":"push","commitsCount":101,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"[Inductor][Flex-attention] Support different sequence lengths for Query and Key/Value (#127678)\n\nFixes #ISSUE_NUMBER\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/127678\nApproved by: https://github.com/Chillee","shortMessageHtmlLink":"[Inductor][Flex-attention] Support different sequence lengths for Que…"}},{"before":"8690b688d5ea83b5af0afaf966adf9774c054d91","after":null,"ref":"refs/heads/fix_mutation_ir","pushedAt":"2024-05-31T02:37:19.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"}},{"before":null,"after":"b14f4dbaa106ff2e9d76669577ecec7c0a38867f","ref":"refs/heads/fix_qlinear_mutation","pushedAt":"2024-05-31T02:32:32.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"[Quant][Inductor] Add get_mutation_names in QLinearPointwiseBinaryPT2E IR","shortMessageHtmlLink":"[Quant][Inductor] Add get_mutation_names in QLinearPointwiseBinaryPT2…"}},{"before":"da39461d615ab7d4867fb31ff00d186fb7bc2954","after":"b1792a622dee8f529c05e83541195b9c642a54b3","ref":"refs/heads/main","pushedAt":"2024-05-31T01:57:34.000Z","pushType":"push","commitsCount":69,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"[pipelining] handle param aliasing (#127471)\n\nAdds support for parameter aliasing in pipelining. Does this by reading the state_dict, and creating a map of id -> valid tensor FQNs (to be used in _sink_params). Assigns additional FQN attributes that may be used, runs _sink_params(), and then deletes unused attributes. Shares some similarity with how export's unflattener does it.\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/127471\nApproved by: https://github.com/kwen2501","shortMessageHtmlLink":"[pipelining] handle param aliasing (pytorch#127471)"}},{"before":null,"after":"8690b688d5ea83b5af0afaf966adf9774c054d91","ref":"refs/heads/fix_mutation_ir","pushedAt":"2024-05-30T08:45:05.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"[Inductor] Fix bug in MutationOutput IR","shortMessageHtmlLink":"[Inductor] Fix bug in MutationOutput IR"}},{"before":"aa6de7618139644d440decd824b787add3dc59e5","after":"da39461d615ab7d4867fb31ff00d186fb7bc2954","ref":"refs/heads/main","pushedAt":"2024-05-30T01:53:53.000Z","pushType":"push","commitsCount":277,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"[optim] Move test_grad_scaling_autocast_fused_optimizers to test_cuda.py (#126418)\n\nthis PR address the comments in this PR #124904\n\n- Move test_grad_scaling_autocast_fused_optimizers to test_cuda.py\n- Combine _grad_scaling_autocast_fused_optimizers into test_grad_scaling_autocast_fused_optimizers\n- Move to OptimizerInfo framework.\n- For failing tests test_grad_scaling_autocast_fused_optimizers AdamW_cuda_float32, Adam_cuda_float32\n - Added toleranceOverride in this PR\n - created a issue #127000\n\n```\n> (c2env) [sandish@devgpu166.ash6 ~/pytorch (refactoroptimizers)]$ python test/test_cuda.py -k test_grad_scaling_autocast_fused_optimizers -v\n/home/sandish/pytorch/torch/backends/cudnn/__init__.py:106: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system.\n warnings.warn(\n/home/sandish/pytorch/torch/backends/cudnn/__init__.py:106: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system.\n warnings.warn(\ntest_grad_scaling_autocast_fused_optimizers_Adagrad_cpu_float32 (__main__.TestCudaOptimsCPU) ... {'fused': True}\n{'fused': True}\n{'weight_decay': 0.1, 'fused': True}\n{'weight_decay': 0.1, 'fused': True}\n{'weight_decay': 0.1, 'maximize': True, 'fused': True}\n{'weight_decay': 0.1, 'maximize': True, 'fused': True}\n{'lr': 0.1, 'fused': True}\n{'lr': 0.1, 'fused': True}\n{'initial_accumulator_value': 0.1, 'weight_decay': 0.1, 'fused': True}\n{'initial_accumulator_value': 0.1, 'weight_decay': 0.1, 'fused': True}\n{'lr': 0.1, 'lr_decay': 0.5, 'weight_decay': 0.1, 'fused': True}\n{'lr': 0.1, 'lr_decay': 0.5, 'weight_decay': 0.1, 'fused': True}\n{'lr': tensor(0.0010), 'fused': True}\n{'lr': tensor(0.0010), 'fused': True}\nok\ntest_grad_scaling_autocast_fused_optimizers_AdamW_cpu_float32 (__main__.TestCudaOptimsCPU) ... {'fused': True}\n{'fused': True}\n{'lr': 0.01, 'fused': True}\n{'lr': 0.01, 'fused': True}\n{'weight_decay': 0.1, 'fused': True}\n{'weight_decay': 0.1, 'fused': True}\n{'weight_decay': 0.1, 'maximize': True, 'fused': True}\n{'weight_decay': 0.1, 'maximize': True, 'fused': True}\n{'weight_decay': 0.1, 'amsgrad': True, 'fused': True}\n{'weight_decay': 0.1, 'amsgrad': True, 'fused': True}\nok\ntest_grad_scaling_autocast_fused_optimizers_Adam_cpu_float32 (__main__.TestCudaOptimsCPU) ... {'fused': True}\n{'fused': True}\n{'lr': 0.01, 'fused': True}\n{'lr': 0.01, 'fused': True}\n{'weight_decay': 0.1, 'fused': True}\n{'weight_decay': 0.1, 'fused': True}\n{'weight_decay': 0.1, 'maximize': True, 'fused': True}\n{'weight_decay': 0.1, 'maximize': True, 'fused': True}\n{'weight_decay': 0.1, 'amsgrad': True, 'fused': True}\n{'weight_decay': 0.1, 'amsgrad': True, 'fused': True}\nok\ntest_grad_scaling_autocast_fused_optimizers_SGD_cpu_float32 (__main__.TestCudaOptimsCPU) ... {'fused': True}\n{'fused': True}\n{'lr': 0.01, 'fused': True}\n{'lr': 0.01, 'fused': True}\n{'lr': tensor(0.0010), 'fused': True}\n{'lr': tensor(0.0010), 'fused': True}\n{'momentum': 0.9, 'fused': True}\n{'momentum': 0.9, 'fused': True}\n{'momentum': 0.9, 'dampening': 0.5, 'fused': True}\n{'momentum': 0.9, 'dampening': 0.5, 'fused': True}\n{'momentum': 0.9, 'weight_decay': 0.1, 'fused': True}\n{'momentum': 0.9, 'weight_decay': 0.1, 'fused': True}\n{'momentum': 0.9, 'nesterov': True, 'weight_decay': 0.1, 'fused': True}\n{'momentum': 0.9, 'nesterov': True, 'weight_decay': 0.1, 'fused': True}\n{'weight_decay': 0.1, 'maximize': True, 'fused': True}\n{'weight_decay': 0.1, 'maximize': True, 'fused': True}\nok\ntest_grad_scaling_autocast_fused_optimizers_Adagrad_cuda_float32 (__main__.TestCudaOptimsCUDA) ... skipped 'cuda is not supported for fused on Adagrad'\ntest_grad_scaling_autocast_fused_optimizers_AdamW_cuda_float32 (__main__.TestCudaOptimsCUDA) ... {'fused': True}\n{'fused': True}\n{'lr': 0.01, 'fused': True}\n{'lr': 0.01, 'fused': True}\n{'weight_decay': 0.1, 'fused': True}\n{'weight_decay': 0.1, 'fused': True}\n{'weight_decay': 0.1, 'maximize': True, 'fused': True}\n{'weight_decay': 0.1, 'maximize': True, 'fused': True}\n{'weight_decay': 0.1, 'amsgrad': True, 'fused': True}\n{'weight_decay': 0.1, 'amsgrad': True, 'fused': True}\n{'capturable': True, 'fused': True}\n{'capturable': True, 'fused': True}\n{'weight_decay': 0.1, 'amsgrad': True, 'capturable': True, 'fused': True}\n{'weight_decay': 0.1, 'amsgrad': True, 'capturable': True, 'fused': True}\n{'lr': tensor(0.0010), 'amsgrad': True, 'capturable': True, 'fused': True}\n{'lr': tensor(0.0010), 'amsgrad': True, 'capturable': True, 'fused': True}\nok\ntest_grad_scaling_autocast_fused_optimizers_Adam_cuda_float32 (__main__.TestCudaOptimsCUDA) ... {'fused': True}\n{'fused': True}\n{'lr': 0.01, 'fused': True}\n{'lr': 0.01, 'fused': True}\n{'weight_decay': 0.1, 'fused': True}\n{'weight_decay': 0.1, 'fused': True}\n{'weight_decay': 0.1, 'maximize': True, 'fused': True}\n{'weight_decay': 0.1, 'maximize': True, 'fused': True}\n{'weight_decay': 0.1, 'amsgrad': True, 'fused': True}\n{'weight_decay': 0.1, 'amsgrad': True, 'fused': True}\n{'capturable': True, 'fused': True}\n{'capturable': True, 'fused': True}\n{'weight_decay': 0.1, 'amsgrad': True, 'capturable': True, 'fused': True}\n{'weight_decay': 0.1, 'amsgrad': True, 'capturable': True, 'fused': True}\n{'lr': tensor(0.0010), 'amsgrad': True, 'capturable': True, 'fused': True}\n{'lr': tensor(0.0010), 'amsgrad': True, 'capturable': True, 'fused': True}\nok\ntest_grad_scaling_autocast_fused_optimizers_SGD_cuda_float32 (__main__.TestCudaOptimsCUDA) ... {'fused': True}\n{'fused': True}\n{'lr': 0.01, 'fused': True}\n{'lr': 0.01, 'fused': True}\n{'lr': tensor(0.0010), 'fused': True}\n{'lr': tensor(0.0010), 'fused': True}\n{'momentum': 0.9, 'fused': True}\n{'momentum': 0.9, 'fused': True}\n{'momentum': 0.9, 'dampening': 0.5, 'fused': True}\n{'momentum': 0.9, 'dampening': 0.5, 'fused': True}\n{'momentum': 0.9, 'weight_decay': 0.1, 'fused': True}\n{'momentum': 0.9, 'weight_decay': 0.1, 'fused': True}\n{'momentum': 0.9, 'nesterov': True, 'weight_decay': 0.1, 'fused': True}\n{'momentum': 0.9, 'nesterov': True, 'weight_decay': 0.1, 'fused': True}\n{'weight_decay': 0.1, 'maximize': True, 'fused': True}\n{'weight_decay': 0.1, 'maximize': True, 'fused': True}\nok\n\n----------------------------------------------------------------------\nRan 8 tests in 16.117s\n\nOK (skipped=1)\n\n> lintrunner test/test_cuda.py\n----------------------------------------------------------------------\nok No lint issues.\n\n> lintrunner torch/testing/_internal/common_optimizers.py\n----------------------------------------------------------------------\nok No lint issues.\n```\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/126418\nApproved by: https://github.com/janeyx99","shortMessageHtmlLink":"[optim] Move test_grad_scaling_autocast_fused_optimizers to test_cuda…"}},{"before":"636e79991cf788278d7d865ef1d751f45b6d1a7f","after":"aa6de7618139644d440decd824b787add3dc59e5","ref":"refs/heads/main","pushedAt":"2024-05-23T06:54:09.000Z","pushType":"push","commitsCount":66,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"Fix silu test for flexattention (#126641)\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/126641\nApproved by: https://github.com/ezyang, https://github.com/drisspg\nghstack dependencies: #126615, #126446","shortMessageHtmlLink":"Fix silu test for flexattention (pytorch#126641)"}},{"before":"51c07f9f69aedf884fc697f3ef8545cb0303e2a9","after":"636e79991cf788278d7d865ef1d751f45b6d1a7f","ref":"refs/heads/main","pushedAt":"2024-05-22T00:30:49.000Z","pushType":"push","commitsCount":43,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"[FSDP2] Fixed 2D clip grad norm test (#126497)\n\nThis fixes https://github.com/pytorch/pytorch/issues/126484.\n\nWe change from transformer to MLP stack since transformer seems to introduce slight numeric differences when using TP. We include a sequence parallel layer norm module in the MLP stack to exercise `(S(0), R)` placement.\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/126497\nApproved by: https://github.com/weifengpy, https://github.com/wz337","shortMessageHtmlLink":"[FSDP2] Fixed 2D clip grad norm test (pytorch#126497)"}},{"before":null,"after":"4444e2f2d85ddf01681195f38e2961d9c80932b8","ref":"refs/heads/fix_onednn_dw_qconv","pushedAt":"2024-05-21T07:38:50.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"[Quant][onednn] fix performance regression of quantized depth-wise convolution","shortMessageHtmlLink":"[Quant][onednn] fix performance regression of quantized depth-wise co…"}},{"before":"5fb11cda4fe60c1a7b30e6c844f84ce8933ef953","after":"51c07f9f69aedf884fc697f3ef8545cb0303e2a9","ref":"refs/heads/main","pushedAt":"2024-05-21T03:08:27.000Z","pushType":"push","commitsCount":35,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"[dynamo] Allow asserts to fail (#126661)\n\nCurrently if an assertion is statically known to be false, dynamo converts it to\n`_assert_async` which inductor currently ignores. Instead this graph breaks to\nraise the original assertion.\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/126661\nApproved by: https://github.com/ezyang","shortMessageHtmlLink":"[dynamo] Allow asserts to fail (pytorch#126661)"}},{"before":"ecaf0d0e26963246ba14b9bb47bc77c56cc5caa3","after":null,"ref":"refs/heads/upgrade_onednn_v3.4","pushedAt":"2024-05-21T02:49:55.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"}},{"before":"e661a424285b8f92d669af23518434f1071f74a0","after":"5fb11cda4fe60c1a7b30e6c844f84ce8933ef953","ref":"refs/heads/main","pushedAt":"2024-05-20T03:12:39.000Z","pushType":"push","commitsCount":119,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"[compiled autograd] Better cache miss logging (#126602)\n\n- log only first node key cache miss\n- log existing node key sizes\n- log which node's collected sizes became dynamic\ne.g.\n```\nDEBUG:torch._dynamo.compiled_autograd.__compiled_autograd_verbose:Cache miss due to new autograd node: torch::autograd::GraphRoot (NodeCall 0) with key size 39, previous key sizes=[]\n...\nDEBUG:torch._dynamo.compiled_autograd.__compiled_autograd_verbose:Cache miss due to new autograd node: torch::autograd::AccumulateGrad (NodeCall 5) with key size 32, previous key sizes=[21]\n...\nDEBUG:torch._dynamo.compiled_autograd.__compiled_autograd_verbose:Cache miss due to dynamic shapes: collected size idx 0 of torch::autograd::GraphRoot (NodeCall 0)\nDEBUG:torch._dynamo.compiled_autograd.__compiled_autograd_verbose:Cache miss due to dynamic shapes: collected size idx 2 of SumBackward0 (NodeCall 1)\nDEBUG:torch._dynamo.compiled_autograd.__compiled_autograd_verbose:Cache miss due to dynamic shapes: collected size idx 4 of SumBackward0 (NodeCall 1)\nDEBUG:torch._dynamo.compiled_autograd.__compiled_autograd_verbose:Cache miss due to dynamic shapes: collected size idx 2 of ReluBackward0 (NodeCall 2)\nDEBUG:torch._dynamo.compiled_autograd.__compiled_autograd_verbose:Cache miss due to dynamic shapes: collected size idx 9 of AddmmBackward0 (NodeCall 3)\nDEBUG:torch._dynamo.compiled_autograd.__compiled_autograd_verbose:Cache miss due to dynamic shapes: collected size idx 2 of torch::autograd::AccumulateGrad (NodeCall 5)\nDEBUG:torch._dynamo.compiled_autograd.__compiled_autograd_verbose:Cache miss due to dynamic shapes: collected size idx 2 of ReluBackward0 (NodeCall 6)\n```\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/126602\nApproved by: https://github.com/jansel\nghstack dependencies: #126144, #126146, #126148, #126483","shortMessageHtmlLink":"[compiled autograd] Better cache miss logging (pytorch#126602)"}},{"before":"3c4058cf18974c23516ac96faec94f65245e8b45","after":"e661a424285b8f92d669af23518434f1071f74a0","ref":"refs/heads/main","pushedAt":"2024-05-16T05:09:19.000Z","pushType":"push","commitsCount":163,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"[Add sliding window attention bias] (#126061)\n\nSummary:\nThis PR implements sliding window and updates \"aten._flash_attention_forward/_flash_attention_backward\" to expose the window_size_left and window_size_right arguments. With this kwarg added we can dispatch to the FAv2 impl if the necessary constraints are met.\n\nThese arguments will eventually be provided to \"aten.sdpa_flash\" but for now they are needed when called by xformers into their effort to directly use the Pytorch FAv2 impl instead of building their own.\n\nTest Plan:\nUse the default aten.sdpa_flash tests since we've added optional arguments set to the previous default value: -1, /*window_size_left*/\n\nUsing buck2 build --flagfile fbcode//mode/dev-nosan fbcode//caffe2/caffe2/fb/predictor/tests:inference_context_test\n\nDifferential Revision: D56938087\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/126061\nApproved by: https://github.com/drisspg, https://github.com/desertfire","shortMessageHtmlLink":"[Add sliding window attention bias] (pytorch#126061)"}},{"before":"9da9159bf152ede7ecf5d86da5365d4bfca2a2c5","after":"1995054d45dcfcc68ebdb3545086fedffa3fb407","ref":"refs/heads/upgrade_onednn_v3.4_new","pushedAt":"2024-05-15T01:50:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"CI: aarch64 linux: upgrade ACL version to 24.04\n\nThis is required for oneDNN 3.4.x version","shortMessageHtmlLink":"CI: aarch64 linux: upgrade ACL version to 24.04"}},{"before":null,"after":"9da9159bf152ede7ecf5d86da5365d4bfca2a2c5","ref":"refs/heads/upgrade_onednn_v3.4_new","pushedAt":"2024-05-14T01:45:04.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"Upgrade submodule oneDNN to v3.4.2","shortMessageHtmlLink":"Upgrade submodule oneDNN to v3.4.2"}},{"before":"1c3fe8403365db3cc9b75524ae742e3027b745e2","after":"3c4058cf18974c23516ac96faec94f65245e8b45","ref":"refs/heads/main","pushedAt":"2024-05-14T01:37:57.000Z","pushType":"push","commitsCount":23,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"Add master cache disable switch for inductor (#126084)\n\nFixes #125699\n\nDifferential Revision: [D57284558](https://our.internmc.facebook.com/intern/diff/D57284558/)\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/126084\nApproved by: https://github.com/jansel","shortMessageHtmlLink":"Add master cache disable switch for inductor (pytorch#126084)"}},{"before":"9e831b57b050fb43ca09ae01dfdf6d9817d91a55","after":"ecaf0d0e26963246ba14b9bb47bc77c56cc5caa3","ref":"refs/heads/upgrade_onednn_v3.4","pushedAt":"2024-05-13T02:17:53.000Z","pushType":"push","commitsCount":485,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"Merge branch 'main' into upgrade_onednn_v3.4","shortMessageHtmlLink":"Merge branch 'main' into upgrade_onednn_v3.4"}},{"before":"724c7491d063f5d564f2b94ecde02b1f2db7ddd6","after":"1c3fe8403365db3cc9b75524ae742e3027b745e2","ref":"refs/heads/main","pushedAt":"2024-05-13T02:15:50.000Z","pushType":"push","commitsCount":483,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"[optim] add fused_adagrad support for CPU device (#124905)\n\nSupport fused_sgd_kernel support for CPU.\n\n## Bench result:\n32 core/sockets ICX\nTest Scripts:\nhttps://gist.github.com/zhuhaozhe/79e842e0a6e25d6d7fa1e4598807272c\nhttps://gist.github.com/zhuhaozhe/b4c6998a509dcea1796dd05b3005c969\n```\nTensor Size: 262144, Num Tensor 4, Num Threads: 1\n_single_tensor_adagrad time: 0.2500 seconds\n_fused_adagrad time: 0.0933 seconds\nTensor Size: 4194304, Num Tensor 32, Num Threads: 32\n_single_tensor_adagrad time: 2.8819 seconds\n_fused_adagrad time: 1.7591 seconds\n```\n## Test Plan:\n```\npython test_optim.py -k test_fused_matches_forloop\npython test_optim.py -k test_fused_large_tensor\npython test_optim.py -k test_can_load_older_state_dict\npython test_optim.py -k test_grad_scaling_autocast_fused_optimizers\npython test_torch.py -k test_grad_scaling_autocast_fused\npython test_torch.py -k test_params_invalidated_with_grads_invalidated_between_unscale_and_step\n```\n\nCo-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>\nPull Request resolved: https://github.com/pytorch/pytorch/pull/124905\nApproved by: https://github.com/jgong5, https://github.com/janeyx99","shortMessageHtmlLink":"[optim] add fused_adagrad support for CPU device (pytorch#124905)"}},{"before":"680c267559db703334f48103b02583513dd8759e","after":"9e831b57b050fb43ca09ae01dfdf6d9817d91a55","ref":"refs/heads/upgrade_onednn_v3.4","pushedAt":"2024-04-30T01:14:46.000Z","pushType":"push","commitsCount":1219,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"Merge branch 'main' into upgrade_onednn_v3.4","shortMessageHtmlLink":"Merge branch 'main' into upgrade_onednn_v3.4"}},{"before":"9bccafc31c9d489b727155e95633efd19adbceaa","after":"724c7491d063f5d564f2b94ecde02b1f2db7ddd6","ref":"refs/heads/main","pushedAt":"2024-04-30T01:11:01.000Z","pushType":"push","commitsCount":127,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"Revert \" [Distributed] [7/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#124987)\"\n\nThis reverts commit b3fd94d15ef49c99ffa32a8226d1f00b0cc26f68.\n\nReverted https://github.com/pytorch/pytorch/pull/124987 on behalf of https://github.com/ezyang due to broke downstream extensions ([comment](https://github.com/pytorch/pytorch/pull/124987#issuecomment-2083956511))","shortMessageHtmlLink":"Revert \" [Distributed] [7/N] Fix clang-tidy warnings in torch/csrc/di…"}},{"before":"fdad16b85108209bc021107f312f4b221422a012","after":"9bccafc31c9d489b727155e95633efd19adbceaa","ref":"refs/heads/main","pushedAt":"2024-04-26T01:08:05.000Z","pushType":"push","commitsCount":65,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"Made FlexAttention rewrite getitem calls to use aten.index in score_mod (#124799)\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/124799\nApproved by: https://github.com/drisspg\nghstack dependencies: #124444","shortMessageHtmlLink":"Made FlexAttention rewrite getitem calls to use aten.index in score_m…"}},{"before":"3145522427400bb1c6357cb60e2251b6c9ebb84c","after":"fdad16b85108209bc021107f312f4b221422a012","ref":"refs/heads/main","pushedAt":"2024-04-25T03:53:27.000Z","pushType":"push","commitsCount":79,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"[cudagraphs] add cudagraph_skips counter (#124804)\n\nused in tests and benchmark csv\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/124804\nApproved by: https://github.com/eellison\nghstack dependencies: #119729, #124700","shortMessageHtmlLink":"[cudagraphs] add cudagraph_skips counter (pytorch#124804)"}},{"before":"236b0d12fa68c62d6af5a9110a23eca4f0a7c4df","after":"3145522427400bb1c6357cb60e2251b6c9ebb84c","ref":"refs/heads/main","pushedAt":"2024-04-24T00:48:43.000Z","pushType":"push","commitsCount":309,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"[Profiler] Update third_party/kineto submodule hash (#124737)\n\nSummary:\nInclude improvements such as:\n- AMD: roctracer crash fix and roctracer external correlations\n- NCCL metadata: process group id to process group name\n- Complete nanosecond transition for Kineto\n- Remove PrivateUse1Type function causing gpu track to be above cpu tracks\n- Use relative time and fix gpu user annotation causing events to overlap\n\nTest Plan: CI and Github CI (full suite)\n\nReviewed By: sraikund16\n\nDifferential Revision: D56475055\n\nPulled By: aaronenyeshi\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/124737\nApproved by: https://github.com/davidberard98, https://github.com/malfet","shortMessageHtmlLink":"[Profiler] Update third_party/kineto submodule hash (pytorch#124737)"}},{"before":"9dfeec9cdc246a6a003dff4b6ba0a5ceb60613f1","after":"236b0d12fa68c62d6af5a9110a23eca4f0a7c4df","ref":"refs/heads/main","pushedAt":"2024-04-17T03:42:28.000Z","pushType":"push","commitsCount":167,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"Don't clamp slices generated from cat kernel (#124139)\n\nFixes https://github.com/pytorch/pytorch/issues/123793\n\nSigned-off-by: Edward Z. Yang \n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/124139\nApproved by: https://github.com/Microve, https://github.com/peterbell10, https://github.com/Skylion007","shortMessageHtmlLink":"Don't clamp slices generated from cat kernel (pytorch#124139)"}},{"before":"4656ea5768b88a98d8827349d3e789bb7e2c3b4b","after":"9dfeec9cdc246a6a003dff4b6ba0a5ceb60613f1","ref":"refs/heads/main","pushedAt":"2024-04-12T09:45:19.000Z","pushType":"push","commitsCount":176,"pusher":{"login":"Xia-Weiwen","name":"Xia Weiwen","path":"/Xia-Weiwen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/12522207?s=80&v=4"},"commit":{"message":"Add a mode to avoid clone() in DDPSink (#122927)\n\nDDPSink clones the outputs of DDP to avoid in-place modification of loss (see https://github.com/pytorch/pytorch/issues/61982). However, when outputs are really large (2-3GB) this adds a lot of overhead for peak memory.\n\nAs a result, adding a mode to avoid this clone in cases where users are not modifying loss in-place.\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/122927\nApproved by: https://github.com/fegin, https://github.com/rohan-varma","shortMessageHtmlLink":"Add a mode to avoid clone() in DDPSink (pytorch#122927)"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEWwwsDwA","startCursor":null,"endCursor":null}},"title":"Activity · Xia-Weiwen/pytorch"}