{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":323614969,"defaultBranch":"master","name":"pytorch","ownerLogin":"SamuelMarks","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2020-12-22T12:07:41.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/807580?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1697489638.0","currentOid":""},"activityList":{"items":[{"before":"6f06832219e1059ea794c9e1a57a81ed9448a7a1","after":"3eb5cae3af1207ac58f77c5ac78669e276824cb9","ref":"refs/heads/main","pushedAt":"2023-10-18T15:52:48.000Z","pushType":"push","commitsCount":54,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"Revert \"[Compiled Autograd] Turn accumulate_grad into an op (#111271)\"\n\nThis reverts commit 04b04c068659127a53d659c44b0dd75fa9fd5887.\n\nReverted https://github.com/pytorch/pytorch/pull/111271 on behalf of https://github.com/jeanschmidt due to Breaking internal CI ([comment](https://github.com/pytorch/pytorch/pull/111271#issuecomment-1768527932))","shortMessageHtmlLink":"Revert \"[Compiled Autograd] Turn accumulate_grad into an op (pytorch#…"}},{"before":"2407d6a671c6173a5ff6e813f948f9a6dedf00bb","after":"e2ad35223d36d67abf44f1f8c8a63861b5409c13","ref":"refs/heads/vector-reserves-and-types","pushedAt":"2023-10-16T22:43:39.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"Utilise `std::unordered_set::reserve` to optimise allocations throughout codebase ; prefer `constexpr unsigned short` over magic numbers","shortMessageHtmlLink":"Utilise std::unordered_set<Key,Hash,KeyEqual,Allocator>::reserve to…"}},{"before":null,"after":"2407d6a671c6173a5ff6e813f948f9a6dedf00bb","ref":"refs/heads/vector-reserves-and-types","pushedAt":"2023-10-16T20:53:58.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"Utilise `std::vector::reserve` to optimise allocations throughout codebase ; prefer `unsigned short` over `size_t` for small numbers ; prefer `constexpr` over `const` for consistency and as microoptimization","shortMessageHtmlLink":"Utilise std::vector<T,Allocator>::reserve to optimise allocations t…"}},{"before":"97a513ed077323550b808e690a0b5a0452f87334","after":"6f06832219e1059ea794c9e1a57a81ed9448a7a1","ref":"refs/heads/main","pushedAt":"2023-10-16T20:52:57.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"Fixed typo in activation.py (#111358)\n\nliner -> linear\nPull Request resolved: https://github.com/pytorch/pytorch/pull/111358\nApproved by: https://github.com/mikaylagawarecki","shortMessageHtmlLink":"Fixed typo in activation.py (pytorch#111358)"}},{"before":"7f3da072903d540a04291b8043cf260bd10790e9","after":"32db0e5ef47248b25ab177b7e7e8152506d55f25","ref":"refs/heads/reserve-types","pushedAt":"2023-10-16T20:51:17.000Z","pushType":"push","commitsCount":2053,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"Merge branch 'main' into reserve-types","shortMessageHtmlLink":"Merge branch 'main' into reserve-types"}},{"before":"3c91b631da08d5c79c714b11ce1d68cb3f71f32b","after":"7f3da072903d540a04291b8043cf260bd10790e9","ref":"refs/heads/reserve-types","pushedAt":"2023-10-16T20:50:22.000Z","pushType":"push","commitsCount":2053,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"Utilise `std::vector::reserve` to optimise allocations throughout codebase ; prefer `unsigned short` over `size_t` for small numbers ; prefer `constexpr` over `const` for consistency and as microoptimization","shortMessageHtmlLink":"Utilise std::vector<T,Allocator>::reserve to optimise allocations t…"}},{"before":"71632d4d24616ddad6685814aae4ae54c981c0d2","after":"97a513ed077323550b808e690a0b5a0452f87334","ref":"refs/heads/main","pushedAt":"2023-10-16T19:59:22.000Z","pushType":"push","commitsCount":2052,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"Revert \"Add `lazy_clone_storage` to create COW storages (#110192)\"\n\nThis reverts commit 1c308144177d6e1663e41aae32a89e1c49b8b3b4.\n\nReverted https://github.com/pytorch/pytorch/pull/110192 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, @ezyang please support the author providing further details ([comment](https://github.com/pytorch/pytorch/pull/110192#issuecomment-1765157285))","shortMessageHtmlLink":"Revert \"Add lazy_clone_storage to create COW storages (pytorch#110192…"}},{"before":"920b07e13e5d0b9393137d1bea8b249819873d0f","after":"3c91b631da08d5c79c714b11ce1d68cb3f71f32b","ref":"refs/heads/reserve-types","pushedAt":"2023-08-20T15:36:23.000Z","pushType":"push","commitsCount":377,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"Merge branch 'reserve-types' of https://github.com/SamuelMarks/pytorch into reserve-types","shortMessageHtmlLink":"Merge branch 'reserve-types' of https://github.com/SamuelMarks/pytorch …"}},{"before":"aa1b2f16c5f143f2d24098b182565b7bd63e5613","after":"71632d4d24616ddad6685814aae4ae54c981c0d2","ref":"refs/heads/main","pushedAt":"2023-08-20T15:35:04.000Z","pushType":"push","commitsCount":375,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"[cpu] add sdpa choice and UT (#105131)\n\nFeature RFC: https://github.com/pytorch/rfcs/pull/56.\n\nWrite an SDPA selecting function for CPU to automatically choose one SDPA implementation among several ones. There are two CPU implementations which could be chosen: the unfused SDPA and flash attention. In general, flash attention has a higher priority than the unfused SDPA. For cases where flash attention is not applicable, such as manually disabling flash attention or the inputs not 4 dimensional, the unfused SDPA is chosen.\n\n## Performance of the stack\n\n### NanoGPT's SDPA kernel\nUsing benchmark [repo](https://github.com/mingfeima/bench_sdpa/blob/main/README.md), with one socket.\nShape: Batch size 1, Sequence length 1024, Head number 25, Head size 64.\nMachine: SPR.\n\n| Dtype | Causal | Mode | SDPA | Time (ms per iter) | Speedup |\n| -------- | -------- | ------- | ------- | ------- | ------- |\n| float32 | FALSE | Inference | Unfused | 3.081 | |\n| | | | Flash attention | 1.665 | **1.85045** |\n| float32 | TRUE | Inference | Unfused | 3.463 | |\n| | | | Flash attention | 1.662 | **2.083634**|\n| bfloat16 | FALSE | Inference | Unfused | 1.203 | |\n| | | | Flash attention | 1.154 | **1.042461**|\n| bfloat16 | TRUE | Inference | Unfused | 1.543 | |\n| | | | Flash attention | 1.154 | **1.337088**|\n| float32 | FALSE | Training | Unfused | 54.938 | |\n| | | | Flash attention | 23.029 | **2.385601**|\n| float32 | TRUE | Training | Unfused | 58.266 | |\n| | | | Flash attention | 17.835 | **3.266947**|\n| bfloat16 | FALSE | Training | Unfused | 18.924 | |\n| | | | Flash attention | 18.886 | **1.002012**|\n| bfloat16 | TRUE | Training | Unfused | 21.08 | |\n| | | | Flash attention | 14.172 | **1.48744** |\n\n### Stable Diffusion\nFollowing model's [BKM](https://github.com/intel-innersource/frameworks.ai.models.intel-models/blob/develop/quickstart/diffusion/pytorch/stable_diffusion/inference/cpu/README.md).\nMode: Inference; Machine: SPR.\n\n| Dtype | SDPA | Throughput (fps) | Speedup SDPA | Total Time (ms) | Speedup |\n| -------- | -------- | ------- | ------- | ------- | ------- |\n| float32 | Unfused | 1.63 | | 1139 | |\n| | Flash attention | 1.983 | 1.216564 | 547.488 | **2.080411**|\n| bfloat16 | Flash attention in IPEX | 4.784 | | 429.051 | |\n| | Flash attention | 4.857 | 1.015259 | 408.823 | **1.049479**|\n\n### LLM models of Torchbench\n\nDtype: float32; Mode: Inference, single socket; Machine: CPX.\nModel name | SDPA | Inductor_new | Inductor_old | Inductor Ratio(old/new)\n-- | -- | -- | -- | --\nhf_Albert | Unfused -> Flash attention | 0.048629309 | 0.05591545 | **1.14983024**\nhf_Bert | Unfused -> Flash attention | 0.053156243 | 0.060732115 | **1.142520841**\nhf_Bert_large | Unfused -> Flash attention | 0.141089502 | 0.155190077 | **1.099940636**\nllama | Unfused -> Flash attention | 0.033250106 | 0.033720745 | **1.01415451**\n\nDtype: bfloat16; Mode: Inference, single socket; Machine: SPR.\nModel name | SDPA | Inductor_new | Inductor_old | Inductor Ratio(old/new)\n-- | -- | -- | -- | --\nhf_Albert | Unfused -> Flash attention | 0.020681298 | 0.020718282 | **1.001788324**\nhf_Bert | Unfused -> Flash attention | 0.019932816 | 0.019935424 | **1.000130842**\nhf_Bert_large | Unfused -> Flash attention | 0.047949174 | 0.048312502 | **1.007577355**\nllama | Unfused -> Flash attention | 0.018528057 | 0.01861126 | **1.0044907**\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/105131\nApproved by: https://github.com/drisspg\nghstack dependencies: #104583, #104584, #103826, #104693, #104863, #107128","shortMessageHtmlLink":"[cpu] add sdpa choice and UT (pytorch#105131)"}},{"before":"648d8c8c6607e4f24460cb913434b0dd4f027f5d","after":"920b07e13e5d0b9393137d1bea8b249819873d0f","ref":"refs/heads/reserve-types","pushedAt":"2023-08-07T02:14:49.000Z","pushType":"push","commitsCount":1943,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"[c10/cuda/CUDACachingAllocator.cpp] Remove typo","shortMessageHtmlLink":"[c10/cuda/CUDACachingAllocator.cpp] Remove typo"}},{"before":"1833009202ee5b8e4e25affc325dec87cad87d36","after":"aa1b2f16c5f143f2d24098b182565b7bd63e5613","ref":"refs/heads/master","pushedAt":"2023-08-07T02:11:08.000Z","pushType":"push","commitsCount":10000,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"fix `upsample_nearest` decompositions for `uint8` tensors (#106675)\n\nFixes #106674.\n\nThis PR aligns the implementation of `_compute_upsample_nearest_indices` with `UpSampleKernel.cpp`: https://github.com/pytorch/pytorch/blob/68cb854d73458a14684d584c25c22b17eb79dfca/aten/src/ATen/native/cpu/UpSampleKernel.cpp#L1388-L1393\nPull Request resolved: https://github.com/pytorch/pytorch/pull/106675\nApproved by: https://github.com/albanD","shortMessageHtmlLink":"fix upsample_nearest decompositions for uint8 tensors (pytorch#10…"}},{"before":"2296ee08fab40c63cd24a5f4e97639e89eb6ab2d","after":"aa1b2f16c5f143f2d24098b182565b7bd63e5613","ref":"refs/heads/main","pushedAt":"2023-08-07T02:10:37.000Z","pushType":"push","commitsCount":1941,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"fix `upsample_nearest` decompositions for `uint8` tensors (#106675)\n\nFixes #106674.\n\nThis PR aligns the implementation of `_compute_upsample_nearest_indices` with `UpSampleKernel.cpp`: https://github.com/pytorch/pytorch/blob/68cb854d73458a14684d584c25c22b17eb79dfca/aten/src/ATen/native/cpu/UpSampleKernel.cpp#L1388-L1393\nPull Request resolved: https://github.com/pytorch/pytorch/pull/106675\nApproved by: https://github.com/albanD","shortMessageHtmlLink":"fix upsample_nearest decompositions for uint8 tensors (pytorch#10…"}},{"before":"90041bdabf22f10791f1a0ef1a5350b3c31e2335","after":"648d8c8c6607e4f24460cb913434b0dd4f027f5d","ref":"refs/heads/reserve-types","pushedAt":"2023-06-06T19:06:26.143Z","pushType":"push","commitsCount":1,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"[c10/cuda/CUDACachingAllocator.cpp] `std::vector::size_type` -> `auto`","shortMessageHtmlLink":"[c10/cuda/CUDACachingAllocator.cpp] `std::vector<decltype(procs)>::si…"}},{"before":"ecdee257c0fffd5625e9da9389c9fe3b9e978219","after":"90041bdabf22f10791f1a0ef1a5350b3c31e2335","ref":"refs/heads/reserve-types","pushedAt":"2023-06-05T20:37:17.103Z","pushType":"push","commitsCount":1,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"[c10/cuda/CUDACachingAllocator.cpp] Use `std::vector::size_type` for vector size type","shortMessageHtmlLink":"[c10/cuda/CUDACachingAllocator.cpp] Use `std::vector<decltype(procs)>…"}},{"before":"8ce8fc7bb7be99909733cac0a6b574bf591297be","after":"ecdee257c0fffd5625e9da9389c9fe3b9e978219","ref":"refs/heads/reserve-types","pushedAt":"2023-06-02T23:23:30.627Z","pushType":"push","commitsCount":1,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"[c10/cuda/CUDACachingAllocator.cpp] Remove corrected type of `getpid()` until NVIDIA conforms/fixes their API https://github.com/NVIDIA/go-nvml/issues/63","shortMessageHtmlLink":"[c10/cuda/CUDACachingAllocator.cpp] Remove corrected type of `getpid(…"}},{"before":"6f3e0b53ab5be0e0dce3e9a3e5a28f86b0ce0dc7","after":"8ce8fc7bb7be99909733cac0a6b574bf591297be","ref":"refs/heads/reserve-types","pushedAt":"2023-06-02T19:28:27.522Z","pushType":"push","commitsCount":1,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"[c10/cuda/CUDACachingAllocator.cpp] `std::vector::size_type` -> `auto`\n\nCo-authored-by: Aaron Gokaslan ","shortMessageHtmlLink":"[c10/cuda/CUDACachingAllocator.cpp] `std::vector<decltype(nvmlProcess…"}},{"before":null,"after":"2296ee08fab40c63cd24a5f4e97639e89eb6ab2d","ref":"refs/heads/main","pushedAt":"2023-06-02T19:03:38.123Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"[PT2][Quant][BE] Test refactor to be organize them better (#102704)\n\nCollected most of the test modules under TestHelperModules. This allows reuse\nof modules when possible. Probably we can refactor a bit more but left some qat\nrelated helper modules in their respective tests\n\nDifferential Revision: [D46267687](https://our.internmc.facebook.com/intern/diff/D46267687/)\nPull Request resolved: https://github.com/pytorch/pytorch/pull/102704\nApproved by: https://github.com/andrewor14","shortMessageHtmlLink":"[PT2][Quant][BE] Test refactor to be organize them better (pytorch#10…"}},{"before":null,"after":"6f3e0b53ab5be0e0dce3e9a3e5a28f86b0ce0dc7","ref":"refs/heads/reserve-types","pushedAt":"2023-06-02T19:00:08.565Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"SamuelMarks","name":"Samuel Marks","path":"/SamuelMarks","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/807580?s=80&v=4"},"commit":{"message":"[aten/src/ATen/ExpandUtils.h] `c10::SmallVector::reserve` micro-optimization ; [c10/cuda/CUDACachingAllocator.cpp] `std::vector::size_type` use over `unsigned int` ; [ios/TestApp/{TestApp/Benchmark.mm,TestAppTests/TestLiteInterpreter.mm}] `std::vector::reserve` micro-optimizations","shortMessageHtmlLink":"[aten/src/ATen/ExpandUtils.h] c10::SmallVector<T>::reserve micro-op…"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAADmiW1GwA","startCursor":null,"endCursor":null}},"title":"Activity · SamuelMarks/pytorch"}