rocRAND integration #2

iotamudelta · 2018-06-11T19:14:43Z

This removes dependencies on the legacy hcRNG (hiprng API) and replaces them with the supported rocRAND library (hiprand API). This requires a newer pyHIPIFY as in PR #2 to work.

* remove unnecessary cudaGetDevices * make curDevice argument non-optional, add explicit checks to current_device

When compiling OSX with CUDA, Caffe2's build system uses find_package(cuda) to get its grubby hands on the CUDA driver library (for some strange reason, FindCUDA doesn't save this information as a variable). Unfortunately, on OSX, sometimes this picks up the cuda.framework folder, and then our build system chokes to death because it doesn't try to link against this as a framework. (Is the folder even a framework? I have no idea). This commit attempts to fix this in a two pronged fashion: 1. For some users, reducing the precedence of frameworks using CMAKE_FIND_FRAMEWORK seems to help. So we set these variables. However, this fix is not perfect; on my laptop it doesn't actually solve the problem. 2. PyTorch doesn't actually need the CUDA driver API. So we only add the dep when building Caffe2. Fixes pytorch#8022 Signed-off-by: Edward Z. Yang <ezyang@fb.com>

…ch#7823) * Improve OrderedDict for C++ API * Give OrderedDict a subject and fix review comments * Fix OrderedDict use in torch/csrc/jit/script/init.cpp

@zou3519

* Fix __rshift__ bug * Add small tests for __lshift__ and __rshift__ in test_cuda * Add a more elaborate check for __lshift__ and __rshift__ * refactor the test to address @zou3519 's comments

…+. (pytorch#8164) For non-generic function call implementations in Storage used by TensorUtils, we do the following: 1) Move the declaration from generic/C to non-generic/C++; we don't need backwards compatibility on these functions and want to use e.g. at::ScalarType. 2) Move the implementation from generic/C++ to non-generic/C++. 3) Change the generic implementation to call the non-generic implementation. This will allow us to get rid of the corresponding TensorUtils calls (once we move over the Tensor functions in the same manner).

* Pinning opencv to 3.1.0 in conda builds * Also pinning numpy to 1.11 * Trying only specifying <3.4

Use the actual path for the file instead of the current working directory, which depends on where the script is invoked.

…iles with the same name results in a linking error in the HCC compiler used for ROCm/AMD.

…g up relevant hip paths.

…es while building.

…s. Fixing up the build process.

…or now.

…2156) Hi! I've been fuzzing different pytorch modules with with [sydr-fuzz](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch), and found a multiple crashes in torch::jit::load() function. All found errors could be reproduced with provided docker: [Dockerfile](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch). ### Crash in torch/csrc/jit/unpickler.cpp:1075 [crash-1f59083b8396c5b62b4705c7556e68f129e833b1.zip](https://github.com/pytorch/pytorch/files/11552947/crash-1f59083b8396c5b62b4705c7556e68f129e833b1.zip) ```asan "#0 0x00007ffff7a5600b in raise () from /lib/x86_64-linux-gnu/libc.so.6", "ROCm#1 0x00007ffff7a35859 in abort () from /lib/x86_64-linux-gnu/libc.so.6", "ROCm#2 0x00007ffff7ce3911 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6", "ROCm#3 0x00007ffff7cef38c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6", "ROCm#4 0x00007ffff7cef3f7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6", "ROCm#5 0x00007ffff7cef6a9 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6", "ROCm#6 0x00007ffff7ce6326 in std::__throw_length_error(char const*) () from /lib/x86_64-linux-gnu/libstdc++.so.6", "ROCm#7 0x00007ffff7d87edc in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long) () from /lib/x86_64-linux-gnu/libstdc++.so.6", "ROCm#8 0x00007ffff7d88880 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::reserve(unsigned long) () from /lib/x86_64-linux-gnu/libstdc++.so.6", "ROCm#9 0x000000000ea52931 in torch::jit::Unpickler::readBytes[abi:cxx11](unsigned long) (this=this@entry=0x7fffffffac10, length=length@entry=8358680908539635837) at /pytorch/torch/csrc/jit/serialization/unpickler.cpp:1075", "ROCm#10 0x000000000ea4c3a0 in torch::jit::Unpickler::readInstruction (this=0x7fffffff90d0) at /pytorch/torch/csrc/jit/serialization/unpickler.cpp:355", "ROCm#11 0x000000000ea49eb8 in torch::jit::Unpickler::run (this=0x7fffffffac10) at /pytorch/torch/csrc/jit/serialization/unpickler.cpp:251", "ROCm#12 0x000000000ea49b12 in torch::jit::Unpickler::parse_ivalue (this=0x7fffffffac10) at /pytorch/torch/csrc/jit/serialization/unpickler.cpp:204", "ROCm#13 0x000000000e960a9f in torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), std::shared_ptr<torch::jit::DeserializationStorageContext>) (archive_name=..., pickle_prefix=..., tensor_prefix=..., type_resolver=..., obj_loader=..., device=..., stream_reader=..., type_parser=<optimized out>, storage_context=...) at /pytorch/torch/csrc/jit/serialization/import_read.cpp:53", "ROCm#14 0x000000000e8ef599 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive (this=0x7fffffffbc60, archive_name=...) at /pytorch/torch/csrc/jit/serialization/import.cpp:184", "ROCm#15 0x000000000e8eb886 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::deserialize (this=<optimized out>, device=..., extra_files=..., restore_shapes=<optimized out>) at /pytorch/torch/csrc/jit/serialization/import.cpp:287", "ROCm#16 0x000000000e8e9cc5 in torch::jit::import_ir_module (cu=..., in=..., device=..., extra_files=..., load_debug_files=<optimized out>, restore_shapes=<optimized out>) at /pytorch/torch/csrc/jit/serialization/import.cpp:386", "ROCm#17 0x000000000e8f37bf in torch::jit::import_ir_module (cu=..., in=..., device=..., load_debug_files=<optimized out>) at /pytorch/torch/csrc/jit/serialization/import.cpp:322", "ROCm#18 0x000000000e8f615a in torch::jit::load (in=..., device=..., load_debug_files=<optimized out>) at /pytorch/torch/csrc/jit/serialization/import.cpp:482", "ROCm#19 0x00000000005c2d61 in LLVMFuzzerTestOneInput (data=<optimized out>, size=1663) at /load.cc:42", "ROCm#20 0x00000000005c2a8e in ExecuteFilesOnyByOne (argc=2, argv=0x7fffffffc6b8, callback=callback@entry=0x5c2ae0 <LLVMFuzzerTestOneInput(uint8_t const*, size_t)>) at /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:255", "ROCm#21 0x00000000005c2899 in LLVMFuzzerRunDriver (argcp=argcp@entry=0x7fffffffc5b4, argvp=argvp@entry=0x7fffffffc5b8, callback=0x5c2ae0 <LLVMFuzzerTestOneInput(uint8_t const*, size_t)>) at /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:364", "ROCm#22 0x00000000005c2459 in main (argc=2, argv=0x7fffffffc6b8) at /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:300" ``` ### Crash in torch/csrc/jit/unpickler.cpp:386 [crash-2e9923de375c393e700e8c0441f0ebe8252ca364.zip](https://github.com/pytorch/pytorch/files/11552950/crash-2e9923de375c393e700e8c0441f0ebe8252ca364.zip) ```asan "#0 0x00007ffff7a5600b in raise () from /lib/x86_64-linux-gnu/libc.so.6", "ROCm#1 0x00007ffff7a35859 in abort () from /lib/x86_64-linux-gnu/libc.so.6", "ROCm#2 0x00007ffff7ce3911 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6", "ROCm#3 0x00007ffff7cef38c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6", "ROCm#4 0x00007ffff7cef3f7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6", "ROCm#5 0x00007ffff7cef6a9 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6", "ROCm#6 0x00007ffff7ce6326 in std::__throw_length_error(char const*) () from /lib/x86_64-linux-gnu/libstdc++.so.6", "ROCm#7 0x0000000000670aff in std::vector<c10::IValue, std::allocator<c10::IValue> >::reserve (this=this@entry=0x7fffffff9750, __n=__n@entry=18446744073709551614) at /usr/include/c++/10/bits/vector.tcc:70", "ROCm#8 0x000000000ea4d5cd in torch::jit::Unpickler::readInstruction (this=0x7fffffffac10) at /pytorch/torch/csrc/jit/serialization/unpickler.cpp:386", "ROCm#9 0x000000000ea49eb8 in torch::jit::Unpickler::run (this=0x7fffffffac10) at /pytorch/torch/csrc/jit/serialization/unpickler.cpp:251", "ROCm#10 0x000000000ea49b12 in torch::jit::Unpickler::parse_ivalue (this=0x7fffffffac10) at /pytorch/torch/csrc/jit/serialization/unpickler.cpp:204", "ROCm#11 0x000000000e960a9f in torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), std::shared_ptr<torch::jit::DeserializationStorageContext>) (archive_name=..., pickle_prefix=..., tensor_prefix=..., type_resolver=..., obj_loader=..., device=..., stream_reader=..., type_parser=<optimized out>, storage_context=...) at /pytorch/torch/csrc/jit/serialization/import_read.cpp:53", "ROCm#12 0x000000000e8ef599 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive (this=0x7fffffffbc60, archive_name=...) at /pytorch/torch/csrc/jit/serialization/import.cpp:184", "ROCm#13 0x000000000e8eb886 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::deserialize (this=<optimized out>, device=..., extra_files=..., restore_shapes=<optimized out>) at /pytorch/torch/csrc/jit/serialization/import.cpp:287", "ROCm#14 0x000000000e8e9cc5 in torch::jit::import_ir_module (cu=..., in=..., device=..., extra_files=..., load_debug_files=<optimized out>, restore_shapes=<optimized out>) at /pytorch/torch/csrc/jit/serialization/import.cpp:386", "ROCm#15 0x000000000e8f37bf in torch::jit::import_ir_module (cu=..., in=..., device=..., load_debug_files=<optimized out>) at /pytorch/torch/csrc/jit/serialization/import.cpp:322", "ROCm#16 0x000000000e8f615a in torch::jit::load (in=..., device=..., load_debug_files=<optimized out>) at /pytorch/torch/csrc/jit/serialization/import.cpp:482", "ROCm#17 0x00000000005c2d61 in LLVMFuzzerTestOneInput (data=<optimized out>, size=5498) at /load.cc:42", "ROCm#18 0x00000000005c2a8e in ExecuteFilesOnyByOne (argc=2, argv=0x7fffffffc6b8, callback=callback@entry=0x5c2ae0 <LLVMFuzzerTestOneInput(uint8_t const*, size_t)>) at /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:255", "ROCm#19 0x00000000005c2899 in LLVMFuzzerRunDriver (argcp=argcp@entry=0x7fffffffc5b4, argvp=argvp@entry=0x7fffffffc5b8, callback=0x5c2ae0 <LLVMFuzzerTestOneInput(uint8_t const*, size_t)>) at /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:364", "ROCm#20 0x00000000005c2459 in main (argc=2, argv=0x7fffffffc6b8) at /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:300" ``` ### Crash in torch/csrc/jit/serialization/source_range_serialization.cpp:211 [crash-5598d386057152f606bfa69d85605499e8852625.zip](https://github.com/pytorch/pytorch/files/11552952/crash-5598d386057152f606bfa69d85605499e8852625.zip) ```asan "#0 torch::jit::ConcreteSourceRangeUnpickler::unpickle (this=0x99b8d80) at /pytorch/torch/csrc/jit/serialization/source_range_serialization.cpp:211", "ROCm#1 0x0000000004042566 in torch::jit::ConcreteSourceRangeUnpickler::findSourceRangeThatGenerated (this=0x99aa1c0, range=...) at /pytorch/torch/csrc/jit/serialization/source_range_serialization.cpp:229", "ROCm#2 0x00000000007b5cc8 in torch::jit::Source::findSourceRangeThatGenerated (this=<optimized out>, range=...) at /pytorch/torch/csrc/jit/frontend/source_range.cpp:144", "ROCm#3 torch::jit::SourceRange::findSourceRangeThatGenerated (this=0x7fffffffa650) at /pytorch/torch/csrc/jit/frontend/source_range.h:384", "ROCm#4 torch::jit::SourceRange::highlight (this=0x7fffffffa650, out=...) at /pytorch/torch/csrc/jit/frontend/source_range.cpp:149", "ROCm#5 0x00000000007a0e74 in torch::jit::Lexer::expected (this=this@entry=0x99979a0, what=..., t=...) at /pytorch/torch/csrc/jit/frontend/lexer.h:461", "ROCm#6 0x000000000079fcaa in torch::jit::Lexer::lexRaw (this=this@entry=0x99979a0, whitespace_token=false) at /pytorch/torch/csrc/jit/frontend/lexer.h:552", "ROCm#7 0x000000000079fd23 in torch::jit::Lexer::lex (this=this@entry=0x99979a0) at /pytorch/torch/csrc/jit/frontend/lexer.h:487", "ROCm#8 0x00000000007a1da1 in torch::jit::Lexer::next (this=this@entry=0x99979a0) at /pytorch/torch/csrc/jit/frontend/lexer.h:436", "ROCm#9 0x0000000003bff6a8 in torch::jit::Lexer::nextIf (this=0x99979a0, kind=330) at /pytorch/torch/csrc/jit/frontend/lexer.h:444", "ROCm#10 torch::jit::ParserImpl::parseReturnAnnotation (this=this@entry=0x99979a0) at /pytorch/torch/csrc/jit/frontend/parser.cpp:703", "ROCm#11 0x0000000003bfd500 in torch::jit::ParserImpl::parseDecl (this=this@entry=0x99979a0) at /pytorch/torch/csrc/jit/frontend/parser.cpp:729", "ROCm#12 0x0000000003bfb725 in torch::jit::ParserImpl::parseFunction (this=this@entry=0x99979a0, is_method=true) at /pytorch/torch/csrc/jit/frontend/parser.cpp:755", "ROCm#13 0x0000000003bfdc28 in torch::jit::ParserImpl::parseStmt (this=this@entry=0x99979a0, in_class=<optimized out>) at /pytorch/torch/csrc/jit/frontend/parser.cpp:599", "ROCm#14 0x0000000003bfd8dd in torch::jit::ParserImpl::parseStatements (this=this@entry=0x99979a0, expect_indent=<optimized out>, in_class=<optimized out>) at /pytorch/torch/csrc/jit/frontend/parser.cpp:697", "ROCm#15 0x0000000003bfc4ba in torch::jit::ParserImpl::parseClass (this=0x99979a0) at /pytorch/torch/csrc/jit/frontend/parser.cpp:747", "ROCm#16 0x0000000003bfaddc in torch::jit::Parser::parseClass (this=<optimized out>) at /pytorch/torch/csrc/jit/frontend/parser.cpp:812", "ROCm#17 0x0000000004008e2d in torch::jit::SourceImporterImpl::parseSourceIfNeeded (this=this@entry=0x95d41f0, qualifier=...) at /pytorch/torch/csrc/jit/serialization/import_source.cpp:182", "ROCm#18 0x0000000004008ab7 in torch::jit::SourceImporterImpl::findNamedType (this=this@entry=0x95d41f0, name=...) at /pytorch/torch/csrc/jit/serialization/import_source.cpp:135", "ROCm#19 0x000000000400d010 in torch::jit::SourceImporterImpl::resolveType (this=0x95d41f0, name=..., loc=...) at /pytorch/torch/csrc/jit/serialization/import_source.cpp:261", "ROCm#20 0x0000000003c20821 in torch::jit::ScriptTypeParser::parseTypeFromExpr (this=this@entry=0x7fffffffb658, expr=...) at /pytorch/torch/csrc/jit/frontend/script_type_parser.cpp:238", "ROCm#21 0x0000000003c20acc in torch::jit::ScriptTypeParser::parseType (this=0x7fffffffb658, str=...) at /pytorch/torch/csrc/jit/frontend/script_type_parser.cpp:312", "ROCm#22 0x0000000004019416 in torch::jit::SourceImporter::loadType (this=<optimized out>, name=...) at /pytorch/torch/csrc/jit/serialization/import_source.cpp:786", "ROCm#23 0x0000000003ff365e in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0::operator()(c10::QualifiedName const&) const (this=<optimized out>, qn=...) at /pytorch/torch/csrc/jit/serialization/import.cpp:146", "ROCm#24 std::__invoke_impl<c10::StrongTypePtr, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&>(std::__invoke_other, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&) (__f=..., __args=...) at /usr/include/c++/10/bits/invoke.h:60", "ROCm#25 std::__invoke_r<c10::StrongTypePtr, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&>(torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&) (__fn=..., __args=...) at /usr/include/c++/10/bits/invoke.h:113", "ROCm#26 std::_Function_handler<c10::StrongTypePtr (c10::QualifiedName const&), torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0>::_M_invoke(std::_Any_data const&, c10::QualifiedName const&) (__functor=..., __args=...) at /usr/include/c++/10/bits/std_function.h:291", "ROCm#27 0x000000000404e5c4 in std::function<c10::StrongTypePtr (c10::QualifiedName const&)>::operator()(c10::QualifiedName const&) const (this=0x7fffffffbf28, __args=...) at /usr/include/c++/10/bits/std_function.h:622", "ROCm#28 torch::jit::Unpickler::readGlobal (this=this@entry=0x7fffffffbd50, module_name=..., class_name=...) at /pytorch/torch/csrc/jit/serialization/unpickler.cpp:820", "ROCm#29 0x0000000004049ce5 in torch::jit::Unpickler::readInstruction (this=this@entry=0x7fffffffbd50) at /pytorch/torch/csrc/jit/serialization/unpickler.cpp:496", "ROCm#30 0x00000000040497a8 in torch::jit::Unpickler::run (this=0x7fffffffbd50) at /pytorch/torch/csrc/jit/serialization/unpickler.cpp:251", "ROCm#31 0x00000000040494f9 in torch::jit::Unpickler::parse_ivalue (this=0x99aa1c0) at /pytorch/torch/csrc/jit/serialization/unpickler.cpp:204", "ROCm#32 0x00000000040075f8 in torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), std::shared_ptr<torch::jit::DeserializationStorageContext>) (archive_name=..., pickle_prefix=..., tensor_prefix=..., type_resolver=..., obj_loader=..., device=..., stream_reader=..., type_parser=0x0, storage_context=...) at /pytorch/torch/csrc/jit/serialization/import_read.cpp:53", "ROCm#33 0x0000000003ff3545 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive (this=this@entry=0x7fffffffc2b8, archive_name=...) at /pytorch/torch/csrc/jit/serialization/import.cpp:184", "ROCm#34 0x0000000003fed8bf in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::deserialize (this=this@entry=0x7fffffffc2b8, device=device@entry=..., extra_files=..., restore_shapes=220) at /pytorch/torch/csrc/jit/serialization/import.cpp:287", "ROCm#35 0x0000000003febb0f in torch::jit::import_ir_module (cu=..., in=..., device=..., device@entry=..., extra_files=..., load_debug_files=true, restore_shapes=<optimized out>) at /pytorch/torch/csrc/jit/serialization/import.cpp:386", "ROCm#36 0x0000000003feb7a1 in torch::jit::import_ir_module (cu=..., in=..., device=..., device@entry=..., load_debug_files=false) at /pytorch/torch/csrc/jit/serialization/import.cpp:322", "ROCm#37 0x0000000003ff015a in torch::jit::load (in=..., device=device@entry=..., load_debug_files=true) at /pytorch/torch/csrc/jit/serialization/import.cpp:482", "ROCm#38 0x00000000004a1655 in LLVMFuzzerTestOneInput (data=0x981a680 \"PK\\003\\004\", size=1609) at /load.cc:42", "ROCm#39 0x00000000004a1dbf in main ()" ``` ### Segmentation fault in /pytorch/aten/src/ATen/core/ivalue.h:526 [crash-9bd059c1ae85ab9cdb41d786932214d942baa189.zip](https://github.com/pytorch/pytorch/files/11552956/crash-9bd059c1ae85ab9cdb41d786932214d942baa189.zip) ```asan "==8528==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x00000e55d97e bp 0x7fffffffb4d0 sp 0x7fffffffb360 T0)", "==8528==The signal is caused by a READ memory access.", "==8528==Hint: this fault was caused by a dereference of a high value address (see register values below). Disassemble the provided pc to learn which register was used.", " #0 0xe55d97e in c10::IValue::isTuple() const /pytorch/aten/src/ATen/core/ivalue.h:526:26", " ROCm#1 0xe55d97e in torch::distributed::rpc::GloballyUniqueId::fromIValue(c10::IValue const&) /pytorch/torch/csrc/distributed/rpc/types.cpp:60:3", " ROCm#2 0xe4b04fb in torch::distributed::rpc::ScriptRemoteCall::fromIValues(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch/torch/csrc/distributed/rpc/script_remote_call.cpp:33:20", " ROCm#3 0xe4b1ed5 in torch::distributed::rpc::ScriptRemoteCall::fromMessage(torch::distributed::rpc::Message const&) /pytorch/torch/csrc/distributed/rpc/script_remote_call.cpp:80:10", " ROCm#4 0xe55f8a0 in torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) /pytorch/torch/csrc/distributed/rpc/utils.cpp:108:14", " ROCm#5 0x6120a8 in LLVMFuzzerTestOneInput /message_deserialize.cc:192:27", " ROCm#6 0x535de1 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15", " ROCm#7 0x51fcec in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6", " ROCm#8 0x525a3b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9", " ROCm#9 0x54eff2 in main /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10", " ROCm#10 0x7ffff7a37082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)", " ROCm#11 0x51a60d in _start (/message_deserialize_fuzz+0x51a60d)", "", "AddressSanitizer can not provide additional info.", "SUMMARY: AddressSanitizer: SEGV /pytorch/aten/src/ATen/core/ivalue.h:526:26 in c10::IValue::isTuple() const", "==8528==ABORTING" ``` Pull Request resolved: pytorch#102156 Approved by: https://github.com/ezyang

Pass size argument. <details> <summary>ASAN report</summary> ``` ==1640574==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x609000022160 at pc 0x03ff31a04b42 bp 0x03ff69885dc0 sp 0x03ff69885db0 READ of size 16 at 0x609000022160 thread T1 #0 0x3ff31a04b41 in at::vec::ZVECTOR::Vectorized<unsigned char, void>::loadu(void const*, int) /home/user/pytorch/aten/src/ATen/cpu/vec/vec256/zarch/vec256_zarch.h:397 ROCm#1 0x3ff31a04b41 in at::vec::ZVECTOR::Vectorized<c10::quint8, void>::loadu(void const*, int) /home/user/pytorch/aten/src/ATen/cpu/vec/vec256/zarch/vec256_zarch.h:1574 ROCm#2 0x3ff31a04b41 in operator() /home/user/pytorch/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp:2668 ROCm#3 0x3ff31cefa5d in void at::internal::invoke_parallel<at::native::(anonymous namespace)::quantized_normalize_kernel(at::Tensor const&, at::Tensor const&, at::Tensor const&, bool, int, int, long, long , double, at::Tensor*)::{lambda()ROCm#1}::operator()() const::{lambda()ROCm#2}::operator()() const::{lambda(long, long)ROCm#1}>(long, long, long, at::native::(anonymous namespace)::quantized_normalize_kernel(at::Tens or const&, at::Tensor const&, at::Tensor const&, bool, int, int, long, long, double, at::Tensor*)::{lambda()ROCm#1}::operator()() const::{lambda()ROCm#2}::operator()() const::{lambda(long, long)ROCm#1} const&) [clone ._omp_fn.0] /home/user/pytorch/aten/src/ATen/ParallelOpenMP.h:42 ROCm#4 0x3ff6f31f52d in gomp_thread_start /var/tmp/portage/sys-devel/gcc-12.2.1_p20230304/work/gcc-12-20230304/libgomp/team.c:129 ROCm#5 0x3ff82218381 in start_thread /usr/src/debug/sys-libs/glibc-2.37-r1/glibc-2.37/nptl/pthread_create.c:444 ROCm#6 0x3ff822943f1 (/lib64/libc.so.6+0x1143f1) 0x609000022160 is located 0 bytes to the right of 32-byte region [0x609000022140,0x609000022160) allocated by thread T0 here: #0 0x3ff82a3663f in __interceptor_posix_memalign /usr/src/debug/sys-devel/gcc-11.3.1_p20230303/gcc-11-20230303/libsanitizer/asan/asan_malloc_linux.cpp:226 ROCm#1 0x3ff6f53ad95 in c10::alloc_cpu(unsigned long) /home/user/pytorch/c10/core/impl/alloc_cpu.cpp:74 Thread T1 created by T0 here: #0 0x3ff829dc263 in __interceptor_pthread_create /usr/src/debug/sys-devel/gcc-11.3.1_p20230303/gcc-11-20230303/libsanitizer/asan/asan_interceptors.cpp:216 ROCm#1 0x3ff6f31fad5 in gomp_team_start /var/tmp/portage/sys-devel/gcc-12.2.1_p20230304/work/gcc-12-20230304/libgomp/team.c:858 SUMMARY: AddressSanitizer: heap-buffer-overflow /home/user/pytorch/aten/src/ATen/cpu/vec/vec256/zarch/vec256_zarch.h:397 in at::vec::ZVECTOR::Vectorized<unsigned char, void>::loadu(void const*, int) Shadow bytes around the buggy address: 0x100c12000043d0: 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x100c12000043e0: fd fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x100c12000043f0: fd fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x100c1200004400: fd fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x100c1200004410: fa fa fa fa fa fa fa fa fd fa fa fa fa fa fa fa =>0x100c1200004420: fa fa fa fa fa fa fa fa 00 00 00 00[fa]fa fa fa 0x100c1200004430: fa fa fa fa fa fa fa fa fd fd fa fa fa fa fa fa 0x100c1200004440: fa fa fa fa fa fa fa fa fd fd fa fa fa fa fa fa 0x100c1200004450: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x100c1200004460: 00 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x100c1200004470: 00 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==1640574==ABORTING ``` </details> Pull Request resolved: pytorch#101970 Approved by: https://github.com/Skylion007, https://github.com/jgong5

Hi! I found heap-buffer-overflow during PyTorch RPC-module fuzzing. [crash-9cc26b8da3b688a9c26614481239943b357c5636.zip](https://github.com/pytorch/pytorch/files/11707706/crash-9cc26b8da3b688a9c26614481239943b357c5636.zip) ``` "==10634==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6060001b6a98 at pc 0x000000639a2e bp 0x7fffffff9100 sp 0x7fffffff90f8", "READ of size 4 at 0x6060001b6a98 thread T0", " #0 0x639a2d in c10::IValue::isTensor() const /pytorch/aten/src/ATen/core/ivalue.h:432:27", " ROCm#1 0x639a2d in c10::IValue::toTensor() && /pytorch/aten/src/ATen/core/ivalue_inl.h:159:7", " ROCm#2 0xc5eb105 in at::Tensor c10::IValue::to<at::Tensor>() && /pytorch/aten/src/ATen/core/ivalue_inl.h:1690:1", " ROCm#3 0xc5eb105 in void torch::jit::pop<at::Tensor>(std::vector<c10::IValue, std::allocator<c10::IValue> >&, at::Tensor&) /pytorch/aten/src/ATen/core/stack.h:130:55", " ROCm#4 0xc5eaedb in torch::jit::dtype(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch/torch/csrc/jit/mobile/promoted_prim_ops.cpp:105:3", " ROCm#5 0xcc79600 in torch::jit::InterpreterStateImpl::runImpl(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch/torch/csrc/jit/runtime/interpreter.cpp:682:13", " ROCm#6 0xcc4158b in torch::jit::InterpreterStateImpl::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch/torch/csrc/jit/runtime/interpreter.cpp:1052:9", " ROCm#7 0x60f378 in runGraph(std::shared_ptr<torch::jit::Graph>, std::vector<at::Tensor, std::allocator<at::Tensor> > const&) /jit_differential.cc:66:38", " ROCm#8 0x610bb9 in LLVMFuzzerTestOneInput /jit_differential.cc:107:25", " ROCm#9 0x535c91 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15", " ROCm#10 0x51fb9c in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6", " ROCm#11 0x5258eb in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9", " ROCm#12 0x54eea2 in main /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10", " ROCm#13 0x7ffff7a37082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)", " ROCm#14 0x51a4bd in _start (/jit_differential_fuzz+0x51a4bd)", "", "0x6060001b6a98 is located 8 bytes to the left of 64-byte region [0x6060001b6aa0,0x6060001b6ae0)", "allocated by thread T0 here:", " #0 0x60c66d in operator new(unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/asan/asan_new_delete.cpp:95:3", " ROCm#1 0xa5a41b in std::_Vector_base<c10::IValue, std::allocator<c10::IValue> >::_M_allocate(unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_vector.h:346:20", " ROCm#2 0xa5a41b in void std::vector<c10::IValue, std::allocator<c10::IValue> >::_M_realloc_insert<c10::IValue&>(__gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue, std::allocator<c10::IValue> > >, c10::IValue&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/vector.tcc:440:33", " ROCm#3 0xa5a241 in c10::IValue& std::vector<c10::IValue, std::allocator<c10::IValue> >::emplace_back<c10::IValue&>(c10::IValue&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/vector.tcc:121:4", " ROCm#4 0xcc8209c in torch::jit::InterpreterStateImpl::runImpl(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch/torch/csrc/jit/runtime/interpreter.cpp:345:19", " ROCm#5 0xcc4158b in torch::jit::InterpreterStateImpl::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch/torch/csrc/jit/runtime/interpreter.cpp:1052:9", " ROCm#6 0x60f378 in runGraph(std::shared_ptr<torch::jit::Graph>, std::vector<at::Tensor, std::allocator<at::Tensor> > const&) /jit_differential.cc:66:38", " ROCm#7 0x610bb9 in LLVMFuzzerTestOneInput /jit_differential.cc:107:25", " ROCm#8 0x535c91 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15", " ROCm#9 0x51fb9c in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6", " ROCm#10 0x5258eb in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9", " ROCm#11 0x54eea2 in main /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10", " ROCm#12 0x7ffff7a37082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)", "", "SUMMARY: AddressSanitizer: heap-buffer-overflow /pytorch/aten/src/ATen/core/ivalue.h:432:27 in c10::IValue::isTensor() const", "Shadow bytes around the buggy address:", " 0x0c0c8002ed00: 00 00 00 00 00 00 00 fa fa fa fa fa fd fd fd fd", " 0x0c0c8002ed10: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd", " 0x0c0c8002ed20: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa", " 0x0c0c8002ed30: fd fd fd fd fd fd fd fd fa fa fa fa 00 00 00 00", " 0x0c0c8002ed40: 00 00 00 00 fa fa fa fa fd fd fd fd fd fd fd fd", "=>0x0c0c8002ed50: fa fa fa[fa]00 00 00 00 00 00 00 00 fa fa fa fa", " 0x0c0c8002ed60: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa", " 0x0c0c8002ed70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa", " 0x0c0c8002ed80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa", " 0x0c0c8002ed90: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa", " 0x0c0c8002eda0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa", "Shadow byte legend (one shadow byte represents 8 application bytes):", " Addressable: 00", " Partially addressable: 01 02 03 04 05 06 07", " Heap left redzone: fa", " Freed heap region: fd", " Stack left redzone: f1", " Stack mid redzone: f2", " Stack right redzone: f3", " Stack after return: f5", " Stack use after scope: f8", " Global redzone: f9", " Global init order: f6", " Poisoned by user: f7", " Container overflow: fc", " Array cookie: ac", " Intra object redzone: bb", " ASan internal: fe", " Left alloca redzone: ca", " Right alloca redzone: cb", "==10634==ABORTING" ``` Pull Request resolved: pytorch#103327 Approved by: https://github.com/Skylion007

…kler (pytorch#103667) Hi! I've been fuzzing different pytorch modules with with [sydr-fuzz](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch), and found a heap buffer overflow error that occures by incorrect loop condition in torch::jit::unpickler.cpp. This bug was found in several fuzzing targets: it can be triggered by `torch::jit::load()` method when loading a .pt model and by `torch::distributed::rpc::deserializeRequest()` method in RPC module. All found errors could be reproduced with provided docker: [Dockerfile](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch). ### PoC for deserealizeRequest(): [crash-0722408578cd2f26593b5a01e26d2a078d3dc5f6.zip](https://github.com/pytorch/pytorch/files/11756694/crash-0722408578cd2f26593b5a01e26d2a078d3dc5f6.zip) ``` ================================================================= ==29858==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020004ed808 at pc 0x000000680084 bp 0x7ffcbd8220d0 sp 0x7ffcbd8220c8 READ of size 4 at 0x6020004ed808 thread T0 #0 0x680083 in c10::IValue::IValue(c10::IValue const&) /pytorch/aten/src/ATen/core/ivalue.h:224:33 ROCm#1 0xdc4beb8 in std::pair<c10::impl::DictIterator<c10::IValue, c10::IValue, ska_ordered::detailv3::sherwood_v3_table<std::pair<c10::IValue, c10::IValue>, c10::IValue, c10::detail::DictKeyHash, ska_ordered::detailv3::KeyOrValueHasher<c10::IValue, std::pair<c10::IValue, c10::IValue>, c10::detail::DictKeyHash>, c10::detail::DictKeyEqualTo, ska_ordered::detailv3::KeyOrValueEquality<c10::IValue, std::pair<c10::IValue, c10::IValue>, c10::detail::DictKeyEqualTo>, std::allocator<std::pair<c10::IValue, c10::IValue> >, std::allocator<ska_ordered::detailv3::sherwood_v3_entry<std::pair<c10::IValue, c10::IValue> > > >::templated_iterator<std::pair<c10::IValue, c10::IValue> > >, bool> c10::Dict<c10::IValue, c10::IValue>::insert_or_assign<c10::IValue&, c10::IValue&>(c10::IValue&, c10::IValue&) const /pytorch/aten/src/ATen/core/Dict_inl.h:136:5 ROCm#2 0xea680a7 in torch::jit::Unpickler::readInstruction() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:452:14 ROCm#3 0xea64e07 in torch::jit::Unpickler::run() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:251:27 ROCm#4 0xea64a61 in torch::jit::Unpickler::parse_ivalue() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:204:3 ROCm#5 0xe9b13ce in torch::jit::unpickle(std::function<unsigned long (char*, unsigned long)>, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)) /pytorch/torch/csrc/jit/serialization/pickle.cpp:126:20 ROCm#6 0xe9b178c in torch::jit::unpickle(char const*, unsigned long, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)) /pytorch/torch/csrc/jit/serialization/pickle.cpp:136:10 ROCm#7 0xfdc8aa1 in torch::distributed::rpc::(anonymous namespace)::toIValues(torch::distributed::rpc::Message const&, torch::distributed::rpc::MessageType) /pytorch/torch/csrc/distributed/rpc/rref_proto.cpp:23:16 ROCm#8 0xfdca3ca in torch::distributed::rpc::PythonRRefFetchCall::fromMessage(torch::distributed::rpc::Message const&) /pytorch/torch/csrc/distributed/rpc/rref_proto.cpp:105:17 ROCm#9 0xfe7f347 in torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) /pytorch/torch/csrc/distributed/rpc/utils.cpp:117:14 ROCm#10 0x5c5d13 in LLVMFuzzerTestOneInput /message_deserialize.cc:192:27 ROCm#11 0x5c2bfd in ExecuteFilesOnyByOne /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:255:7 ROCm#12 0x5c2a08 in LLVMFuzzerRunDriver /AFLplusplus/utils/aflpp_driver/aflpp_driver.c ROCm#13 0x5c25c8 in main /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:300:10 ROCm#14 0x7feb90908082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee) ROCm#15 0x50237d in _start (/message_deserialize_afl+0x50237d) 0x6020004ed808 is located 8 bytes to the right of 16-byte region [0x6020004ed7f0,0x6020004ed800) allocated by thread T0 here: #0 0x5bfc1d in operator new(unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/asan/asan_new_delete.cpp:95:3 ROCm#1 0x32ad8d1 in std::_Vector_base<c10::IValue, std::allocator<c10::IValue> >::_M_allocate(unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_vector.h:346:20 ROCm#2 0x32ad8d1 in void std::vector<c10::IValue, std::allocator<c10::IValue> >::_M_realloc_insert<double>(__gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue, std::allocator<c10::IValue> > >, double&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/vector.tcc:440:33 SUMMARY: AddressSanitizer: heap-buffer-overflow /pytorch/aten/src/ATen/core/ivalue.h:224:33 in c10::IValue::IValue(c10::IValue const&) Shadow bytes around the buggy address: 0x0c0480095ab0: fa fa fd fd fa fa fd fd fa fa fd fd fa fa 00 00 0x0c0480095ac0: fa fa 00 00 fa fa 00 00 fa fa 04 fa fa fa 04 fa 0x0c0480095ad0: fa fa 00 fa fa fa fd fa fa fa 04 fa fa fa 00 fa 0x0c0480095ae0: fa fa 00 fa fa fa fd fa fa fa fd fa fa fa fd fa 0x0c0480095af0: fa fa fd fd fa fa 00 00 fa fa 00 fa fa fa 00 00 =>0x0c0480095b00: fa[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c0480095b10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c0480095b20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c0480095b30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c0480095b40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c0480095b50: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==29858==ABORTING ``` ### PoC for load(): [crash-2bd32e496811fb06de24a2bb720dc6490218009f.zip](/uploads/53d108cdd434ec4b11a2034bbca3cfd8/crash-2bd32e496811fb06de24a2bb720dc6490218009f.zip) ``` ==29865==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60c00031f388 at pc 0x000000669984 bp 0x7ffd6c6de630 sp 0x7ffd6c6de628 READ of size 4 at 0x60c00031f388 thread T0 #0 0x669983 in c10::IValue::IValue(c10::IValue const&) /pytorch/aten/src/ATen/core/ivalue.h:224:33 ROCm#1 0xdc3de68 in std::pair<c10::impl::DictIterator<c10::IValue, c10::IValue, ska_ordered::detailv3::sherwood_v3_table<std::pair<c10::IValue, c10::IValue>, c10::IValue, c10::detail::DictKeyHash, ska_ordered::detailv3::KeyOrValueHasher<c10::IValue, std::pair<c10::IValue, c10::IValue>, c10::detail::DictKeyHash>, c10::detail::DictKeyEqualTo, ska_ordered::detailv3::KeyOrValueEquality<c10::IValue, std::pair<c10::IValue, c10::IValue>, c10::detail::DictKeyEqualTo>, std::allocator<std::pair<c10::IValue, c10::IValue> >, std::allocator<ska_ordered::detailv3::sherwood_v3_entry<std::pair<c10::IValue, c10::IValue> > > >::templated_iterator<std::pair<c10::IValue, c10::IValue> > >, bool> c10::Dict<c10::IValue, c10::IValue>::insert_or_assign<c10::IValue&, c10::IValue&>(c10::IValue&, c10::IValue&) const /pytorch/aten/src/ATen/core/Dict_inl.h:136:5 ROCm#2 0xea5a207 in torch::jit::Unpickler::readInstruction() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:452:14 ROCm#3 0xea56f67 in torch::jit::Unpickler::run() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:251:27 ROCm#4 0xea56bc1 in torch::jit::Unpickler::parse_ivalue() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:204:3 ROCm#5 0xe96db4e in torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), std::shared_ptr<torch::jit::DeserializationStorageContext>) /pytorch/torch/csrc/jit/serialization/import_read.cpp:53:20 ROCm#6 0xe8fc648 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/serialization/import.cpp:184:10 ROCm#7 0xe8f8935 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::deserialize(c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:287:19 ROCm#8 0xe8f6d74 in torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::istream&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, bool, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:386:25 ROCm#9 0xe90086e in torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::istream&, c10::optional<c10::Device>, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:322:10 ROCm#10 0xe903209 in torch::jit::load(std::istream&, c10::optional<c10::Device>, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:482:10 ROCm#11 0x5c2d60 in LLVMFuzzerTestOneInput /load.cc:42:14 ROCm#12 0x5c2a8d in ExecuteFilesOnyByOne /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:255:7 ROCm#13 0x5c2898 in LLVMFuzzerRunDriver /AFLplusplus/utils/aflpp_driver/aflpp_driver.c ROCm#14 0x5c2458 in main /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:300:10 ROCm#15 0x7f156ae33082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee) ROCm#16 0x50220d in _start (/load_afl+0x50220d) 0x60c00031f388 is located 8 bytes to the right of 128-byte region [0x60c00031f300,0x60c00031f380) allocated by thread T0 here: #0 0x5bfaad in operator new(unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/asan/asan_new_delete.cpp:95:3 ROCm#1 0xa86231 in std::_Vector_base<c10::IValue, std::allocator<c10::IValue> >::_M_allocate(unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_vector.h:346:20 ROCm#2 0xa86231 in void std::vector<c10::IValue, std::allocator<c10::IValue> >::_M_realloc_insert<c10::IValue&>(__gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue, std::allocator<c10::IValue> > >, c10::IValue&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/vector.tcc:440:33 SUMMARY: AddressSanitizer: heap-buffer-overflow /pytorch/aten/src/ATen/core/ivalue.h:224:33 in c10::IValue::IValue(c10::IValue const&) Shadow bytes around the buggy address: 0x0c188005be20: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa 0x0c188005be30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c188005be40: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd 0x0c188005be50: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa 0x0c188005be60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0c188005be70: fa[fa]fa fa fa fa fa fa 00 00 00 00 00 00 00 00 0x0c188005be80: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa 0x0c188005be90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c188005bea0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c188005beb0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c188005bec0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==29865==ABORTING ``` Pull Request resolved: pytorch#103667 Approved by: https://github.com/albanD

…103969) Hi! We've been fuzzing torchvision project with [sydr-fuzz](https://github.com/ispras/oss-sydr-fuzz). We've found a heap buffer overflow error at `source_range_serialization.cpp:73` in pytorch project. The error occurs because there is not check in `deserialize_source` that `text_table_` size can be less than `fnameIndex`. To prevent the error the corresponding check must be located. torchvision version: 9d0a93eee90bf7c401b74ebf9c8be80346254f15 pytorch version: 0f1621d OS: Ubuntu 20.04 How to reproduce 1. Build docker from [here](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/torchvision) and run the container: sudo docker build -t oss-sydr-fuzz-torchvision . sudo docker run --privileged --rm -v `pwd`:/fuzz -it oss-sydr-fuzz-torchvision /bin/bash 2. Run the target on this input: [serialization-crash.txt](https://github.com/pytorch/pytorch/files/11819901/serialization-crash.txt) /encode_png_fuzz serialization-crash.txt 3. You will see the following output: ================================================================= ==13==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200055a630 at pc 0x0000010197b7 bp 0x7ffd4cfb15f0 sp 0x7ffd4cfb15e8 READ of size 8 at 0x60200055a630 thread T0 #0 0x10197b6 in std::__shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, (__gnu_cxx::_Lock_policy)2>::get() const /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:1325:16 ROCm#1 0x10197b6 in std::__shared_ptr_access<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, (__gnu_cxx::_Lock_policy)2, false, false>::_M_get() const /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:1024:66 ROCm#2 0x10197b6 in std::__shared_ptr_access<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, (__gnu_cxx::_Lock_policy)2, false, false>::operator*() const /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:1011:10 ROCm#3 0xde888c2 in torch::jit::SourceRangeDeserializer::deserialize_source(c10::IValue const&) /pytorch/torch/csrc/jit/serialization/source_range_serialization.cpp:73:16 ROCm#4 0xde8802b in torch::jit::SourceRangeDeserializer::deserialize(c10::IValue const&) /pytorch/torch/csrc/jit/serialization/source_range_serialization.cpp:51:37 ROCm#5 0xde8e9c7 in torch::jit::ConcreteSourceRangeUnpickler::unpickle() /pytorch/torch/csrc/jit/serialization/source_range_serialization.cpp:224:39 ROCm#6 0xde8fb19 in torch::jit::ConcreteSourceRangeUnpickler::findSourceRangeThatGenerated(torch::jit::SourceRange const&) /pytorch/torch/csrc/jit/serialization/source_range_serialization.cpp:231:3 ROCm#7 0x10798e7 in torch::jit::Source::findSourceRangeThatGenerated(torch::jit::SourceRange const&) /pytorch/torch/csrc/jit/frontend/source_range.cpp:144:23 ROCm#8 0x1079d9a in torch::jit::SourceRange::findSourceRangeThatGenerated() const /pytorch/torch/csrc/jit/frontend/source_range.h:384:26 ROCm#9 0x1079acd in torch::jit::SourceRange::highlight(std::ostream&) const /pytorch/torch/csrc/jit/frontend/source_range.cpp:149:32 ROCm#10 0x1026fe2 in torch::jit::Lexer::expected(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::jit::Token const&) /pytorch/torch/csrc/jit/frontend/lexer.h:461:13 ROCm#11 0x10417d9 in torch::jit::Lexer::expected(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/frontend/lexer.h:465:5 ROCm#12 0x102e52c in torch::jit::Lexer::expect(int) /pytorch/torch/csrc/jit/frontend/lexer.h:471:7 ROCm#13 0xcee774c in torch::jit::ParserImpl::parseIdent() /pytorch/torch/csrc/jit/frontend/parser.cpp:52:16 ROCm#14 0xcef4ea8 in torch::jit::ParserImpl::parseBaseExp() /pytorch/torch/csrc/jit/frontend/parser.cpp:195:22 ROCm#15 0xcef2c1b in torch::jit::ParserImpl::parseExp(int) /pytorch/torch/csrc/jit/frontend/parser.cpp:284:16 ROCm#16 0xcefac6a in torch::jit::ParserImpl::parseExp() /pytorch/torch/csrc/jit/frontend/parser.cpp:262:12 ROCm#17 0xcefac6a in torch::jit::ParserImpl::parseSubscriptExp() /pytorch/torch/csrc/jit/frontend/parser.cpp:403:15 ROCm#18 0xceff39f in torch::jit::List<torch::jit::Expr> torch::jit::ParserImpl::parseList<torch::jit::Expr>(int, int, int, torch::jit::Expr (torch::jit::ParserImpl::*)())::'lambda'()::operator()() const /pytorch/torch/csrc/jit/frontend/parser.cpp:354:54 ROCm#19 0xceff39f in torch::jit::Expr std::__invoke_impl<void, torch::jit::List<torch::jit::Expr> torch::jit::ParserImpl::parseList<torch::jit::Expr>(int, int, int, torch::jit::Expr (torch::jit::ParserImpl::*)())::'lambda'()&>(std::__invoke_other, torch::jit::List<torch::jit::Expr> torch::jit::ParserImpl::parseList<torch::jit::Expr>(int, int, int, torch::jit::Expr (torch::jit::ParserImpl::*)())::'lambda'()&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:60:14 ROCm#20 0xceea935 in torch::jit::ParserImpl::parseSequence(int, int, int, std::function<void ()> const&) /pytorch/torch/csrc/jit/frontend/parser.cpp:339:7 ROCm#21 0xceefd69 in torch::jit::List<torch::jit::Expr> torch::jit::ParserImpl::parseList<torch::jit::Expr>(int, int, int, torch::jit::Expr (torch::jit::ParserImpl::*)()) /pytorch/torch/csrc/jit/frontend/parser.cpp:353:5 ROCm#22 0xcef895a in torch::jit::ParserImpl::parseSubscript(c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> > const&) /pytorch/torch/csrc/jit/frontend/parser.cpp:430:9 ROCm#23 0xcef5e5c in torch::jit::ParserImpl::parseBaseExp() /pytorch/torch/csrc/jit/frontend/parser.cpp:206:18 ROCm#24 0xcef2c1b in torch::jit::ParserImpl::parseExp(int) /pytorch/torch/csrc/jit/frontend/parser.cpp:284:16 ROCm#25 0xceeeb9d in torch::jit::ParserImpl::parseExp() /pytorch/torch/csrc/jit/frontend/parser.cpp:262:12 ROCm#26 0xceeeb9d in torch::jit::ParserImpl::parseExpOrExpTuple() /pytorch/torch/csrc/jit/frontend/parser.cpp:94:19 ROCm#27 0xcee8a36 in torch::jit::ParserImpl::parseStmt(bool) /pytorch/torch/csrc/jit/frontend/parser.cpp:612:20 ROCm#28 0xcee7e72 in torch::jit::ParserImpl::parseStatements(bool, bool) /pytorch/torch/csrc/jit/frontend/parser.cpp:697:23 ROCm#29 0xcee56f5 in torch::jit::ParserImpl::parseClass() /pytorch/torch/csrc/jit/frontend/parser.cpp:747:9 ROCm#30 0xcee544a in torch::jit::Parser::parseClass() /pytorch/torch/csrc/jit/frontend/parser.cpp:812:17 ROCm#31 0xdddbea9 in torch::jit::SourceImporterImpl::parseSourceIfNeeded(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/serialization/import_source.cpp:182:42 ROCm#32 0xdddadbc in torch::jit::SourceImporterImpl::findNamedType(c10::QualifiedName const&) /pytorch/torch/csrc/jit/serialization/import_source.cpp:135:3 ROCm#33 0xdde1d88 in torch::jit::SourceImporterImpl::resolveType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::jit::SourceRange const&) /pytorch/torch/csrc/jit/serialization/import_source.cpp:261:10 ROCm#34 0xcf2ba5f in torch::jit::ScriptTypeParser::parseTypeFromExpr(torch::jit::Expr const&) const /pytorch/torch/csrc/jit/frontend/script_type_parser.cpp:238:24 ROCm#35 0xcf2bec7 in torch::jit::ScriptTypeParser::parseType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/frontend/script_type_parser.cpp:312:10 ROCm#36 0xddf4284 in torch::jit::SourceImporter::loadType(c10::QualifiedName const&) const /pytorch/torch/csrc/jit/serialization/import_source.cpp:786:27 ROCm#37 0xdd739f7 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0::operator()(c10::QualifiedName const&) const /pytorch/torch/csrc/jit/serialization/import.cpp:146:33 ROCm#38 0xdd739f7 in c10::StrongTypePtr std::__invoke_impl<c10::StrongTypePtr, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&>(std::__invoke_other, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:60:14 ROCm#39 0xdd73880 in std::enable_if<is_invocable_r_v<c10::StrongTypePtr, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&>, c10::StrongTypePtr>::type std::__invoke_r<c10::StrongTypePtr, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&>(torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:113:9 ROCm#40 0xdd736d6 in std::_Function_handler<c10::StrongTypePtr (c10::QualifiedName const&), torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0>::_M_invoke(std::_Any_data const&, c10::QualifiedName const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/std_function.h:291:9 ROCm#41 0xdd76349 in std::function<c10::StrongTypePtr (c10::QualifiedName const&)>::operator()(c10::QualifiedName const&) const /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/std_function.h:622:14 ROCm#42 0xdeb9f48 in torch::jit::Unpickler::readGlobal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/serialization/unpickler.cpp:835:9 ROCm#43 0xdeb012d in torch::jit::Unpickler::readInstruction() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:511:7 ROCm#44 0xdeae437 in torch::jit::Unpickler::run() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:251:27 ROCm#45 0xdeae0d2 in torch::jit::Unpickler::parse_ivalue() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:204:3 ROCm#46 0xddd6de3 in torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), std::shared_ptr<torch::jit::DeserializationStorageContext>) /pytorch/torch/csrc/jit/serialization/import_read.cpp:53:20 ROCm#47 0xdd732dd in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/serialization/import.cpp:184:10 ROCm#48 0xdd69885 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::deserialize(c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:287:19 ROCm#49 0xdd6c855 in torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, bool, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:438:25 ROCm#50 0xdd6c1c7 in torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:421:10 ROCm#51 0xdd6dce4 in torch::jit::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:503:10 ROCm#52 0xf2d3f75 in torch::serialize::InputArchive::load_from(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>) /pytorch/torch/csrc/api/src/serialize/input-archive.cpp:97:13 ROCm#53 0x60509c in void torch::load<at::Tensor, char*&>(at::Tensor&, char*&) /pytorch/torch/include/torch/csrc/api/include/torch/serialize.h:107:11 ROCm#54 0x6036be in LLVMFuzzerTestOneInput /vision/encode_png.cc:38:5 ROCm#55 0x66b041 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15 ROCm#56 0x6544cc in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6 ROCm#57 0x65a61b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9 ROCm#58 0x654222 in main /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10 ROCm#59 0x7f3d12cc7082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee) ROCm#60 0x542cdd in _start (/encode_png_fuzz+0x542cdd) 0x60200055a630 is located 16 bytes to the right of 16-byte region [0x60200055a610,0x60200055a620) allocated by thread T0 here: #0 0x60057d in operator new(unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/asan/asan_new_delete.cpp:95:3 ROCm#1 0xde9185d in std::_Vector_base<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::_M_allocate(unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_vector.h:346:20 ROCm#2 0xde9185d in void std::vector<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::_M_realloc_insert<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >(__gnu_cxx::__normal_iterator<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >*, std::vector<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > >, std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/vector.tcc:440:33 ROCm#3 0xde916a1 in std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >& std::vector<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::emplace_back<std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >(std::shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/vector.tcc:121:4 ROCm#4 0xde8f445 in torch::jit::SourceRangeDeserializer::SourceRangeDeserializer(c10::IValue) /pytorch/torch/csrc/jit/serialization/source_range_serialization.h:42:19 ROCm#5 0xde8e141 in torch::jit::ConcreteSourceRangeUnpickler::unpickle() /pytorch/torch/csrc/jit/serialization/source_range_serialization.cpp:215:28 ROCm#6 0xde8fb19 in torch::jit::ConcreteSourceRangeUnpickler::findSourceRangeThatGenerated(torch::jit::SourceRange const&) /pytorch/torch/csrc/jit/serialization/source_range_serialization.cpp:231:3 ROCm#7 0x10798e7 in torch::jit::Source::findSourceRangeThatGenerated(torch::jit::SourceRange const&) /pytorch/torch/csrc/jit/frontend/source_range.cpp:144:23 ROCm#8 0x1079d9a in torch::jit::SourceRange::findSourceRangeThatGenerated() const /pytorch/torch/csrc/jit/frontend/source_range.h:384:26 ROCm#9 0x1079acd in torch::jit::SourceRange::highlight(std::ostream&) const /pytorch/torch/csrc/jit/frontend/source_range.cpp:149:32 ROCm#10 0x1026fe2 in torch::jit::Lexer::expected(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::jit::Token const&) /pytorch/torch/csrc/jit/frontend/lexer.h:461:13 ROCm#11 0x10417d9 in torch::jit::Lexer::expected(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/frontend/lexer.h:465:5 ROCm#12 0xcee774c in torch::jit::ParserImpl::parseIdent() /pytorch/torch/csrc/jit/frontend/parser.cpp:52:16 ROCm#13 0xcef4ea8 in torch::jit::ParserImpl::parseBaseExp() /pytorch/torch/csrc/jit/frontend/parser.cpp:195:22 ROCm#14 0xcef2c1b in torch::jit::ParserImpl::parseExp(int) /pytorch/torch/csrc/jit/frontend/parser.cpp:284:16 ROCm#15 0xcefac6a in torch::jit::ParserImpl::parseExp() /pytorch/torch/csrc/jit/frontend/parser.cpp:262:12 ROCm#16 0xcefac6a in torch::jit::ParserImpl::parseSubscriptExp() /pytorch/torch/csrc/jit/frontend/parser.cpp:403:15 ROCm#17 0xceff39f in torch::jit::List<torch::jit::Expr> torch::jit::ParserImpl::parseList<torch::jit::Expr>(int, int, int, torch::jit::Expr (torch::jit::ParserImpl::*)())::'lambda'()::operator()() const /pytorch/torch/csrc/jit/frontend/parser.cpp:354:54 ROCm#18 0xceff39f in torch::jit::Expr std::__invoke_impl<void, torch::jit::List<torch::jit::Expr> torch::jit::ParserImpl::parseList<torch::jit::Expr>(int, int, int, torch::jit::Expr (torch::jit::ParserImpl::*)())::'lambda'()&>(std::__invoke_other, torch::jit::List<torch::jit::Expr> torch::jit::ParserImpl::parseList<torch::jit::Expr>(int, int, int, torch::jit::Expr (torch::jit::ParserImpl::*)())::'lambda'()&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:60:14 ROCm#19 0xceea935 in torch::jit::ParserImpl::parseSequence(int, int, int, std::function<void ()> const&) /pytorch/torch/csrc/jit/frontend/parser.cpp:339:7 ROCm#20 0xceefd69 in torch::jit::List<torch::jit::Expr> torch::jit::ParserImpl::parseList<torch::jit::Expr>(int, int, int, torch::jit::Expr (torch::jit::ParserImpl::*)()) /pytorch/torch/csrc/jit/frontend/parser.cpp:353:5 ROCm#21 0xcef895a in torch::jit::ParserImpl::parseSubscript(c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> > const&) /pytorch/torch/csrc/jit/frontend/parser.cpp:430:9 ROCm#22 0xcef5e5c in torch::jit::ParserImpl::parseBaseExp() /pytorch/torch/csrc/jit/frontend/parser.cpp:206:18 ROCm#23 0xcef2c1b in torch::jit::ParserImpl::parseExp(int) /pytorch/torch/csrc/jit/frontend/parser.cpp:284:16 ROCm#24 0xceeeb9d in torch::jit::ParserImpl::parseExp() /pytorch/torch/csrc/jit/frontend/parser.cpp:262:12 ROCm#25 0xceeeb9d in torch::jit::ParserImpl::parseExpOrExpTuple() /pytorch/torch/csrc/jit/frontend/parser.cpp:94:19 ROCm#26 0xcee8a36 in torch::jit::ParserImpl::parseStmt(bool) /pytorch/torch/csrc/jit/frontend/parser.cpp:612:20 ROCm#27 0xcee7e72 in torch::jit::ParserImpl::parseStatements(bool, bool) /pytorch/torch/csrc/jit/frontend/parser.cpp:697:23 ROCm#28 0xcee56f5 in torch::jit::ParserImpl::parseClass() /pytorch/torch/csrc/jit/frontend/parser.cpp:747:9 ROCm#29 0xcee544a in torch::jit::Parser::parseClass() /pytorch/torch/csrc/jit/frontend/parser.cpp:812:17 ROCm#30 0xdddbea9 in torch::jit::SourceImporterImpl::parseSourceIfNeeded(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/serialization/import_source.cpp:182:42 ROCm#31 0xdddadbc in torch::jit::SourceImporterImpl::findNamedType(c10::QualifiedName const&) /pytorch/torch/csrc/jit/serialization/import_source.cpp:135:3 ROCm#32 0xdde1d88 in torch::jit::SourceImporterImpl::resolveType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::jit::SourceRange const&) /pytorch/torch/csrc/jit/serialization/import_source.cpp:261:10 ROCm#33 0xcf2ba5f in torch::jit::ScriptTypeParser::parseTypeFromExpr(torch::jit::Expr const&) const /pytorch/torch/csrc/jit/frontend/script_type_parser.cpp:238:24 SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:1325:16 in std::__shared_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, (__gnu_cxx::_Lock_policy)2>::get() const Shadow bytes around the buggy address: 0x0c04800a3470: fa fa 00 00 fa fa 00 00 fa fa fd fa fa fa 00 00 0x0c04800a3480: fa fa fd fa fa fa fd fd fa fa fd fd fa fa fd fa 0x0c04800a3490: fa fa fd fd fa fa 00 00 fa fa 00 00 fa fa 00 00 0x0c04800a34a0: fa fa fd fa fa fa fd fd fa fa fd fa fa fa 00 fa 0x0c04800a34b0: fa fa fd fd fa fa fd fd fa fa fd fa fa fa fd fd =>0x0c04800a34c0: fa fa 00 00 fa fa[fa]fa fa fa fa fa fa fa fa fa 0x0c04800a34d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c04800a34e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c04800a34f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c04800a3500: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c04800a3510: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==13==ABORTING Pull Request resolved: pytorch#103969 Approved by: https://github.com/davidberard98

Fixes ASAN stack-use-after-scope in MKLDNN. The stack trace is ``` 2023-06-27T16:37:20.9099950Z ==1424==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7f0c5dc20980 at pc 0x7f0c61286a73 bp 0x7ffef8e76990 sp 0x7ffef8e76118 2023-06-27T16:37:20.9100054Z READ of size 24 at 0x7f0c5dc20980 thread T0 2023-06-27T16:37:20.9100327Z #0 0x7f0c61286a72 in memcmp (/usr/lib/llvm-7/lib/clang/7.0.1/lib/linux/libclang_rt.asan-x86_64.so+0x5da72) 2023-06-27T16:37:20.9100701Z ROCm#1 0x7f0c2f395d0b in c10::ArrayRef<long>::equals(c10::ArrayRef<long>) const (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xcb8bd0b) 2023-06-27T16:37:20.9101196Z ROCm#2 0x7f0c314a1bb1 in at::native::mkldnn_matmul(at::Tensor const&, at::Tensor const&, at::Tensor const&, float, float) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xec97bb1) 2023-06-27T16:37:20.9101714Z ROCm#3 0x7f0c301f49c5 in at::native::bmm_out_or_baddbmm_(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xd9ea9c5) 2023-06-27T16:37:20.9102153Z ROCm#4 0x7f0c301f85ab in at::native::structured_bmm_out_cpu::impl(at::Tensor const&, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xd9ee5ab) 2023-06-27T16:37:20.9102601Z ROCm#5 0x7f0c32cb3cb6 in at::(anonymous namespace)::wrapper_CPU_bmm(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x104a9cb6) 2023-06-27T16:37:20.9103662Z ROCm#6 0x7f0c32ea1f43 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &(at::(anonymous namespace)::wrapper_CPU_bmm(at::Tensor const&, at::Tensor const&))>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x10697f43) 2023-06-27T16:37:20.9104330Z ROCm#7 0x7f0c3187252a in at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) const (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xf06852a) 2023-06-27T16:37:20.9104756Z ROCm#8 0x7f0c3257e097 in at::_ops::bmm::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xfd74097) 2023-06-27T16:37:20.9105237Z ROCm#9 0x7f0c383c31c3 in torch::autograd::VariableType::(anonymous namespace)::bmm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x15bb91c3) 2023-06-27T16:37:20.9106496Z ROCm#10 0x7f0c383c25b9 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), &(torch::autograd::VariableType::(anonymous namespace)::bmm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&))>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x15bb85b9) 2023-06-27T16:37:20.9106874Z ROCm#11 0x7f0c3257da60 in at::_ops::bmm::call(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xfd73a60) 2023-06-27T16:37:20.9107275Z ROCm#12 0x7f0c301fc0e2 in at::native::_matmul_impl(at::Tensor&, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xd9f20e2) 2023-06-27T16:37:20.9107647Z ROCm#13 0x7f0c301f9c21 in at::native::matmul(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xd9efc21) 2023-06-27T16:37:20.9108853Z ROCm#14 0x7f0c33dca7e3 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &(at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__matmul(at::Tensor const&, at::Tensor const&))>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x115c07e3) 2023-06-27T16:37:20.9109255Z ROCm#15 0x7f0c32958ef0 in at::_ops::matmul::call(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1014eef0) 2023-06-27T16:37:20.9110023Z ROCm#16 0x7f0c2f596b62 in at::autocast::WrapFunction_<(at::autocast::CastPolicy)0, (c10::DeviceType)0, at::Tensor (at::Tensor const&, at::Tensor const&), &(at::_ops::matmul::call(at::Tensor const&, at::Tensor const&)), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >::call(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xcd8cb62) 2023-06-27T16:37:20.9110723Z ROCm#17 0x7f0c2f348403 in c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, at::Tensor const&), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >::operator()(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xcb3e403) 2023-06-27T16:37:20.9111596Z ROCm#18 0x7f0c2f348063 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, at::Tensor const&), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xcb3e063) 2023-06-27T16:37:20.9111976Z ROCm#19 0x7f0c32958ef0 in at::_ops::matmul::call(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1014eef0) 2023-06-27T16:37:20.9112383Z ROCm#20 0x7f0c5803dc3e in torch::autograd::THPVariable_matmul(_object*, _object*, _object*) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_python.so+0x2b2cc3e) 2023-06-27T16:37:20.9112561Z warning: parsing line table prologue at 0x00000000 should have ended at 0x0000050b but it ended at 0x0000050a 2023-06-27T16:37:20.9112713Z ROCm#21 0x5074a6 in cfunction_call (/opt/conda/envs/py_3.9/bin/python3.9+0x5074a6) 2023-06-27T16:37:20.9112857Z ROCm#22 0x505997 in _PyObject_Call (/opt/conda/envs/py_3.9/bin/python3.9+0x505997) 2023-06-27T16:37:20.9113114Z ROCm#23 0x505997 in PyObject_Call /croot/python-split_1684193875530/work/build-static/<invalid>:293:12 2023-06-27T16:37:20.9113258Z ROCm#24 0x4ed302 in do_call_core (/opt/conda/envs/py_3.9/bin/python3.9+0x4ed302) 2023-06-27T16:37:20.9113633Z ROCm#25 0x4ed302 in _PyEval_EvalFrameDefault /croot/python-split_1684193875530/work/build-static/<invalid>:3582:22 2023-06-27T16:37:20.9113780Z ROCm#26 0x4e6729 in _PyEval_EvalFrame (/opt/conda/envs/py_3.9/bin/python3.9+0x4e6729) 2023-06-27T16:37:20.9114041Z ROCm#27 0x4e6729 in _PyEval_EvalCode /croot/python-split_1684193875530/work/build-static/<invalid>:4329:14 2023-06-27T16:37:20.9114202Z ROCm#28 0x4efd7d in _PyFunction_Vectorcall (/opt/conda/envs/py_3.9/bin/python3.9+0x4efd7d) ``` Pull Request resolved: pytorch#104331 Approved by: https://github.com/soulitzer

Hi! We've been fuzzing torchvision project with [sydr-fuzz](https://github.com/ispras/oss-sydr-fuzz). We've found a SEGV error at address 0x0 at `vector.h:163` in pytorch third-party project flatbuffers. The error occurs because the `ivalues` field of flatbuffer module can be null, so the corresponding check must be inserted. torchvision version: 9d0a93eee90bf7c401b74ebf9c8be80346254f15 pytorch version: 0f1621d OS: Ubuntu 20.04 How to reproduce 1. Build docker from [here](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/torchvision) and run the container: sudo docker build -t oss-sydr-fuzz-torchvision . sudo docker run --privileged --rm -v `pwd`:/fuzz -it oss-sydr-fuzz-torchvision /bin/bash 2. Run the target on this input: [malformed-module.txt](https://github.com/pytorch/pytorch/files/11879653/malformed-module.txt) /encode_png_fuzz malformed-module.txt 3. You will see the following output: AddressSanitizer:DEADLYSIGNAL ================================================================= ==1154==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x00000d17cc61 bp 0x7ffcbe8637f0 sp 0x7ffcbe863660 T0) ==1154==The signal is caused by a READ memory access. ==1154==Hint: address points to the zero page. #0 0xd17cc61 in flatbuffers::Vector<flatbuffers::Offset<torch::jit::mobile::serialization::IValue> >::size() const /pytorch/third_party/flatbuffers/include/flatbuffers/vector.h:163:48 ROCm#1 0xd17cc61 in torch::jit::(anonymous namespace)::FlatbufferLoader::parseModule(torch::jit::mobile::serialization::Module*) /pytorch/torch/csrc/jit/mobile/flatbuffer_loader.cpp:293:32 ROCm#2 0xd17dd23 in torch::jit::parse_and_initialize_mobile_module_for_jit(void*, unsigned long, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, std::vector<c10::IValue, std::allocator<c10::IValue> >&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >*) /pytorch/torch/csrc/jit/mobile/flatbuffer_loader.cpp:809:29 ROCm#3 0xdd661b4 in torch::jit::parse_and_initialize_jit_module(std::shared_ptr<char>, unsigned long, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, c10::optional<c10::Device>) /pytorch/torch/csrc/jit/serialization/import.cpp:345:28 ROCm#4 0xdd6b24a in torch::jit::_load_jit_module_from_bytes(std::shared_ptr<char>, unsigned long, std::shared_ptr<torch::jit::CompilationUnit>, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:547:14 ROCm#5 0xdd6c6df in torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, bool, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:443:10 ROCm#6 0xdd6c1c7 in torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:421:10 ROCm#7 0xdd6dce4 in torch::jit::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:503:10 ROCm#8 0xf2d3f75 in torch::serialize::InputArchive::load_from(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>) /pytorch/torch/csrc/api/src/serialize/input-archive.cpp:97:13 ROCm#9 0x60509c in void torch::load<at::Tensor, char*&>(at::Tensor&, char*&) /pytorch/torch/include/torch/csrc/api/include/torch/serialize.h:107:11 ROCm#10 0x6036be in LLVMFuzzerTestOneInput /vision/encode_png.cc:38:5 ROCm#11 0x66b041 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15 ROCm#12 0x6544cc in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6 ROCm#13 0x65a61b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9 ROCm#14 0x654222 in main /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10 ROCm#15 0x7f0c87b9c082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee) ROCm#16 0x542cdd in _start (/encode_png_fuzz+0x542cdd) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV /pytorch/third_party/flatbuffers/include/flatbuffers/vector.h:163:48 in flatbuffers::Vector<flatbuffers::Offset<torch::jit::mobile::serialization::IValue> >::size() const ==1154==ABORTING Pull Request resolved: pytorch#104243 Approved by: https://github.com/kit1980

Hi! we've been fuzzing PyTorch project with [sydr-fuzz](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch). We've found a couple heap-buffer-overflows in `distributed/rpc` module. PyTorch version: pytorch@0f1621d OS: Ubuntu 20.04 ### How to reproduce 1. Build docker from this [Dockerfile](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch) and run the container. 2. Then run `message_deserialize-afl++` fuzzing target on provided crash-inputs ([crash-056826339f6da8dbb97c944178e94494369a9e22.zip](https://github.com/pytorch/pytorch/files/12096151/crash-056826339f6da8dbb97c944178e94494369a9e22.zip), [crash-4f85db9f19fe152c0018f6675c3b4c122227058f.zip](https://github.com/pytorch/pytorch/files/12096160/crash-4f85db9f19fe152c0018f6675c3b4c122227058f.zip)): ``` unzip crash-4f85db9f19fe152c0018f6675c3b4c122227058f.zip /message_deserialize-afl++ crash-4f85db9f19fe152c0018f6675c3b4c122227058f ``` ### Heap buffer overflow in torch/csrc/jit/serialization/pickle.cpp:144 [crash-056826339f6da8dbb97c944178e94494369a9e22.zip](https://github.com/pytorch/pytorch/files/12096151/crash-056826339f6da8dbb97c944178e94494369a9e22.zip) ```asan "==7614==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60b001b58355 at pc 0x0000005d1147 bp 0x7fffffffa610 sp 0x7fffffff9de0", "READ of size 256 at 0x60b001b58355 thread T0", " #0 0x5d1146 in __asan_memcpy /llvm-project-llvmorg-14.0.6/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:22:3", " ROCm#1 0xd1cd19f in torch::jit::unpickle(char const*, unsigned long, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&))::$_3::operator()(char*, unsigned long) const /pytorch/torch/csrc/jit/serialization/pickle.cpp:144:9", " ROCm#2 0xd1cd19f in unsigned long std::__invoke_impl<unsigned long, torch::jit::unpickle(char const*, unsigned long, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&))::$_3&, char*, unsigned long>(std::__invoke_other, torch::jit::unpickle(char const*, unsigned long, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&))::$_3&, char*&&, unsigned long&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:60:14", " ROCm#3 0xd27aa48 in std::function<unsigned long (char*, unsigned long)>::operator()(char*, unsigned long) const /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/std_function.h:622:14", " ROCm#4 0xd27a61c in torch::jit::Unpickler::readSlowWithBuffer(char*, unsigned long) /pytorch/torch/csrc/jit/serialization/unpickler.cpp:1047:23", " ROCm#5 0xd2698b8 in unsigned char torch::jit::Unpickler::read<unsigned char>() /pytorch/torch/csrc/jit/serialization/unpickler.h:111:7", " ROCm#6 0xd268816 in torch::jit::Unpickler::readOpCode() /pytorch/torch/csrc/jit/serialization/unpickler.h:130:38", " ROCm#7 0xd268816 in torch::jit::Unpickler::run() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:238:17", " ROCm#8 0xd268522 in torch::jit::Unpickler::parse_ivalue() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:204:3", " ROCm#9 0xd1c8502 in torch::jit::unpickle(std::function<unsigned long (char*, unsigned long)>, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)) /pytorch/torch/csrc/jit/serialization/pickle.cpp:126:20", " ROCm#10 0xd1c8dbd in torch::jit::unpickle(char const*, unsigned long, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)) /pytorch/torch/csrc/jit/serialization/pickle.cpp:136:10", " ROCm#11 0xe56b16d in torch::distributed::rpc::readWrappedPayload(std::vector<char, std::allocator<char> >&, torch::distributed::rpc::Message const&) /pytorch/torch/csrc/distributed/rpc/utils.cpp:515:18", " ROCm#12 0xe3d8f29 in torch::distributed::autograd::RpcWithProfilingReq::fromMessage(torch::distributed::rpc::Message const&) /pytorch/torch/csrc/distributed/autograd/rpc_messages/rpc_with_profiling_req.cpp:112:24", " ROCm#13 0xe55f692 in torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) /pytorch/torch/csrc/distributed/rpc/utils.cpp:138:14", " ROCm#14 0x6120a8 in LLVMFuzzerTestOneInput /message_deserialize.cc:192:27", " ROCm#15 0x535de1 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15", " ROCm#16 0x51fcec in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6", " ROCm#17 0x525a3b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9", " ROCm#18 0x54eff2 in main /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10", " ROCm#19 0x7ffff7a37082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)", " ROCm#20 0x51a60d in _start (/message_deserialize_fuzz+0x51a60d)", "", "0x60b001b58355 is located 0 bytes to the right of 101-byte region [0x60b001b582f0,0x60b001b58355)", "allocated by thread T0 here:", " #0 0x60c7bd in operator new(unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/asan/asan_new_delete.cpp:95:3", " ROCm#1 0x62c7fd in std::_Vector_base<char, std::allocator<char> >::_M_allocate(unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_vector.h:346:20", " ROCm#2 0x62c7fd in void std::vector<char, std::allocator<char> >::_M_range_initialize<unsigned char const*>(unsigned char const*, unsigned char const*, std::forward_iterator_tag) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_vector.h:1582:14", " ROCm#3 0x612913 in std::vector<char, std::allocator<char> >::vector<unsigned char const*, void>(unsigned char const*, unsigned char const*, std::allocator<char> const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_vector.h:657:4", " ROCm#4 0x611c4a in LLVMFuzzerTestOneInput /message_deserialize.cc:181:21", " ROCm#5 0x535de1 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15", " ROCm#6 0x51fcec in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6", " ROCm#7 0x525a3b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9", " ROCm#8 0x54eff2 in main /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10", " ROCm#9 0x7ffff7a37082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)", "", "SUMMARY: AddressSanitizer: heap-buffer-overflow /llvm-project-llvmorg-14.0.6/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:22:3 in __asan_memcpy", "Shadow bytes around the buggy address:", " 0x0c1680363010: 00 00 00 fa fa fa fa fa fa fa fa fa 00 00 00 00", " 0x0c1680363020: 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa fa", " 0x0c1680363030: fa fa 00 00 00 00 00 00 00 00 00 00 00 00 00 fa", " 0x0c1680363040: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00", " 0x0c1680363050: 00 00 00 00 00 fa fa fa fa fa fa fa fa fa 00 00", "=>0x0c1680363060: 00 00 00 00 00 00 00 00 00 00[05]fa fa fa fa fa", " 0x0c1680363070: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00", " 0x0c1680363080: 05 fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa", " 0x0c1680363090: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa", " 0x0c16803630a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa", " 0x0c16803630b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa", "Shadow byte legend (one shadow byte represents 8 application bytes):", " Addressable: 00", " Partially addressable: 01 02 03 04 05 06 07", " Heap left redzone: fa", " Freed heap region: fd", " Stack left redzone: f1", " Stack mid redzone: f2", " Stack right redzone: f3", " Stack after return: f5", " Stack use after scope: f8", " Global redzone: f9", " Global init order: f6", " Poisoned by user: f7", " Container overflow: fc", " Array cookie: ac", " Intra object redzone: bb", " ASan internal: fe", " Left alloca redzone: ca", " Right alloca redzone: cb", "==7614==ABORTING" ``` ### Heap-buffer-overflow in aten/src/ATen/core/ivalue.h:432 [crash-4f85db9f19fe152c0018f6675c3b4c122227058f.zip](https://github.com/pytorch/pytorch/files/11553011/crash-4f85db9f19fe152c0018f6675c3b4c122227058f.zip) ```asan "==60983==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6150001e4108 at pc 0x000000601877 bp 0x7fffffff9fd0 sp 0x7fffffff9fc8", "READ of size 4 at 0x6150001e4108 thread T0", " #0 0x601876 in c10::IValue::isTensor() const /pytorch/aten/src/ATen/core/ivalue.h:432:27", " ROCm#1 0x601876 in c10::IValue::destroy() /pytorch/aten/src/ATen/core/ivalue.h:1148:9", " ROCm#2 0x699f72 in c10::IValue::~IValue() /pytorch/aten/src/ATen/core/ivalue.h:236:5", " ROCm#3 0x699f72 in void std::_Destroy<c10::IValue>(c10::IValue*) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_construct.h:140:19", " ROCm#4 0x699f72 in void std::_Destroy_aux<false>::__destroy<c10::IValue*>(c10::IValue*, c10::IValue*) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_construct.h:152:6", " ROCm#5 0x699f72 in void std::_Destroy<c10::IValue*>(c10::IValue*, c10::IValue*) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_construct.h:184:7", " ROCm#6 0x699f72 in void std::_Destroy<c10::IValue*, c10::IValue>(c10::IValue*, c10::IValue*, std::allocator<c10::IValue>&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/alloc_traits.h:738:7", " ROCm#7 0x699f72 in std::vector<c10::IValue, std::allocator<c10::IValue> >::_M_erase_at_end(c10::IValue*) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_vector.h:1796:6", " ROCm#8 0x699e4a in std::vector<c10::IValue, std::allocator<c10::IValue> >::_M_erase(__gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue, std::allocator<c10::IValue> > >, __gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue, std::allocator<c10::IValue> > >) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/vector.tcc:191:4", " ROCm#9 0xea5b11e in torch::jit::Unpickler::readInstruction() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:454:14", " ROCm#10 0xea57d97 in torch::jit::Unpickler::run() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:251:27", " ROCm#11 0xea579f1 in torch::jit::Unpickler::parse_ivalue() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:204:3", " ROCm#12 0xe9a435e in torch::jit::unpickle(std::function<unsigned long (char*, unsigned long)>, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)) /pytorch/torch/csrc/jit/serialization/pickle.cpp:126:20", " ROCm#13 0xe9a471c in torch::jit::unpickle(char const*, unsigned long, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)) /pytorch/torch/csrc/jit/serialization/pickle.cpp:136:10", " ROCm#14 0xfcd034b in torch::distributed::autograd::PropagateGradientsReq::fromMessage(torch::distributed::rpc::Message const&) /pytorch/torch/csrc/distributed/autograd/rpc_messages/propagate_gradients_req.cpp:54:18", " ROCm#15 0xfe720ff in torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) /pytorch/torch/csrc/distributed/rpc/utils.cpp:132:14", " ROCm#16 0x5c5c93 in LLVMFuzzerTestOneInput /message_deserialize.cc:192:27", " ROCm#17 0x5c2bfd in ExecuteFilesOnyByOne /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:255:7", " ROCm#18 0x5c2a08 in LLVMFuzzerRunDriver /AFLplusplus/utils/aflpp_driver/aflpp_driver.c", " ROCm#19 0x5c25c8 in main /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:300:10", " ROCm#20 0x7ffff7a37082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)", " ROCm#21 0x50237d in _start (/message_deserialize_afl+0x50237d)", "", "0x6150001e4108 is located 8 bytes to the right of 512-byte region [0x6150001e3f00,0x6150001e4100)", "allocated by thread T0 here:", " #0 0x5bfbfa in operator new(unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/asan/asan_new_delete.cpp:95:3", "", "SUMMARY: AddressSanitizer: heap-buffer-overflow /pytorch/aten/src/ATen/core/ivalue.h:432:27 in c10::IValue::isTensor() const", "Shadow bytes around the buggy address:", " 0x0c2a800347d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa", " 0x0c2a800347e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00", " 0x0c2a800347f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00", " 0x0c2a80034800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00", " 0x0c2a80034810: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00", "=>0x0c2a80034820: fa[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa", " 0x0c2a80034830: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa", " 0x0c2a80034840: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa", " 0x0c2a80034850: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa", " 0x0c2a80034860: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa", " 0x0c2a80034870: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa", "Shadow byte legend (one shadow byte represents 8 application bytes):", " Addressable: 00", " Partially addressable: 01 02 03 04 05 06 07", " Heap left redzone: fa", " Freed heap region: fd", " Stack left redzone: f1", " Stack mid redzone: f2", " Stack right redzone: f3", " Stack after return: f5", " Stack use after scope: f8", " Global redzone: f9", " Global init order: f6", " Poisoned by user: f7", " Container overflow: fc", " Array cookie: ac", " Intra object redzone: bb", " ASan internal: fe", " Left alloca redzone: ca", " Right alloca redzone: cb", "==60983==ABORTING" ``` Pull Request resolved: pytorch#105537 Approved by: https://github.com/albanD

### Description Hi! We've been fuzzing `pytorch` with [sydr-fuzz](https://github.com/ispras/oss-sydr-fuzz) and found error of out of bounds access in `torch::jit` module. pytorch version: 18bcf62 The error occurs in `import_source.cpp:560` when we get the type from the `assign.rhs()`. `assign.rhs()` has `Maybe` type, as well as `assign.type()`, so one of them can be not presented. According to [grammar](https://github.com/pytorch/pytorch/blob/22f93852a2664b3dc29544ac6a36f1ec52c6caa2/torch/csrc/jit/frontend/tree_views.h), we can have `Assign` statement, which `lhs` will be `Subscript`, `rhs` will be empty (`Maybe` type with no subtrees) and `type` will be presented. But in `import_source.cpp:560` we try to get `rhs` expression from the assignment with no check whether it is presented. This is example from the how to reproduce section from the testing input: ``` class Module(Module): __parameters__ = ["0", ] __buffers__ = [] __annotations__ = [] __annotations__["0"] : Tensor ``` When we parse the last statement of class definition, we set the type of `lhs` to `Subscript`, because the lookahead is `[` https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/csrc/jit/frontend/parser.cpp#L205-L207 Then in `parseAssignment` we get `maybeOp` and `type` depending on the next symbol (if it is `:`, we get only the type) https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/csrc/jit/frontend/parser.cpp#L437-L447 So after that, in `import_source.cpp:560`, parsing attributes, one of which is assignment with subscript type of `lhs`, we try to get type from `rhs` expression and out of bounds access occurs. To fix the error, we need to check whether the `rhs` or `type` are presented and get the type from corresponding expression. ### How to reproduce Build docker container from [here](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch): ```bash $ sudo docker build -t oss-sydr-fuzz-pytorch ``` Run docker container: ```bash $ sudo docker run --rm --privileged -v `pwd`:/fuzz -it oss-sydr-fuzz-pytorch /bin/bash ``` Run the `load_fuzz` target on the [input.txt](https://github.com/pytorch/pytorch/files/12173962/input.txt) ```bash /load_fuzz input.txt ``` You will see the following output: ``` AddressSanitizer:DEADLYSIGNAL ================================================================= ==157==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x00000c163764 bp 0x7ffee71d0070 sp 0x7ffee71d0050 T0) ==157==The signal is caused by a READ memory access. ==157==Hint: this fault was caused by a dereference of a high value address (see register values below). Disassemble the provided pc to learn which register was used. #0 0xc163764 in c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> >::retain_() /pytorch/c10/util/intrusive_ptr.h:265:54 #1 0xc1697fd in c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> >::intrusive_ptr(c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> > const&) /pytorch/c10/util/intrusive_ptr.h:354:5 #2 0xc1697fd in torch::jit::Expr::Expr(c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> > const&) /pytorch/torch/csrc/jit/frontend/tree_views.h:270:49 #3 0xc1f02cb in torch::jit::Maybe<torch::jit::Expr>::get() const /pytorch/torch/csrc/jit/frontend/tree_views.h:212:12 #4 0xd194369 in torch::jit::SourceImporterImpl::importClass(c10::QualifiedName const&, torch::jit::ClassDef const&, bool) /pytorch/torch/csrc/jit/serialization/import_source.cpp:560:70 #5 0xd18c701 in torch::jit::SourceImporterImpl::importNamedType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::jit::ClassDef const&) /pytorch/torch/csrc/jit/serialization/import_source.cpp:288:5 #6 0xd18a84c in torch::jit::SourceImporterImpl::findNamedType(c10::QualifiedName const&) /pytorch/torch/csrc/jit/serialization/import_source.cpp:140:5 #7 0xd1913a8 in torch::jit::SourceImporterImpl::resolveType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::jit::SourceRange const&) /pytorch/torch/csrc/jit/serialization/import_source.cpp:261:10 #8 0xc2e422f in torch::jit::ScriptTypeParser::parseTypeFromExpr(torch::jit::Expr const&) const /pytorch/torch/csrc/jit/frontend/script_type_parser.cpp:238:24 #9 0xc2e4697 in torch::jit::ScriptTypeParser::parseType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/frontend/script_type_parser.cpp:312:10 #10 0xd1a37d4 in torch::jit::SourceImporter::loadType(c10::QualifiedName const&) const /pytorch/torch/csrc/jit/serialization/import_source.cpp:786:27 #11 0xd121c47 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0::operator()(c10::QualifiedName const&) const /pytorch/torch/csrc/jit/serialization/import.cpp:146:33 #12 0xd121c47 in c10::StrongTypePtr std::__invoke_impl<c10::StrongTypePtr, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&>(std::__invoke_other, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:60:14 #13 0xd121ad0 in std::enable_if<is_invocable_r_v<c10::StrongTypePtr, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&>, c10::StrongTypePtr>::type std::__invoke_r<c10::StrongTypePtr, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&>(torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:113:9 #14 0xd121926 in std::_Function_handler<c10::StrongTypePtr (c10::QualifiedName const&), torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0>::_M_invoke(std::_Any_data const&, c10::QualifiedName const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/std_function.h:291:9 #15 0xd17ec49 in std::function<c10::StrongTypePtr (c10::QualifiedName const&)>::operator()(c10::QualifiedName const&) const /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/std_function.h:622:14 #16 0xd26b802 in torch::jit::Unpickler::readGlobal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/serialization/unpickler.cpp:844:9 #17 0xd2615fb in torch::jit::Unpickler::readInstruction() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:520:7 #18 0xd25f917 in torch::jit::Unpickler::run() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:253:27 #19 0xd25f5b2 in torch::jit::Unpickler::parse_ivalue() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:206:3 #20 0xd186403 in torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), std::shared_ptr<torch::jit::DeserializationStorageContext>) /pytorch/torch/csrc/jit/serialization/import_read.cpp:53:20 #21 0xd12152d in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/serialization/import.cpp:184:10 #22 0xd117bae in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::deserialize(c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:287:19 #23 0xd114074 in torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::istream&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, bool, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:389:25 #24 0xd113a27 in torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::istream&, c10::optional<c10::Device>, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:325:10 #25 0xd11bb64 in torch::jit::load(std::istream&, c10::optional<c10::Device>, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:485:10 #26 0x610c5c in LLVMFuzzerTestOneInput /load.cc:42:14 #27 0x537701 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15 #28 0x52160c in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6 #29 0x52735b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9 #30 0x550912 in main /llvm-project-llvmorg-14.0.6/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10 #31 0x7f06e8323082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee) #32 0x51bf2d in _start (/load_fuzz+0x51bf2d) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV /pytorch/c10/util/intrusive_ptr.h:265:54 in c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> >::retain_() ==157==ABORTING ``` Pull Request resolved: pytorch#106041 Approved by: https://github.com/davidberard98

…torch#105251) Currently all information about the dependencies of ghstack PRs (e.g. pytorch#105010) is stripped away: https://github.com/pytorch/pytorch/blob/c984885809194e0a807b3f5543450fae4dfa841a/.github/scripts/trymerge.py#L1077-L1078 This PR adds this information back in a more compact form. All dependencies (PR numbers) of each PR in ghstack are recorded. The resulting commit message will look like this (the last line is new): > Mock title (#123) > > Mock body text > Pull Request resolved: pytorch#123 > Approved by: https://github.com/Approver1, https://github.com/Approver2 > ghstack dependencies: #1, #2 --- ### Testing Unit tests. --- ### Note Re: `# type: ignore[assignment]` in unit tests. I did my due diligence to find alternatives. Unfortunately mypy [doesn't](python/mypy#6713) support this [way of patching methods](https://docs.python.org/3/library/unittest.mock-examples.html#mock-patching-methods), and the alternatives are either extremely verbose or don't work for this case. I decided it's not worth the effort (since the problem is limited only to the unit test). Pull Request resolved: pytorch#105251 Approved by: https://github.com/huydhn

Compiler behavior when non-zero offset is added to a null pointer is undefined and is a bad habit. - When `lapackEig` is called with to estimate a workspace size, do not add matrix size to the W pointer. - When `unpack_pivots_cpu_kernel` with zero `dim_size` exit early. - When `topk_impl_loop` is called with `k` is zero, exit right away as output tensors are empty anyway. - Ignore adding non-zero storage-offset in `TensorImpl::data_ptr_impl_impl`, which can be the case if tensor is created as `torch.empty(3)[4:]`. - In `s_addmm_out_sparse_dense_worker` do not call `axpy` over an empty vector. - In `_sparse_binary_op_intersection_kernel_impl` do skip computing `ptr_indices_dim` when `sparse_dim` is empty. - Exit `grid_sample` forward/backward kernels earlier if either `input` or `grid` are empty tensors. Found by asan in clang-12 Before the change UBSan report looks as follows: ``` ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-12/bin/llvm-symbolizer UBSAN_OPTIONS=print_stacktrace=1 LD_PRELOAD=/usr/lib/llvm-12/lib/clang/12.0.1/lib/linux/libclang_rt.asan-x86_64.so python test_fx_experimental.py -v -k test_normalize_operator_exhaustive_linalg_eig_cpu_float32 Test results will be stored in test-reports/python-unittest/test_fx_experimental Running tests... ---------------------------------------------------------------------- test_normalize_operator_exhaustive_linalg_eig_cpu_float32 (__main__.TestNormalizeOperatorsCPU) ... /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:111: UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()' torch.has_cuda, /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:112: UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()' torch.has_cudnn, /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:118: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()' torch.has_mps, /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:119: UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()' torch.has_mkldnn, /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:937:17: runtime error: applying non-zero offset 20 to null pointer #0 0x7f2025794888 in void at::native::lapackEig<float, float>(char, char, int, float*, int, float*, float*, int, float*, int, float*, int, float*, int*) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9945888) ROCm#1 0x7f20257da256 in void at::native::(anonymous namespace)::apply_linalg_eig<float>(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x998b256) ROCm#2 0x7f20257d902d in at::native::(anonymous namespace)::linalg_eig_kernel(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor const&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x998a02d) ROCm#3 0x7f20257b5b3d in at::native::linalg_eig_out_info(at::Tensor const&, at::Tensor&, at::Tensor&, at::Tensor&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9966b3d) ROCm#4 0x7f20257b4770 in at::native::linalg_eig_out(at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9965770) ROCm#5 0x7f20280710e6 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor&, at::Tensor&> (at::Tensor const&, at::Tensor&, at::Tensor&), &(at::(anonymous namespace)::(anonymous namespace)::wrapper_CPU_out_linalg_eig_out(at::Tensor const&, at::Tensor&, at::Tensor&))>, std::tuple<at::Tensor&, at::Tensor&>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor&, at::Tensor&> >, std::tuple<at::Tensor&, at::Tensor&> (at::Tensor const&, at::Tensor&, at::Tensor&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xc2220e6) ROCm#6 0x7f202727a045 in at::_ops::linalg_eig_out::call(at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb42b045) ROCm#7 0x7f20257b7e29 in at::native::linalg_eig(at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9968e29) ROCm#8 0x7f2028070bf0 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&), &(at::(anonymous namespace)::(anonymous namespace)::wrapper_CPU__linalg_eig(at::Tensor const&))>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&> >, std::tuple<at::Tensor, at::Tensor> (at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xc221bf0) ROCm#9 0x7f2026b1f787 in std::tuple<at::Tensor, at::Tensor> c10::Dispatcher::redispatch<std::tuple<at::Tensor, at::Tensor>, at::Tensor const&>(c10::TypedOperatorHandle<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&)> const&, c10::DispatchKeySet, at::Tensor const&) const (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xacd0787) ROCm#10 0x7f20273230a7 in at::_ops::linalg_eig::redispatch(c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb4d40a7) ROCm#11 0x7f202c3cc32d in torch::autograd::VariableType::(anonymous namespace)::linalg_eig(c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1057d32d) ROCm#12 0x7f202c3cba96 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&), &(torch::autograd::VariableType::(anonymous namespace)::linalg_eig(c10::DispatchKeySet, at::Tensor const&))>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&> >, std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1057ca96) ROCm#13 0x7f20272798e0 in at::_ops::linalg_eig::call(at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb42a8e0) ROCm#14 0x7f2043d97ae3 in torch::autograd::THPVariable_linalg_eig(_object*, _object*, _object*) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_python.so+0x23feae3) ROCm#15 0x5072d6 in cfunction_call /usr/local/src/conda/python-3.9.17/Objects/methodobject.c:543:19 ... SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:937:17 in ``` Pull Request resolved: pytorch#106354 Approved by: https://github.com/huydhn, https://github.com/lezcano

…108414) Hi! I've been fuzzing different pytorch modules with with [sydr-fuzz](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch), and found a heap buffer overflow error that occurs during Python object deserialization routine. Vector with `IValues` is verified to contain at least 3 elements, which are subsequently removed from vector. The rest of vector is passed further, where it is expected to contain at least one more element. The crash occurs on empty vector. Docker to reproduce found error: [Dockerfile](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch). ### PoC: [crash-6d634f38a76bfeaa1fffc9472e8ea7b88ee8e776.txt](https://github.com/pytorch/pytorch/files/12499089/crash-6d634f38a76bfeaa1fffc9472e8ea7b88ee8e776.txt) ### ASAN report ``` ==339647==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x604000105388 at pc 0x000000c2b3bc bp 0x7fffffffb8d0 sp 0x7fffffffb8c8 READ of size 4 at 0x604000105388 thread T0 #0 0xc2b3bb in c10::IValue::isString() const /pytorch/aten/src/ATen/core/ivalue.h:685:27 ROCm#1 0xc2b3bb in c10::IValue::toStringRef[abi:cxx11]() const /pytorch/aten/src/ATen/core/ivalue_inl.h:2308:3 ROCm#2 0x101ce65f in torch::distributed::rpc::SerializedPyObj::fromIValues(std::vector<c10::IValue, std::allocator<c10::IValue> >) /pytorch/torch/csrc/distributed/rpc/types.cpp:103:39 ROCm#3 0x1006a7a0 in torch::distributed::rpc::PythonRemoteCall::fromMessage(torch::distributed::rpc::Message const&) /pytorch/torch/csrc/distributed/rpc/python_remote_call.cpp:58:26 ROCm#4 0x101d02e1 in torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) /pytorch/torch/csrc/distributed/rpc/utils.cpp:111:14 ROCm#5 0x8db738 in LLVMFuzzerTestOneInput /message_deserialize.cc:192:27 ROCm#6 0x8d84cd in ExecuteFilesOnyByOne /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:255:7 ROCm#7 0x8d82d8 in LLVMFuzzerRunDriver /AFLplusplus/utils/aflpp_driver/aflpp_driver.c ROCm#8 0x8d7e98 in main /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:300:10 ROCm#9 0x7ffff7a37082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee) ROCm#10 0x817c4d in _start (/message_deserialize_afl+0x817c4d) 0x604000105388 is located 8 bytes to the left of 48-byte region [0x604000105390,0x6040001053c0) allocated by thread T0 here: #0 0x8d54ca in operator new(unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/asan/asan_new_delete.cpp:95:3 SUMMARY: AddressSanitizer: heap-buffer-overflow /pytorch/aten/src/ATen/core/ivalue.h:685:27 in c10::IValue::isString() const ``` Pull Request resolved: pytorch#108414 Approved by: https://github.com/ezyang

…h#108417) Hi! I've been fuzzing different pytorch modules with with [sydr-fuzz](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch), and found a SEGV that occurs during class deserialization in jit module. Docker to reproduce found error: [Dockerfile](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch). ### PoC: [crash-bfbab61bf86755aa712bb978e26057ae76d75fe4.txt](https://github.com/pytorch/pytorch/files/12499228/crash-bfbab61bf86755aa712bb978e26057ae76d75fe4.txt) ### ASAN report ``` ==1003115==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x00000db61680 bp 0x7fffffff5e30 sp 0x7fffffff5a60 T0) ==1003115==The signal is caused by a READ memory access. ==1003115==Hint: this fault was caused by a dereference of a high value address (see register values below). Disassemble the provided pc to learn which register was used. #0 0xdb61680 in c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> >::retain_() /pytorch/c10/util/intrusive_ptr.h:265:54 ROCm#1 0xdb6721c in c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> >::intrusive_ptr(c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> > const&) /pytorch/c10/util/intrusive_ptr.h:354:5 ROCm#2 0xdb6721c in torch::jit::Expr::Expr(c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> > const&) /pytorch/torch/csrc/jit/frontend/tree_views.h:270:49 ROCm#3 0xdbf73b9 in torch::jit::Maybe<torch::jit::Expr>::get() const /pytorch/torch/csrc/jit/frontend/tree_views.h:212:12 ROCm#4 0xecac171 in torch::jit::SourceImporterImpl::importClass(c10::QualifiedName const&, torch::jit::ClassDef const&, bool) /pytorch/torch/csrc/jit/serialization/import_source.cpp:454:64 ROCm#5 0xeca0ada in torch::jit::SourceImporterImpl::importNamedType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::jit::ClassDef const&) /pytorch/torch/csrc/jit/serialization/import_source.cpp:288:5 ROCm#6 0xeca7422 in torch::jit::SourceImporterImpl::findNamedType(c10::QualifiedName const&) /pytorch/torch/csrc/jit/serialization/import_source.cpp:140:5 ROCm#7 0xeca295c in torch::jit::SourceImporterImpl::resolveType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::jit::SourceRange const&) /pytorch/torch/csrc/jit/serialization/import_source.cpp:261:10 ROCm#8 0xdd03bc8 in torch::jit::ScriptTypeParser::parseTypeFromExpr(torch::jit::Expr const&) const /pytorch/torch/csrc/jit/frontend/script_type_parser.cpp:238:24 ROCm#9 0xdcfc9b6 in torch::jit::ScriptTypeParser::parseType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/frontend/script_type_parser.cpp:312:10 ROCm#10 0xecbac43 in torch::jit::SourceImporter::loadType(c10::QualifiedName const&) const /pytorch/torch/csrc/jit/serialization/import_source.cpp:786:27 ROCm#11 0xec2b5d3 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0::operator()(c10::QualifiedName const&) const /pytorch/torch/csrc/jit/serialization/import.cpp:146:33 ROCm#12 0xec2b5d3 in c10::StrongTypePtr std::__invoke_impl<c10::StrongTypePtr, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&>(std::__invoke_other, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:60:14 ROCm#13 0xec2b4a0 in std::enable_if<is_invocable_r_v<c10::StrongTypePtr, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&>, c10::StrongTypePtr>::type std::__invoke_r<c10::StrongTypePtr, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&>(torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:113:9 ROCm#14 0xec2b3a0 in std::_Function_handler<c10::StrongTypePtr (c10::QualifiedName const&), torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0>::_M_invoke(std::_Any_data const&, c10::QualifiedName const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/std_function.h:291:9 ROCm#15 0xec95f7c in std::function<c10::StrongTypePtr (c10::QualifiedName const&)>::operator()(c10::QualifiedName const&) const /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/std_function.h:622:14 ROCm#16 0xed78721 in torch::jit::Unpickler::readGlobal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/serialization/unpickler.cpp:844:9 ROCm#17 0xed87821 in torch::jit::Unpickler::readInstruction() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:520:7 ROCm#18 0xed85b27 in torch::jit::Unpickler::run() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:253:27 ROCm#19 0xed85781 in torch::jit::Unpickler::parse_ivalue() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:206:3 ROCm#20 0xec9c7be in torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), std::shared_ptr<torch::jit::DeserializationStorageContext>) /pytorch/torch/csrc/jit/serialization/import_read.cpp:53:20 ROCm#21 0xec2b168 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/serialization/import.cpp:184:10 ROCm#22 0xec27235 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::deserialize(c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:287:19 ROCm#23 0xec25644 in torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::istream&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, bool, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:389:25 ROCm#24 0xec2dcbe in torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::istream&, c10::optional<c10::Device>, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:325:10 ROCm#25 0xec30659 in torch::jit::load(std::istream&, c10::optional<c10::Device>, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:485:10 ROCm#26 0x8d8636 in LLVMFuzzerTestOneInput /load.cc:42:14 ROCm#27 0x8d835d in ExecuteFilesOnyByOne /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:255:7 ROCm#28 0x8d8168 in LLVMFuzzerRunDriver /AFLplusplus/utils/aflpp_driver/aflpp_driver.c ROCm#29 0x8d7d28 in main /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:300:10 ROCm#30 0x7ffff7a37082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee) ROCm#31 0x817add in _start (/load_afl+0x817add) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV /pytorch/c10/util/intrusive_ptr.h:265:54 in c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> >::retain_() ==1003115==ABORTING ``` Pull Request resolved: pytorch#108417 Approved by: https://github.com/ezyang

…h#108413) Hi! I've been fuzzing different pytorch modules with with [sydr-fuzz](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch), and found a heap buffer overflow error that occurs by incorrect loop condition in torch::jit::unpickler.cpp. This bug can be triggered by `torch::distributed::rpc::deserializeRequest()` method in RPC module. Docker to reproduce found error: [Dockerfile](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch). ### PoC for deserealizeRequest(): [crash-001e49dcd3a3c439e2b1273d580049309e052bdd.txt](https://github.com/pytorch/pytorch/files/12498999/crash-001e49dcd3a3c439e2b1273d580049309e052bdd.txt) ### ASAN report ``` ==339982==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x619000086a88 at pc 0x000000996fa4 bp 0x7fffffff9c50 sp 0x7fffffff9c48 READ of size 4 at 0x619000086a88 thread T0 #0 0x996fa3 in c10::IValue::IValue(c10::IValue const&) /pytorch/aten/src/ATen/core/ivalue.h:226:33 ROCm#1 0xdf99a38 in std::pair<c10::impl::DictIterator<c10::IValue, c10::IValue, ska_ordered::detailv3::sherwood_v3_table<std::pair<c10::IValue, c10::IValue>, c10::IValue, c10::detail::DictKeyHash, ska_ordered::detailv3::KeyOrValueHasher<c10::IValue, std::pair<c10::IValue, c10::IValue>, c10::detail::DictKeyHash>, c10::detail::DictKeyEqualTo, ska_ordered::detailv3::KeyOrValueEquality<c10::IValue, std::pair<c10::IValue, c10::IValue>, c10::detail::DictKeyEqualTo>, std::allocator<std::pair<c10::IValue, c10::IValue> >, std::allocator<ska_ordered::detailv3::sherwood_v3_entry<std::pair<c10::IValue, c10::IValue> > > >::templated_iterator<std::pair<c10::IValue, c10::IValue> > >, bool> c10::Dict<c10::IValue, c10::IValue>::insert_or_assign<c10::IValue&, c10::IValue&>(c10::IValue&, c10::IValue&) const /pytorch/aten/src/ATen/core/Dict_inl.h:136:5 ROCm#2 0xed966c7 in torch::jit::Unpickler::readInstruction() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:490:14 ROCm#3 0xed94377 in torch::jit::Unpickler::run() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:253:27 ROCm#4 0xed93fd1 in torch::jit::Unpickler::parse_ivalue() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:206:3 ROCm#5 0xece09ee in torch::jit::unpickle(std::function<unsigned long (char*, unsigned long)>, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)) /pytorch/torch/csrc/jit/serialization/pickle.cpp:126:20 ROCm#6 0xece0dac in torch::jit::unpickle(char const*, unsigned long, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)) /pytorch/torch/csrc/jit/serialization/pickle.cpp:136:10 ROCm#7 0x1006a4e7 in torch::distributed::rpc::PythonRemoteCall::fromMessage(torch::distributed::rpc::Message const&) /pytorch/torch/csrc/distributed/rpc/python_remote_call.cpp:40:16 ROCm#8 0x101d02e1 in torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) /pytorch/torch/csrc/distributed/rpc/utils.cpp:111:14 ROCm#9 0x8db738 in LLVMFuzzerTestOneInput /message_deserialize.cc:192:27 ROCm#10 0x8d84cd in ExecuteFilesOnyByOne /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:255:7 ROCm#11 0x8d82d8 in LLVMFuzzerRunDriver /AFLplusplus/utils/aflpp_driver/aflpp_driver.c ROCm#12 0x8d7e98 in main /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:300:10 ROCm#13 0x7ffff7a37082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee) ROCm#14 0x817c4d in _start (/message_deserialize_afl+0x817c4d) 0x619000086a88 is located 8 bytes to the right of 1024-byte region [0x619000086680,0x619000086a80) allocated by thread T0 here: #0 0x8d54ca in operator new(unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/asan/asan_new_delete.cpp:95:3 SUMMARY: AddressSanitizer: heap-buffer-overflow /pytorch/aten/src/ATen/core/ivalue.h:226:33 in c10::IValue::IValue(c10::IValue const&) ``` Pull Request resolved: pytorch#108413 Approved by: https://github.com/ezyang

…zation (pytorch#108418) Hi! I've been fuzzing different pytorch modules with with [sydr-fuzz](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch), and found a SEGV that occurs during data parsing for quantized conv deserialization. The crash occurs because of empty `optional` vector. Docker to reproduce found error: [Dockerfile](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch). ### PoC: [crash-aaa72b1c1431ac556118e34099ba163052dc0f96.txt](https://github.com/pytorch/pytorch/files/12499249/crash-aaa72b1c1431ac556118e34099ba163052dc0f96.txt) ### ASAN report ``` ==1003193==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000cbd1b1 bp 0x7fffffff8490 sp 0x7fffffff7a30 T0) ==1003193==The signal is caused by a READ memory access. ==1003193==Hint: address points to the zero page. #0 0xcbd1b1 in c10::optional_base<at::Tensor>::optional_base(c10::optional_base<at::Tensor> const&) /pytorch/c10/util/Optional.h:222:17 #1 0x2b32336 in c10::optional<at::Tensor>::optional(c10::optional<at::Tensor> const&) /pytorch/c10/util/Optional.h:631:3 #2 0x2b32336 in std::tuple<long, std::vector<long, std::allocator<long> >, std::vector<c10::optional<at::Tensor>, std::allocator<c10::optional<at::Tensor> > > > parse_conv_serialized_state<2u>(c10::IValue) /pytorch/aten/src/ATen/native/quantized/cpu/conv_serialization.h:183:17 #3 0x2b30276 in int register_conv_params<2>()::'lambda'(c10::IValue)::operator()(c10::IValue) const /pytorch/aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp:410:49 #4 0x2b30014 in std::enable_if<!(std::is_member_pointer<std::decay<int register_conv_params<2>()::'lambda'(c10::IValue) const&>::type>::value), std::invoke_result<int register_conv_params<2>()::'lambda'(c10::IValue) const&, c10::IValue>::type>::type c10::guts::invoke<int register_conv_params<2>()::'lambda'(c10::IValue) const&, c10::IValue>(int register_conv_params<2>()::'lambda'(c10::IValue) const&, c10::IValue&&) /pytorch/c10/util/C++17.h:203:10 #5 0x2b2f7e7 in torch::class_<ConvPackedParamsBase<2> >& torch::class_<ConvPackedParamsBase<2> >::def_pickle<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), int register_conv_params<2>()::'lambda'(c10::IValue)>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&&, int register_conv_params<2>()::'lambda'(c10::IValue)&&)::'lambda'(c10::tagged_capsule<ConvPackedParamsBase<2> >, c10::IValue&&)::operator()(c10::tagged_capsule<ConvPackedParamsBase<2> >, c10::IValue&&) const /pytorch/torch/custom_class.h:328:11 #6 0x2b2f570 in c10::guts::infer_function_traits<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)>::type::return_type torch::detail::call_torchbind_method_from_stack<torch::class_<ConvPackedParamsBase<2> >& torch::class_<ConvPackedParamsBase<2> >::def_pickle<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), int register_conv_params<2>()::'lambda'(c10::IValue)>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&&, int register_conv_params<2>()::'lambda'(c10::IValue)&&)::'lambda'(c10::tagged_capsule<ConvPackedParamsBase<2> >, c10::IValue&&), false, 0ul, 1ul>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&, std::vector<c10::IValue, std::allocator<c10::IValue> >&, std::integer_sequence<unsigned long, 0ul, 1ul>) /pytorch/torch/custom_class_detail.h:139:10 #7 0x2b2f408 in c10::guts::infer_function_traits<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)>::type::return_type torch::detail::call_torchbind_method_from_stack<torch::class_<ConvPackedParamsBase<2> >& torch::class_<ConvPackedParamsBase<2> >::def_pickle<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), int register_conv_params<2>()::'lambda'(c10::IValue)>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&&, int register_conv_params<2>()::'lambda'(c10::IValue)&&)::'lambda'(c10::tagged_capsule<ConvPackedParamsBase<2> >, c10::IValue&&), false>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&, std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch/torch/custom_class_detail.h:153:10 #8 0x2b2f408 in torch::detail::BoxedProxy<void, torch::class_<ConvPackedParamsBase<2> >& torch::class_<ConvPackedParamsBase<2> >::def_pickle<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), int register_conv_params<2>()::'lambda'(c10::IValue)>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&&, int register_conv_params<2>()::'lambda'(c10::IValue)&&)::'lambda'(c10::tagged_capsule<ConvPackedParamsBase<2> >, c10::IValue&&)>::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >&, torch::class_<ConvPackedParamsBase<2> >& torch::class_<ConvPackedParamsBase<2> >::def_pickle<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), int register_conv_params<2>()::'lambda'(c10::IValue)>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&&, int register_conv_params<2>()::'lambda'(c10::IValue)&&)::'lambda'(c10::tagged_capsule<ConvPackedParamsBase<2> >, c10::IValue&&)&) /pytorch/torch/custom_class_detail.h:174:5 #9 0x2b2f38d in torch::jit::Function* torch::class_<ConvPackedParamsBase<2> >::defineMethod<torch::class_<ConvPackedParamsBase<2> >& torch::class_<ConvPackedParamsBase<2> >::def_pickle<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), int register_conv_params<2>()::'lambda'(c10::IValue)>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&&, int register_conv_params<2>()::'lambda'(c10::IValue)&&)::'lambda'(c10::tagged_capsule<ConvPackedParamsBase<2> >, c10::IValue&&)>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::initializer_list<torch::arg>)::'lambda'(std::vector<c10::IValue, std::allocator<c10::IValue> >&)::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch/torch/custom_class.h:407:7 #10 0x2b2f38d in int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&) std::__invoke_impl<void, torch::jit::Function* torch::class_<ConvPackedParamsBase<2> >::defineMethod<torch::class_<ConvPackedParamsBase<2> >& torch::class_<ConvPackedParamsBase<2> >::def_pickle<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), int register_conv_params<2>()::'lambda'(c10::IValue)>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&&, int register_conv_params<2>()::'lambda'(c10::IValue)&&)::'lambda'(c10::tagged_capsule<ConvPackedParamsBase<2> >, c10::IValue&&)>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::initializer_list<torch::arg>)::'lambda'(std::vector<c10::IValue, std::allocator<c10::IValue> >&)&, std::vector<c10::IValue, std::allocator<c10::IValue> >&>(std::__invoke_other, int register_conv_params<2>()::'lambda'(c10::IValue)&&, std::vector<c10::IValue, std::allocator<c10::IValue> >&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:60:14 #11 0x125654e in torch::jit::Function::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) /pytorch/aten/src/ATen/core/function.h:62:5 #12 0xec2c1c6 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_1::operator()(c10::StrongTypePtr const&, c10::IValue) const /pytorch/torch/csrc/jit/serialization/import.cpp:172:7 #13 0xec2c1c6 in c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > std::__invoke_impl<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> >, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_1&, c10::StrongTypePtr, c10::IValue>(std::__invoke_other, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_1&, c10::StrongTypePtr&&, c10::IValue&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:60:14 #14 0xec2b9a0 in std::enable_if<is_invocable_r_v<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> >, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_1&, c10::StrongTypePtr, c10::IValue>, c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > >::type std::__invoke_r<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> >, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_1&, c10::StrongTypePtr, c10::IValue>(torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_1&, c10::StrongTypePtr&&, c10::IValue&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:113:9 #15 0xec2b8ae in std::_Function_handler<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue), torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_1>::_M_invoke(std::_Any_data const&, c10::StrongTypePtr&&, c10::IValue&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/std_function.h:291:9 #16 0xeda0c63 in std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)>::operator()(c10::StrongTypePtr, c10::IValue) const /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/std_function.h:622:14 #17 0xed8062d in torch::jit::Unpickler::readGlobal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_9::operator()() const /pytorch/torch/csrc/jit/serialization/unpickler.cpp:863:20 #18 0xed8062d in void std::__invoke_impl<void, torch::jit::Unpickler::readGlobal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_9&>(std::__invoke_other, torch::jit::Unpickler::readGlobal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_9&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:60:14 #19 0xed877c6 in torch::jit::Unpickler::readInstruction() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:545:7 #20 0xed85b27 in torch::jit::Unpickler::run() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:253:27 #21 0xed85781 in torch::jit::Unpickler::parse_ivalue() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:206:3 #22 0xec9c7be in torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), std::shared_ptr<torch::jit::DeserializationStorageContext>) /pytorch/torch/csrc/jit/serialization/import_read.cpp:53:20 #23 0xec2b168 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/serialization/import.cpp:184:10 #24 0xec27235 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::deserialize(c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:287:19 #25 0xec25644 in torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::istream&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, bool, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:389:25 #26 0xec2dcbe in torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::istream&, c10::optional<c10::Device>, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:325:10 #27 0xec30659 in torch::jit::load(std::istream&, c10::optional<c10::Device>, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:485:10 #28 0x8d8636 in LLVMFuzzerTestOneInput /load.cc:42:14 #29 0x8d835d in ExecuteFilesOnyByOne /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:255:7 #30 0x8d8168 in LLVMFuzzerRunDriver /AFLplusplus/utils/aflpp_driver/aflpp_driver.c #31 0x8d7d28 in main /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:300:10 #32 0x7ffff7a37082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee) #33 0x817add in _start (/load_afl+0x817add) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV /pytorch/c10/util/Optional.h:222:17 in c10::optional_base<at::Tensor>::optional_base(c10::optional_base<at::Tensor> const&) ==1003193==ABORTING ``` Pull Request resolved: pytorch#108418 Approved by: https://github.com/Skylion007

)" (pytorch#111072) This reverts commit 314a502. Changes since original PR: Reland 1 * rename torch.distributed.hooks to torch.distributed._hooks Reland 2 * make _hooks importable even if !distributed.is_available() * handle cuda driver exit intermittent failure caused by new cuda api usage in callback caller (see prev PR in stack) (original PR pytorch#108815 desc copied below) Expose a set of observability hooks into C10D such that our users can detect collectives failure both faster and more easily. The design is similar to NCCL desync debug that it minimized the overhead by doing most of the work out of the main thread. This PR introduces a new module torch.distributed.hooks that exposes the following set of methods: register_collective_start_hook register_collective_end_hook register_process_group_hook The process group hook exposes PG creation on the member ranks and call them inline from the the PG creation code. This is fine since this happens during initialization and a limited number of times. The collective start/end hooks are fired from a single background thread. It reads events from a C++ queue and dispatches over. Queue notification is oddly done using a pipe, this is needed so python can abort the thread on shutdown and have it as background thread. This is not possible with more reasonable choices like a condvar. Pull Request resolved: pytorch#111072 Approved by: https://github.com/malfet ghstack dependencies: pytorch#111061

…ytorch#110907)" (pytorch#111072)" This reverts commit bb1424d. Reverted pytorch#111072 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](pytorch#111072 (comment)))

…ry (pytorch#113207) This is the cheap and cheerful implementation, which is only enabled on TORCH_SHOW_CPP_STACKTRACES, because it *eagerly* symbolizes immediately at exception throw time, even if the exception will end up getting caught. It would be better to do this lazily and only symbolize when we try to print the exception, but that requires a more involved refactor of c10::Error that I don't feel like doing. Compare the output before: ``` frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x95 (0x7fa21b99d975 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame #1: c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const + 0x8d (0x7fa21b951269 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame #2: c10::TensorImpl::sizes_custom() const + 0x9f (0x7fa21b9770df in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame #3: at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) + 0x31e (0x7fa20a202a8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame #4: <unknown function> + 0x29f34de (0x7fa20b5f34de in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame #5: <unknown function> + 0x2a1fd8e (0x7fa20b61fd8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame #6: <unknown function> + 0x6b907b (0x7fa2142b907b in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so) frame #7: <unknown function> + 0x6b6175 (0x7fa2142b6175 in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so) ``` and after: ``` #4 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0 #5 c10::TensorImpl::throw_cannot_call_with_symbolic(char const*) const from ??:0 #6 c10::TensorImpl::sizes_custom() const [clone .localalias] from TensorImpl.cpp:0 #7 at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) from ??:0 #8 at::(anonymous namespace)::wrapper_Meta_mm_out_out(at::Tensor const&, at::Tensor const&, at::Tensor&) from RegisterMeta.cpp:0 #9 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (at::Tensor const&, at::Tensor const&, at::Tensor&), &at::(anonymous namespace)::wrapper_Meta_mm_out_out>, at::Tensor&, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor&> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterMeta.cpp:0 ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: pytorch#113207 Approved by: https://github.com/Skylion007

… to hang (pytorch#115124) Let's see if it helps pytorch#114913 The issues on llvm are at llvm/llvm-project#55530 and llvm/llvm-project#69369. In my CI test, I saw the following process hanged: ``` /pytorch/pytorch/.lintbin/clang-tidy -p=/pytorch/pytorch/build --extra-arg -I/usr/lib/llvm-11/include/openmp --extra-arg -I/opt/conda/envs/py_3.9/include/python3.9 --extra-arg -I/pytorch/pytorch/third_party/pybind11/include --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward --extra-arg -I/usr/lib/llvm-14/lib/clang/14.0.0/include --extra-arg -I/usr/local/include --extra-arg -I/usr/include/x86_64-linux-gnu --extra-arg -I/usr/include /pytorch/pytorch/torch/csrc/autograd/python_nested_functions_manual.cpp ``` and the core dump matches the description found in llvm/llvm-project#69369 showing the stuck in `clang::tidy::bugprone::UncheckedOptionalAccessCheck::check`: ``` #0 0x00000000030c7420 in clang::dataflow::WatchedLiteralsSolverImpl::updateWatchedLiterals() () #1 0x00000000030c6c2a in clang::dataflow::WatchedLiteralsSolverImpl::solve() && () #2 0x00000000030c6572 in clang::dataflow::WatchedLiteralsSolver::solve(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) () #3 0x00000000030b3bd3 in clang::dataflow::DataflowAnalysisContext::querySolver(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) () #4 0x00000000030b3ca5 in clang::dataflow::DataflowAnalysisContext::flowConditionImplies(clang::dataflow::AtomicBoolValue&, clang::dataflow::BoolValue&) () #5 0x00000000030b1213 in clang::dataflow::(anonymous namespace)::diagnoseUnwrapCall(clang::Expr const*, clang::Expr const*, clang::dataflow::Environment const&) () #6 0x00000000030b1357 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::(anonymous namespace)::buildDiagnoseMatchSwitch(clang::dataflow::UncheckedOptionalAccessModelOptions const&)::$_7>::_M_invoke(std::_Any_data const&, clang::CallExpr const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) () #7 0x00000000030b1292 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::CaseOf<clang::CallExpr>(clang::ast_matchers::internal::Matcher<clang::Stmt>, std::function<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)>) &&::{lambda(clang::Stmt const*, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) () #8 0x00000000030b1995 in clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)#1}::operator()(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) const () #9 0x00000000030b170c in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) () #10 0x00000000030a7c27 in clang::dataflow::UncheckedOptionalAccessDiagnoser::diagnose(clang::ASTContext&, clang::Stmt const*, clang::dataflow::Environment const&) () #11 0x0000000002931286 in std::_Function_handler<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&), clang::tidy::bugprone::analyzeFunction(clang::FunctionDecl const&, clang::ASTContext&)::$_0>::_M_invoke(std::_Any_data const&, clang::Stmt const*&&, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&) () #12 0x0000000002930b41 in clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>)::{lambda(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)#1}::operator()(clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&) const () #13 0x00000000030c18cc in std::_Function_handler<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&), clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>)::$_1>::_M_invoke(std::_Any_data const&, clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&) () #14 0x00000000030bf069 in clang::dataflow::transferBlock(clang::dataflow::ControlFlowContext const&, std::vector<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState>, std::allocator<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState> > >&, clang::CFGBlock const&, clang::dataflow::Environment const&, clang::dataflow::TypeErasedDataflowAnalysis&, std::function<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) () #15 0x00000000030bfaa5 in clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) () #16 0x00000000029301b3 in llvm::Expected<std::vector<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> >, std::allocator<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> > > > > clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>) () #17 0x000000000292fbe8 in clang::tidy::bugprone::UncheckedOptionalAccessCheck::check(clang::ast_matchers::MatchFinder::MatchResult const&) () #18 0x00000000022e1572 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::MatchVisitor::visitMatch(clang::ast_matchers::BoundNodes const&) () #19 0x0000000002797a1c in clang::ast_matchers::internal::BoundNodesTreeBuilder::visitMatches(clang::ast_matchers::internal::BoundNodesTreeBuilder::Visitor*) () #20 0x00000000022e0dc6 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::matchWithFilter(clang::DynTypedNode const&) () #21 0x00000000022e3b57 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #22 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () #23 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #24 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () #25 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #26 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () #27 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #28 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () #29 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #30 0x00000000022e8791 in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl*) () #31 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl*) () #32 0x00000000022c017a in clang::ast_matchers::MatchFinder::matchAST(clang::ASTContext&) () #33 0x000000000370ad3c in clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) () #34 0x00000000038ed4bb in clang::ParseAST(clang::Sema&, bool, bool) () #35 0x000000000369eda7 in clang::FrontendAction::Execute() () #36 0x000000000360d3f6 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) () #37 0x00000000027c475c in clang::tooling::FrontendActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) () #38 0x00000000022ad486 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef)::ActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) () #39 0x00000000027c44c6 in clang::tooling::ToolInvocation::runInvocation(char const*, clang::driver::Compilation*, std::shared_ptr<clang::CompilerInvocation>, std::shared_ptr<clang::PCHContainerOperations>) () #40 0x00000000027c360b in clang::tooling::ToolInvocation::run() () #41 0x00000000027c5bb1 in clang::tooling::ClangTool::run(clang::tooling::ToolAction*) () #42 0x00000000022a90c7 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef) () #43 0x0000000001ebc7f2 in clang::tidy::clangTidyMain(int, char const**) () #44 0x0000000004c54ba0 in __libc_start_main () #45 0x0000000001eb76ae in _start () ``` Another note is that clang-tidy is CPU-bound. So we could consider running lintrunner job on 4xlarge if needed. Pull Request resolved: pytorch#115124 Approved by: https://github.com/kit1980, https://github.com/Skylion007, https://github.com/malfet

…6938) As [`newFunctionWithName:`](https://developer.apple.com/documentation/metal/mtllibrary/1515524-newfunctionwithname) does not accept error argument, do not attempt to print it as it'll be guaranteed `nil` at that point, that results in a classic null pointer dereference, when `TORCH_CHECK` will attempt to construct `std::string` from it. See below backtrace for example: ``` thread ROCm#1, queue = 'metal gpu stream', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x000000018a316dc4 libsystem_platform.dylib`_platform_strlen + 4 frame ROCm#1: 0x00000001471011bc libtorch_cpu.dylib`std::__1::__constexpr_strlen[abi:v160006](__str=0x0000000000000000) at cstring:114:10 frame ROCm#2: 0x0000000147100c24 libtorch_cpu.dylib`std::__1::char_traits<char>::length(__s=0x0000000000000000) at char_traits.h:220:12 * frame ROCm#3: 0x0000000147100bf0 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& std::__1::operator<<[abi:v160006]<std::__1::char_traits<char>>(__os=0x000000016fdfb3a0, __str=0x0000000000000000) at ostream:901:57 frame ROCm#4: 0x0000000147100bb4 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb5d0) at StringUtil.h:55:6 frame ROCm#5: 0x00000001471007ac libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*, char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10 frame ROCm#6: 0x0000000147101444 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char const*, char const*>(ss=0x000000016fdfb3a0, t="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10 frame ROCm#7: 0x0000000147101404 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char const*, char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb500, args="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10 frame ROCm#8: 0x000000014710137c libtorch_cpu.dylib`c10::detail::_str_wrapper<char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, char const*, char const* const&>::call(args=0x000000016fdfb500, args="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:75:5 frame ROCm#9: 0x0000000147101310 libtorch_cpu.dylib`decltype(auto) c10::str<char [53], std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char [10], char const*>(args={a\xcb\xa7H\x01\0\0\0}, args="index_select_32bit_idx32", args={\x96\xcb\xa7H\x01\0\0\0}, args=0x000000016fdfb5d0) at StringUtil.h:111:10 frame ROCm#10: 0x0000000147100210 libtorch_cpu.dylib`decltype(auto) c10::detail::torchCheckMsgImpl<char [53], std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char [10], char const*>((null)="Expected indexFunction to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)", args={a\xcb\xa7H\x01\0\0\0}, args="index_select_32bit_idx32", args={\x96\xcb\xa7H\x01\0\0\0}, args=0x000000016fdfb5d0) at Exception.h:453:10 frame ROCm#11: 0x00000001470fffe8 libtorch_cpu.dylib`at::mps::MPSDevice::metalIndexingPSO(this=0x0000600000381670, kernel="index_select_32bit_idx32") at MPSDevice.mm:62:3 ``` This was introduced by pytorch#99855 that replaced `newFunctionWithName:constantValues:error:` with `newFunctionWithName:` Pull Request resolved: pytorch#116938 Approved by: https://github.com/Skylion007

user may not know which line of code called collectives in a big code base. When debugging, we can print python-cpp stacktrace in case user call ``ProcessGroup.reduce`` instead of ``torch.distributed.reduce`` ``` LOG(INFO) << "ProcessGroupNCCL::_allgather_base stacktrace: " << get_python_cpp_trace(); ``` output (using _allgather_base as an example): one example python-part trace is ``all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838`` ``` ProcessGroupNCCL::_allgather_base stacktrace: #0 torch::unwind::unwind() from ??:0 #1 torch::CapturedTraceback::gather(bool, bool, bool) from ??:0 #2 c10d::get_python_cpp_trace[abi:cxx11]() from :0 #3 c10d::ProcessGroupNCCL::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from ??:0 #4 c10d::ops::(anonymous namespace)::_allgather_base_CUDA(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long) from Ops.cpp:0 #5 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > > (*)(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long), std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > >, c10::guts::typelist::typelist<at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from :0 #6 torch::autograd::basicAutogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from autograd_not_implemented_fallback.cpp:0 #7 c10d::ProcessGroup::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from :0 #8 pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}&&, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (*)(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from :0 #9 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 #10 cfunction_call from /usr/local/src/conda/python-3.10.12/Objects/methodobject.c:543 #11 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215 #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112 #13 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #14 all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838 #15 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #16 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #17 wrapper from /data/users/weif/pytorch/torch/distributed/c10d_logger.py:75 #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #20 _all_gather_flat_param from /data/users/weif/pytorch/torch/distributed/fsdp/_flat_param.py:1399 #21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #23 unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_flat_param.py:1308 #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #26 _unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:332 #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #29 _pre_forward_unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:448 #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #32 _pre_forward from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:413 #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #35 forward from /data/users/weif/pytorch/torch/distributed/fsdp/fully_sharded_data_parallel.py:839 #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #37 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #38 _call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1520 #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #40 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #41 _wrapped_call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1511 #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #43 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.12/Objects/call.c:431 #44 slot_tp_call from /usr/local/src/conda/python-3.10.12/Objects/typeobject.c:7494 #45 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215 #46 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112 #47 inner from /data/users/weif/pytorch/run_fsdp.py:72 #48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #50 run from /data/users/weif/pytorch/run_fsdp.py:76 #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #53 main from /data/users/weif/pytorch/run_fsdp.py:133 #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #56 <module> from /data/users/weif/pytorch/run_fsdp.py:137 #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #58 PyEval_EvalCode from /usr/local/src/conda/python-3.10.12/Python/ceval.c:1134 #59 run_eval_code_obj from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1291 #60 run_mod from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1312 #61 pyrun_file from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1208 #62 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:456 #63 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:90 #64 pymain_run_file_obj from /usr/local/src/conda/python-3.10.12/Modules/main.c:357 #65 Py_BytesMain from /usr/local/src/conda/python-3.10.12/Modules/main.c:1090 #66 __libc_start_call_main from ??:0 #67 <unwind unsupported> from ??:0 ``` Pull Request resolved: pytorch#118924 Approved by: https://github.com/kwen2501

… [attempt 2] (pytorch#119107) Attempt #2 for pytorch#117875 to fix pytorch#112090. Summary of changes: - ~Changed CacheEntry linked list into a doubly-linked list structure to support deletion.~ (done by C++ refactor) - Added CacheEntry and ExtraState borrowed references to GuardFn so that GuardFn can tell ExtraState to delete CacheEntry when the GuardFn is invalidated. - ~Added ExtraState raw reference to CacheEntry so that we can get ExtraState to correctly point to the first CacheEntry if it gets deleted.~ (done by C++ refactor) - CacheEntry destructor needs to reset GuardFn refs to ExtraState/CacheEntry in order to prevent use-after-free. - code_context values that are nn.GraphModules need to be weakrefs in order to prevent circular references. - Added tests that check for memory leaks and cache deletion operations. Pull Request resolved: pytorch#119107 Approved by: https://github.com/jansel

user may not know which line of code called collectives in a big code base. When debugging, we can print python-cpp stacktrace in case user call ``ProcessGroup.reduce`` instead of ``torch.distributed.reduce`` ``` LOG(INFO) << "ProcessGroupNCCL::_allgather_base stacktrace: " << get_python_cpp_trace(); ``` output (using _allgather_base as an example): one example python-part trace is ``all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838`` ``` ProcessGroupNCCL::_allgather_base stacktrace: #0 torch::unwind::unwind() from ??:0 #1 torch::CapturedTraceback::gather(bool, bool, bool) from ??:0 #2 c10d::get_python_cpp_trace[abi:cxx11]() from :0 #3 c10d::ProcessGroupNCCL::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from ??:0 #4 c10d::ops::(anonymous namespace)::_allgather_base_CUDA(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long) from Ops.cpp:0 #5 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > > (*)(at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long), std::tuple<at::Tensor, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > >, c10::guts::typelist::typelist<at::Tensor&, at::Tensor&, c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&, bool, long> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from :0 #6 torch::autograd::basicAutogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from autograd_not_implemented_fallback.cpp:0 #7 c10d::ProcessGroup::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) from :0 #8 pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> >, c10d::ProcessGroup, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (c10d::ProcessGroup::*)(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&)#1}&&, c10::intrusive_ptr<c10d::Work, c10::detail::intrusive_target_default_null_type<c10d::Work> > (*)(c10d::ProcessGroup*, at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from :0 #9 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 #10 cfunction_call from /usr/local/src/conda/python-3.10.12/Objects/methodobject.c:543 #11 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215 #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112 #13 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #14 all_gather_into_tensor from /data/users/weif/pytorch/torch/distributed/distributed_c10d.py:2838 #15 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #16 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #17 wrapper from /data/users/weif/pytorch/torch/distributed/c10d_logger.py:75 #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #20 _all_gather_flat_param from /data/users/weif/pytorch/torch/distributed/fsdp/_flat_param.py:1399 #21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #23 unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_flat_param.py:1308 #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #26 _unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:332 #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #29 _pre_forward_unshard from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:448 #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #32 _pre_forward from /data/users/weif/pytorch/torch/distributed/fsdp/_runtime_utils.py:413 #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #35 forward from /data/users/weif/pytorch/torch/distributed/fsdp/fully_sharded_data_parallel.py:839 #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #37 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #38 _call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1520 #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #40 do_call_core from /usr/local/src/conda/python-3.10.12/Python/ceval.c:5945 #41 _wrapped_call_impl from /data/users/weif/pytorch/torch/nn/modules/module.py:1511 #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #43 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.12/Objects/call.c:431 #44 slot_tp_call from /usr/local/src/conda/python-3.10.12/Objects/typeobject.c:7494 #45 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.12/Objects/call.c:215 #46 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:112 #47 inner from /data/users/weif/pytorch/run_fsdp.py:72 #48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #50 run from /data/users/weif/pytorch/run_fsdp.py:76 #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #53 main from /data/users/weif/pytorch/run_fsdp.py:133 #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.12/Include/cpython/abstract.h:114 #56 <module> from /data/users/weif/pytorch/run_fsdp.py:137 #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.12/Include/internal/pycore_ceval.h:46 #58 PyEval_EvalCode from /usr/local/src/conda/python-3.10.12/Python/ceval.c:1134 #59 run_eval_code_obj from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1291 #60 run_mod from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1312 #61 pyrun_file from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:1208 #62 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:456 #63 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.12/Python/pythonrun.c:90 #64 pymain_run_file_obj from /usr/local/src/conda/python-3.10.12/Modules/main.c:357 #65 Py_BytesMain from /usr/local/src/conda/python-3.10.12/Modules/main.c:1090 #66 __libc_start_call_main from ??:0 #67 <unwind unsupported> from ??:0 ``` Pull Request resolved: pytorch#118924 Approved by: https://github.com/kwen2501

… [attempt 2] (pytorch#119107) Attempt #2 for pytorch#117875 to fix pytorch#112090. Summary of changes: - ~Changed CacheEntry linked list into a doubly-linked list structure to support deletion.~ (done by C++ refactor) - Added CacheEntry and ExtraState borrowed references to GuardFn so that GuardFn can tell ExtraState to delete CacheEntry when the GuardFn is invalidated. - ~Added ExtraState raw reference to CacheEntry so that we can get ExtraState to correctly point to the first CacheEntry if it gets deleted.~ (done by C++ refactor) - CacheEntry destructor needs to reset GuardFn refs to ExtraState/CacheEntry in order to prevent use-after-free. - code_context values that are nn.GraphModules need to be weakrefs in order to prevent circular references. - Added tests that check for memory leaks and cache deletion operations. Pull Request resolved: pytorch#119107 Approved by: https://github.com/jansel

…h#121813) * [Release only changes] Release only changes #2 * common+lint

pytorch#126677) …destruction of tensors cached by autocast ## Root Cause For out-of-tree device extension it is loaded after torch (different .so), so the global variable `cached_casts` may be constructed before caching allocator and then destructed in reversed order when exit. ## Fix Lazily initialize `cached_casts` to correct the order. ## How to Reproduce && Test Modify the testcase `TestAutocastGPU.test_cast_cache_is_global` in test/test_autocast.py to run on your out-of-tree device. You will see following failure in the end of test. ```bash ---------------------------------------------------------------------- Ran 1 test in 4.812s OK free: 0x30080ff44000400 terminate called after throwing an instance of 'c10::Error' what(): invalid device pointer: 0x30080ff44000400 Exception raised from free at /projs/framework/betterman/code/pytorch_new/catch/torch_mlu/csrc/framework/core/caching_allocator.cpp:1609 (most recent call first): frame #0: <unknown function> + 0x118fe1 (0x7ffaef4d3fe1 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#1: <unknown function> + 0x11b1c4 (0x7ffaef4d61c4 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#2: <unknown function> + 0x117677 (0x7ffaef4d2677 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#3: <unknown function> + 0x11a2bf (0x7ffaef4d52bf in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#4: <unknown function> + 0x11a186 (0x7ffaef4d5186 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#5: <unknown function> + 0x119fde (0x7ffaef4d4fde in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#6: <unknown function> + 0x119d2e (0x7ffaef4d4d2e in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#7: <unknown function> + 0x119be0 (0x7ffaef4d4be0 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#8: <unknown function> + 0x119977 (0x7ffaef4d4977 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#9: <unknown function> + 0x119313 (0x7ffaef4d4313 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#10: <unknown function> + 0x118b4c (0x7ffaef4d3b4c in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#11: c10::Error::Error(c10::SourceLocation, std::string) + 0x34 (0x7ffaef4d27c4 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#12: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x7f (0x7ffaef4d04ed in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#13: torch_mlu::MLUCachingAllocator::Native::NativeCachingAllocator::free(void*) + 0xe6 (0x7ff9a8eeb112 in /projs/framework/betterman/code/pytorch_new/catch/torch_mlu/csrc/lib/libtorch_mlu.so) frame ROCm#14: torch_mlu::MLUCachingAllocator::Native::local_raw_delete(void*) + 0x3b (0x7ff9a8ed9480 in /projs/framework/betterman/code/pytorch_new/catch/torch_mlu/csrc/lib/libtorch_mlu.so) frame ROCm#15: std::unique_ptr<void, void (*)(void*)>::~unique_ptr() + 0x50 (0x7ffb0a5ea322 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_python.so) frame ROCm#16: <unknown function> + 0x1269890 (0x7ffb0a5e4890 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_python.so) frame ROCm#17: <unknown function> + 0x1269928 (0x7ffb0a5e4928 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_python.so) frame ROCm#18: <unknown function> + 0x127572c (0x7ffb0a5f072c in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_python.so) frame ROCm#19: <unknown function> + 0x1275758 (0x7ffb0a5f0758 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_python.so) frame ROCm#20: <unknown function> + 0xb9bc7 (0x7ffaef474bc7 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#21: <unknown function> + 0xb97bc (0x7ffaef4747bc in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#22: <unknown function> + 0xdbc50 (0x7ffaef496c50 in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#23: c10::TensorImpl::~TensorImpl() + 0x82 (0x7ffaef49157e in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#24: c10::TensorImpl::~TensorImpl() + 0x1c (0x7ffaef4915aa in /projs/framework/betterman/code/pytorch_new/torch/lib/libc10.so) frame ROCm#25: <unknown function> + 0x2f596d9 (0x7ffaf24fc6d9 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame ROCm#26: <unknown function> + 0x2f589c2 (0x7ffaf24fb9c2 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame ROCm#27: <unknown function> + 0x2f57b92 (0x7ffaf24fab92 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame ROCm#28: <unknown function> + 0x2f5c228 (0x7ffaf24ff228 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame ROCm#29: <unknown function> + 0x30f3f70 (0x7ffaf2696f70 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame ROCm#30: <unknown function> + 0x30f3f90 (0x7ffaf2696f90 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame ROCm#31: <unknown function> + 0x30f5004 (0x7ffaf2698004 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame ROCm#32: <unknown function> + 0x30f5024 (0x7ffaf2698024 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame ROCm#33: <unknown function> + 0x31207f0 (0x7ffaf26c37f0 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame ROCm#34: <unknown function> + 0x3120814 (0x7ffaf26c3814 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame ROCm#35: <unknown function> + 0x30f51e8 (0x7ffaf26981e8 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame ROCm#36: <unknown function> + 0x30f5148 (0x7ffaf2698148 in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame ROCm#37: <unknown function> + 0x316ecea (0x7ffaf2711cea in /projs/framework/betterman/code/pytorch_new/torch/lib/libtorch_cpu.so) frame ROCm#38: <unknown function> + 0x468a7 (0x7ffb0c9ed8a7 in /lib/x86_64-linux-gnu/libc.so.6) frame ROCm#39: on_exit + 0 (0x7ffb0c9eda60 in /lib/x86_64-linux-gnu/libc.so.6) <omitting python frames> frame ROCm#47: __libc_start_main + 0xf3 (0x7ffb0c9cb083 in /lib/x86_64-linux-gnu/libc.so.6) Aborted (core dumped) ``` Pull Request resolved: pytorch#126677 Approved by: https://github.com/ezyang

Summary: There are two kinds of exceptions: Case #1: ``` static input data pointer changed. input name: primals_2. data pointer changed from 140315748992000 to 140315748993536. input stack trace: File "/dev/shm/uid-30083/c0899c70-seed-nspid4026535598_cgpid16622182-ns-4026535192/caffe2/test/inductor/test_cudagraph_trees.py", line 1826, in forward return self.static_tensor + x + self.goo(x) File "/dev/shm/uid-30083/c0899c70-seed-nspid4026535598_cgpid16622182-ns-4026535192/caffe2/test/inductor/test_cudagraph_trees.py", line 1816, in forward return self.linear(x) input name: primals_3. data pointer changed from 140315748990976 to 140315748993024. input stack trace: File "/dev/shm/uid-30083/c0899c70-seed-nspid4026535598_cgpid16622182-ns-4026535192/caffe2/test/inductor/test_cudagraph_trees.py", line 1825, in forward self.static_tensor.add_(torch.ones((2, 2), device="cuda")) ``` Case #2: ``` static input data pointer changed. input name: primals_2. data pointer changed from 139852509086720 to 139852509088256. input stack trace: None input name: primals_3. data pointer changed from 139852509085696 to 139852509087744. input stack trace: File "/dev/shm/uid-30083/f61ee184-seed-nspid4026560782_cgpid769179-ns-4026560865/caffe2/test/inductor/test_cudagraph_trees.py", line 1825, in forward self.static_tensor.add_(torch.ones((2, 2), device="cuda")) ``` The current impl only covered the case #2 Test Plan: https://www.internalfb.com/intern/testinfra/testrun/15481123762274476 Differential Revision: D60340212 Pull Request resolved: pytorch#132043 Approved by: https://github.com/BoyuanFeng

…Try #2 (pytorch#137377) ExecuTorch's fork of BlasKernel.cpp grew bfdot support, complete with demonstration that it helps. Port it back to PyTorch. First attempt was pytorch#136331 . Differential Revision: [D63923166](https://our.internmc.facebook.com/intern/diff/D63923166/) Pull Request resolved: pytorch#137377 Approved by: https://github.com/malfet

@zdevito

…ytorch#139659) ### Motivation Today, watchdog only reports that it found a collective timeout: ``` [rank1]:[E1104 14:02:18.767594328 ProcessGroupNCCL.cpp:688] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, NumelIn=200, NumelOut=200, Timeout(ms)=5000) ran for 5096 milliseconds before timing out. ``` While this is nice, it is hard to associate the error with user's program or library stack. ### This PR This PR gives watchdog the ability to report the call-time stack of the collective, so that it would be easier to track the error back to the program's behavior. The call-time stack was recorded by Flight Recorder with minimal overhead (for details, please read this [doc](https://dev-discuss.pytorch.org/t/fast-combined-c-python-torchscript-inductor-tracebacks/1158) written by @zdevito ). In `ProcessGroupNCCL`, we are only tracking / reporting the python part so that it fits most PyTorch users. ### Demo [stack_demo.py](https://gist.github.com/kwen2501/6758e18d305d67fc6f3f926217825c09). ``` TORCH_NCCL_TRACE_BUFFER_SIZE=100 torchrun --nproc-per-node 2 stack_demo.py ``` `TORCH_NCCL_TRACE_BUFFER_SIZE` is for turning on the Flight Recorder. Output: ``` [rank0]:[E1104 14:19:27.591610653 ProcessGroupNCCL.cpp:695] Stack trace of the timedout collective operation: #0 all_reduce from /data/users/kw2501/pytorch/torch/distributed/distributed_c10d.py:2696 #1 wrapper from /data/users/kw2501/pytorch/torch/distributed/c10d_logger.py:83 #2 bar from /data/users/kw2501/sync_async/repro.py:15 #3 foo from /data/users/kw2501/sync_async/repro.py:24 #4 main from /data/users/kw2501/sync_async/repro.py:34 #5 <module> from /data/users/kw2501/sync_async/repro.py:40 [rank1]:[E1104 14:19:27.771430164 ProcessGroupNCCL.cpp:695] Stack trace of the timedout collective operation: #0 all_gather_into_tensor from /data/users/kw2501/pytorch/torch/distributed/distributed_c10d.py:3630 #1 wrapper from /data/users/kw2501/pytorch/torch/distributed/c10d_logger.py:83 #2 baz from /data/users/kw2501/sync_async/repro.py:20 #3 foo from /data/users/kw2501/sync_async/repro.py:26 #4 main from /data/users/kw2501/sync_async/repro.py:34 #5 <module> from /data/users/kw2501/sync_async/repro.py:40 ``` From the log above, we can tell that `bar()` and `baz()` are the places where the two ranks divert. Pull Request resolved: pytorch#139659 Approved by: https://github.com/wconstab, https://github.com/fduwjj

ngimel and others added 30 commits June 5, 2018 13:17

remove some unnecessary cudaGetDevices (pytorch#8089)

7c1e8c3

* remove unnecessary cudaGetDevices * make curDevice argument non-optional, add explicit checks to current_device

[C++ API] Improve and use OrderedDict for parameters / modules (pytor…

990c6c5

…ch#7823) * Improve OrderedDict for C++ API * Give OrderedDict a subject and fix review comments * Fix OrderedDict use in torch/csrc/jit/script/init.cpp

Fix __rshift__ bug (pytorch#8161)

1cdd7b5

* Fix __rshift__ bug * Add small tests for __lshift__ and __rshift__ in test_cuda * Add a more elaborate check for __lshift__ and __rshift__ * refactor the test to address @zou3519 's comments

Pinning opencv to < 3.4 in conda builds (pytorch#7923)

fa1bdcf

* Pinning opencv to 3.1.0 in conda builds * Also pinning numpy to 1.11 * Trying only specifying <3.4

PyTorch AMD Build Script.

2774d83

Python invocation for hipify

6accd1a

Adding individual hip fles.

89613c2

Updating CWD

24c50a8

Use the actual path for the file instead of the current working directory, which depends on where the script is invoked.

Updating folder path for amd_build

7dfb589

Removing previous amd_build directory

7843a7c

Updated setup.py to support WITH_ROCM

bb5d5ea

Renaming the files for CuDNN BatchNorm & Conv since having two .cpp f…

ec577b5

…iles with the same name results in a linking error in the HCC compiler used for ROCm/AMD.

Removing old BatchNorm & Conv files since they've been renamed.

518418b

Cleaned up the build path and created a FindHIP cmake file for settin…

b86db2d

…g up relevant hip paths.

Seperated the individual patch files to make it easier to detect issu…

87edc4f

…es while building.

Removed CMakeLists hip files and fixed directory structure

54e3129

Adding build pytorch amd script

d5c7abb

Merged setup patch into PyTorch setup.py & cleaned a few issues

62a289f

Removing many unnecessary patch files. Removing unnecessary .hip file…

1ed43f7

…s. Fixing up the build process.

Refactored the PR for supporting HIP

a67b748

Minimizing the number of changes inside individual patches.

d70dd6e

Cleaned up patch files.

f6c9fdb

Removed patch files.

f7572d2

Updating patches

bf54087

Cleaned up patches

c971d55

Added AVX/SSE avoidance due to bug with ROCms stack. Just temporary f…

57b1377

…or now.

Removing the other HIP file

069cd27

Removed patch file + merged ROCm into Aten/test

bd00d50

jataylo added a commit that referenced this pull request Feb 9, 2024

Remove unused TEST_WITH_ROCM import #2

484af6d

xinyazhang pushed a commit that referenced this pull request Mar 29, 2024

[RELEASE ONLY CHANGES] Apply release only changes Release 2.3 (pytorc…

6a89a75

…h#121813) * [Release only changes] Release only changes #2 * common+lint

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rocRAND integration #2

rocRAND integration #2

iotamudelta commented Jun 11, 2018

rocRAND integration #2

rocRAND integration #2

Conversation

iotamudelta commented Jun 11, 2018