torch.argsort crashes when tensor is on Neuron device #868

evellasques · 2024-04-11T11:28:13Z

For some reason, torch.argsort is crashing when the tensor is on a Neuron device. For example, the code snippet bellow works as expected (on CPU):

>>> import torch
>>> import torch_neuronx
>>> x = torch.randint(10, [10,1])
>>> torch.argsort(x, dim=0)
tensor([[1],
        [8],
        [6],
        [4],
        [3],
        [7],
        [0],
        [5],
        [2],
        [9]])

But when I move x to Neuron device:

>>> import torch_xla.core.xla_model as xm
>>> x = torch.randint(10, [10, 1]).to(xm.xla_device())
>>> torch.argsort(x, dim=0)

It's crashing:

2024-04-11 11:25:10.000599:  18868  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-04-11 11:25:10.000601:  18868  ERROR ||NEURON_CC_WRAPPER||: Got a cached failed neff at /var/tmp/neuron-compile-cache/neuronxcc-2.13.66.0+6dfecc895/MODULE_1494686516964995913+d41d8cd9/model.neff. Will skip compilation, please set --retry_failed_compilation for recompilation: 
 Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/e7a868ef-54fd-4c90-9210-6674cde1d517/model.MODULE_1494686516964995913+d41d8cd9.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/e7a868ef-54fd-4c90-9210-6674cde1d517/model.MODULE_1494686516964995913+d41d8cd9.neff', '--verbose=35']: 2024-04-11T10:42:56Z [TEN404] (_sort.12) Internal tensorizer error - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new
.
2024-04-11 11:25:10.636084: F ./torch_xla/csrc/runtime/debug_macros.h:20] Non-OK-status: status.status() status: INTERNAL: RunNeuronCCImpl: error condition error != 0: <class 'subprocess.CalledProcessError'>: Command '' died with <Signals.SIGHUP: 1>.
*** Begin stack trace ***
        tsl::CurrentStackTrace()
        std::unique_ptr<xla::PjRtLoadedExecutable, std::default_delete<xla::PjRtLoadedExecutable> > ConsumeValue<std::unique_ptr<xla::PjRtLoadedExecutable, std::default_delete<xla::PjRtLoadedExecutable> > >(absl::lts_20230125::StatusOr<std::unique_ptr<xla::PjRtLoadedExecutable, std::default_delete<xla::PjRtLoadedExecutable> > >&&)
        torch_xla::runtime::PjRtComputationClient::Compile(std::vector<torch_xla::runtime::ComputationClient::CompileInstance, std::allocator<torch_xla::runtime::ComputationClient::CompileInstance> >)
        torch_xla::XLAGraphExecutor::Compile(std::vector<c10::intrusive_ptr<torch_xla::XLATensor, c10::detail::intrusive_target_default_null_type<torch_xla::XLATensor> >, std::allocator<c10::intrusive_ptr<torch_xla::XLATensor, c10::detail::intrusive_target_default_null_type<torch_xla::XLATensor> > > > const&, absl::lts_20230125::Span<std::string const>, torch::lazy::LazyGraphExecutor::SyncTensorCollection const&, torch::lazy::LazyGraphExecutor::PostOrderData*, std::vector<torch::lazy::Value, std::allocator<torch::lazy::Value> > const&)
        torch_xla::XLAGraphExecutor::SyncTensorsGraphInternal(std::vector<c10::intrusive_ptr<torch_xla::XLATensor, c10::detail::intrusive_target_default_null_type<torch_xla::XLATensor> >, std::allocator<c10::intrusive_ptr<torch_xla::XLATensor, c10::detail::intrusive_target_default_null_type<torch_xla::XLATensor> > > >*, absl::lts_20230125::Span<std::string const>, torch::lazy::LazyGraphExecutor::SyncTensorsConfig const&, bool)
        torch_xla::XLAGraphExecutor::SyncTensorsGraph(std::vector<c10::intrusive_ptr<torch_xla::XLATensor, c10::detail::intrusive_target_default_null_type<torch_xla::XLATensor> >, std::allocator<c10::intrusive_ptr<torch_xla::XLATensor, c10::detail::intrusive_target_default_null_type<torch_xla::XLATensor> > > >*, absl::lts_20230125::Span<std::string const>, bool, bool, bool)
        torch_xla::XLATensor::ApplyPendingGraph()
        torch_xla::XLATensor::GetXlaData()
        torch_xla::XLATensor::ToTensor(bool)
        torch_xla::XLANativeFunctions::_to_copy(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>)

        at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>)

        at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>)


        at::_ops::_to_copy::call(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>)
        at::native::to(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, bool, c10::optional<c10::MemoryFormat>)

        at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, bool, c10::optional<c10::MemoryFormat>)



        _PyEval_EvalFrameDefault

        _PyFunction_Vectorcall
        _PyEval_EvalFrameDefault

        _PyFunction_Vectorcall
        _PyEval_EvalFrameDefault

        _PyFunction_Vectorcall

        PyObject_Repr
        PyFile_WriteObject


        _PyEval_EvalFrameDefault

        _PyEval_EvalCodeWithName
        PyEval_EvalCodeEx
        PyEval_EvalCode


        PyRun_InteractiveLoopFlags
        PyRun_AnyFileExFlags
        Py_RunMain
        Py_BytesMain
        __libc_start_main
        _start
*** End stack trace ***

Aborted (core dumped)

Output of pip list for the relevant packages:

libneuronxla==2.0.965
neuronx-cc==2.13.66.0+6dfecc895
numpy==1.25.2
torch==2.1.2
torch-neuronx==2.1.2.2.1.0
torch-xla==2.1.2
torchmetrics==0.10.3
torchvision==0.16.2

EC2 Instance: trn1.32.xlarge
Instance ID: i-08d5b389cb04f873d
Neuron Runtime: aws-neuronx-runtime-lib/unknown,now 2.20.22.0-1b3ca6425
OS: Ubuntu 20

The text was updated successfully, but these errors were encountered:

jluntamazon · 2024-04-11T23:20:57Z

Hello @evellasques,

Currently sorting operations are not supported. We have completed work to add support for this and we will be including it in an upcoming release.

We will update this ticket when the release is live and support has been added.

JigaoLuo · 2024-07-25T19:13:46Z

Is there documentation available on the operations currently supported by the SDK?
I think I find it, but not super sure if it is the true one: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/pytorch-neuron-supported-operators.html#pytorch-neuron-supported-operators

evellasques changed the title ~~torch.argsort crashing when tensor is on Neuron device~~ torch.argsort crashes when tensor is on Neuron device Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.argsort crashes when tensor is on Neuron device #868

torch.argsort crashes when tensor is on Neuron device #868

evellasques commented Apr 11, 2024

jluntamazon commented Apr 11, 2024

JigaoLuo commented Jul 25, 2024 •

edited

Loading

torch.argsort crashes when tensor is on Neuron device #868

torch.argsort crashes when tensor is on Neuron device #868

Comments

evellasques commented Apr 11, 2024

jluntamazon commented Apr 11, 2024

JigaoLuo commented Jul 25, 2024 • edited Loading

JigaoLuo commented Jul 25, 2024 •

edited

Loading