Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using torch.block_diag method #855

Closed
joelamzn opened this issue Mar 26, 2024 · 2 comments
Closed

Error when using torch.block_diag method #855

joelamzn opened this issue Mar 26, 2024 · 2 comments

Comments

@joelamzn
Copy link

The following code throws the below error on trn1.32xlarge instance.

>>> import torch, torch_xla
>>> import torch_xla.core.xla_model as xm
>>> device = xm.xla_device()
>>> segments = [torch.ones((1,4), device=device)]
>>> torch.block_diag(*segments[:])

Error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/torch/functional.py", line 1266, in block_diag
    return torch._C._VariableFunctions.block_diag(tensors)  # type: ignore[attr-defined]
RuntimeError: torch_xla/csrc/aten_xla_type.cpp:3426 : Check failed: !runtime::sys_util::GetEnvBool("XLA_DISABLE_FUNCTIONALIZATION", false)
*** Begin stack trace ***
	tsl::CurrentStackTrace()
	torch_xla::XLANativeFunctions::block_diag(c10::ArrayRef<at::Tensor>)

	at::_ops::block_diag::redispatch(c10::DispatchKeySet, c10::ArrayRef<at::Tensor>)


	at::_ops::block_diag::call(c10::ArrayRef<at::Tensor>)

	PyCFunction_Call
	_PyObject_MakeTpCall
	_PyEval_EvalFrameDefault
	_PyEval_EvalCodeWithName
	_PyFunction_Vectorcall
	PyObject_Call
	_PyEval_EvalFrameDefault
	_PyEval_EvalCodeWithName
	PyEval_EvalCode



	PyRun_InteractiveLoopFlags
	PyRun_AnyFileExFlags

	Py_BytesMain
	__libc_start_main
	_start
*** End stack trace ***

Setup

  • Torch: 2.1.2+cu121
  • Torch XLA: 2.1.1
@joelamzn joelamzn changed the title Compiler error when using torch.block_diag method Error when using torch.block_diag method Mar 26, 2024
@jeffhataws
Copy link
Contributor

Hi @joelamzn ,

Thanks for reporting the issue. Currently we disable functionalization as default for performance reason. Will you try setting XLA_DISABLE_FUNCTIONALIZATION=0 to run your example. I tried your code with this environment variable setting and no longer see the error.

@aws-taylor
Copy link
Contributor

Hello @joelamzn,

We haven't heard from you in a while, so I'm going to resolve this issue. Feel free to re-open if you require further asstance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants