Model Compilation Issue in AWS Neuron Environment #76

ShivamB25 · 2024-05-30T09:51:54Z

Description

After running the code until the compilation part, the models do not exist. The compilation logs indicate that the process completes without errors, but the expected model file model.pt is missing from the directory sd2_compile_dir_768/unet/.

Steps to Reproduce

Activate the pre-built PyTorch-2.1 environment for Inf2, Trn*:
```
source /opt/aws_neuronx_venv_pytorch_2_1/bin/activate
```
Run the provided template code from the repository:
```
python3 test3.py
```
Observe the logs and check for the existence of the model file in the specified directory.

Expected Behavior

The model file model.pt should be present in the directory sd2_compile_dir_768/unet/ after the compilation process completes.

Actual Behavior

The model file model.pt is missing from the directory sd2_compile_dir_768/unet/.

Compilation Logs

2024-05-30T09:32:51Z Running birverifier
2024-05-30T09:32:52Z birverifier finished after 1.166 seconds
2024-05-30T09:32:52Z Running codegen
2024-05-30T09:32:57Z isa_gen finished after 4.293 seconds
2024-05-30T09:32:58Z dma_desc_gen finished after 1.495 seconds
2024-05-30T09:33:01Z debug_info_gen finished after 2.790 seconds
2024-05-30T09:33:02Z codegen finished after 9.213 seconds
2024-05-30T09:33:02Z Running neff_packager
2024-05-30T09:33:29Z neff_packager finished after 27.627 seconds

Error Message

Traceback (most recent call last):
  File "/home/ubuntu/test3.py", line 124, in <module>
    pipe.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_filename), device_ids, set_dynamic_batching=False)
  File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch/jit/_serialization.py", line 152, in load
    raise ValueError(f"The provided filename {f} does not exist")  # type: ignore[str-bytes-safe]
ValueError: The provided filename sd2_compile_dir_768/unet/model.pt does not exist

Environment Details

OS: Ubuntu 22.04
AWS Neuron Environment: PyTorch-2.1
Instance Type: Inf2.xlarge

Additional Information

Key	Value
Repository	aws-neuron-samples
Template Used	hf_pretrained_sd2_768_inference.ipynb
Script	`test.py (for compilation)`

Screenshots

The text was updated successfully, but these errors were encountered:

chafik-c · 2024-06-03T15:56:17Z

The model file model.pt is missing from the directory sd2_compile_dir_768/unet/.
Hi, I routed the issue to the appropriate team within the org. We will track it and get back to you.

ShivamB25 · 2024-06-03T15:57:47Z

@chafik-c thanks

ShivamB25 · 2024-06-14T08:00:27Z

@chafik-c it maye be ran out of ram. i tried this on on 8x large and it worked fine

ShivamB25 closed this as completed Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Compilation Issue in AWS Neuron Environment #76

Model Compilation Issue in AWS Neuron Environment #76

ShivamB25 commented May 30, 2024

chafik-c commented Jun 3, 2024

ShivamB25 commented Jun 3, 2024

ShivamB25 commented Jun 14, 2024

Model Compilation Issue in AWS Neuron Environment #76

Model Compilation Issue in AWS Neuron Environment #76

Comments

ShivamB25 commented May 30, 2024

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Compilation Logs

Error Message

Environment Details

Additional Information

Screenshots

chafik-c commented Jun 3, 2024

ShivamB25 commented Jun 3, 2024

ShivamB25 commented Jun 14, 2024