Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Compilation Issue in AWS Neuron Environment #76

Closed
ShivamB25 opened this issue May 30, 2024 · 3 comments
Closed

Model Compilation Issue in AWS Neuron Environment #76

ShivamB25 opened this issue May 30, 2024 · 3 comments

Comments

@ShivamB25
Copy link

Description

After running the code until the compilation part, the models do not exist. The compilation logs indicate that the process completes without errors, but the expected model file model.pt is missing from the directory sd2_compile_dir_768/unet/.

Steps to Reproduce

  1. Activate the pre-built PyTorch-2.1 environment for Inf2, Trn*:
    source /opt/aws_neuronx_venv_pytorch_2_1/bin/activate
  2. Run the provided template code from the repository:
    python3 test3.py
  3. Observe the logs and check for the existence of the model file in the specified directory.

Expected Behavior

The model file model.pt should be present in the directory sd2_compile_dir_768/unet/ after the compilation process completes.

Actual Behavior

The model file model.pt is missing from the directory sd2_compile_dir_768/unet/.

Compilation Logs

2024-05-30T09:32:51Z Running birverifier
2024-05-30T09:32:52Z birverifier finished after 1.166 seconds
2024-05-30T09:32:52Z Running codegen
2024-05-30T09:32:57Z isa_gen finished after 4.293 seconds
2024-05-30T09:32:58Z dma_desc_gen finished after 1.495 seconds
2024-05-30T09:33:01Z debug_info_gen finished after 2.790 seconds
2024-05-30T09:33:02Z codegen finished after 9.213 seconds
2024-05-30T09:33:02Z Running neff_packager
2024-05-30T09:33:29Z neff_packager finished after 27.627 seconds

Error Message

Traceback (most recent call last):
  File "/home/ubuntu/test3.py", line 124, in <module>
    pipe.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_filename), device_ids, set_dynamic_batching=False)
  File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch/jit/_serialization.py", line 152, in load
    raise ValueError(f"The provided filename {f} does not exist")  # type: ignore[str-bytes-safe]
ValueError: The provided filename sd2_compile_dir_768/unet/model.pt does not exist

Environment Details

  • OS: Ubuntu 22.04
  • AWS Neuron Environment: PyTorch-2.1
  • Instance Type: Inf2.xlarge

Additional Information

Key Value
Repository aws-neuron-samples
Template Used hf_pretrained_sd2_768_inference.ipynb
Script test.py (for compilation)

Screenshots

Screenshot 2024-05-30 at 3 17 04 PM

@chafik-c
Copy link

chafik-c commented Jun 3, 2024

The model file model.pt is missing from the directory sd2_compile_dir_768/unet/.
Hi, I routed the issue to the appropriate team within the org. We will track it and get back to you.

@ShivamB25
Copy link
Author

@chafik-c thanks

@ShivamB25
Copy link
Author

@chafik-c it maye be ran out of ram. i tried this on on 8x large and it worked fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants