Skip to content

NeMo RL training tutorial errors #784

@cwing-nvidia

Description

@cwing-nvidia
  1. The sanity test script used to validate the training environment fails with the following assertion error: AssertionError: assert {'final_batch...': None, ...}} == {'final_batch...': None, ...}}.
Image
  1. The training tutorial for Nemo-RL fails because the NeMo RL container does not include Megatron. This leads to a “module not found” error when attempting to run the Nemo Gym training example on a single A100 GPU.
Image

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions