Skip to content

Conversation

bmind7
Copy link
Contributor

@bmind7 bmind7 commented Sep 16, 2025

Proposed change(s)

On Windows, running with threaded: true produced “tensors on different devices” errors. Threaded trainers create tensors in multiple threads. Implicit CPU allocations (or per-thread default device changes) led to CPU-CUDA mixing and PyTorch mode stack corruption. Making device placement explicit and consistent prevents both classes of errors.

  • policy/torch_policy.py
    - Create action masks, observations, and RNN memories on default_device() during inference.
  • torch_entities/utils.py
    - ModelUtils.list_to_tensor() and list_to_tensor_list() now allocate on default_device().
  • torch_entities/networks.py
    - VectorInput.update_normalization() now uses device-correct tensors.
  • optimizer/torch_optimizer.py, poca/optimizer_torch.py
    - Initialize zero RNN memories on default_device().
  • torch_entities/components/reward_providers/gail_reward_provider.py
    - Ensure DONE tensors, epsilons, and accumulators allocate on the correct device.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

  • [x ] Bug fix

Updated tensor creation in torch_policy.py and utils.py to explicitly use the default device, ensuring consistency across devices (CPU/GPU). Also set torch config in TrainerController to use the default device. This improves device management and prevents potential device mismatch errors.
Updated tensor creation in optimizers, reward providers, and network normalization to explicitly use the configured default_device. Removed redundant set_torch_config call in trainer_controller to avoid interfering with PyTorch's global device context. These changes improve device consistency and prevent device mismatch errors in multi-threaded or multi-device training scenarios.
@CLAassistant
Copy link

CLAassistant commented Sep 16, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ maryamziaa
❌ bmind7
You have signed the CLA already but the status is still pending? Let us recheck it.

@maryamziaa maryamziaa requested a review from Copilot September 16, 2025 12:31
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes CUDA/CPU device mismatch errors that occur during threaded training on Windows by making tensor device placement explicit and consistent across the codebase.

  • Ensures all tensor creation operations use default_device() to maintain device consistency
  • Fixes issues where tensors were implicitly created on different devices in multi-threaded environments
  • Updates utilities, networks, optimizers, and reward providers to use explicit device placement

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
torch_entities/utils.py Updates tensor creation utilities to allocate on default device
torch_entities/networks.py Fixes device placement in vector input normalization
torch_entities/components/reward_providers/gail_reward_provider.py Ensures GAIL reward provider tensors use correct device
policy/torch_policy.py Makes device placement explicit for masks, observations, and RNN memories
poca/optimizer_torch.py Initializes zero RNN memories on default device
optimizer/torch_optimizer.py Fixes RNN memory initialization device placement

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@maryamziaa maryamziaa requested a review from slunity September 16, 2025 12:33
@maryamziaa maryamziaa self-requested a review September 16, 2025 15:29
Copy link
Contributor

@maryamziaa maryamziaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Please fix the issue with black reformatting and then you should be good to merge the PR. Thanks!

@maryamziaa maryamziaa merged commit a83b3b8 into Unity-Technologies:develop Sep 16, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants