Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PyTorch] Update to 2.4.0, add distributed #1510

Open
wants to merge 59 commits into
base: master
Choose a base branch
from

Conversation

HGuillemet
Copy link
Collaborator

@HGuillemet HGuillemet commented Jun 7, 2024

Included in this PR:

  • Update to PyTorch 2.4.0
  • Add distributed framework (backend Gloo)
  • Add a minimal binding for std::chrono, needed by distributed (TBC)
  • Add an adapter for intrusive_ptr, enabling transparent usage like for shared_ptr
  • Add an adapter for weak_ptr, enabling transparent usage (could be moved to JavaCPP ?)
  • Add an optional Maven dependency to cuda. This allows to use types defined in the cuda presets. Users calling a method of PyTroch presets using one of these types should then have the cuda presets. Which also means the cuda library of the presets will be used, and not the system one (unless the user sets org.bytedeco.javacpp.pathsFirst and their system cuda has same version than the cuda presets).
  • Merge functions packages to main package
  • Fix [PyTorch 2.2.2-1.5.11-SNAPSHOT] Training produces poor MNIST model on Windows #1503
  • Fix [PyTorch] Training is very slow on Linux. #1504
  • Fix support for OpenMP on macos

Remains to be done:

  • Determine if other distributed backends are needed (MPI or UCC. NCCL is not supported on Windows yet).

@HGuillemet HGuillemet marked this pull request as draft June 7, 2024 20:59
@saudet
Copy link
Member

saudet commented Jul 8, 2024

@HGuillemet Your account doesn't already have permission to publish for org.bytedeco??

@HGuillemet
Copy link
Collaborator Author

I don't have the secrets. Anyway PR builds don't publish.

@saudet
Copy link
Member

saudet commented Jul 9, 2024

You should be able to publish to org.bytedeco using your own account. I'm pretty sure I gave you access a long time ago.

@saudet
Copy link
Member

saudet commented Jul 9, 2024

Or maybe not, what the name of your OSSRH account?

@HGuillemet
Copy link
Collaborator Author

HGuillemet. But don't bother.

@saudet
Copy link
Member

saudet commented Aug 1, 2024

@HGuillemet Let's just merge this and fix anything broken with the upgrade to PyTorch 2.4?

@HGuillemet
Copy link
Collaborator Author

I'm about to add a commit for 2.4.0 upgrade

@saudet
Copy link
Member

saudet commented Aug 1, 2024

Ok, we can do that too

@HGuillemet
Copy link
Collaborator Author

Before merging, you must decide what to do with bytedeco/javacpp#766

@HGuillemet
Copy link
Collaborator Author

Also to decide: whether adding the cuda dependency is a good thing or not.

@saudet
Copy link
Member

saudet commented Aug 1, 2024

An optional dependency on the presets for CUDA that only torch_cuda needs, sure, that's fine.

.put(new Info().javaText("import org.bytedeco.cuda.cusolver.*;"))
.put(new Info().javaText("import org.bytedeco.cuda.cudnn.*;"))
// .put(new Info().javaText("import org.bytedeco.cuda.nccl.*;")) // Not on Windows
.put(new Info().javaText("import org.bytedeco.pytorch.chrono.*;"))
Copy link
Member

@saudet saudet Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing that, please add helper.class, chrono.class and cudnn.class to the @Platform(inherit=... list.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done for cuda, but what helper.class and chrono.class are you talking about ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, JavaCPP assumes the helper classes to be in the same package as the generated classes, so let's remove that package too?

As for chrono, see bytedeco/javacpp#766 (comment)

@HGuillemet
Copy link
Collaborator Author

An optional dependency on the presets for CUDA that only torch_cuda needs, sure, that's fine.

Consider my comment on top post: the dependency will be mandatory when using CUDA. Presently, most users use their system cuda I guess, they will now have to use cuda presets.

@saudet
Copy link
Member

saudet commented Aug 1, 2024

Sure, that's fine, it's not a huge dependency

@HGuillemet HGuillemet changed the title [PyTorch] Update to 2.3.1, add distributed [PyTorch] Update to 2.4.0, add distributed Aug 1, 2024
@HGuillemet
Copy link
Collaborator Author

@saudet please restart the jobs that failed

@HGuillemet HGuillemet marked this pull request as ready for review August 6, 2024 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants