Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Cross-Platform Refactor: Mac M1 support #1020

Open
5 tasks
rickardp opened this issue Feb 3, 2024 · 2 comments
Open
5 tasks

[RFC] Cross-Platform Refactor: Mac M1 support #1020

rickardp opened this issue Feb 3, 2024 · 2 comments

Comments

@rickardp
Copy link
Contributor

rickardp commented Feb 3, 2024

Motivation

The newer Macs with Apple Silicon (M1 and up) are actually quite powerful and even the lowest end M1 MacBook Air are impressive. In addition, the Apple platform is very suitable for ML workloads thanks to their unified memory architecture (all system RAM can be used as GPU memory with no performance penalty).

The Apple accelerated API is called MPS (Metal Performance Shaders) and is not at all compatible with CUDA, so this requires porting all the kernels, as well as writing the stub code.

Additionally, the Mac is a very popular platform for developers. Supporting MacOS natively for the popular torch libraries (as a longer term goal) means we don't have to resort to expensive Nvidia cloud VMs for every single task.

Proposed solution

@Titus-von-Koeller Feel free to edit this issue as you see fit, if you want a different structure for it for example.

@niclimcy
Copy link

niclimcy commented Feb 3, 2024

There was one project that I was following that is using MPS: https://github.com/ggerganov/llama.cpp

And that was where I got the idea of using CMake for cross platform support. Not too familiar with MPS but sharing for more context.

@Titus-von-Koeller
Copy link
Collaborator

@rickardp summarized the approach:

My proposal [is]:

  • Support cross platform build
  • Build all in GitHub actions
  • Improve unit tests
  • portable bootstrapping (possibly look at if we can use PyTorch for common churns like finding DLL paths)
  • Then
  • Using a test driven approach, port function (kernel) by function

I fixed some of the “plumbing” on my branch, specifically:

  • Switch to cmake over pure makefiles (as cmake does a lot of the needed detections out of the box)
  • Builds CPU on Windows and Mac (arm64 and x64)
  • Builds and runs CPU tests on GitHub actions (all platforms including Mac and Windows)
  • Unit tests which depend on CUDA only code are skipped on non-cuda platforms, non-cuda tests now go green if CUDA is not present
  • MPS bootstrapping code for Apple Silicon

Would this be something you would find acceptable? If you approve of this, we could baseline on a portable build/test system then the community could work incrementally by adding MPS kernels and possibly also CPU kernels (I would actually think it would be useful for this library to be able to run on CPU only).

or would you rather have one PR where it is usable straight off the bat? (Then the community effort could go on in my fork, or somewhere else).

(I am myself more of a software engineer rather than a data scientist, so I can help out with software engineering parts of the problem (for one, this means I want a simple unit test to tell me if my kernel is working or not, rather than a higher level metric etc). Though I do know a fair share of PyTorch and GPU development so I can help out with the porting where there is a clear spec.)

Also I think the community could help out with API documentation as a way of getting the spec and expected outcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants