-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting Petals to run on macOS #147
Comments
Sorry for taking so long to respond, we're a bit overwhelmed r/n, will respond within the next 24 hours |
No worries, its the holiday season =) Have a Merry Christmas |
Thanks! Should you run on M1?I found a guy with an M1 Max macbook pro to run some compute tests. Surprisingly, M1 is competitive for autoregressive inference. It's still 2.5 times slower than an A6000, but way more energy efficient. For training, the comparison is less favourable, probably because you need more raw tflops, not just fast memory. So, surprisingly, yes, that makes sense. Can you run on M1?The current status is "you probably can, but it will require tinkering"
Notes:
[opinion] On the contrary, KWh matters a lot, but the actual kwh is significantly less due to the fact that not all GPUs are compute-utilized all the time, even under heavy use. |
Awesome!, its good to have some validation that the idea actually makes more sense than crazy (my original assumption) - even if its just inference. Any idea how big the gap is for training? (eg. is it 5x slower?) For more accurate KWh though, we might need to have a more controlled tests, because of how napkin math the current numbers are taken from the spec. Under full load, i find in general the M1 macbook pros would be below spec as the wattage typically accounts for thunderbolt/usb connectors. So the gap might be bigger than suggested. (need to find confirmation) For the M1 max MBP, with the screen off, and no additional peripherals, I believe it is clocked to max out at 65w, which matches the typical USBC based power from a display+dock. Next steps for me Gonna give it a try on a mac studio, and mac mini so we can get datapoint from both extreme ends !!! If there is any command after step 6? I can use to put a machine under the respective load, where I can try to get a more accurate in system wattage reading. Though this would only apply for desktop macs. For laptops, it would need a wall meter (which i do not have) because the in system reading will switch back and forth between battery and wall power. Notes: And haha, yea agreed KWh matters - I was assuming (wrongly) that the lower-end macbook's might only be more useful in a lesson / training scenario - for students to have some hands on experience, using machines they have at hand in class, over actual production usage. (Due to the very limited memory size per node). But i realise it is an assumption that needs validation. Especially on how the lower-end models are tuned for efficiency over performance. |
Unfortunately im stuck at the last step, as it seems to be still using cuda. (Scroll to end) Using this space to log the whole macos setup step by step. Because the supported OS version required for M1 mac's defaults to ZSH, the whole process here assumes ZSH is used (and not bash) Date this was done: 3rd Jan 2023 Setup conda environment with GPU support
Setup pytorch with GPU support
Optional: Setup a folder for all your subsequent files
Optional: Validate the pytorch install using jupyter
Install petals, and various other dependencies
|
The full output text
|
@PicoCreator The error caused by the
Note that the Petals server won't support storing weights in 8-bit (that's what the |
I tried to follow the instructions here to get it to run on a non-M1 mac. The 'best' I manage to have is the following:
Now I have no idea why it tells me As you can see, this is using |
@ineiti The message you see is a part of the p2pd's Could you please ensure that you use the latest commit in learning-at-home/go-libp2p-daemon, hivemind, and petals? If it doesn't help, you can check out the full p2pd outputs by running the server like this:
There will be lots of debug outputs, but the daemon should report which arguments it doesn't understand somewhere among this text. If you can't find anything relevant, please send the |
@borzunov OK, that works. Well, it doesn't, because I have an old macbook pro from 2018 with no Cuda support :( Also, why isn't this done automatically if I run
And where would I have found the information on how to build the correct p2pd? Or where should it be written? |
I'm afraid, this information can only be found in the readme for that library, here |
In fact I did try
First on my Mac, but for some reasons this didn't work. The |
@ineiti: don't know if this is still actual, I did manage to get it running on an older Intel MAC in the following way:
I'm sure there's a more elegant way to do this but I'm not a Python guy so ... |
Waiting for my new mac and I'll try again... |
I was hoping to host an instance of chat.petals.ml on one of Oracle Cloud's ARM Ampere instances, but I am having no luck getting Petals to run. I asked for advice in the Discord server, and I was given a custom branch to test (of which removes the CPUFeatures module). After making some progress, I was pointed here - and I've spent several hours testing the recommendations, with varying degrees of success. Can anybody offer some additional advice? Here is the Dockerfile I'm working with:
This Dockerfile will build successfully on ARM. However, after running the container, you'll get the following error message:
If you omit the custom p2pd build, which was recommended by @vrosca, you'll get a different error:
I'm going to keep working with this, and I'll post an update, if I make progress. Any advice you can give would be greatly appreciated! |
Ok so, with the disclaimer that I'm terrible at Python and I only got this to work on an 2012 Intel MacBook Pro, here's what I did that might be helpful. In lib/python3.9/site-packages/hivemind/p2p/p2p_daemon.py the arguments for p2pd are logged. In my case, on line 221, I changed the log level from debug to info:
You can then run the command from the console and see why it fails. That's how I got the final combination of Go version & go-libp2p-daemon that works for me. Hope this helps |
Thanks for the advice, @vrosca. Unfortunately, it didn't help me, but during the process of troubleshooting, I learned that the issue was PEBKAC! Put simply, I forgot to switch the Dockerfile's working directory back to the app directory, just before trying to launch the webserver. This is what I needed to add to the above Dockerfile:
Thanks to everyone who spent time documenting their efforts, I now have a working installation on ARM Ampere! |
Hi @PicoCreator @ineiti @vrosca @LuciferianInk, We've shipped native macOS support in #477 - both macOS clients and servers (including ones using Apple M1/M2 GPU) now work out of the box. You can try the latest version with: pip install --upgrade git+https://github.com/bigscience-workshop/petals Please ensure that you use Python 3.10+ (you can use Homebrew to install one: Please let me know if you meet any issues while installing or using it! |
The primary motivation, is
As with the latest beta pytorch has included optimisations for m1 metal GPU
This present an interesting possibility of scaling up on more easily & affordably, for example. To hit 352GB of memory...
(and assuming up to 75% of a Mac's memory is allocated to GPU, you could in theory go 75%+, but I suspect we need at-least 25% for OS, and filesystem operations)
However if you were to try build this using A100 for example
Also as outlined, alternatively would be 30 student laptops/mac-mini ...
** not that it matters in this case
Making it possibly one of the most accessible way for students, to setup a private swarm, and try training on their own hardware in a datalab.
The text was updated successfully, but these errors were encountered: