Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Petals to run on macOS #147

Closed
PicoCreator opened this issue Dec 11, 2022 · 18 comments
Closed

Getting Petals to run on macOS #147

PicoCreator opened this issue Dec 11, 2022 · 18 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@PicoCreator
Copy link

PicoCreator commented Dec 11, 2022

The primary motivation, is

  • to get as much high bandwidth memory, in a low cost way (thanks to its unified memory model)
  • to be easily used for training / inference
  • its probably gonna be slower then 3090's (i have no idea), but i dun think thats the point here
  • could potentially also be used with large number of "student laptops"

As with the latest beta pytorch has included optimisations for m1 metal GPU

This present an interesting possibility of scaling up on more easily & affordably, for example. To hit 352GB of memory...
(and assuming up to 75% of a Mac's memory is allocated to GPU, you could in theory go 75%+, but I suspect we need at-least 25% for OS, and filesystem operations)

  • Number of nodes: 4
  • Ram allocated per node: 96GB (75% of 128GB)
  • Upfront cost of nodes: $23,200.00 ($5,800.00 / node)
  • Max KWh: 0.86 (0.215 KWh / node)

However if you were to try build this using A100 for example

  • Number of nodes: 5
  • Ram allocated per node: 80GB
  • Upfront cost of nodes: $65,000.00 ($13,000.00 / node)
  • Max KWh: 1.5 (0.300 KWh / node)
  • Price & Energy usage exclude overheads for CPU, RAM, Motherboard, Storage, Cooling, and networking

Also as outlined, alternatively would be 30 student laptops/mac-mini ...

  • Number of nodes: 30
  • Ram allocated per node (12GB, 75% of 16GB)
  • Upfront cost**: $33,000.00 ($1,100 / node)
  • Max KWh**: 4.5 (0.150 KWh / node)
    ** not that it matters in this case

Making it possibly one of the most accessible way for students, to setup a private swarm, and try training on their own hardware in a datalab.

@PicoCreator PicoCreator changed the title Is it possibel to run on macos? Is it possible to run on macos? Dec 11, 2022
@justheuristic
Copy link
Collaborator

Sorry for taking so long to respond, we're a bit overwhelmed r/n, will respond within the next 24 hours

@PicoCreator
Copy link
Author

PicoCreator commented Dec 22, 2022

No worries, its the holiday season =)

Have a Merry Christmas
( feel free to take a few days, instead of 24 hours )

@justheuristic
Copy link
Collaborator

justheuristic commented Dec 26, 2022

Thanks!

Should you run on M1?

I found a guy with an M1 Max macbook pro to run some compute tests. Surprisingly, M1 is competitive for autoregressive inference. It's still 2.5 times slower than an A6000, but way more energy efficient. For training, the comparison is less favourable, probably because you need more raw tflops, not just fast memory. So, surprisingly, yes, that makes sense.

Can you run on M1?

The current status is "you probably can, but it will require tinkering"
The shortest path is:

  1. pip install torch with m1 support (e.g. this tutorial)
  2. install go 1.18 or newer https://go.dev/doc/install -- latest is fine
    • check with: go version
  3. build p2pd - check that it builds fine
    • git clone https://github.com/learning-at-home/go-libp2p-daemon
    • cd go-libp2p-daemon/p2pd
    • go build .
    • check for p2pd in your local directory
  4. install petals normally
  5. reinstall hivemind with p2pd:
    • pip install --global-option="--buildgo" https://github.com/learning-at-home/hivemind/archive/master.zip
  6. test that networking code works in a local python / jupyter:
    • import hivemind
    • dht = hivemind.DHT(start=True)
    • if it doesnt crash, you win :)
      , and a lot
      After that, you should be able to run Petals normally. To be more realistic: you will probably bump into random problems. If you reach steps 3 / 4 / 5, we can help you go through these. On the plus side, if you figure this out, other people with apple devices will be able to follow in your footsteps :)

Notes:

** not that it matters in this case

[opinion] On the contrary, KWh matters a lot, but the actual kwh is significantly less due to the fact that not all GPUs are compute-utilized all the time, even under heavy use.

@PicoCreator
Copy link
Author

PicoCreator commented Dec 28, 2022

Awesome!, its good to have some validation that the idea actually makes more sense than crazy (my original assumption) - even if its just inference.

Any idea how big the gap is for training? (eg. is it 5x slower?)

For more accurate KWh though, we might need to have a more controlled tests, because of how napkin math the current numbers are taken from the spec.

Under full load, i find in general the M1 macbook pros would be below spec as the wattage typically accounts for thunderbolt/usb connectors. So the gap might be bigger than suggested.

(need to find confirmation) For the M1 max MBP, with the screen off, and no additional peripherals, I believe it is clocked to max out at 65w, which matches the typical USBC based power from a display+dock.

Next steps for me

Gonna give it a try on a mac studio, and mac mini so we can get datapoint from both extreme ends !!!

If there is any command after step 6? I can use to put a machine under the respective load, where I can try to get a more accurate in system wattage reading.

Though this would only apply for desktop macs. For laptops, it would need a wall meter (which i do not have) because the in system reading will switch back and forth between battery and wall power.

Notes:

And haha, yea agreed KWh matters - I was assuming (wrongly) that the lower-end macbook's might only be more useful in a lesson / training scenario - for students to have some hands on experience, using machines they have at hand in class, over actual production usage. (Due to the very limited memory size per node).

But i realise it is an assumption that needs validation. Especially on how the lower-end models are tuned for efficiency over performance.

@justheuristic justheuristic mentioned this issue Dec 31, 2022
32 tasks
@PicoCreator
Copy link
Author

Unfortunately im stuck at the last step, as it seems to be still using cuda. (Scroll to end)


Using this space to log the whole macos setup step by step.
On a somewhat freshly formatted VM (so it should cover all missing steps).

Because the supported OS version required for M1 mac's defaults to ZSH, the whole process here assumes ZSH is used (and not bash)

Date this was done: 3rd Jan 2023
OS Version: 12.5.1
System: Mac Studio (2022), M1 Max, 32 GB


Setup conda environment with GPU support

For most parts, this is modified from https://towardsdatascience.com/installing-pytorch-on-apple-m1-chip-with-gpu-acceleration-3351dc44d67c , to work on a "clean install"

  1. Ensure the OS is updated to 12.3++
  2. Ensure xcode-select is installed
    • xcode-select --install
  3. Install homebrew
    • /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  4. Install conda
    • brew install anaconda
  5. Validate conda is setup and installed
    • conda --version
    • If you get zsh: command not found: conda, you will need to export the path accordingly with
      • export PATH="/opt/homebrew/anaconda3/bin:$PATH"
    • Alternatively you can add it to your .zshrc more permanently
      • echo '\nexport PATH="/opt/homebrew/anaconda3/bin:$PATH"\n' >> ~/.zshrc
  6. Install the conda environment
    • conda create -n torch-gpu python=3.9
    • conda init zsh
  7. Close and open a new terminal window (this ensure the items done with conda init is setup correctly)

Setup pytorch with GPU support

  1. Activate the conda env with torch-gpu
    • conda activate torch-gpu
  2. Install pytorch stable (MacOS acceleration has been merged in, you no longer need nightly!)
    • conda install pytorch torchvision torchaudio -c pytorch -y

Optional: Setup a folder for all your subsequent files

  1. Setup a folder to store various stuff, away from your desktop/documents pile
    • mkdir ./petals-macos; cd ./petals-macos

Optional: Validate the pytorch install using jupyter

  1. Install jupyter notebook
    • conda install -c conda-forge jupyter jupyterlab
  2. Startup the jupyter notebook
    • jupyter notebook
  3. Create a new test notebook
    Screenshot showing how to create a new unititled notebook
  4. Run the following script to check that it says "true" for both mps is avaliable and is built
import torch
import math
# this ensures that the current MacOS version is at least 12.3+
print(torch.backends.mps.is_available())
# this ensures that the current current PyTorch installation was built with MPS activated.
print(torch.backends.mps.is_built())

Example MPS support test

Install petals, and various other dependencies

  1. install go 1.18 or newer https://go.dev/doc/install -- latest is fine
    • check with: go version
    • You will need to reopen the terminal after the go install, and rerun coda activate
    • conda activate torch-gpu
  2. install p2pd - check that it builds fine
    • git clone https://github.com/learning-at-home/go-libp2p-daemon
    • cd go-libp2p-daemon/p2pd
    • go build .
    • Validate that the p2pd binary is built ./p2pd --help
    • Navigate back up cd ../..
  3. Install petals
    • pip install -U petals
  4. Reinstall hivemine
    • pip install --global-option="--buildgo" https://github.com/learning-at-home/hivemind/archive/master.zip

  1. Run hive mind inside jupyter
    Screenshot 2023-01-03 at 1 50 11 PM

@PicoCreator
Copy link
Author

The full output text

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
CUDA SETUP: Loading binary /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[1], line 1
----> 1 import hivemind
      2 dht = hivemind.DHT(start=True)

File /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/hivemind/__init__.py:1
----> 1 from hivemind.averaging import DecentralizedAverager
      2 from hivemind.compression import *
      3 from hivemind.dht import DHT

File /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/hivemind/averaging/__init__.py:1
----> 1 from hivemind.averaging.averager import DecentralizedAverager

File /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/hivemind/averaging/averager.py:20
     17 import numpy as np
     18 import torch
---> 20 from hivemind.averaging.allreduce import AllreduceException, AllReduceRunner, AveragingMode, GroupID
     21 from hivemind.averaging.control import AveragingStage, StepControl
     22 from hivemind.averaging.group_info import GroupInfo

File /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/hivemind/averaging/allreduce.py:7
      3 from typing import AsyncIterator, Optional, Sequence, Set, Tuple, Type
      5 import torch
----> 7 from hivemind.averaging.partition import AllreduceException, BannedException, TensorPartContainer, TensorPartReducer
      8 from hivemind.compression import deserialize_torch_tensor, serialize_torch_tensor
      9 from hivemind.p2p import P2P, P2PContext, PeerID, ServicerBase, StubBase

File /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/hivemind/averaging/partition.py:11
      8 import numpy as np
      9 import torch
---> 11 from hivemind.compression import CompressionBase, CompressionInfo, NoCompression
     12 from hivemind.proto import runtime_pb2
     13 from hivemind.utils import amap_in_executor, as_aiter, get_logger

File /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/hivemind/compression/__init__.py:5
      1 """
      2 Compression strategies that reduce the network communication in .averaging, .optim and .moe
      3 """
----> 5 from hivemind.compression.adaptive import PerTensorCompression, RoleAdaptiveCompression, SizeAdaptiveCompression
      6 from hivemind.compression.base import CompressionBase, CompressionInfo, NoCompression, TensorRole
      7 from hivemind.compression.floating import Float16Compression, ScaledFloat16Compression

File /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/hivemind/compression/adaptive.py:7
      4 import torch
      6 from hivemind.compression.base import CompressionBase, CompressionInfo, Key, NoCompression, TensorRole
----> 7 from hivemind.compression.serialization import deserialize_torch_tensor
      8 from hivemind.proto import runtime_pb2
     11 class AdaptiveCompressionBase(CompressionBase, ABC):

File /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/hivemind/compression/serialization.py:9
      7 from hivemind.compression.base import CompressionBase, CompressionInfo, NoCompression
      8 from hivemind.compression.floating import Float16Compression, ScaledFloat16Compression
----> 9 from hivemind.compression.quantization import BlockwiseQuantization, Quantile8BitQuantization, Uniform8BitQuantization
     10 from hivemind.proto import runtime_pb2
     11 from hivemind.utils.streaming import combine_from_streaming

File /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/hivemind/compression/quantization.py:14
     12 if importlib.util.find_spec("bitsandbytes") is not None:
     13     warnings.filterwarnings("ignore", module="bitsandbytes", category=UserWarning)
---> 14     from bitsandbytes.functional import quantize_blockwise, dequantize_blockwise
     16 from hivemind.compression.base import CompressionBase, CompressionInfo
     17 from hivemind.proto import runtime_pb2

File /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/bitsandbytes/__init__.py:6
      1 # Copyright (c) Facebook, Inc. and its affiliates.
      2 #
      3 # This source code is licensed under the MIT license found in the
      4 # LICENSE file in the root directory of this source tree.
----> 6 from .autograd._functions import (
      7     MatmulLtState,
      8     bmm_cublas,
      9     matmul,
     10     matmul_cublas,
     11     mm_cublas,
     12 )
     13 from .cextension import COMPILED_WITH_CUDA
     14 from .nn import modules

File /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py:5
      2 import warnings
      4 import torch
----> 5 import bitsandbytes.functional as F
      7 from dataclasses import dataclass
      8 from functools import reduce  # Required in Python 3

File /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/bitsandbytes/functional.py:13
     10 from typing import Tuple
     11 from torch import Tensor
---> 13 from .cextension import COMPILED_WITH_CUDA, lib
     14 from functools import reduce  # Required in Python 3
     16 # math.prod not compatible with python < 3.8

File /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/bitsandbytes/cextension.py:41
     37             cls._instance.initialize()
     38         return cls._instance
---> 41 lib = CUDALibrary_Singleton.get_instance().lib
     42 try:
     43     lib.cadam32bit_g32

File /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/bitsandbytes/cextension.py:37, in CUDALibrary_Singleton.get_instance(cls)
     35 if cls._instance is None:
     36     cls._instance = cls.__new__(cls)
---> 37     cls._instance.initialize()
     38 return cls._instance

File /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/bitsandbytes/cextension.py:31, in CUDALibrary_Singleton.initialize(self)
     29 else:
     30     print(f"CUDA SETUP: Loading binary {binary_path}...")
---> 31     self.lib = ct.cdll.LoadLibrary(binary_path)

File /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/ctypes/__init__.py:460, in LibraryLoader.LoadLibrary(self, name)
    459 def LoadLibrary(self, name):
--> 460     return self._dlltype(name)

File /opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/ctypes/__init__.py:382, in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    379 self._FuncPtr = _FuncPtr
    381 if handle is None:
--> 382     self._handle = _dlopen(self._name, mode)
    383 else:
    384     self._handle = handle

OSError: dlopen(/opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so, 0x0006): tried: '/opt/homebrew/anaconda3/envs/torch-gpu/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (not a mach-o file)

@borzunov
Copy link
Collaborator

borzunov commented Jan 5, 2023

@PicoCreator The error caused by the bitsandbytes library that doesn't support macOS at the moment. This library is optional for hivemind and now Petals (once #180 is merged), so you can just:

  1. Install the latest Petals code from the repo:

    git clone https://github.com/bigscience-workshop/petals
    cd petals
    pip install --upgrade .
  2. Uninstall bitsandbytes so that it doesn't cause errors: pip uninstall bitsandbytes

Note that the Petals server won't support storing weights in 8-bit (that's what the bitsandbytes library is for). Instead, it will store them in bfloat16, which takes ~1.9x more memory. If you need the 8-bit weights, you'd need to port bitsandbytes to macOS, which may be not trivial.

@PicoCreator PicoCreator changed the title Is it possible to run on macos? Getting petals to run on macos Jan 5, 2023
@ineiti
Copy link

ineiti commented Jan 13, 2023

I tried to follow the instructions here to get it to run on a non-M1 mac. The 'best' I manage to have is the following:

$ python -m petals.cli.run_server bigscience/bloom-petals
Jan 13 15:18:58.857 [INFO] Automatic dht prefix: bigscience/bloom-petals
Traceback (most recent call last):
  File "/Users/xxx/.pyenv/versions/3.10.6/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/xxx/.pyenv/versions/3.10.6/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/xxx/.pyenv/versions/3.10.6/lib/python3.10/site-packages/petals/cli/run_server.py", line 213, in <module>
    main()
  File "/Users/xxx/.pyenv/versions/3.10.6/lib/python3.10/site-packages/petals/cli/run_server.py", line 196, in main
    server = Server(
  File "/Users/xxx/.pyenv/versions/3.10.6/lib/python3.10/site-packages/petals/server/server.py", line 121, in __init__
    self.dht = DHT(
  File "/Users/xxx/.pyenv/versions/3.10.6/lib/python3.10/site-packages/hivemind/dht/dht.py", line 88, in __init__
    self.run_in_background(await_ready=await_ready)
  File "/Users/xxx/.pyenv/versions/3.10.6/lib/python3.10/site-packages/hivemind/dht/dht.py", line 148, in run_in_background
    self.wait_until_ready(timeout)
  File "/Users/xxx/.pyenv/versions/3.10.6/lib/python3.10/site-packages/hivemind/dht/dht.py", line 151, in wait_until_ready
    self._ready.result(timeout=timeout)
  File "/Users/xxx/.pyenv/versions/3.10.6/lib/python3.10/site-packages/hivemind/utils/mpfuture.py", line 258, in result
    return super().result(timeout)
  File "/Users/xxx/.pyenv/versions/3.10.6/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/Users/xxx/.pyenv/versions/3.10.6/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
hivemind.p2p.p2p_daemon_bindings.utils.P2PDaemonError: Daemon failed to start:     	Enables TLS1.3 channel security protocol (default true)

Now I have no idea why it tells me Enables TLS1.3 channel security protocol (default true) as an error message. I guess that the actual error message is longer, but I don't know how to retrieve the full error message. Does anyone have an idea?

As you can see, this is using pyenv with python version 3.10.6.

@borzunov
Copy link
Collaborator

borzunov commented Jan 13, 2023

@ineiti The message you see is a part of the p2pd's -help outputs. It is shown when the daemon encounters some unknown arguments. Thus, I suppose there is some kind of a version mismatch between hivemind and p2pd (it's likely that hivemind passes newer arguments not supported by this version of p2pd).

Could you please ensure that you use the latest commit in learning-at-home/go-libp2p-daemon, hivemind, and petals?

If it doesn't help, you can check out the full p2pd outputs by running the server like this:

HIVEMIND_LOGLEVEL=DEBUG GOLOG_LOG_LEVEL=DEBUG python -m petals.cli.run_server bigscience/bloom-petals 2>&1 | tee log.txt

There will be lots of debug outputs, but the daemon should report which arguments it doesn't understand somewhere among this text. If you can't find anything relevant, please send the log.txt file, I'll take a look.

@borzunov borzunov changed the title Getting petals to run on macos Getting Petals to run on macOS Jan 13, 2023
@borzunov borzunov added documentation Improvements or additions to documentation enhancement New feature or request labels Jan 13, 2023
@ineiti
Copy link

ineiti commented Jan 16, 2023

@borzunov OK, that works. Well, it doesn't, because I have an old macbook pro from 2018 with no Cuda support :(

Also, why isn't this done automatically if I run

pip install  --global-option="--buildgo" https://github.com/learning-at-home/hivemind/archive/master.zip

And where would I have found the information on how to build the correct p2pd? Or where should it be written?

@justheuristic
Copy link
Collaborator

I'm afraid, this information can only be found in the readme for that library, here

@ineiti
Copy link

ineiti commented Jan 17, 2023

In fact I did try

git clone https://github.com/learning-at-home/hivemind.git
cd hivemind
pip install . --global-option="--buildgo"

First on my Mac, but for some reasons this didn't work. The p2pd binary was still the original one. Or perhaps I had some caching of the package?

@vrosca
Copy link

vrosca commented Feb 15, 2023

@ineiti: don't know if this is still actual, I did manage to get it running on an older Intel MAC in the following way:

I'm sure there's a more elegant way to do this but I'm not a Python guy so ...

@ineiti
Copy link

ineiti commented Feb 16, 2023

Waiting for my new mac and I'll try again...

@Vectorrent
Copy link
Contributor

I was hoping to host an instance of chat.petals.ml on one of Oracle Cloud's ARM Ampere instances, but I am having no luck getting Petals to run. I asked for advice in the Discord server, and I was given a custom branch to test (of which removes the CPUFeatures module). After making some progress, I was pointed here - and I've spent several hours testing the recommendations, with varying degrees of success. Can anybody offer some additional advice?

Here is the Dockerfile I'm working with:

FROM arm64v8/debian:stable

RUN apt-get update \
    && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
    curl \
    git \
    python3-pip \
    gcc \
    g++ \
    python3-dev \
    tar \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /tmp

RUN curl -fSL --output go.tar.gz https://go.dev/dl/go1.18.10.linux-arm64.tar.gz && \
    tar -xzf go.tar.gz -C /usr/local

ENV PATH="/usr/local/go/bin:${PATH}"

WORKDIR /app

## Copy files from https://github.com/borzunov/chat.petals.ml
COPY . ./

## Remove git+https://github.com/bigscience-workshop/petals from requirements.txt, since
## we install a custom branch of it in the next step
RUN pip install -r requirements.txt

RUN pip install --upgrade git+https://github.com/bigscience-workshop/petals@no-cpufeature

RUN git clone https://github.com/learning-at-home/hivemind

RUN cd hivemind && pip install . --global-option="--buildgo"

WORKDIR /p2pd

RUN git clone https://github.com/learning-at-home/go-libp2p-daemon && \
    cd go-libp2p-daemon/p2pd && \
    git checkout v0.3.16 && \
    go build . && \
    mv -f p2pd /usr/local/lib/python3.9/dist-packages/hivemind/hivemind_cli/p2pd && \
    chmod +x /usr/local/lib/python3.9/dist-packages/hivemind/hivemind_cli/p2pd

EXPOSE 5000

CMD ["gunicorn", "app:app", "--bind", "0.0.0.0:5000", "--threads", "100", "--timeout", "1000"]

This Dockerfile will build successfully on ARM. However, after running the container, you'll get the following error message:

app-app-1    | [2023-02-17 10:11:55 +0000] [1] [INFO] Starting gunicorn 20.1.0
app-app-1    | [2023-02-17 10:11:55 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)
app-app-1    | [2023-02-17 10:11:55 +0000] [1] [INFO] Using worker: gthread
app-app-1    | [2023-02-17 10:11:55 +0000] [7] [INFO] Booting worker with pid: 7
app-app-1    | [2023-02-17 10:11:55 +0000] [7] [ERROR] Exception in worker process
app-app-1    | Traceback (most recent call last):
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/gunicorn/arbiter.py", line 589, in spawn_worker
app-app-1    |     worker.init_process()
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/gunicorn/workers/gthread.py", line 92, in init_process
app-app-1    |     super().init_process()
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/gunicorn/workers/base.py", line 134, in init_process
app-app-1    |     self.load_wsgi()
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/gunicorn/workers/base.py", line 146, in load_wsgi
app-app-1    |     self.wsgi = self.app.wsgi()
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/gunicorn/app/base.py", line 67, in wsgi
app-app-1    |     self.callable = self.load()
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/gunicorn/app/wsgiapp.py", line 58, in load
app-app-1    |     return self.load_wsgiapp()
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
app-app-1    |     return util.import_app(self.app_uri)
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/gunicorn/util.py", line 359, in import_app
app-app-1    |     mod = importlib.import_module(module)
app-app-1    |   File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
app-app-1    |     return _bootstrap._gcd_import(name[level:], package, level)
app-app-1    |   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
app-app-1    |   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
app-app-1    |   File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
app-app-1    | ModuleNotFoundError: No module named 'app'
app-app-1    | [2023-02-17 10:11:55 +0000] [7] [INFO] Worker exiting (pid: 7)
app-app-1    | [2023-02-17 10:11:55 +0000] [1] [INFO] Shutting down: Master
app-app-1    | [2023-02-17 10:11:55 +0000] [1] [INFO] Reason: Worker failed to boot.
app-app-1 exited with code 3

If you omit the custom p2pd build, which was recommended by @vrosca, you'll get a different error:

app-app-1    | Feb 17 09:22:52.413 [INFO] Loading tokenizer for bigscience/bloomz-petals
app-app-1    | Feb 17 09:22:52.977 [INFO] Loading model bigscience/bloomz-petals
app-app-1    | [2023-02-17 09:22:55 +0000] [7] [ERROR] Exception in worker process
app-app-1    | Traceback (most recent call last):
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/gunicorn/arbiter.py", line 589, in spawn_worker
app-app-1    |     worker.init_process()
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/gunicorn/workers/gthread.py", line 92, in init_process
app-app-1    |     super().init_process()
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/gunicorn/workers/base.py", line 134, in init_process
app-app-1    |     self.load_wsgi()
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/gunicorn/workers/base.py", line 146, in load_wsgi
app-app-1    |     self.wsgi = self.app.wsgi()
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/gunicorn/app/base.py", line 67, in wsgi
app-app-1    |     self.callable = self.load()
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/gunicorn/app/wsgiapp.py", line 58, in load
app-app-1    |     return self.load_wsgiapp()
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
app-app-1    |     return util.import_app(self.app_uri)
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/gunicorn/util.py", line 359, in import_app
app-app-1    |     mod = importlib.import_module(module)
app-app-1    |   File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
app-app-1    |     return _bootstrap._gcd_import(name[level:], package, level)
app-app-1    |   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
app-app-1    |   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
app-app-1    |   File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
app-app-1    |   File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
app-app-1    |   File "<frozen importlib._bootstrap_external>", line 790, in exec_module
app-app-1    |   File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
app-app-1    |   File "/app/app.py", line 20, in <module>
app-app-1    |     model = DistributedBloomForCausalLM.from_pretrained(model_name, torch_dtype=config.TORCH_DTYPE, max_retries=3)
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/petals/client/remote_model.py", line 78, in from_pretrained
app-app-1    |     return super().from_pretrained(*args, low_cpu_mem_usage=low_cpu_mem_usage, **kwargs)
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/transformers/modeling_utils.py", line 2276, in from_pretrained
app-app-1    |     model = cls(config, *model_args, **model_kwargs)
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/petals/client/remote_model.py", line 237, in __init__
app-app-1    |     self.transformer = DistributedBloomModel(config)
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/petals/client/remote_model.py", line 107, in __init__
app-app-1    |     else hivemind.DHT(
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/hivemind/dht/dht.py", line 88, in __init__
app-app-1    |     self.run_in_background(await_ready=await_ready)
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/hivemind/dht/dht.py", line 148, in run_in_background
app-app-1    |     self.wait_until_ready(timeout)
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/hivemind/dht/dht.py", line 151, in wait_until_ready
app-app-1    |     self._ready.result(timeout=timeout)
app-app-1    |   File "/usr/local/lib/python3.9/dist-packages/hivemind/utils/mpfuture.py", line 262, in result
app-app-1    |     return super().result(timeout)
app-app-1    |   File "/usr/lib/python3.9/concurrent/futures/_base.py", line 440, in result
app-app-1    |     return self.__get_result()
app-app-1    |   File "/usr/lib/python3.9/concurrent/futures/_base.py", line 389, in __get_result
app-app-1    |     raise self._exception
app-app-1    | hivemind.p2p.p2p_daemon_bindings.utils.P2PDaemonError: Daemon failed to start: /usr/local/lib/python3.9/dist-packages/hivemind/hivemind_cli/p2pd: 4: Syntax error: Unterminated quoted string
app-app-1    | [2023-02-17 09:22:55 +0000] [7] [INFO] Worker exiting (pid: 7)
app-app-1    | [2023-02-17 09:22:55 +0000] [1] [INFO] Shutting down: Master
app-app-1    | [2023-02-17 09:22:55 +0000] [1] [INFO] Reason: Worker failed to boot.
app-app-1 exited with code 3

I'm going to keep working with this, and I'll post an update, if I make progress.

Any advice you can give would be greatly appreciated!

@vrosca
Copy link

vrosca commented Feb 17, 2023

Ok so, with the disclaimer that I'm terrible at Python and I only got this to work on an 2012 Intel MacBook Pro, here's what I did that might be helpful.

In lib/python3.9/site-packages/hivemind/p2p/p2p_daemon.py the arguments for p2pd are logged. In my case, on line 221, I changed the log level from debug to info:

221       **logger.info(f"Launching {proc_args}")**
222         self._child = await asyncio.subprocess.create_subprocess_exec(
223             *proc_args, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.STDOUT, env=env
224         )
225         self._alive = True

You can then run the command from the console and see why it fails. That's how I got the final combination of Go version & go-libp2p-daemon that works for me.

Hope this helps

@Vectorrent
Copy link
Contributor

Thanks for the advice, @vrosca. Unfortunately, it didn't help me, but during the process of troubleshooting, I learned that the issue was PEBKAC! Put simply, I forgot to switch the Dockerfile's working directory back to the app directory, just before trying to launch the webserver. This is what I needed to add to the above Dockerfile:

EXPOSE 5000

WORKDIR /app

CMD ["gunicorn", "app:app", "--bind", "0.0.0.0:5000", "--threads", "100", "--timeout", "1000"]

Thanks to everyone who spent time documenting their efforts, I now have a working installation on ARM Ampere!

This was referenced Aug 6, 2023
@borzunov
Copy link
Collaborator

borzunov commented Aug 29, 2023

Hi @PicoCreator @ineiti @vrosca @LuciferianInk,

We've shipped native macOS support in #477 - both macOS clients and servers (including ones using Apple M1/M2 GPU) now work out of the box. You can try the latest version with:

pip install --upgrade git+https://github.com/bigscience-workshop/petals

Please ensure that you use Python 3.10+ (you can use Homebrew to install one: brew install python).

Please let me know if you meet any issues while installing or using it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants