New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce vocabulary size in test model, fix bug in routing when overlapped #45
Conversation
.github/workflows/run-tests.yaml
Outdated
run: | | ||
export HF_TAG=$(python -c "import os; print(os.environ.get('GITHUB_HEAD_REF') or os.environ.get('GITHUB_REF_NAME'))") | ||
python -c "from huggingface_hub import delete_repo; delete_repo(token='$BLOOM_TESTING_WRITE_TOKEN', \ | ||
name='test-bloomd-560m-$HF_TAG', organization='bloom-testing')" || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider moving that to a small Python script in the repo. Having a Bash command running Python, whose results are passed to another Bash command running Python, is not good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, see tests/srcipts/remove_old_models.py
sleep 60 # wait for server to download layers | ||
sleep 10 # wait for initial servers to declare blocks, then let server decide which blocks to serve | ||
|
||
python -m cli.run_server --converted_model_name_or_path $MODEL_NAME --block_indices 0:6 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider running a loop instead, maybe using just --num_blocks
without explicit --block_indices
(I think using load balancing for all servers is a better test).
If it gets more complicated than just repeating something N times, please move it to a Python script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to #16
More than half of the test 560m model parameters is it's embedding layer (250_880 x 1024).
This adds 1-2 GB of RAM depending on the test scenario
This PR reduces this vocabulary size to save memory during conversion, keeping only the first 50k tokens
As a result,
Running CI with 4 servers uncovered a bug:
https://github.com/bigscience-workshop/petals/runs/7879399020?check_suite_focus=true
which is now also fixed
This PR also temporarily pins an older bnb version, pending @TimDettmers 's fix for cpuonly