Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce vocabulary size in test model, fix bug in routing when overlapped #45

Merged
merged 20 commits into from Aug 17, 2022

Conversation

justheuristic
Copy link
Collaborator

@justheuristic justheuristic commented Aug 17, 2022

More than half of the test 560m model parameters is it's embedding layer (250_880 x 1024).
This adds 1-2 GB of RAM depending on the test scenario
This PR reduces this vocabulary size to save memory during conversion, keeping only the first 50k tokens

As a result,

  • tests that load client-side embeddings need significantly less RAM
  • we can now run CI tests with 4 servers instead of 2 - needed to test routing - see bugs uncovered
  • some of the servers now use load balancing
  • CI convert_model now takes 4-5 minutes (was 6-7)

Running CI with 4 servers uncovered a bug:
https://github.com/bigscience-workshop/petals/runs/7879399020?check_suite_focus=true
which is now also fixed

This PR also temporarily pins an older bnb version, pending @TimDettmers 's fix for cpuonly

@justheuristic justheuristic marked this pull request as ready for review August 17, 2022 11:58
run: |
export HF_TAG=$(python -c "import os; print(os.environ.get('GITHUB_HEAD_REF') or os.environ.get('GITHUB_REF_NAME'))")
python -c "from huggingface_hub import delete_repo; delete_repo(token='$BLOOM_TESTING_WRITE_TOKEN', \
name='test-bloomd-560m-$HF_TAG', organization='bloom-testing')" || true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider moving that to a small Python script in the repo. Having a Bash command running Python, whose results are passed to another Bash command running Python, is not good.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, see tests/srcipts/remove_old_models.py

sleep 60 # wait for server to download layers
sleep 10 # wait for initial servers to declare blocks, then let server decide which blocks to serve

python -m cli.run_server --converted_model_name_or_path $MODEL_NAME --block_indices 0:6 \
Copy link
Collaborator

@borzunov borzunov Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider running a loop instead, maybe using just --num_blocks without explicit --block_indices (I think using load balancing for all servers is a better test).

If it gets more complicated than just repeating something N times, please move it to a Python script.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to #16

@justheuristic justheuristic changed the title Reduce vocabulary size in test model Reduce vocabulary size in test model, fix bug in routing when overlapped Aug 17, 2022
@justheuristic justheuristic merged commit a263400 into main Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants