Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recommended hardware resources #3

Closed
aswanthkrishna opened this issue Apr 3, 2024 · 3 comments
Closed

recommended hardware resources #3

aswanthkrishna opened this issue Apr 3, 2024 · 3 comments

Comments

@aswanthkrishna
Copy link

aswanthkrishna commented Apr 3, 2024

What is the minimum hardware resources required to test out this codebase for llama 7B.

@tuidan
Copy link
Member

tuidan commented Apr 9, 2024

Hi! We strongly recommend to compress llama 7B on A100. However, it is still possible to run the compression with only 20G GPU memory by the following command:

python SVDLLM.py --step 1 --run_low_resource --ratio COMPRESSION_RATIO --model HUGGINGFACE_MODEL_REPO --whitening_nsamples WHITENING_SAMPLE_NUMBER --dataset WHITENING_DATASET --seed SAMPLING_SEED --model_seq_len MODEL_SEQ_LEN --save_path WHITENING_INFO_SAVING_PATH

@tuidan tuidan closed this as completed Apr 9, 2024
@pvti
Copy link
Contributor

pvti commented Oct 11, 2024

Hi @tuidan,

I've been running the following command on an A100 80GB GPU with ~90GB of CPU RAM:

python SVDLLM.py --model jeffwan/llama-7b-hf --step 1 --ratio 0.2 --whitening_nsamples 256 --dataset wikitext2 --seed 3 --model_seq_len 2048 --save_path . --run_low_resource

However, the program consistently gets suspended due to CPU RAM shortages. It seems the issue might be related to the loop iterating through layers in this part of the code:

for i in tqdm(range(len(layers))):

Could you share the machine configuration you used to run this? Also, do you have any further recommendations for resolving this issue?

Thanks in advance for your help!

@tuidan
Copy link
Member

tuidan commented Oct 11, 2024

Hi @tuidan,

I've been running the following command on an A100 80GB GPU with ~90GB of CPU RAM:

python SVDLLM.py --model jeffwan/llama-7b-hf --step 1 --ratio 0.2 --whitening_nsamples 256 --dataset wikitext2 --seed 3 --model_seq_len 2048 --save_path . --run_low_resource

However, the program consistently gets suspended due to CPU RAM shortages. It seems the issue might be related to the loop iterating through layers in this part of the code:

for i in tqdm(range(len(layers))):

Could you share the machine configuration you used to run this? Also, do you have any further recommendations for resolving this issue?
Thanks in advance for your help!

In fact, our code requires at least 100GB CPU RAM since it needs to profile and cache the whitening matrices of all the weight matrices before running the compression. The reason to do this is because recomputing the whitening matrix will take about 10-15 min. If we cache it, we only need about less than 5 min to run the compression.

  • Considering that your CPU RAM is about 90GB, I recommend you to transform the whitenining matrix to fp16 before caching it, i.e. modify the code in this line to layer_profile[name] = scaling_diag_matrix.half().cpu() .

  • If your CPU memory is less than 60GB, I recommend you to directly compute the whitening matrix and then run the compression weight by weight rather than caching all the whitening matrices beforehand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants