recommended hardware resources #3

aswanthkrishna · 2024-04-03T20:10:29Z

What is the minimum hardware resources required to test out this codebase for llama 7B.

tuidan · 2024-04-09T18:47:41Z

Hi! We strongly recommend to compress llama 7B on A100. However, it is still possible to run the compression with only 20G GPU memory by the following command:

python SVDLLM.py --step 1 --run_low_resource --ratio COMPRESSION_RATIO --model HUGGINGFACE_MODEL_REPO --whitening_nsamples WHITENING_SAMPLE_NUMBER --dataset WHITENING_DATASET --seed SAMPLING_SEED --model_seq_len MODEL_SEQ_LEN --save_path WHITENING_INFO_SAVING_PATH

pvti · 2024-10-11T16:56:36Z

Hi @tuidan,

I've been running the following command on an A100 80GB GPU with ~90GB of CPU RAM:

python SVDLLM.py --model jeffwan/llama-7b-hf --step 1 --ratio 0.2 --whitening_nsamples 256 --dataset wikitext2 --seed 3 --model_seq_len 2048 --save_path . --run_low_resource

However, the program consistently gets suspended due to CPU RAM shortages. It seems the issue might be related to the loop iterating through layers in this part of the code:

SVD-LLM/SVDLLM.py

Line 136 in 7dc65bd

for i in tqdm(range(len(layers))):

Could you share the machine configuration you used to run this? Also, do you have any further recommendations for resolving this issue?

Thanks in advance for your help!

tuidan · 2024-10-11T23:31:35Z

Hi @tuidan,

I've been running the following command on an A100 80GB GPU with ~90GB of CPU RAM:
python SVDLLM.py --model jeffwan/llama-7b-hf --step 1 --ratio 0.2 --whitening_nsamples 256 --dataset wikitext2 --seed 3 --model_seq_len 2048 --save_path . --run_low_resource
However, the program consistently gets suspended due to CPU RAM shortages. It seems the issue might be related to the loop iterating through layers in this part of the code:

SVD-LLM/SVDLLM.py

Line 136 in 7dc65bd

for i in tqdm(range(len(layers))):

Could you share the machine configuration you used to run this? Also, do you have any further recommendations for resolving this issue?
Thanks in advance for your help!

In fact, our code requires at least 100GB CPU RAM since it needs to profile and cache the whitening matrices of all the weight matrices before running the compression. The reason to do this is because recomputing the whitening matrix will take about 10-15 min. If we cache it, we only need about less than 5 min to run the compression.

Considering that your CPU RAM is about 90GB, I recommend you to transform the whitenining matrix to fp16 before caching it, i.e. modify the code in this line to layer_profile[name] = scaling_diag_matrix.half().cpu() .
If your CPU memory is less than 60GB, I recommend you to directly compute the whitening matrix and then run the compression weight by weight rather than caching all the whitening matrices beforehand.

tuidan closed this as completed Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recommended hardware resources #3

recommended hardware resources #3

aswanthkrishna commented Apr 3, 2024 •

edited

Loading

tuidan commented Apr 9, 2024

pvti commented Oct 11, 2024 •

edited

Loading

tuidan commented Oct 11, 2024

recommended hardware resources #3

recommended hardware resources #3

Comments

aswanthkrishna commented Apr 3, 2024 • edited Loading

tuidan commented Apr 9, 2024

pvti commented Oct 11, 2024 • edited Loading

tuidan commented Oct 11, 2024

aswanthkrishna commented Apr 3, 2024 •

edited

Loading

pvti commented Oct 11, 2024 •

edited

Loading