-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recommended hardware resources #3
Comments
Hi! We strongly recommend to compress llama 7B on A100. However, it is still possible to run the compression with only 20G GPU memory by the following command:
|
Hi @tuidan, I've been running the following command on an A100 80GB GPU with ~90GB of CPU RAM:
However, the program consistently gets suspended due to CPU RAM shortages. It seems the issue might be related to the loop iterating through layers in this part of the code: Line 136 in 7dc65bd
Could you share the machine configuration you used to run this? Also, do you have any further recommendations for resolving this issue? Thanks in advance for your help! |
In fact, our code requires at least 100GB CPU RAM since it needs to profile and cache the whitening matrices of all the weight matrices before running the compression. The reason to do this is because recomputing the whitening matrix will take about 10-15 min. If we cache it, we only need about less than 5 min to run the compression.
|
What is the minimum hardware resources required to test out this codebase for llama 7B.
The text was updated successfully, but these errors were encountered: