-
Notifications
You must be signed in to change notification settings - Fork 130
lock q.to to fix accelerate invalid argument #1947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…positional argument: 'scope'
|
Are you still getting the Q.moveTo() asserts on |
@Qubitium sometimes python is 3.13.7t |
|
@avtc Are you using the tinygrad hacked p2p driver by any chance? |
yep |
|
@avtc Btw, do you have flash attention installed? Quantization forwarding use less vram if you have flashattn invovlved. GPT-QModel will auto enable it by default if you have it installed. |
@Qubitium no, only those packages that were in requirements.txt, I built with but as I will install flash-attn to try. |
The blocking crash that you saw should be fixed on Flash Attention is not a hard requirement but is a requirements as many models supports it, not all, and for those that supports, there is a observable reduction of lower vram usage during forwarding. You will see GPT-QModel loading logs when it is enabled. project.toml is all we have now but the install is no different. Only diff is there no specific file to just install the requirements as before. |
@Qubitium I am using the same script for GLM-4.5-Air, 4bit, 1 sample, The error appeared after merging |
I am using venv. |
|
btw, I still have to use this lock |
I was able to use a clean venv and install latest |
|
Closed with #1963 |
Revert removal of lock over Q.moveTo, to be able to run on multi-gpu