-
-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't start training: ValueError: XBLOCK is not defined [PyTorch 2.0 issue] #1160
Comments
Note, that on nightly torch version 2.1.0.dev20230330 it doesn't happen but this autotuning thing takes an ENORMOUS amount of time, I think I waited for at least 30 minutes and it ended up in #1146 issue. |
Definitely a torch problem. You should check their github issues. |
I couldn't find the specific issue but overall PyTorch 2.0 seems to be very rough for now. I downgraded it back to 1.13.1 and also installed xformers 0.0.18 with conda because the version in pip doesn't support PyTorch 1. Now it seems to work fine, thanks. Let's keep this issue open until it's at least resolved in the stable version of PyTorch 2. |
I'm seeing the same thing with torch 2.0.0 on an RTX 4090 Looks like this issue: pytorch/pytorch#97018 And this fix: pytorch/pytorch#95556 (merged 5 hours ago) |
@rkfg thanks. Based on your solution, I commented out the venv related code in webui.sh and searched for the correct xformer package at https://anaconda.org/xformers/xformers/files. I then installed it using |
Glad to hear that! You can also install the package even easier with
Good to know there's progress, thanks! |
I'm not sure, but it could be related to this issue: facebookresearch/xformers#708. |
Yes, that's where I found the way to install xformers for PT 1. I just checked the conda site and it has versions for both PT 1 and 2, in my case the correct version of xformers 0.0.18 was installed, probably because I had PT 1 installed before that. |
This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days |
1. Please find the following lines in the console and paste them below.
2. Describe the bug
When starting the training the process stops immediately with a long traceback. I had this extension working a while ago before A1111 and PyTorch updated to the current versions. Not sure if it's related, I tried to run the webUI with both venv and conda, the outcome is exactly the same. Also tried turning on and off various options such as memory attention (default/xformers), precision (fp16/bf16), using extended Lora or not and choosing different base models (SD 1.5 and Liberty). No difference whatsoever. Sometimes it does a few cycles of triton autotune but in the end can't compile that code. The following log is when using conda.
3. Provide logs
4. Environment
What OS? Debian testing
What GPU are you using? RTX 3090 Ti
The text was updated successfully, but these errors were encountered: