Generated images are completely black?! 😵 What am I doing wrong? #15

illtellyoulater · 2022-03-08T07:55:29Z

Hello,
I am on Windows 10, and my gpu is a PNY Nvidia GTX 1660 TI 6 Gb.
I installed V-Diffusion like so:

conda create --name v-diffusion python=3.8
conda activate v-diffusion
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch (as per Pytorch website instructions)
pip install requests tqdm

The problem is that when I launch the cfg_sample.py or clip_sample.py command lines, the generated images are completely black, although the inference process seems to run nicely and without errors.

Things I've tried:

installing previous pytorch version with conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
removing V-Diffusion conda environment completely and recreating it anew
uninstalling nvidia drivers and performing a new clean driver install (I tried both Nvidia Studio drivers and Nvidia Game Ready drivers)
uninstalling and reinstalling Conda completely

But nothing helped... and at this point I don't know what else to try...

The only interesting piece of information I could gather is that for some reason this problem also happens with another text-to-image project called Big Sleep where similar to V-Diffusion the inference process appears to run correctly but the generated images are all black.

I think there must be some simple detail I'm overlooking... which it's making me go insane... 😵
Please let me know something if you think you can help!
THANKS !

The text was updated successfully, but these errors were encountered:

crowsonkb · 2022-03-08T20:05:34Z

I'm not entirely sure what's going on but if you are getting an all black image it is probably because at some point during the inference process a value becomes NaN (due to floating point overflow or some other problem) and this NaN propagates to the entire image and it is displayed as black. This is probably happening with Big Sleep too but I don't know why there either...

illtellyoulater · 2022-03-08T21:12:51Z

@crowsonkb you nailed it :) This is exactly what's happening at least for Big Sleep where I reported the image containing NaN values with more detail: please see lucidrains/big-sleep#129 (comment). I really hope you could be able to provide some more valuable insight!

I'm very inexperienced in ML and related libraries, so I cannot fully debug this on my own, however I need to do it because I'm a little concerned those problems could be due to the recent gpu I got (PNY Nvidia GTX 1660 TI 6 Gb) so I will have to find out and in case it's faulty I will have to return it... I know it seems unlikely, but why am I the only one facing this issue? 😔

Just when I was so excited that with my new purchase I could finally be able to run bigger models, now I have to deal with this weird mysterious problem...

illtellyoulater · 2022-03-08T22:27:23Z

Hey @crowsonkb, hold on, look at this!
Another ML project is joining the "black images" party...

In fact I've just found out that in my case glide-text2im is also generating black images!!! 😮

Now this is starting to get a little weird, isn't it ?

crowsonkb · 2022-03-08T22:31:37Z

That is really weird! If you run cc12m_1_cfg in CPU mode (use cfg_sample.py with --device cpu and set --steps 25 or something so it goes faster) , you get an image output that isn't black, right?

illtellyoulater · 2022-03-08T22:57:16Z

Yes, right! Not black at all! I get a very colorful image using CPU... This is so puzzling and frustrating at the same...

crowsonkb · 2022-03-08T23:04:05Z

I would really suspect the GPU might be bad at this point tbh.

illtellyoulater · 2022-03-19T02:02:46Z

Just a quick update. I think we can rule out the option of a faulty GPU, as I could finally make it work with BigSleep and a couple of other projects. In those cases the solution was installing the latest version of torch with pip, like so:
pip3 install torch==1.11.0+cu115 torchvision==0.12.0+cu115 -f https://download.pytorch.org/whl/torch_stable.html

However this did not work for both v-diffusion and glide-text2im, which continue to generate NaN values.
All I can say for now is I suspect it has to do with the clip library...

illtellyoulater · 2022-03-21T10:37:06Z

Ok apparently it seems that for some reason I'm only having problems with projects involving OpenAI models.
Those models are the ones generating NaN values, while models from other projects seems to work just fine (provided I'm using the torch==1.11.0+cu115 package).

What could it be that prevents OpenAI models in particular from working as expected?
Is it possible that some models are built or trained without taking into account compatibility with some recent GPUs ?
Please let me know anything you suspect is remotely useful! 🤞

woctezuma · 2022-03-21T10:55:21Z

Provide links to:

recent projects which work,
OpenAI projects which do not work.

Then compare the requirements.

illtellyoulater · 2022-03-29T14:29:58Z

I fixed it by installing (with pip) a version of Torch that includes CUDA Toolkit v10.2, like this:

pip install torch==1.10.1+cu102 torchvision==0.11.2+cu102 -f https://download.pytorch.org/whl/torch_stable.html

With any CUDA Toolkit version higher than that it would not work... not sure if this strict requirement comes from my system's particular configuration, or this project, or the CLIP project..

Anyway, using v10.2 was enough to make cfg_sample.py work, but not to make clic_sample.py work.

To also fix that I had to make the following change to clip.py, at line 118:

https://github.com/openai/CLIP/blob/main/clip/clip.py#L118-L123

where, before the if condition, I had to add:

name = "ViT-B/32"

in order to force this model.
Otherwise, by using other models I would either receive:

a CUDA OUT OF MEMORY error
A CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling 'cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)
or a RuntimeError: mat1 dim 1 must match mat2 dim 0

That's it!
Maybe it doesn't work at 100%, but it works and I'm having fun with it already! :)
Thank you for working on this!

illtellyoulater mentioned this issue Mar 8, 2022

While running the clip_guided notebook in CPU mode I get: "RuntimeError - Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.FloatTensor instead" openai/glide-text2im#28

Closed

illtellyoulater closed this as completed Apr 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generated images are completely black?! 😵 What am I doing wrong? #15

Generated images are completely black?! 😵 What am I doing wrong? #15

illtellyoulater commented Mar 8, 2022

crowsonkb commented Mar 8, 2022

illtellyoulater commented Mar 8, 2022 •

edited

illtellyoulater commented Mar 8, 2022 •

edited

crowsonkb commented Mar 8, 2022

illtellyoulater commented Mar 8, 2022

crowsonkb commented Mar 8, 2022

illtellyoulater commented Mar 19, 2022 •

edited

illtellyoulater commented Mar 21, 2022 •

edited

woctezuma commented Mar 21, 2022 •

edited

illtellyoulater commented Mar 29, 2022 •

edited

Generated images are completely black?! 😵 What am I doing wrong? #15

Generated images are completely black?! 😵 What am I doing wrong? #15

Comments

illtellyoulater commented Mar 8, 2022

crowsonkb commented Mar 8, 2022

illtellyoulater commented Mar 8, 2022 • edited

illtellyoulater commented Mar 8, 2022 • edited

crowsonkb commented Mar 8, 2022

illtellyoulater commented Mar 8, 2022

crowsonkb commented Mar 8, 2022

illtellyoulater commented Mar 19, 2022 • edited

illtellyoulater commented Mar 21, 2022 • edited

woctezuma commented Mar 21, 2022 • edited

illtellyoulater commented Mar 29, 2022 • edited

I fixed it by installing (with pip) a version of Torch that includes CUDA Toolkit v10.2, like this:

illtellyoulater commented Mar 8, 2022 •

edited

illtellyoulater commented Mar 8, 2022 •

edited

illtellyoulater commented Mar 19, 2022 •

edited

illtellyoulater commented Mar 21, 2022 •

edited

woctezuma commented Mar 21, 2022 •

edited

illtellyoulater commented Mar 29, 2022 •

edited