RTX 5090 Tested Against FLUX DEV, SD 3.5 Large, SD 3.5 Medium, SDXL, SD 1.5, AMD 9950X + RTX 3090 TI #116
FurkanGozukara
announced in
Tutorials
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
RTX 5090 Tested Against FLUX DEV, SD 3.5 Large, SD 3.5 Medium, SDXL, SD 1.5, AMD 9950X + RTX 3090 TI
Full tutorial: https://www.youtube.com/watch?v=jHlGzaDLkto
In this video I have intensively compared RTX 5090 speed on FLUX DEV, FLUX Fill, SD 3.5 Large, SD 3.5 Medium, Stable Diffusion XL (SDXL) and Stable Diffusion 1.5 (SD 1.5) models. For each benchmark, I have compared RTX 5090 against RTX 3090 TI so we see the speed improvement. Moreover, I have tested FP8 vs 16-bit precision for FLUX and SD 3.5 Large and SD 3.5 Medium models. Furthermore, I have tested the speed impact of changing prompt on FLUX DEV model since one of the follower had requested. Full specs of the system provided below.
I have used SwarmUI with ComfyUI backend so these benchmarks are literally done on ComfyUI you can think as. Currently no other interface / UI supporting RTX 5000 series as far as i know.
🔗Automatic Installer and Model Downloader RTX 5000 Series Support⤵️
🔗Manually Install RTX 5000 Series Support⤵️
🔗SECourses 10000+ Members Discord⤵️
🔗SECourses Amazing Generative AI GitHub⤵️
🔗SECourses AI APPs Index⤵️
🔗GPU Specs and the used Machine Tutorial⤵️
🔗RTX 5090 Benchmarking Video Series Playlist⤵️
00:00:00 Introduction to the video and PC configuration and specs
00:00:30 Backends of ComfyUI and the GPUs and machine config
00:00:40 Testing FLUX DEV model performance on RTX 5090 and RTX 3090 Ti
00:00:53 RTX 5090 and RTX 3090 Ti GPU sensor values while generating images on FLUX Dev
00:02:10 How to change precision to 8-bit (FP8) vs 16-bit on SwarmUI including FP16 T5 XXL
00:03:34 Testing FLUX Fill (inpainting and outpainting) model performance on RTX 5090 and RTX 3090 Ti
00:04:56 Testing speed impact of changing prompt when generating new images
00:06:23 Testing Stable Diffusion 3.5 Medium (SD 3.5 Medium) model performance on RTX 5090 and RTX 3090 Ti
00:08:27 Testing Stable Diffusion 3.5 Large (SD 3.5 Large) model performance on RTX 5090 and RTX 3090 Ti
00:09:02 Where you can learn more about the GPU and the machine that is being used for benchmarking
00:09:42 What kind of Overclocking I am doing on RTX 5090, overlocking levels
00:10:12 Testing Stable Diffusion XL (SDXL) model performance on RTX 5090 and RTX 3090 Ti
00:11:41 Testing Stable Diffusion 1.5 (SD 1.5) model performance on RTX 5090 and RTX 3090 Ti
00:12:54 Comparison table and evaluation of the models and the speeds of RTX 5090 and RTX 3090 Ti
00:15:30 How to install SwarmUI and make it working with RTX 5000 series GPUs and how to download AI models ultra fast
00:17:01 How to use our unified SwarmUI model downloader to download FLUX, SD 1.5, SDXL, SD 3.5 models and more models
00:19:00 How to upgrade Torch / ComfyUI backend to make it work on RTX 5000 series GPUs
00:19:58 Which folder will be deleted and replaced with pre-made backend
Full PC Specs as Below
CPU : AMD Ryzen 9 9950X 4.3 GHz AM5
Motherboard : ASUS ROG STRIX X870E-E GAMING WIFI
GPU 1 - MSI RTX 5090 GAMING TRIO OC
GPU 2 - Gainward RTX 3090 Ti Phantom
CASE : Cooler Master Coolermaster HAF700 Evo H700E-WGNN-S00 Gaming Full Tower Pc Case White
RAM : Corsair 96GB(2x48) Vengeance RGB Black 6400Mhz CL32 DDR5 Ram (CMH96GX5M2B6400C32)
RAMs are working at 6000 MT
RAM Timings : 30-38-38-38-76 (tCAS-tRCD-tRP-tRAS-tRC)
Disk : 2x Samsung 990 Pro MZ-V9P4T0BW 4 TB
PSU : Cooler Master V Platinum V2 MPZ-G002-AFAP-BEU Gen5.1 1600 W
CPU Cooling : Arctic Liquid Freezer III 420 ACFRE00137A
So lets beging testing #rtx5090 and #rtx3090ti and AMD #overlock
Video Transcription
00:00:00 Greetings everyone. Today, I am going to test RTX 5090 on SwarmUI with FLUX models,
00:00:08 Stable Diffusion XL, SDXL models, Stable Diffusion 3 model, and also Stable Diffusion
00:00:16 1.5 models. Currently, SwarmUI is not natively supporting RTX 5000 series yet, but don't worry,
00:00:26 at the end of the video, I will show you installation and download of the models. So,
00:00:31 as usual, I have added two backends. One of them is RTX 5090 and the other one is RTX 3090 Ti. And
00:00:40 I did set the FLUX DEV parameters. The model is selected, 20 steps. I am going to generate
00:00:46 10 images, so let's see its speed with 1 megapixel resolution. Let's generate. Meanwhile, generating,
00:00:54 let's watch the values of the GPUs. So, let's reset, and let's also reset this one. Currently,
00:01:02 I am generating images with FP8 precision. When we go to the server and logs, we will see the speed
00:01:10 of the generations and we are seeing the per step speed. Unfortunately, it is lower than what it is,
00:01:19 because I am recording a video right now. I will stop the video and generate again
00:01:24 and show you its optimal speed when there is no recording. And the images are getting generated
00:01:31 right now. I have generated few images while the video recording has been turned off and we
00:01:38 can see that it is taking around 10 seconds for generating images on RTX 5090 and 25,
00:01:47 26 seconds to generate on RTX 3090 Ti. So, the step speed is around 2.2 IT per second and 1.27
00:01:59 seconds per IT. We can see that RTX 5090 is 2.5 times faster at 8-bit precision. Now, I am going
00:02:10 to change the precision to 16 bit from here, Now, I am going to change the precision to 16 bit from
00:02:14 here, and I am also going to select the 16 bit precision Clip model from here. This is improving
00:02:23 quality for sure, and let's see the new speed. Let's generate. Let's generate. This will not fit
00:02:30 into VRAM of the RTX 3090 Ti possibly, especially if it is your primary GPU because your Windows
00:02:39 will be also using VRAM. But let's see its speed. It will be doing some offloading probably,
00:02:47 so it will be possibly slower. Now, both of the GPUs are running. You see, this is the watt usages
00:02:53 as you are seeing right now. Let's go to the server debug. So, these are the speeds. Now,
00:02:59 I will turn off video recording to get perfect timing. Alright, few images have been generated
00:03:07 and we can see that RTX 5090 is faster in 16-bit precision. It is now 9.55 seconds for per image
00:03:20 and RTX 3090 Ti is taking around 30 seconds for per image. So, RTX 5090 is 3 times faster
00:03:31 than RTX 3090 Ti. Now, I am going to test FLUX fill dev model. Let's edit this image and let's
00:03:41 inpaint somewhere of the image. Perhaps here, here, and let's say green car light. It is not
00:03:50 important. We just want to see how it works. So, the model is selected. Init Image Creativity 1,
00:03:56 Init Image Reset To Norm to 1. And we also need to set the FLUX guidance scale as 30. And let's
00:04:05 generate some images with the FLUX fill dev model. Okay, so the generations has been started. Now,
00:04:12 I will stop video recording and test its maximum performance. Let me also show you the status of
00:04:22 the GPU and the usage and the watts, as you are seeing right now. It is almost using the maximum
00:04:28 power and these are the power limits. We can also see the values here, the GPU temperature, hotspot
00:04:36 and everything is displayed on the screen right now. You can just pause the video and check them
00:04:41 out if you wish. Alright, the images have been generated. I slightly modified and made it bigger
00:04:48 and we can see that the duration is almost same as the FLUX dev model with the FLUX fill model. Now,
00:04:57 I am going to test changing prompt speed because someone asked me to do that to see the difference
00:05:05 between changing prompts or not. So, for this test, I am going to use FLUX dev model. I am
00:05:12 going to generate two images. Let's load now. I will stop video recording to see its real impact
00:05:19 when we change prompt, how much time it takes, how much difference it makes to compare it to
00:05:26 generating multiple images with the same prompt. Alright, changing prompt added significant amount
00:05:34 of duration to the generations from around 9.7, 9.6 second to 13.4, 12.5 second generation. So,
00:05:47 it increased the image generation duration around 30% whenever you change a prompt. We can see the
00:05:54 same behavior on RTX 3090 Ti as well. It is increased to 36, 37 seconds from 30 second,
00:06:05 29.5 second. So, changing prompt has such dramatic impact because it is encoding the prompt with the
00:06:16 T5 XXL encoder model. Therefore, it is adding a delay whenever you change a prompt. Now,
00:06:23 I will show you the speed of the generations on SD 3.5 Medium model. So, I am selecting this
00:06:33 model and let's see the generation speeds. Let's generate 10 images. I will set the CFG scale to 7
00:06:41 and the rest is same. I am going to make the test with FP8 first, then I will test with FP16. Let's
00:06:50 begin. In the first generation, I will show you the values, then I will turn off the recording
00:06:56 and test again. So, let's see the values of the GPU. The generations has started. We can see that
00:07:05 it is using lesser VRAM because currently we are at the FP8 precision. We can see the CPU
00:07:13 usages as well. Let's reset. Okay, memory clock, GPU clock. The GPU clock is just mind blowing,
00:07:21 as you can see right now. Let's see the IT per second while recording a video. Yes, yes, it
00:07:29 is just mind blowing 4.7 IT per second, as you are seeing on RTX 5090. And RTX 3090 Ti is 2.38 IT per
00:07:39 second. All images have been generated with the SD 3.5 Medium model. We can see the generations. Now,
00:07:47 I will turn off video recording and test again. Alright, the generations has been completed. It
00:07:53 is taking 3.88 seconds on RTX 5090 and 8.6 second on RTX 3090 Ti. Now, let's try the generation in
00:08:06 16 bit precision and let's see if there is any difference or not. The 16-bit precision image
00:08:13 generations also took same amount of time. I didn't see any significant difference. So,
00:08:21 since both of them is fitting into the VRAM, there is no issues and there is no difference. I am
00:08:27 downloading the large model as well and now I am going to test large model with the same way. Okay,
00:08:34 currently generating images with the SD 3.5 Large model and you can see that both of the GPUs are
00:08:40 running with maximum performance. We can see that 100 percentage of usage. They are both fitting
00:08:49 into the VRAM even at the 16 bit precision. We can see that it is using the entire GPU power. 575
00:08:59 watts and 450 watts. If you want to see the GPU, I already made a tutorial video about it. It will be
00:09:06 in the playlist that will be in the link of this video. So, I recommend to watch this first video
00:09:13 to learn more information about the GPU and how I am using it in my machine. Alright, so 10 images
00:09:21 have been generated while I am not recording and we can see that for Stable Diffusion 3.5 Large
00:09:29 model, RTX 5090 is performing around 9 seconds for per image and RTX 3090 Ti is taking around
00:09:39 21 seconds for per image. Another thing is that I am running my RTX 5090 slightly overclocked. You
00:09:49 can see plus 1,500 memory and plus 300 GPU clock. Now, I will repeat the test on SDXL
00:09:59 and SD 1.5 model. All of the models are tested at 1 megapixel, except I will make the test with
00:10:08 SD 1.5 with 768 to 768. For SDXL testing, I am going to use SDXL Juggernaut XL version 11
00:10:19 and let's see. Okay, the tests are beginning very quickly. We can see the speed. Wow, the speed is
00:10:27 almost real time. This is 20 steps. This is not a lightning model and real time you are watching the
00:10:34 speed of generation on RTX 5090. It is just mind blowing. Yes, already completed. Let's generate
00:10:43 20 more images. As you are seeing the values here. It is just too fast and yes, the images are just
00:10:51 popping up. Let's see the IT per second. Wow. Wow, it is just amazing 8.8 IT per second. I mean just
00:11:02 mind blowing. So, to see the real speed, I will now stop recording and let's see then. Alright,
00:11:09 SDXL is taking only 2.2 second on RTX 5090 and 5 second on RTX 3090 Ti. We are doing 20 steps so
00:11:21 far. Let's see the step speeds in the debug menu. Yes. So, we see that it is 10.1 IT per second for
00:11:32 RTX 5090 on SDXL and 4.2 IT per second on RTX 3090 Ti for SDXL. It is time to test SD 1.5. For
00:11:45 SD 1.5, I am going to test SD 1.5 Realistic Vision version 6. I am going to change the resolution to
00:11:53 768 to 768, otherwise it is too big resolution for SD 1.5 and let's see the speed. Okay, yes,
00:12:03 even this resolution is sometimes too big, but the speed is just mind blowing as you are seeing. Yes,
00:12:11 you see? They are just popping up, popping up, almost instant. Wow, just mind blowing. Now,
00:12:18 let me stop the video and see. Alright, the speed is just mind blowing. We can see that 1.2 seconds
00:12:27 for RTX 5090 and 2.2 seconds for RTX 3090 Ti. Let's see the step speed. These speeds are just
00:12:36 mind blowing and yes, we can see that it is 19 IT per second, almost 10 IT per second. So, these are
00:12:46 the speeds for SD 1.5 at 768 to 768 resolution. This is just amazing. So, I have prepared a table
00:12:56 for you to see everything in a single place. I have made some extra testing. Stable diffusion
00:13:04 1.5 at 512 pixel resolution. So, the average image generation takes 1.14 seconds for RTX 3090 Ti and
00:13:20 0.64 seconds for RTX 5090. These are all 20 steps. The step speeds are also calculated based on the
00:13:32 average times. So, actually they are little bit faster because it doesn't include VAE decoding.
00:13:41 So, with the SD 1.5 at 768 pixel resolution, the timing increases average to 2.23 second for 3090
00:13:52 Ti and 1.2 seconds for the RTX 5090. You can see that the speed differences are also like
00:14:02 this. Especially we see more difference as we go to the more demanding models. For example,
00:14:09 SDXL, the RTX 5090 is 2.35 times faster than RTX 3090 Ti. With Stable Diffusion 3 Large model,
00:14:21 when we run it at FP8 precision, it is 2.42 times faster. When it is 16-bit precision, it
00:14:31 is 2.5 times faster. Especially when we run FLUX at 16-bit precision, which is the highest quality,
00:14:41 RTX 5090 is 3.08 times faster. Yes, this is a really, really significant speed difference. We
00:14:53 can see that it is taking on average 10.67 second to generate one image. The lowest generation is
00:15:02 9.39 second and the highest generation took 20.68 second. This is probably not very accurate. I used
00:15:13 AI to parse the logs, but it was on average around 9 to 10 seconds as we have seen. So, the relative
00:15:23 speeds are probably very accurate. So, this is the table. Just pause the video and look carefully if
00:15:30 you wish. Now, how you can install SwarmUI and make it compatible with RTX 5090 series, this is
00:15:40 the part where I will explain how to install it. For installation, there is a post on this GitHub
00:15:48 repository. You can follow this post. However, I have prepared one click installer as well. The one
00:15:55 click installer is included in our SwarmUI easy ultra fast and robust unified downloader. The
00:16:02 link will be in the description of the video and in this post, just download this SwarmUI
00:16:08 model downloader version 20 zip file. Extract it into any folder you wish. Let's extract into my E
00:16:16 drive and once you extract the content, you will see that there is SwarmUI installation. SwarmUI
00:16:24 Windows installer. This will start the official installation. Let me demonstrate you quickly. So,
00:16:30 it is starting. It is starting to install. It is pretty fast. This is the official installation as
00:16:36 I said. The installation started, you see, it is started at this URL. You can also see that
00:16:41 it is started here. Then agree, just actually let's say customize settings, modern dark, next,
00:16:49 just yourself, ComfyUI local. I am not going to download anything and yes, I am sure to install.
00:16:56 Wait for installation to be completed. Meanwhile, you can also download any models that you want.
00:17:02 We have very fast downloader. You see Windows download models.bat file. Double click it and
00:17:08 run anyway. We have downloading feature for all these models. Just read here. You see FP16 FLUX
00:17:18 Dev, FP8 FLUX Dev, FP16 Stable Diffusion 3.5 Large, FP8 scaled Stable Diffusion 3.5 Large,
00:17:27 T5 XXL FP16 version, FP8 scaled T5 version, which works better. Stable diffusion 3.5 Medium,
00:17:38 FP8 scaled Stable Diffusion 3.5 Medium. If it is scaled, it means that its weights are optimized.
00:17:45 So, it is in half precision, but it is improved than the half precision. We have the segmentation
00:17:53 models. We have text to video model Genmo Mochi 1. This is also still working. We have FLUX Redux,
00:18:00 Depth, Canny, FLUX fill model for inpainting and outpainting. We have the newest better
00:18:06 clip large model. In my all experimentation so far, this works better than the default clip
00:18:12 large model. We have the very best deterministic upscale models, which I recommend. So, you can
00:18:19 download all of them, just type whichever you want to download like this and it will download all of
00:18:24 them with amazing speed. Let me demonstrate you. So, the first file is small one, it is slower,
00:18:31 then it will become very, very fast when downloading a bigger file. Yes, it is downloading
00:18:36 with 100 megabytes, 200 megabytes per second. So, this is depending on your internet speed,
00:18:43 of course, but this is optimized to use your entire network speed, 200 megabytes per second,
00:18:50 300 megabytes per second. You can use this downloader to download anything you wish. Okay,
00:18:56 let's see our installation. I think it is completed. Yes, the SwarmUI has been loaded. Now,
00:19:03 what I am going to do is I will upgrade the ComfyUI backend automatically for you so that
00:19:10 it will work with the RTX 5090 series, RTX 50 series. It can be RTX 5080, 5070, 5060. Currently
00:19:21 natively, it is not supported, so you need to do this. So, let's close this. Then double click
00:19:27 SwarmUI upgrade Torch. This is going to download the pre-made ComfyUI backend into the accurate
00:19:35 folder. You see, it is downloading with 8 threads, so it is super fast. Then it will delete the older
00:19:42 one. Be careful with that because if you have other extensions, installations or other stuff,
00:19:48 they will be all deleted in the ComfyUI backend, not anywhere else. So, you can also make a new
00:19:55 fresh installation and install it there. So, which folder this is going to delete? This is going to
00:20:00 delete the DBL backend, ComfyUI, Python embedded. This is going to delete this folder. You see,
00:20:08 it deleted it, then it will extract the content and it will not do anything else. So,
00:20:14 whatever installed inside this Python embedded folder will be deleted and will be replaced
00:20:19 with pre-made by ComfyUI developers themselves. And the SwarmUI will start supporting RTX 5000
00:20:28 series. This is all for today. Hopefully, I will be making much more AI related and other test
00:20:36 videos for RTX 5090. Keep watching. It will be inside the playlist. You see RTX 5090 benchmarks,
00:20:43 tests and experiments. So, stay subscribed, make a reply and tell me which videos, which
00:20:50 AI models you would like to see and hopefully I will try to make them. See you later. Thank you.
Beta Was this translation helpful? Give feedback.
All reactions