How does Stable Diffusion work?
Stable Diffusion works by repeatedly applying a denoising autoencoder guided with text input.

An autoencoder tries to compress an image into a small vector then decompress back to the original image. Textual guidance can be added to improve its efficiency, by specifying what content it should or should not generate. A denoising autoencoder has the primary aim of restoring an image combined with noise back to the form before adding noise.

When we apply such autoencoder repeatedly to any image, the output will become less noisy and more resembles the input text. With enough iterations, we can get a clean image even when we feed pure noise as input. This workflow of repeatedly applying an autoencoder, together with a properly trained autoencoder on wide variety of objects, forms a Stable Diffusion model.
Can you explain it in human words, not in that dork gibberish?
It takes the text you give it and turns it into an image.
Oh, okay. But I'd like a bit more detail, please. How does it actually function, in simple terms?