# ✨ LLM and Stable Diffusion Movie Generator ✨

```
@author:
- Github: https://github.com/StepanTita
- LinkedIn: https://www.linkedin.com/in/stepan-tita/
```

## 🚀 Project Description:
Embark on a journey of educational exploration with this small and captivating project on Diffusion models. This Colab Notebook serves as the MVP, enabling the generation of a compelling story using LLM and transforming it into an enchanting movie.

### 🤖 Models:
**LLMs:**
- [Mistral 7B](https://colab.research.google.com/drive/1x6Gsa_pgmq_g9GNo1BYI0v8DEX_IuYKm#scrollTo=WMDLxQ8UcbtM)

**Diffusion:**
- [Stable Diffusion XL base 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)

## 📚 Resources:
- [Stable Diffusion Dreaming Gist](https://gist.github.com/nateraw/c989468b74c616ebbc6474aa8cdd9e53)
- [Stable Diffusion using Hugging Face](https://towardsdatascience.com/stable-diffusion-using-hugging-face-501d8dbdd8)
- [YouTube: Stable Diffusion Explained](https://youtu.be/sFztPP9qPRc?si=7yobh5yIvKppqwqv)
- [YouTube: Stable Diffusion Paper Explained](https://youtu.be/HoKDTa5jHvg?si=EoTEzfRGDLEJEAwB)


In [None]:
! pip install ctransformers[cuda] torch torchvision diffusers["torch"]==0.24.0 transformers["torch"]==4.36.2 scipy ftfy accelerate xformers numba

Collecting ctransformers[cuda]
  Downloading ctransformers-0.2.27-py3-none-any.whl (9.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.9/9.9 MB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
Collecting diffusers[torch]==0.24.0
  Downloading diffusers-0.24.0-py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m27.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers[torch]==4.36.2
  Downloading transformers-4.36.2-py3-none-any.whl (8.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.2/8.2 MB[0m [31m47.3 MB/s[0m eta [36m0:00:00[0m
Collecting ftfy
  Downloading ftfy-6.1.3-py3-none-any.whl (53 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.4/53.4 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.25.0-py3-none-any.whl (265 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.7/265.7 kB[0

In [None]:
! apt install ffmpeg

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 24 not upgraded.


In [None]:
import os

import numpy as np

import matplotlib.pyplot as plt

import torch
import torch.nn.functional as F

from diffusers import EulerDiscreteScheduler, StableDiffusionXLPipeline

from tqdm import tqdm
from PIL import Image

from ctransformers import AutoModelForCausalLM

import re

from numba import cuda

In [None]:
llm = AutoModelForCausalLM.from_pretrained(
    'TheBloke/Mistral-7B-OpenOrca-GGUF',
    model_file='mistral-7b-openorca.Q4_K_M.gguf',
    model_type='mistral',
    gpu_layers=32
)
llm

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

<ctransformers.llm.LLM at 0x7ce10f88b880>

### Details of the story

In [None]:
details = 'The story happens in the world of God Of War, where old Kratos gives his final battle with the Egypt Gods\' pantheon' # @param {type: "string"}

In [None]:
prompt = f'''
<|im_start|>user
Create a small story in the following format:
- break it down into scenes and name each scene, so that each scene is just a single sentence (or couple sentences) describing what is going on in the scene
- next scene should naturally follow the previous scene
- try to include as much details about how the scene looks like as possible
- do not make the characters talk, only narrate through the scenes view

In your story, incorporate the following details:
{details}
<|im_end|>
<|im_start|>system
'''

In [None]:
reply = llm(
    prompt,
    temperature=0.8
)
reply

" Scene 1: The Calm Before the Storm\nIn the peaceful village of Jotunheim, an elderly Kratos lives a quiet life with his son Atreus. They enjoy their days fishing and hunting together, far from the chaos of the Gods' wars.\n\nScene 2: A Vision of Egypt\nOne night, Kratos has a vivid dream of the Egyptian pantheon planning to unleash a terrible power upon the world. He wakes up with a heavy heart, knowing he must face his past once more.\n\nScene 3: The Journey Begins\nKratos and Atreus set out on a journey through the treacherous realms of the Egyptian pantheon in search of ancient secrets and power that might give them the advantage to combat their dark forces.\n\nScene 4: Encountering the Gods\nAs they venture deeper into the realm, Kratos and Atreus encounter various gods who have been corrupted by the darkness. They must use their wits and strength to overcome these challenges.\n\nScene 5: The Final Battle\nKratos and Atreus finally confront the leaders of the Egyptian pantheon in

In [None]:
def preprocess(sent):
    return re.sub('Scene \d: ', '', sent.strip())

In [None]:
scenes = list(map(preprocess, reply.split('\n\n')))
scenes

["The Calm Before the Storm\nIn the peaceful village of Jotunheim, an elderly Kratos lives a quiet life with his son Atreus. They enjoy their days fishing and hunting together, far from the chaos of the Gods' wars.",
 'A Vision of Egypt\nOne night, Kratos has a vivid dream of the Egyptian pantheon planning to unleash a terrible power upon the world. He wakes up with a heavy heart, knowing he must face his past once more.',
 'The Journey Begins\nKratos and Atreus set out on a journey through the treacherous realms of the Egyptian pantheon in search of ancient secrets and power that might give them the advantage to combat their dark forces.',
 'Encountering the Gods\nAs they venture deeper into the realm, Kratos and Atreus encounter various gods who have been corrupted by the darkness. They must use their wits and strength to overcome these challenges.',
 'The Final Battle\nKratos and Atreus finally confront the leaders of the Egyptian pantheon in a climactic battle']

In [None]:
len(scenes)

5

In [None]:
torch.cuda.empty_cache()

In [None]:
# device = cuda.get_current_device()
# device.reset()

In [None]:
device_id = 0

In [None]:
device = torch.device(f'cuda:{device_id}' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda', index=0)

In [None]:
config = {
    'stable_diffusion_checkpoint': 'stabilityai/stable-diffusion-xl-base-1.0',
    # 'refiner_checkpoint': 'stabilityai/stable-diffusion-xl-refiner-1.0',

    'negative_prompt': 'multiple images, static, painting, illustration, sd character, low quality, low resolution, greyscale, monochrome, nose, cropped, lowres, jpeg artifacts, deformed iris, deformed pupils, bad eyes, semi-realistic worst quality, bad lips, deformed mouth, deformed face, deformed fingers, deformed toes standing still, posing',

    'seed': 255,

    'high_noise_fraction': 0,

    'scheduler_kwargs': {
        "beta_end": 0.012,
        "beta_schedule": "scaled_linear", # one of ["linear", "scaled_linear"]
        "beta_start": 0.00085,
        "interpolation_type": "linear", # one of ["linear", "log_linear"]
        "num_train_timesteps": 1000,
        "prediction_type": "epsilon", # one of ["epsilon", "sample", "v_prediction"]
        "steps_offset": 1,
        "timestep_spacing": "leading", # one of ["linspace", "leading"]
        "trained_betas": None,
        "use_karras_sigmas": False,
    }
}

In [None]:
if config['seed'] is not None:
    generator = [torch.Generator(device=device).manual_seed(config['seed'])]
else:
    generator = [torch.Generator(device=device)]

In [None]:
sdpipe = StableDiffusionXLPipeline.from_pretrained(
    config['stable_diffusion_checkpoint'],
    torch_dtype=torch.float16,
    variant='fp16',
    use_safetensors=True,
    scheduler=EulerDiscreteScheduler(**config['scheduler_kwargs']),
)

sdpipe.to(device)

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

StableDiffusionXLPipeline {
  "_class_name": "StableDiffusionXLPipeline",
  "_diffusers_version": "0.24.0",
  "_name_or_path": "stabilityai/stable-diffusion-xl-base-1.0",
  "feature_extractor": [
    null,
    null
  ],
  "force_zeros_for_empty_prompt": true,
  "image_encoder": [
    null,
    null
  ],
  "scheduler": [
    "diffusers",
    "EulerDiscreteScheduler"
  ],
  "text_encoder": [
    "transformers",
    "CLIPTextModel"
  ],
  "text_encoder_2": [
    "transformers",
    "CLIPTextModelWithProjection"
  ],
  "tokenizer": [
    "transformers",
    "CLIPTokenizer"
  ],
  "tokenizer_2": [
    "transformers",
    "CLIPTokenizer"
  ],
  "unet": [
    "diffusers",
    "UNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}

In [None]:
def slerp(t, v0, v1, DOT_THRESHOLD=0.9995, eps=1e-9):
    """ helper function to spherically interpolate two arrays v1 v2 """

    if not isinstance(v0, np.ndarray):
        inputs_are_torch = True
        input_device = v0.device
        v0 = v0.cpu().numpy().astype(float)
        v1 = v1.cpu().numpy().astype(float)

    dot = np.sum(v0 * v1 / (np.linalg.norm(v0) * np.linalg.norm(v1)))
    if np.abs(dot) > DOT_THRESHOLD:
        v2 = (1 - t) * v0 + t * v1
    else:
        theta_0 = np.arccos(dot)
        sin_theta_0 = np.sin(theta_0)
        theta_t = theta_0 * t
        sin_theta_t = np.sin(theta_t)
        s0 = np.sin(theta_0 - theta_t) / sin_theta_0
        s1 = sin_theta_t / sin_theta_0
        v2 = s0 * v0 + s1 * v1

    if inputs_are_torch:
        v2 = torch.from_numpy(v2).to(input_device)

    return v2.half()

In [None]:
def stable_diffusion_dream(
        prompts, # prompts to dream about
        negative_prompts,
        seeds=[243, 523],
        name = 'dreams', # name of this project, for the output directory
        rootdir = './dreams',
        num_steps = 72,  # number of steps between each pair of sampled points
        # --------------------------------------
        # args you probably don't want to change
        num_inference_steps = 50,
        guidance_scale = 7.5,
        width = 1024,
        height = 1024,
        show_int=False,
        config=None,
        # --------------------------------------
):
    assert len(prompts) == len(seeds)
    assert torch.cuda.is_available()
    assert height % 8 == 0 and width % 8 == 0

    # init the output dir
    outdir = os.path.join(rootdir, name)
    os.makedirs(outdir, exist_ok=True)


    # get the conditional text embeddings based on the prompts
    prompt_embeddings = []
    for prompt, neg_prompt in zip(prompts, negative_prompts):
        with torch.no_grad():
            embed = sdpipe.encode_prompt(prompt=prompt, negative_prompt=neg_prompt)

        prompt_embeddings.append(embed)

    # Take first embed and set it as starting point, leaving rest as list we'll loop over.
    prompt_embedding_a, *prompt_embeddings = prompt_embeddings

    # Take first seed and use it to generate init noise
    init_seed, *seeds = seeds
    init_a = torch.randn(
        (1, sdpipe.unet.in_channels, height // 8, width // 8),
        device=device,
        generator=torch.Generator(device=device).manual_seed(init_seed)
    )

    sdpipe.scheduler.set_timesteps(num_inference_steps)

    frame_index = 0
    for p, prompt_embedding_b in enumerate(prompt_embeddings):

        init_b = torch.randn(
            (1, sdpipe.unet.in_channels, height // 8, width // 8),
            generator=torch.Generator(device='cuda').manual_seed(seeds[p]),
            device=device
        )

        for i, t in enumerate(tqdm(np.linspace(0, 1, num_steps))):
            torch.cuda.empty_cache()

            if not os.path.exists(f'{outdir}/frames'):
                os.mkdir(f'{outdir}/frames')

            cond_embedding = slerp(float(t), prompt_embedding_a[0], prompt_embedding_b[0])
            pool_cond_embedding = slerp(float(t), prompt_embedding_a[2], prompt_embedding_b[2])
            init = slerp(float(t), init_a, init_b)

            image = sdpipe(
                latents=init,
                prompt_embeds=cond_embedding,
                pooled_prompt_embeds=pool_cond_embedding,

                negative_prompt_embeds=prompt_embedding_b[1],
                negative_pooled_prompt_embeds=prompt_embedding_b[3],

                guidance_scale=guidance_scale,
                num_inference_steps=num_inference_steps,

                denoising_end=config['high_noise_fraction'],
                generator=generator,
            ).images[0]

            outpath = os.path.join(outdir, f'frames/{frame_index:04}.jpeg')
            image.save(outpath)
            frame_index += 1

            if show_int:
                plt.figure(figsize=(8, 8))
                plt.imshow(image)
                plt.axis('off')
                plt.show()

        prompt_embedding_a = prompt_embedding_b
        init_a = init_b

### Configuration:
* **NAME** - movie name (no spaces)
* **num_inference_steps** - number of steps for the model to gradual denoise process (recommended >50)
* **num_steps** - number of transition steps between scenes in the movie. For the smooth transition recommended at least 50, but big values might take much longer to generate a movie (this is the number of intermediate images between initial scene and final scene). Transition happens between each pair of scenes, so for $n$ scenes this means $n-1$ transitions, so $n*(n-1)$ total images.
* **guidance_scale** - float number to control how much an image will condition on the text of the scene. If the number is closer to 0, then the model will almost be ignoring the condition. If the value is higher, then the model will follow more literal interpretation
* **FPS** (frames per second) - for a somewhat smooth movie recommended to have at least 10

In [26]:
NAME = 'movie' # @param {type: "string"}
num_inference_steps = 50 # @param {type: "number"}
nums_steps = 5 # @param {type: "number"}
guidance_scale = 7.5 # @param {type: "number"}
FPS = 10 # @param {type: "number"}

In [None]:
stable_diffusion_dream(
    prompts=scenes,
    negative_prompts=[config['negative_prompt']] * len(scenes),
    name='movie',
    guidance_scale=guidance_scale,
    show_int=False,
    seeds=[i for i in range(len(scenes))],
    num_steps=nums_steps,
    width=1024,
    height=1024,
    num_inference_steps=num_inference_steps,
    config=config
)

  (1, sdpipe.unet.in_channels, height // 8, width // 8),
  (1, sdpipe.unet.in_channels, height // 8, width // 8),
  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

 20%|██        | 1/5 [00:51<03:26, 51.55s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 40%|████      | 2/5 [01:39<02:27, 49.27s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 60%|██████    | 3/5 [02:27<01:37, 48.71s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 80%|████████  | 4/5 [03:15<00:48, 48.44s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

100%|██████████| 5/5 [04:03<00:00, 48.67s/it]
  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

 20%|██        | 1/5 [00:47<03:10, 47.72s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 40%|████      | 2/5 [01:35<02:23, 47.74s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 60%|██████    | 3/5 [02:23<01:35, 47.80s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 80%|████████  | 4/5 [03:11<00:47, 47.87s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

100%|██████████| 5/5 [03:59<00:00, 47.86s/it]
  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

 20%|██        | 1/5 [00:48<03:12, 48.02s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 40%|████      | 2/5 [01:36<02:24, 48.03s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 60%|██████    | 3/5 [02:24<01:36, 48.02s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

In [None]:
LOC = f'dreams/{NAME}'

In [None]:
! ffmpeg -r $FPS -f image2 -s 1024x1024 -i $LOC/frames/%04d.jpeg -vcodec libx264 -crf 10 -pix_fmt yuv420p $NAME\.mp4

In [None]:
from IPython.display import HTML
from base64 import b64encode

mp4 = open(f'{NAME}.mp4','rb').read()

data_url = "data:video/mp4;base64," + b64encode(mp4).decode()

In [None]:
HTML(f"""
<video width=400 controls>
      <source src="{data_url}" type="video/mp4">
</video>
""")