**Aeronautics Institute of Technology – ITA**

**Computer Vision – CM-203**

**Professors:** 

Marcos Ricardo Omena de Albuquerque Maximo

Gabriel Adriano de Melo


**Instructions:**

Before submitting your lab, be sure that everything is running correctly (in sequence): first, **restart the kernel** (`Runtime->Restart Runtime` in Colab or `Kernel->Restart` in Jupyter). Then, execute all cells (`Runtime->Run All` in Colab or `Cell->Run All` in Jupyter) and verifies that all cells run without any errors, expecially the automatic grading ones, i.e. the ones with `assert`s.

**Do not delete the answer cells**, i.e. the ones that contains `WRITE YOUR CODE HERE` or `WRITE YOUR ANSWER HERE`, because they contain metadata with the ids of the cells for the grading system. For the same reason, **do not delete the test cells**, i.e. the ones with `assert`s. The autograding system executes all the code sequentially, adding extra tests in the test cells. There is no problem in creating new cells, as long as you do not delete answer or test cells. Moreover, keep your solutions within the reserved spaces.

The notebooks are implemented to be compatible with Google Colab, and they install the dependencies and download the datasets automatically. The commands which start with ! (exclamation mark) are bash commands and can be executed in a Linux terminal.

---

# Visual Attention Laboratory

This is a simplified implementation and verification of a convolutional attention model, a Transformer head, and inference using diffusion by a generative model. Therefore, you can exercise the main concepts of the attention lecture.

You will need a GPU with CUDA to execute the Stable Diffusion (you have access to this in Colab). To use the GPU in Colab, go to `Edit > Notebook setting` or `Runtime > Change runtime type` and select `GPU` as the acceleration hardware. For automatic evaluation, we are going to only execute the `assert`s, so you don't need to worry if you aren't able to run this using a GPU in Colab.

This lab was originally created by Gabriel Melo. Marcos Maximo translated it to English and made some minor improvements.

In [None]:
!pip install -Uq diffusers==0.7.2 transformers==4.24.0 fastcore==1.5.29 # opencv-contrib-python==4.6.0.66 torch==1.12.1
# Googledrive blocked the download of a large file, but you can download manually and move to the folder if you prefer
# You can later define local_files_only=True in model.from_pretrained
#! [ ! -d ~/.cache/huggingface ] && mkdir -p ~/.cache/huggingface && gdown -O ~/.cache/models.tar 1S2cvcS-XlNZFam5kLSwfyWFcViA-on80 && tar -xf ~/.cache/models.tar -C ~/.cache/ && rm ~/.cache/models.tar

import cv2
import numpy as np
from PIL import Image
from tqdm import tqdm
from collections import OrderedDict
from matplotlib import pyplot as plt

import torch
import torch.nn as nn
from transformers import CLIPTextModel, CLIPTokenizer
from diffusers import AutoencoderKL, UNet2DConditionModel, LMSDiscreteScheduler, DDPMScheduler
import logging
import warnings
hf_token = '' # PUT HERE YOUR Token to download from Hugging Face
execute_transformer = False
logging.disable(logging.WARNING)
warnings.filterwarnings("ignore")

## Convolutional Attention

For images, we have the so-called spatial attention. This is used in many state-of-the-art convolutional models:

$\textbf{M}_{s}\left(F\right) = \sigma\left(f^{7x7}\left(\left[\text{AvgPool}\left(F\right);\text{MaxPool}\left(F\right)\right]\right)\right)$

$\textbf{M}_{s}\left(F\right) = \sigma\left(f^{7x7}\left(\left[\mathbf{F}^{s}_{avg};\mathbf{F}^{s}_{max} \right]\right)\right)$

![Spatial Attention Module](https://production-media.paperswithcode.com/methods/Screen_Shot_2020-06-25_at_1.27.27_PM_CjrAZaI.png)

Implement the function below, which is related to the spatial attention module (SAM) of the CBAM model from [Woo et. al (2018)](https://arxiv.org/abs/1807.06521). (3 points)

Use the following operations:
- `torch.cat`: concatenate two or more tensors;
- `torch.max`: compute the maximum value across a dimension;
- `torch.mean`: compute the average across a dimension;
- `torch.sigmoid`: apply the sigmoid activation function to each element of the tensor (element-wise).

Verify Pytorch's documentation for each function so you better understand how to use them.
When using `torch.max` and `torch.mean`, you can change the shape of the tensor. To correct this, you can use `.unsqueeze`/`.view`/`.reshape`/`tensor[:, None, :, :]` or you can use the parameter `keepdim` as `True` to keep the dimension.

To multiply two tensors element-wise, use `*` or `torch.multiply`.

To execute the convolutional operation, use it as a function call, i.e. as `operation(input)`. 
<details><summary><b>---Hint---</b></summary>
<p>
In SAM, the average and maximum values are computed across the channels. In a tensor of shape (N, C, H, W), the channels are at the dimension 1 (C).
</p>
</details>

In [None]:
def spatial_attention(convolutional_operation: nn.Module, features: torch.tensor) -> torch.tensor:
    """
    Implements the spatial attention module (SAM) by appplying an attention map to the
    features in a multiplicative form. Obs.: concatenates the Avg before the Max.
    
    :param convolutional_operation: convolutional operation (using a filter 𝑓) which 
                                         receives an image with two channels as input
                                         and returns an image with a single channel.
    :param features: tensor of shape (N, C, H, W), where N is the number of batches,
                             C is the number of channels, H is the height, and W is the width.
    return: tensor of shape (N, C, H, W), the resulting tensor after the spatial attention operation.
    """
    # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
    raise NotImplementedError()
    return result

In [None]:
conv = nn.Conv2d(in_channels=2, out_channels=1, kernel_size=3, stride=1, padding=1)
conv.load_state_dict(OrderedDict([('weight', torch.tensor([[[ [-0.0165, -0.0119, -0.1987],
                                                              [ 0.1247, -0.0875,  0.0907],
                                                              [-0.1658,  0.2204,  0.1959]],
                                                             [[-0.0925, -0.1122, -0.2284],
                                                              [ 0.0141,  0.0105, -0.0061],
                                                              [ 0.0526,  0.1581, -0.1757]] ]])),
                                  ('bias', torch.tensor([0.0675]))]))
features = torch.tensor([[[[-0.0165, -0.0119, -0.1987, 1, 1, 1, 1],
                           [ 0.1247, -0.0875,  0.0907, 1, 1, 1, 1],
                           [ 0.1247, -0.0875,  0.0907, 1, 1, 1, 1],
                           [ 0.1247, -0.0875,  0.0907, 1, 1, 1, 1],
                           [-0.1658,  0.2204,  0.1959, 1, 1, 1, 1]],
                          [[-0.0165, -0.0119, -0.1987, 1, 1, 1, 1],
                           [ 0.1247, -0.0875,  0.0907, 1, 1, 1, 1],
                           [ 0.1247, -0.0875,  0.0907, 1, 1, 1, 1],
                           [ 0.1247, -0.0875,  0.0907, 1, 1, 1, 1],
                           [-0.1658,  0.2204,  0.1959, 1, 1, 1, 1]],
                          [[-0.0165, -0.0119, -0.1987, 1, 1, 1, 1],
                           [ 0.1247, -0.0875,  0.0907, 1, 1, 1, 1],
                           [ 0.1247, -0.0875,  0.0907, 1, 1, 1, 1],
                           [ 0.1247, -0.0875,  0.0907, 1, 1, 1, 1],
                           [-0.1658,  0.2204,  0.1959, 1, 1, 1, 1]]
                        ]])
result = spatial_attention(conv, features)
assert result.requires_grad
assert torch.norm(result - torch.tensor(
       [[[[-0.0087, -0.0060, -0.1107,  0.6073,  0.6223,  0.6223,  0.5974],
          [ 0.0656, -0.0469,  0.0407,  0.4866,  0.4599,  0.4599,  0.5403],
          [ 0.0660, -0.0440,  0.0401,  0.4788,  0.4599,  0.4599,  0.5403],
          [ 0.0628, -0.0473,  0.0402,  0.4758,  0.4599,  0.4599,  0.5403],
          [-0.0879,  0.1103,  0.0852,  0.3873,  0.3902,  0.3902,  0.4741]],

         [[-0.0087, -0.0060, -0.1107,  0.6073,  0.6223,  0.6223,  0.5974],
          [ 0.0656, -0.0469,  0.0407,  0.4866,  0.4599,  0.4599,  0.5403],
          [ 0.0660, -0.0440,  0.0401,  0.4788,  0.4599,  0.4599,  0.5403],
          [ 0.0628, -0.0473,  0.0402,  0.4758,  0.4599,  0.4599,  0.5403],
          [-0.0879,  0.1103,  0.0852,  0.3873,  0.3902,  0.3902,  0.4741]],

         [[-0.0087, -0.0060, -0.1107,  0.6073,  0.6223,  0.6223,  0.5974],
          [ 0.0656, -0.0469,  0.0407,  0.4866,  0.4599,  0.4599,  0.5403],
          [ 0.0660, -0.0440,  0.0401,  0.4788,  0.4599,  0.4599,  0.5403],
          [ 0.0628, -0.0473,  0.0402,  0.4758,  0.4599,  0.4599,  0.5403],
          [-0.0879,  0.1103,  0.0852,  0.3873,  0.3902,  0.3902,  0.4741]]]])).item() < 1e-3

## Transformers

An attention model initially proposed for Natural Language Processing (NLP) to allow highly parallelizable operations and facilitate the training of long sequences. It has already been adapted to Computer Vision through the Vision Transformer (ViT), which essentially treats the image as a sequence of patches.

![Transformer Architecture](https://lilianweng.github.io/posts/2018-06-24-attention/transformer.png)

The Transformer was initially proposed by [Vaswani et. al (2017)](https://arxiv.org/abs/1706.03762v5), following the equations below:

$\text{Attention}(\mathbf{Q}, \mathbf{K}, \mathbf{V}) = \text{softmax}(\frac{\mathbf{Q}\mathbf{K}^\top}{\sqrt{n}})\mathbf{V}$

$\begin{aligned}
\text{MultiHead}(\mathbf{Q}, \mathbf{K}, \mathbf{V}) &= [\text{head}_1; \dots; \text{head}_h]\mathbf{W}^O \\
\text{where head}_i &= \text{Attention}(\mathbf{Q}\mathbf{W}^Q_i, \mathbf{K}\mathbf{W}^K_i, \mathbf{V}\mathbf{W}^V_i)
\end{aligned}$

Implement the attention head of the Transformer. (4 points)

Use the function `.transpose(dim1, dim2)` to transpose the dimensions 1 and 2 of a tensor. To do a regular matrix multiplication across the last two dimensions of a tensor, use the operator `@` or the function `torch.matmul`. To obtain the dimensions of a tensor, use `.shape ` or `.size()`.
<details><summary><b>---Hint---</b></summary>
<p>
- To use the projection operations, use them as functions, similarly to what you have done in the previous exercise.
- Use `tensor.transpose(-1, -2)` to transpose the last two dimensions of a tensor.
</p>
</details>

In [None]:
def transformer_head(q: torch.tensor, k: torch.tensor, v: torch.tensor, 
                     proj_q: nn.Module, proj_k: nn.Module, proj_v: nn.Module, 
                     reproj: nn.Module, softmax: nn.Module):
    """
    Implements the head of the Transformer architecture given by equation
    (softmax((q' @ k'.T) / sqrt(D_k)) @ v')' where ' represents the operation of linear
    projection for each tensor on its vector space.
    
    :param q: query tensor of shape (N, L, D).
    :param k: key tensor of shape (N, L, D).
    :param v: value tensor of shape (N, L, D).
    :param proj_q: operation that does a linear projection of a tensor (N, L, D) on 
                   H vector spaces of dimension D_k, resulting in a tensor (N, H, L, D_k).
    :param proj_k: operation that does a linear projection of a tensor (N, L, D) on
                   H vector spaces of dimension D_k, resulting in a tensor (N, H, L, D_k).
    :param proj_v: operation that does a linear projection of a tensor (N, L, D) on
                   H vector spaces of dimension D_v, resulting in a tensor (N, H, L, D_v).
    :param reproj: operation that does a concatenation and linear projection of a tensor 
                   (N, H, L, D_v) of dimension E2, resulting in a tensor (N, L, D).
    :param softmax: softmax operation over the tensor last dimension.
    :return: tensor of shape (N, L, D), final result of the attention module.
    """
    # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
    raise NotImplementedError()
    return result

class Projection(nn.Linear):
    def __init__(self, dim_in, num_projs, dim_out, **kwargs):
        super().__init__(dim_in, num_projs * dim_out, **kwargs)
        self.num_projs = num_projs
        
    def forward(self, x):
        x_flat = super().forward(x)
        return x_flat.reshape(*x_flat.shape[:2], self.num_projs, -1).transpose(1, 2)

class ReProjection(nn.Linear):
    def __init__(self, dim_in, num_projs, dim_out, **kwargs):
        super().__init__(dim_in * num_projs, dim_out, **kwargs)
        self.num_projs = num_projs
    
    def forward(self, x):
        x_flat = x.transpose(1, 2).contiguous()
        return super().forward(x_flat.reshape(*x_flat.shape[:2], -1))

In [None]:
softmax = nn.Softmax(dim=-1)
proj_q = Projection(4, 2, 3)
proj_k = Projection(4, 2, 3)
proj_v = Projection(4, 2, 5)
reproj = ReProjection(5, 2, 4)
proj_q.load_state_dict(OrderedDict([('weight', torch.tensor([[-0.2903,  0.1144, -0.2388,  0.3808],
                      [ 0.4571, -0.1722, -0.3059,  0.0529],
                      [-0.0764,  0.4670, -0.2218, -0.3888],
                      [ 0.4820, -0.0895,  0.0496, -0.4707],
                      [ 0.0030, -0.0348,  0.4132,  0.3539],
                      [-0.2953,  0.2528,  0.2744, -0.0833]])),
    ('bias', torch.tensor([-0.2345, -0.0744,  0.1075, -0.1458, -0.4157, -0.3114]))]))
proj_k.load_state_dict(OrderedDict([('weight', torch.tensor([[ 0.2761, -0.4321,  0.3839,  0.1454],
                      [-0.3980,  0.1969, -0.4166, -0.2317],
                      [-0.1690, -0.1395,  0.3167, -0.3027],
                      [-0.1422, -0.2583, -0.4430, -0.0448],
                      [ 0.4761,  0.1354,  0.1436,  0.3219],
                      [ 0.3956,  0.0885,  0.2519,  0.1227]])),
    ('bias', torch.tensor([0.4507, 0.3675, 0.3032, 0.3873, 0.4495, 0.1289]))]))
proj_v.load_state_dict(OrderedDict([('weight', torch.tensor([[ 0.2719, -0.0664,  0.3742,  0.3409],
                      [ 0.1177,  0.4588, -0.3498, -0.4507],
                      [-0.4851, -0.3594,  0.2124, -0.1817],
                      [-0.3162, -0.1503, -0.1955,  0.3816],
                      [ 0.0005, -0.0776, -0.4964, -0.1608],
                      [ 0.0581, -0.1783, -0.2951,  0.0964],
                      [-0.3771,  0.1194, -0.4692,  0.1051],
                      [-0.4773, -0.0826,  0.4722, -0.2247],
                      [-0.1419,  0.0064,  0.3859,  0.1678],
                      [ 0.2845, -0.4944,  0.4023, -0.2722]])),
('bias', torch.tensor([ 0.1750,  0.1422, -0.3162,  0.2938,  0.0050, -0.0249,  0.2706, -0.2545, 0.0081, -0.2179]))]))
reproj.load_state_dict(OrderedDict([('weight',
    torch.tensor([[ 0.2090, -0.1364,  0.2534,  0.0466, -0.1310,  0.1216,  0.0299, -0.2586, 0.3095, -0.2708],
            [-0.2939,  0.2829, -0.0410,  0.2643,  0.0008, -0.3008,  0.2150,  0.0737, -0.0611, -0.0701],
            [ 0.2135, -0.1298, -0.3017,  0.2684, -0.0917, -0.1428,  0.1363, -0.1012, 0.2472,  0.2877],
            [ 0.1128,  0.0359, -0.1215,  0.2214,  0.2173, -0.1789,  0.1038,  0.0059, 0.2911, -0.0398]])),
             ('bias', torch.tensor([ 0.2898,  0.0237, -0.2518, -0.2610]))]))
sequence = torch.tensor([[
                   [ 0.1177,  0.4588, -0.3498, -0.4507],
                   [-0.0764,  0.4670, -0.2218, -0.3888],
                   [-0.3771,  0.1194, -0.4692,  0.1051],
                   [-0.1419,  0.0064,  0.3859,  0.1678],
                   [-0.3162, -0.1503, -0.1955,  0.3816],
                   ]])
assert torch.norm(transformer_head(sequence, sequence, sequence, proj_q, proj_k, proj_v, reproj, softmax) -\
    torch.tensor([[[ 0.3581,  0.3061, -0.1407, -0.0638],
                   [ 0.3595,  0.3037, -0.1386, -0.0639],
                   [ 0.3618,  0.3005, -0.1352, -0.0635],
                   [ 0.3583,  0.3030, -0.1361, -0.0625],
                   [ 0.3604,  0.3014, -0.1349, -0.0626]]])).item() < 1e-3

## Diffusion Model

![Generated Examples](https://techcrunch.com/wp-content/uploads/2022/08/53118410-9cce-468a-8bf6-1b8ce4dd1390_1600x925.webp?resize=1200,694)

The models downloaded in Googledrive were distributed by [HuggingFace](https://huggingface.co/) and the example below by [FastAI](https://github.com/fastai/diffusion-nbs/).

In [None]:
def tokenize_encode_string(string_list, tokenizer, text_encoder):
    ml = tokenizer.model_max_length
    inp = tokenizer(string_list, padding="max_length", max_length=ml, truncation=True, return_tensors="pt")
    encoded = text_encoder(inp.input_ids.to("cuda"))[0].half()
    emp = tokenizer([""] * len(string_list), padding="max_length", max_length=ml, truncation=True, return_tensors="pt")
    empty = text_encoder(emp.input_ids.to("cuda"))[0].half()
    return torch.cat([empty, encoded])

def show_image(latent_space, vae, indice=0):
    with torch.no_grad():
        normalized_image = vae.decode(1 / 0.18215 * latent_space).sample[indice]
    image = (normalized_image/2+0.5).clamp(0,1).detach().cpu().permute(1, 2, 0).numpy()
    return Image.fromarray((image*255).round().astype("uint8"))

def diffusion_loop(encoded_text, unet, scheduler, steps=70, g=7.5, width=512, height=512, seed=42):
    torch.manual_seed(seed)
    scheduler.set_timesteps(steps)
    latent_space = torch.randn((encoded_text.shape[0]//2, unet.in_channels, height//8, width//8))
    latent_space = latent_space.to("cuda").half() * scheduler.init_noise_sigma
    for step in tqdm(scheduler.timesteps):
        inp = scheduler.scale_model_input(torch.cat([latent_space] * 2), step)
        with torch.no_grad():
            u, t = unet(inp, step, encoder_hidden_states=encoded_text).sample.chunk(2)
        noise = u + g*(t-u)
        latent_space = scheduler.step(noise, step, latent_space).prev_sample
    return latent_space

The functions above do the tokenization and encoding of the textual inputs, the conversion between the latent space and the image pixels in RGB, and, mainly, the diffusion loop, where the latent space is iteratively updated, reducing the noise, which is estimated by the model at each iteration.

Investigate the Stable Diffusion model by inputting your own string to guide the diffusion. We strongly suggest that you give a input in English, since the text encoder (CLIP) was mainly trained using English. (1 point)

In [None]:
def generate_textual_input() -> str:
    """
    Returns a input string for the Diffusion model.
    
    :return: string to guide the diffusion.
    """
    # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
    raise NotImplementedError()
    return input_string

In [None]:
input_string = generate_textual_input()
assert input_string

Now, see the execution of the Stable Diffusion using your input (change `execute_transformer` to `True` in the beginning):

In [None]:
assert execute_transformer # This assert won't be evaluated
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14", use_auth_token=hf_token, torch_dtype=torch.float16)
text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", use_auth_token=hf_token, torch_dtype=torch.float16).to("cuda")
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-ema", use_auth_token=hf_token, torch_dtype=torch.float16).to("cuda")
unet = UNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=hf_token, subfolder="unet", torch_dtype=torch.float16).to("cuda")

In [None]:
assert execute_transformer # This assert won't be evaluated
scheduler = LMSDiscreteScheduler(beta_start=0.00085,beta_end=0.012,beta_schedule="scaled_linear",num_train_timesteps=1000)
encoded_text = tokenize_encode_string([input_string], tokenizer, text_encoder)
generated_images = diffusion_loop(encoded_text, unet, scheduler, steps=70, seed=42)
show_image(generated_images, vae)

Implement the diffusion loop below with the coefficients of the simple update, where the original image is recovered by subtracting its repsective noise, both weighted by factors. (2 points)

Since this is a simple scheduler model, the DDPM (Denoising Diffusion Probabilistic Models) needs more steps of diffusion, each one with a smaller amplitude.

In [None]:
def simple_update(estimated_noise: torch.tensor, latent_image: torch.tensor, 
                  noise_factor: float, image_factor: float) -> torch.tensor:
    """
    Implements a simple update of the scheduler to remove the noise present in the latent image.
    The noise is multiplied by noise_factor and the image by the image_factor, which depend of the diffusion step.
    
    :param estimated_noise: noise tensor in the latent space of shape (N, C, H, W).
    :param latent_image: image tensor in the latent space of shape (N, C, H, W).
    :noise_factor: factor that multiplies the noise.
    :image_factor: factor that multiplies the latent image.
    :return: tensor of shape (N, C, H, W) estimated from the noise removal of the image, weighted by their
             respective factors.
    """
    # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
    raise NotImplementedError()
    return reconstructed_latent_image

def simple_diffusion_loop(encoded_text, unet, steps=70, g=7.5, width=512, height=512, seed=42, train_steps=1000, beta_start=0.00085, beta_end=0.012):
    torch.manual_seed(seed)
    betas = np.linspace(beta_start, beta_end, train_steps)
    alphas = 1.0 - betas
    alpha_prod = np.pad(np.cumprod(alphas, axis=0), (0, 1), constant_values=1)
    beta_prod = 1 - alpha_prod
    latent_space = torch.randn((encoded_text.shape[0]//2, unet.in_channels, height//8, width//8)).to("cuda").half()
    for t in tqdm(np.arange(0, train_steps, train_steps // steps)[::-1]):
        inp = torch.cat([latent_space] * 2)
        with torch.no_grad():
            uncond, text = unet(inp, t, encoder_hidden_states=encoded_text).sample.chunk(2)
        ruido_estimado = uncond + g*(text-uncond)
        latent_space_original = torch.clamp(simple_update(ruido_estimado, latent_space, 
                                beta_prod[t]**0.5/alpha_prod[t]**0.5, 1/alpha_prod[t]**0.5), -1, 1)
        coeff_original = alpha_prod[t-1]**0.5 * betas[t]/beta_prod[t]
        coeff_current = alphas[t]**0.5 * beta_prod[t-1]/beta_prod[t]
        latent_space = coeff_original*latent_space_original + coeff_current*latent_space
        variance = np.clip(beta_prod[t-1]/beta_prod[t] * betas[t], 1e-20, None)
        ruido = torch.randn(ruido_estimado.shape, device=ruido_estimado.device, dtype=ruido_estimado.dtype)
        latent_space = latent_space + variance**0.5 * ruido
    return latent_space

In [None]:
assert torch.norm(simple_update(torch.tensor([[[[0.1, -0.1], [0.2,-0.2]]]]), 
                                torch.tensor([[[[2, 3], [4,3]]]]), 0.4, 0.9) - \
                  torch.tensor([[[[1.76, 2.74],[3.52, 2.78]]]])) < 1e-6


Apesar do nome de simples, e de ser a mais simples que realmente é empregada durante o treino do modelo, a DDPM requer uma quantidade de iterações que justamente de aproxima do treino, no caso, mil passos, para que tenha uma convergência adequada. Por isso o outro método acaba sendo mais de uma ordem de grandeza mais rápido para inferência.

In [None]:
assert execute_transformer # This assert won't be evaluated
generated_images = simple_diffusion_loop(encoded_text, unet, steps=1000, seed=42)
show_image(generated_images, vae)

We incentivize you to explore the model with other textual inputs.

# Your data and feedback:

Write a feedback for the lab so we can make it better for the next years.

In the following variables, write the number of hours spent on this lab, the perceived difficulty, and the expected grade (you may delete the `raise` and the comments):

In [None]:
# meta_eval manual_graded_answer 0

horas_gastas = None    # 1.5   - Float number with the number of hours spent 
dificuldade_lab = None # 0     - Float number from 0.0 to 10.0 (inclusive)
nota_esperada = None   # 10    - Float number from 0.0 to 10.0 (inclusive)

# WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
raise NotImplementedError()

Write below other comments or feedbacks about the lab. If you did not understand anything about the lab, please also comment here.

If you find any typo or bug in the lab, please comment below so we can fix it.

WRITE YOUR SOLUTION HERE! (do not change this first line):

**ATTENTION**

**ATTENTION**

**ATTENTION**

**ATTENTION**

**DISCURSIVE QUESTION**

WRITE YOUR ANSWER HERE (do not delete this cell so the ID is not lost)

**ATTENTION**

**ATTENTION**


**End of the lab!**