In [1]:
!pip install diffusers

Collecting diffusers
  Downloading diffusers-0.27.2-py3-none-any.whl.metadata (18 kB)
Downloading diffusers-0.27.2-py3-none-any.whl (2.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m27.2 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: diffusers
Successfully installed diffusers-0.27.2


# Understanding Stable Diffusion

# Latent Diffusion Models: Theory and Algorithmic Steps

## Introduction

Latent Diffusion Models (LDMs) are a specific type of diffusion model optimized for generating high-quality images in a more memory-efficient manner by operating in a lower-dimensional latent space. Unlike traditional diffusion models that work directly on image pixels and consume considerable memory, LDMs achieve the diffusion process in latent space, significantly reducing the memory requirements. This process involves training the model to denoise random Gaussian noise step by step to eventually produce an image.

## Main Components

There are three main components in a latent diffusion model:

1. **CLIP Text Encoder**: Converts input text into text embeddings.
2. **Variational Auto Encoder (VAE)**: Compresses and decompresses images into and from a lower-dimensional latent space.
3. **U-Net**: Predicts the noise to be removed from the noised latent representations to reconstruct the original image data.

## CLIP Text Encoder

### Basics
The CLIP Text Encoder takes text as input and produces text embeddings. These embeddings represent the text in a form that is close in the latent space to the representation of images encoded by a similar process.

### Algorithmic Steps
1. **Tokenization**: Break down the input text into sub-words or tokens and convert these into numerical representations using a lookup table.
2. **Text to Embedding Conversion**: Utilize the CLIPTextModel to convert the numerical tokens into embeddings that encapsulate the semantic meaning of the text.

### Role in Latent Diffusion
The text embeddings generated by the CLIP Text Encoder are used as one of the inputs to the U-Net model. This enables the generation of images that are semantically related to the input text.

## Variational Auto Encoder (VAE)

### Basics
The VAE consists of two main parts: an encoder and a decoder. The encoder compresses an image into a lower-dimensional latent representation, and the decoder attempts to reconstruct the image from this latent representation.

### Algorithmic Steps
1. **Encoding**: Compress an input image into a latent space representation.
2. **Decoding**: Reconstruct the image from the latent representation.

### Role in Latent Diffusion
The VAE is essential for reducing the computational load of the diffusion process. By operating in latent space, the diffusion model requires less computational power and memory, facilitating the generation of high-resolution images.

## U-Net

### Basics
The U-Net architecture takes two inputs: noisy latents and text embeddings. It outputs the predicted noise residuals to be subtracted from the noisy latents, effectively denoising them.

### Algorithmic Steps
1. **Adding Noise**: Apply a series of noise levels to the latent representations according to a predetermined schedule.
2. **Noise Prediction**: For each noise level, predict the noise present in the noisy latents.
3. **Denoising**: Subtract the predicted noise from the noisy latents to move closer to the original image representation.

### Role in Latent Diffusion
The U-Net is crucial for the iterative denoising process in latent diffusion. By gradually reducing noise from the latents, it guides the generation process to produce images that correspond to the textual description provided.

## Conclusion

Latent diffusion models represent an advanced approach in the field of generative models, especially for tasks like image synthesis. By breaking down the process into distinct components like the CLIP Text Encoder, VAE, and U-Net, LDMs manage to efficiently generate high-quality images from textual descriptions.


# Importing Required Libaries

In [2]:
import torch, logging

## disable warnings
logging.disable(logging.WARNING)  

## Imaging  library
from PIL import Image
from torchvision import transforms as tfms

## Basic libraries
import numpy as np
from tqdm.auto import tqdm
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import display
import shutil
import os

## For video display
from IPython.display import HTML
from base64 import b64encode


## Import the CLIP artifacts 
from transformers import CLIPTextModel, CLIPTokenizer
from diffusers import AutoencoderKL, UNet2DConditionModel, LMSDiscreteScheduler
from IPython.display import display, clear_output
import os



In [3]:
# os.mkdir("/kaggle/working/steps2")

### Setting CPU/GPU device

In [4]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

#### `load_image`

**Description:**

Loads an image from a specified path, converts it to RGB, and resizes it to a specified dimension.

**Parameters:**

- `p` (str): Path to the image file.
- `size` (tuple, optional): The dimensions to resize the image to. Default is `(512, 512)`.

**Returns:**

- `Image`: An image object in RGB format with the specified dimensions.

**Example Usage:**

```python
image = load_image("path/to/image.jpg", size=(512, 512))


In [5]:
## Helper functions
def load_image(p):
    '''
    Function to load images from a defined path
    '''
    return Image.open(p).convert('RGB').resize((512,512))




#### `pil_to_latents`

**Description:**

Converts a PIL image to a latent representation suitable for input into a VAE model.

**Parameters:**

- `image` (PIL.Image): The image to convert.

**Returns:**

- `Tensor`: The latent representation of the image.

**Example Usage:**

```python
latents = pil_to_latents(image)


In [6]:
def pil_to_latents(image):
    '''
    Function to convert image to latents
    '''
    init_image = tfms.ToTensor()(image).unsqueeze(0) * 2.0 - 1.0
    init_image = init_image.to(device="cuda", dtype=torch.float16) 
    init_latent_dist = vae.encode(init_image).latent_dist.sample() * 0.18215
    return init_latent_dist



#### `latents_to_pil`

**Description:**

Converts latents back into a PIL image, suitable for visualization and further processing.

**Parameters:**

- `latents` (Tensor): The latent representation to convert back to an image.

**Returns:**

- `List[Image]`: A list of image objects generated from the latent representations.

**Example Usage:**

```python
images = latents_to_pil(latents)
for img in images:
    img.show()


In [7]:
def latents_to_pil(latents):
    '''
    Function to convert latents to images
    '''
    latents = (1 / 0.18215) * latents
    with torch.no_grad():
        image = vae.decode(latents).sample
    image = (image / 2 + 0.5).clamp(0, 1)
    image = image.detach().cpu().permute(0, 2, 3, 1).numpy()
    images = (image * 255).round().astype("uint8")
    pil_images = [Image.fromarray(image) for image in images]
    return pil_images



#### `text_enc`

**Description:**

Encodes textual prompts into embeddings using a CLIP text model.

**Parameters:**

- `prompts` (List[str]): A list of textual prompts to encode.
- `maxlen` (int, optional): Maximum length of the encoded text. Defaults to the model's maximum length.

**Returns:**

- `Tensor`: The encoded text embeddings.

**Example Usage:**

```python
embeddings = text_enc(["A clear blue sky", "The London Bridge"])


In [8]:

def text_enc(prompts, maxlen=None):
    '''
    A function to take a texual promt and convert it into embeddings
    '''
    if maxlen is None: maxlen = tokenizer.model_max_length
    inp = tokenizer(prompts, padding="max_length", max_length=maxlen, truncation=True, return_tensors="pt") 
    return text_encoder(inp.input_ids.to("cuda"))[0].half()



#### `prompt_2_img`

**Description:**

Converts text prompts into images using a latent diffusion model.

**Parameters:**

- `prompts` (List[str]): Text prompts to convert into images.
- `g` (float): Guidance scale. Higher values enforce stronger adherence to the text prompt.
- `seed` (int): Random seed for generating images.
- `steps` (int): Number of diffusion steps.
- `dim` (int): Dimension of the generated images.
- `save_int` (bool): Whether to save intermediate images.

**Returns:**

- `List[Image]`: A list of generated image objects corresponding to the text prompts.

**Example Usage:**

```python
images = prompt_2_img(["A sunset over a mountain range", "A cat playing with a ball"], g=7.5, seed=42, steps=100, dim=512, save_int=True)
for img in images:
    img.show()


In [21]:
# def prompt_2_img(prompts, g=7.5, seed=100, steps=70, dim=512, save_int=True):
#     """
#     Diffusion process to convert prompt to image
#     """
    
#     # Defining batch size
#     bs = len(prompts) 
    
#     # Converting textual prompts to embedding
#     text = text_enc(prompts) 
    
#     # Adding an unconditional prompt , helps in the generation process
#     uncond =  text_enc([""] * bs, text.shape[1])
#     emb = torch.cat([uncond, text])
    
#     # Setting the seed
#     if seed: torch.manual_seed(seed)
    
#     # Initiating random noise
#     latents = torch.randn((bs, unet.in_channels, dim//8, dim//8))
    
#     # Setting number of steps in scheduler
#     scheduler.set_timesteps(steps)
    
#     # Adding noise to the latents 
#     latents = latents.to("cuda").half() * scheduler.init_noise_sigma

#     print("Processing text prompts:", prompts)
#     # Just before the loop starts:
#     print("Visualizing initial latents...")
#     latents_norm = torch.norm(latents.view(latents.shape[0], -1), dim=1).mean().item()
#     print(f"Initial Latents Norm: {latents_norm}")

#     # Iterating through defined steps
#     for i,ts in enumerate(tqdm(scheduler.timesteps)):
#         # We need to scale the i/p latents to match the variance
#         inp = scheduler.scale_model_input(torch.cat([latents] * 2), ts)
        
#         # Predicting noise residual using U-Net
#         with torch.no_grad(): u,t = unet(inp, ts, encoder_hidden_states=emb).sample.chunk(2)
            
#         # Performing Guidance
#         pred = u + g*(t-u)
        
#         # Conditioning  the latents
#         latents = scheduler.step(pred, ts, latents).prev_sample
        
#         # Inside your loop, after `latents` have been updated:
#         latents_norm = torch.norm(latents.view(latents.shape[0], -1), dim=1).mean().item()
#         print(f"Step {i+1}/{steps} Latents Norm: {latents_norm}")
        
#         from IPython.display import display, clear_output
#         if save_int and i%13==0: 
#             image_path = f'steps2/la_{i:04d}.jpeg'
#             latents_to_pil(latents)[0].save(image_path)
#             display(latents_to_pil(latents)[0])  # Display the new image

#     return latents_to_pil(latents)


import torch
from PIL import Image
import io

def prompt_2_img(prompts, g=7.5, seed=100, steps=70, dim=512, save_int=True):
    """
    Diffusion process to convert prompt to image, modified to yield images for Streamlit.
    """
    
    bs = len(prompts) 
    text = text_enc(prompts)
    uncond = text_enc([""] * bs, text.shape[1])
    emb = torch.cat([uncond, text])
    
    if seed:
        torch.manual_seed(seed)
    
    latents = torch.randn((bs, unet.in_channels, dim//8, dim//8))
    scheduler.set_timesteps(steps)
    latents = latents.to("cuda").half() * scheduler.init_noise_sigma

    for i, ts in enumerate(scheduler.timesteps):
        inp = scheduler.scale_model_input(torch.cat([latents] * 2), ts)
        with torch.no_grad():
            u, t = unet(inp, ts, encoder_hidden_states=emb).sample.chunk(2)
        pred = u + g*(t-u)
        latents = scheduler.step(pred, ts, latents).prev_sample

        if save_int and i % (steps // 10) == 0:  # Yield 10 images throughout the process
            image = latents_to_pil(latents)[0]
            buf = io.BytesIO()
            image.save(buf, format="JPEG")
            byte_im = buf.getvalue()
            yield byte_im  # Yield image in bytes format for Streamlit to display

    final_image = latents_to_pil(latents)
    final_buf = io.BytesIO()
    final_image[0].save(final_buf, format="JPEG")
    final_byte_im = final_buf.getvalue()
    yield final_byte_im  # Yield the final image

# Replace `display(latents_to_pil(latents)[0])` with `yield byte_im`


# Overview — The Diffusion Process

- The stable diffusion model takes the textual input and a seed. 
- The textual input is then passed through the CLIP model to generate textual embedding of size 77x768 and the seed is used to generate Gaussian noise of size 4x64x64 which becomes the first latent image representation.
- Next, the U-Net iteratively denoises the random latent image representations while conditioning on the text embeddings. 
- The output of the U-Net is predicted noise residual, which is then used to compute conditioned latents via a scheduler algorithm. 
- This process of denoising and text conditioning is repeated N times (We will use 50) to retrieve a better latent image representation. 
- Once this process is complete, the latent image representation (4x64x64) is decoded by the VAE decoder to retrieve the final output image (3x512x512).



In [10]:

## Initiating tokenizer and encoder.
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.float16)
text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.float16).to("cuda")

## Initiating the VAE
vae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="vae", torch_dtype=torch.float16).to("cuda")

## Initializing a scheduler and Setting number of sampling steps
scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
scheduler.set_timesteps(50)

## Initializing the U-Net model
unet = UNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="unet", torch_dtype=torch.float16).to("cuda")


tokenizer_config.json:   0%|          | 0.00/905 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/961k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/389 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.22M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/4.52k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.71G [00:00<?, ?B/s]

vae/config.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

vae/diffusion_pytorch_model.safetensors:   0%|          | 0.00/335M [00:00<?, ?B/s]

unet/config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

unet/diffusion_pytorch_model.safetensors:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

In [22]:
images = prompt_2_img(["Create a highly detailed and realistic image showing the neck of a person with fair skin. The neck should display a typical ringworm infection, characterized by a clearly visible, circular red rash with a slightly raised, scaly border and healthy-looking skin in the center. The person's hair and other features are not in focus, ensuring the emphasis remains on the neck and the detailed depiction of the skin condition"], save_int=True)
for img in images:display(img)


  latents = torch.randn((bs, unet.in_channels, dim//8, dim//8))


b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C\x00\x08\x06\x06\x07\x06\x05\x08\x07\x07\x07\t\t\x08\n\x0c\x14\r\x0c\x0b\x0b\x0c\x19\x12\x13\x0f\x14\x1d\x1a\x1f\x1e\x1d\x1a\x1c\x1c $.\' ",#\x1c\x1c(7),01444\x1f\'9=82<.342\xff\xdb\x00C\x01\t\t\t\x0c\x0b\x0c\x18\r\r\x182!\x1c!22222222222222222222222222222222222222222222222222\xff\xc0\x00\x11\x08\x02\x00\x02\x00\x03\x01"\x00\x02\x11\x01\x03\x11\x01\xff\xc4\x00\x1f\x00\x00\x01\x05\x01\x01\x01\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\xff\xc4\x00\xb5\x10\x00\x02\x01\x03\x03\x02\x04\x03\x05\x05\x04\x04\x00\x00\x01}\x01\x02\x03\x00\x04\x11\x05\x12!1A\x06\x13Qa\x07"q\x142\x81\x91\xa1\x08#B\xb1\xc1\x15R\xd1\xf0$3br\x82\t\n\x16\x17\x18\x19\x1a%&\'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz\x83\x84\x85\x86\x87\x88\x89\x8a\x92\x93\x94\x95\x96\x97\x98\x99\x9a\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xd2\xd3\

b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C\x00\x08\x06\x06\x07\x06\x05\x08\x07\x07\x07\t\t\x08\n\x0c\x14\r\x0c\x0b\x0b\x0c\x19\x12\x13\x0f\x14\x1d\x1a\x1f\x1e\x1d\x1a\x1c\x1c $.\' ",#\x1c\x1c(7),01444\x1f\'9=82<.342\xff\xdb\x00C\x01\t\t\t\x0c\x0b\x0c\x18\r\r\x182!\x1c!22222222222222222222222222222222222222222222222222\xff\xc0\x00\x11\x08\x02\x00\x02\x00\x03\x01"\x00\x02\x11\x01\x03\x11\x01\xff\xc4\x00\x1f\x00\x00\x01\x05\x01\x01\x01\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\xff\xc4\x00\xb5\x10\x00\x02\x01\x03\x03\x02\x04\x03\x05\x05\x04\x04\x00\x00\x01}\x01\x02\x03\x00\x04\x11\x05\x12!1A\x06\x13Qa\x07"q\x142\x81\x91\xa1\x08#B\xb1\xc1\x15R\xd1\xf0$3br\x82\t\n\x16\x17\x18\x19\x1a%&\'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz\x83\x84\x85\x86\x87\x88\x89\x8a\x92\x93\x94\x95\x96\x97\x98\x99\x9a\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xd2\xd3\

b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C\x00\x08\x06\x06\x07\x06\x05\x08\x07\x07\x07\t\t\x08\n\x0c\x14\r\x0c\x0b\x0b\x0c\x19\x12\x13\x0f\x14\x1d\x1a\x1f\x1e\x1d\x1a\x1c\x1c $.\' ",#\x1c\x1c(7),01444\x1f\'9=82<.342\xff\xdb\x00C\x01\t\t\t\x0c\x0b\x0c\x18\r\r\x182!\x1c!22222222222222222222222222222222222222222222222222\xff\xc0\x00\x11\x08\x02\x00\x02\x00\x03\x01"\x00\x02\x11\x01\x03\x11\x01\xff\xc4\x00\x1f\x00\x00\x01\x05\x01\x01\x01\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\xff\xc4\x00\xb5\x10\x00\x02\x01\x03\x03\x02\x04\x03\x05\x05\x04\x04\x00\x00\x01}\x01\x02\x03\x00\x04\x11\x05\x12!1A\x06\x13Qa\x07"q\x142\x81\x91\xa1\x08#B\xb1\xc1\x15R\xd1\xf0$3br\x82\t\n\x16\x17\x18\x19\x1a%&\'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz\x83\x84\x85\x86\x87\x88\x89\x8a\x92\x93\x94\x95\x96\x97\x98\x99\x9a\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xd2\xd3\

b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C\x00\x08\x06\x06\x07\x06\x05\x08\x07\x07\x07\t\t\x08\n\x0c\x14\r\x0c\x0b\x0b\x0c\x19\x12\x13\x0f\x14\x1d\x1a\x1f\x1e\x1d\x1a\x1c\x1c $.\' ",#\x1c\x1c(7),01444\x1f\'9=82<.342\xff\xdb\x00C\x01\t\t\t\x0c\x0b\x0c\x18\r\r\x182!\x1c!22222222222222222222222222222222222222222222222222\xff\xc0\x00\x11\x08\x02\x00\x02\x00\x03\x01"\x00\x02\x11\x01\x03\x11\x01\xff\xc4\x00\x1f\x00\x00\x01\x05\x01\x01\x01\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\xff\xc4\x00\xb5\x10\x00\x02\x01\x03\x03\x02\x04\x03\x05\x05\x04\x04\x00\x00\x01}\x01\x02\x03\x00\x04\x11\x05\x12!1A\x06\x13Qa\x07"q\x142\x81\x91\xa1\x08#B\xb1\xc1\x15R\xd1\xf0$3br\x82\t\n\x16\x17\x18\x19\x1a%&\'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz\x83\x84\x85\x86\x87\x88\x89\x8a\x92\x93\x94\x95\x96\x97\x98\x99\x9a\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xd2\xd3\

KeyboardInterrupt: 

In [13]:
import os
import matplotlib.pyplot as plt

def visualize_steps(folder='/kaggle/working/steps2'):
    # Get list of all files in the folder
    files = os.listdir(folder)
    # Filter only image files
    image_files = [file for file in files if file.endswith('.jpeg')]
    num_steps = len(image_files)
    count = 0

    fig, axs = plt.subplots(1, num_steps, figsize=(20, 10))
    for ax, img_file in zip(axs, sorted(image_files)):  # Sort to ensure proper ordering
        img_path = os.path.join(folder, img_file)
        img = plt.imread(img_path)
        ax.imshow(img)
        ax.axis('off')  # Turn off axis numbering and ticks
        
        count += 1
        # Add arrows (except for the last image)
        if count < 6:
            ax.arrow(0.6, 0.5, 0.5, 0, head_width=0.1, head_length=0.1, fc='k', ec='k',
                     transform=ax.transAxes, clip_on=False)

    plt.show()

# Call the function to display the visualization
visualize_steps()


FileNotFoundError: [Errno 2] No such file or directory: '/kaggle/working/steps2'

In [16]:
import os
os.mkdir("/kaggle/working/steps2")

In [15]:
!pip install streamlit
!npm install localtunnel

Collecting streamlit
  Downloading streamlit-1.33.0-py2.py3-none-any.whl.metadata (8.5 kB)
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.8.1b0-py2.py3-none-any.whl.metadata (3.9 kB)
Collecting watchdog>=2.1.5 (from streamlit)
  Downloading watchdog-4.0.0-py3-none-manylinux2014_x86_64.whl.metadata (37 kB)
Downloading streamlit-1.33.0-py2.py3-none-any.whl (8.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.1/8.1 MB[0m [31m66.5 MB/s[0m eta [36m0:00:00[0m:00:01[0m0:01[0m
[?25hDownloading pydeck-0.8.1b0-py2.py3-none-any.whl (4.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.8/4.8 MB[0m [31m82.0 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[?25hDownloading watchdog-4.0.0-py3-none-manylinux2014_x86_64.whl (82 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m83.0/83.0 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: watchdog, pydeck, streamlit
Successfully installed pydec

In [32]:
%%writefile app.py

import torch, logging

import torch
from PIL import Image
import io

## disable warnings
logging.disable(logging.WARNING)  

## Imaging  library
from PIL import Image
from torchvision import transforms as tfms

## Basic libraries
import numpy as np
from tqdm.auto import tqdm
import matplotlib.pyplot as plt
from IPython.display import display
import shutil
import os

## For video display
from IPython.display import HTML
from base64 import b64encode


## Import the CLIP artifacts 
from transformers import CLIPTextModel, CLIPTokenizer
from diffusers import AutoencoderKL, UNet2DConditionModel, LMSDiscreteScheduler
from IPython.display import display, clear_output
import os

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## Helper functions
def load_image(p):
    '''
    Function to load images from a defined path
    '''
    return Image.open(p).convert('RGB').resize((512,512))

def pil_to_latents(image):
    '''
    Function to convert image to latents
    '''
    init_image = tfms.ToTensor()(image).unsqueeze(0) * 2.0 - 1.0
    init_image = init_image.to(device="cuda", dtype=torch.float16) 
    init_latent_dist = vae.encode(init_image).latent_dist.sample() * 0.18215
    return init_latent_dist

def latents_to_pil(latents):
    '''
    Function to convert latents to images
    '''
    latents = (1 / 0.18215) * latents
    with torch.no_grad():
        image = vae.decode(latents).sample
    image = (image / 2 + 0.5).clamp(0, 1)
    image = image.detach().cpu().permute(0, 2, 3, 1).numpy()
    images = (image * 255).round().astype("uint8")
    pil_images = [Image.fromarray(image) for image in images]
    return pil_images


def text_enc(prompts, maxlen=None):
    '''
    A function to take a texual promt and convert it into embeddings
    '''
    if maxlen is None: maxlen = tokenizer.model_max_length
    inp = tokenizer(prompts, padding="max_length", max_length=maxlen, truncation=True, return_tensors="pt") 
    return text_encoder(inp.input_ids.to("cuda"))[0].half()

def prompt_2_img(prompts, g=7.5, seed=100, steps=70, dim=512, save_int=True):
    """
    Diffusion process to convert prompt to image, modified to yield images for Streamlit.
    """
    
    bs = len(prompts) 
    text = text_enc(prompts)
    uncond = text_enc([""] * bs, text.shape[1])
    emb = torch.cat([uncond, text])
    
    if seed:
        torch.manual_seed(seed)
    
    latents = torch.randn((bs, unet.in_channels, dim//8, dim//8))
    scheduler.set_timesteps(steps)
    latents = latents.to("cuda").half() * scheduler.init_noise_sigma

    for i, ts in enumerate(scheduler.timesteps):
        inp = scheduler.scale_model_input(torch.cat([latents] * 2), ts)
        with torch.no_grad():
            u, t = unet(inp, ts, encoder_hidden_states=emb).sample.chunk(2)
        pred = u + g*(t-u)
        latents = scheduler.step(pred, ts, latents).prev_sample

        if save_int and i % (steps // 70) == 0:  # Yield 10 images throughout the process
            image = latents_to_pil(latents)[0]
            buf = io.BytesIO()
            image.save(buf, format="JPEG")
            byte_im = buf.getvalue()
            yield byte_im  # Yield image in bytes format for Streamlit to display

    final_image = latents_to_pil(latents)
    final_buf = io.BytesIO()
    final_image[0].save(final_buf, format="JPEG")
    final_byte_im = final_buf.getvalue()
    yield final_byte_im  # Yield the final image

## Initiating tokenizer and encoder.
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.float16)
text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.float16).to("cuda")

## Initiating the VAE
vae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="vae", torch_dtype=torch.float16).to("cuda")

## Initializing a scheduler and Setting number of sampling steps
scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
scheduler.set_timesteps(50)

## Initializing the U-Net model
unet = UNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="unet", torch_dtype=torch.float16).to("cuda")


# import streamlit as st
# import torch
# from torchvision.utils import make_grid
# from PIL import Image
# import io

# st.title('Image Generation from Text Prompts')


# import streamlit as st

# prompt = st.text_input('Enter a text prompt:', 'Example prompt')
# if st.button('Generate Images'):
#     image_placeholder = st.empty()
#     progress_bar = st.progress(0)

#     generator = prompt_2_img([prompt], save_int=True)
#     total_steps = 70  # total number of steps you expect
#     for step, image_bytes in enumerate(generator):
#         image_placeholder.image(image_bytes)
#         progress_bar.progress((step + 1) / total_steps)

# import streamlit as st
# from PIL import Image
# import io

# # Title of the app
# st.title('Skin Rash Generator')

# st.markdown("""
# Welcome to the Skin Rash Generator! Use the commands below to generate images of common skin rashes based on your preferences.
# """)

# # Dropdown menus for user input
# skin_rash_type = st.selectbox('Skin Rash Type:', ['Eczema', 'Ringworm', 'Dermatitis'])
# skin_color = st.selectbox('Skin Color:', ['Fair', 'Brown', 'Black'])
# affected_area = st.selectbox('Affected Area:', ['Chest', 'Neck', 'Hand'])

# # Example command displayed
# st.markdown('Example Command: "Generate a few images of a Ringworm type of rash at the back of the neck area on fair skin."')

# # Generate button
# if st.button('Generate Images'):
#     # Create prompt from user input
#     prompt = f"Generate images of {skin_rash_type} type of rash on {affected_area} of {skin_color} skin."
    
#     # Placeholder for displaying images
#     image_placeholder = st.empty()
    
#     # Progress bar setup
#     progress_bar = st.progress(0)
    
#     # Assuming prompt_2_img is already imported and appropriately modified to handle the prompt
#     generator = prompt_2_img([prompt], save_int=True)
#     total_steps = 70  # Total number of steps expected in the generation process
    
#     # Display generated images
#     for step, image_bytes in enumerate(generator):
#         image_placeholder.image(image_bytes)
#         progress_bar.progress((step + 1) / total_steps+1)


# import streamlit as st

# # Custom CSS to inject for better control over the Streamlit layout
# st.markdown(
#     """
#     <style>
#     .big-font {
#         font-size:20px !important;
#         font-weight: bold;
#     }
#     .image-gen {
#         padding-top: 10px;
#         padding-bottom: 30px;
#     }
#     </style>
#     """, unsafe_allow_html=True)

# # Title of the app
# st.title('🔬 Skin Rash Generator')

# # Welcome message with enhanced markdown
# st.markdown("""
# Welcome to the Skin Rash Generator! Use the commands below to generate images of common skin rashes based on your preferences.
# """, unsafe_allow_html=True)

# # Layout with columns for the dropdowns
# col1, col2, col3 = st.columns(3)
# with col1:
#     skin_rash_type = st.selectbox('Skin Rash Type:', ['Eczema', 'Ringworm', 'Dermatitis'], index=1, help='Select the type of skin rash you want to generate.')

# with col2:
#     skin_color = st.selectbox('Skin Color:', ['Fair', 'Brown', 'Black'], index=1, help='Choose the skin color for the generated image.')

# with col3:
#     affected_area = st.selectbox('Affected Area:', ['Chest', 'Neck', 'Hand'], index=1, help='Select the part of the body affected by the rash.')

# # Button to generate images with a custom class for additional padding
# if st.button('🖼️ Generate Images', help='Click to generate images based on your selections'):
#     prompt = f"Generate images of {skin_rash_type} type of rash on {affected_area} of {skin_color} skin."
#     # Display the prompt being generated
#     st.markdown(f"### Generating Images For: \n`{prompt}`")

#     image_placeholder = st.empty()
#     progress_bar = st.progress(0)
    
#     # Assuming prompt_2_img is already imported and appropriately modified to handle the prompt
#     generator = prompt_2_img([prompt], save_int=True)
#     total_steps = 70  # Total number of steps expected in the generation process
    
#     for step, image_bytes in enumerate(generator):
#         with st.spinner('Generating images...'):
#             image_placeholder.image(image_bytes, use_column_width=True)
#             progress_bar.progress((step + 1) / total_steps)

# # Adding a footer
# st.markdown("""
# ---
# <p style='text-align: center;'>Powered by AI | Streamlit App</p>
# """, unsafe_allow_html=True)

import streamlit as st

# Custom CSS to inject for better control over the Streamlit layout
st.markdown(
    """
    <style>
    .big-font {
        font-size:20px !important;
        font-weight: bold;
    }
    .image-gen {
        padding-top: 10px;
        padding-bottom: 30px;
    }
    </style>
    """, unsafe_allow_html=True)

# Title of the app
st.title('🔬 Skin Rash Generator')

# Welcome message with enhanced markdown
st.markdown("""
Welcome to the Skin Rash Generator! Use the commands below to generate images of common skin rashes based on your preferences.
""", unsafe_allow_html=True)

# Layout with columns for the dropdowns
col1, col2, col3 = st.columns(3)
with col1:
    skin_rash_type = st.selectbox('Skin Rash Type:', ['Eczema', 'Ringworm', 'Dermatitis'], index=1, help='Select the type of skin rash you want to generate.')

with col2:
    skin_color = st.selectbox('Skin Color:', ['Fair', 'Brown', 'Black'], index=1, help='Choose the skin color for the generated image.')

with col3:
    affected_area = st.selectbox('Affected Area:', ['Chest', 'Neck', 'Hand'], index=1, help='Select the part of the body affected by the rash.')

# Button to generate images with a custom class for additional padding
if st.button('🖼️ Generate Images', help='Click to generate images based on your selections'):
    prompt = f"Generate images of {skin_rash_type} type of rash on {affected_area} of {skin_color} skin."
    
    # Display the prompt being generated
    st.markdown(f"### Generating Images For: \n`{prompt}`")
    
    image_placeholder = st.empty()
    progress_bar = st.progress(0)
    
    # Assuming prompt_2_img is already imported and appropriately modified to handle the prompt
    prompt = "Create a highly detailed and realistic image showing the neck of a person with really exteremely dark brown skin. The neck should display a typical ringworm infection, characterized by a clearly visible, circular red rash with a slightly raised, scaly border and healthy-looking skin in the center. The emphasis remains on the neck and the detailed depiction of the skin condition"
    generator = prompt_2_img([prompt], save_int=True)
    total_steps = 70  # Total number of steps expected in the generation process
    
    for step, image_bytes in enumerate(generator):
        with st.spinner('Generating images...'):
            print("hello")
            print((step + 1) / total_steps)
            progress_bar.progress((step + 1) / total_steps)
            image_placeholder.image(image_bytes, use_column_width=True)N


Overwriting app.py


In [33]:
# Your public ip is the password to the localtunnel
!curl ipv4.icanhazip.com

35.197.122.150


In [None]:
!streamlit run app.py &>./logs.txt & npx localtunnel --port 8501

your url is: https://swift-parents-remain.loca.lt


In [None]:
# add code for rash statment directly. ..  like palm could be referred in different ways . process that and