# Final Project - Synthetic Image Generation

## Problem Statement

Generate common skin rashes based on textual commands varying the following three factors:
- Skin rash type (e.g., eczema, ringworm, dermatitis)
- Skin color (e.g., fair, brown, black)
- Affected area (e.g., chest, neck, hand)

An example command will be like "generate a few images of a ringworm type of rash at the back of the neck area on a fair skin". Build an interface and a deep generative model to process such queries and visualize the output. Please deal with 3-4 rash types that you are not uncomfortable to look at.

Hint: Explore using latent diffusion model, fine-tuning its CLIP model component. Make sure to collect some training images from the internet.

This is a group project, so you will collaborate with another student. You are going to present on Wednesday, the 24th between noon-3pm. Please prepare about a dozen slides. This is your final, so want a robust approach and an implementation.

Prof. Das

## Approach Explanation


To approach the problem of synthetic image generation for common skin rashes, we've implemented a solution using PyTorch and Streamlit. Here's an overview of our approach:

1. **Data Preprocessing and Model Selection**:
   - We load necessary libraries and models, including CLIP (Contrastive Language-Image Pre-training) for text encoding, a Variational Autoencoder (VAE) for image generation, and a U-Net model for diffusion process.

2. **Interface Design**:
   - Utilizing Streamlit, we create an intuitive interface for users to input their preferences for generating skin rash images. Users can select the type of skin rash, skin color, and affected area through dropdown menus.

3. **Image Generation**:
   - Upon clicking the "Generate Images" button, the selected preferences are translated into a textual command.
   - We employ a diffusion process to convert the textual prompt into images. This process involves multiple steps, during which the latent representations of images are iteratively adjusted to match the provided prompt.
   - Progress is displayed using a progress bar to keep users informed about the generation process.

4. **Results Presentation**:
   - Throughout the generation process, interim images are displayed to the user to provide real-time feedback.
   - Once the generation is complete, the final image(s) are displayed along with the detailed description of the prompt for which it was generated.

5. **Future Enhancements**:
   - We aim to further optimize the generation process for better efficiency and quality.
   - Additionally, we plan to incorporate more diverse skin rash types and expand the range of affected areas and skin colors for increased flexibility.


In [18]:
!pip install diffusers



In [19]:
!pip install streamlit
!npm install localtunnel

[K[?25hm#########[0m[100;90m.........[0m] \ idealTree: [32;40mtiming[0m [35midealTree[0m Completed in 191ms[0m[K
up to date, audited 23 packages in 726ms

3 packages are looking for funding
  run `npm fund` for details

2 [33m[1mmoderate[22m[39m severity vulnerabilities

To address all issues (including breaking changes), run:
  npm audit fix --force

Run `npm audit` for details.


In [165]:
%%writefile app.py
import re
import torch, logging

import torch
from PIL import Image
import io

## disable warnings
logging.disable(logging.WARNING)  

## Imaging  library
from PIL import Image
from torchvision import transforms as tfms

## Basic libraries
import numpy as np
from tqdm.auto import tqdm
import matplotlib.pyplot as plt
from IPython.display import display
import shutil
import os

## For video display
from IPython.display import HTML
from base64 import b64encode


## Import the CLIP artifacts 
from transformers import CLIPTextModel, CLIPTokenizer
from diffusers import AutoencoderKL, UNet2DConditionModel, LMSDiscreteScheduler
from IPython.display import display, clear_output
import os

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def extract_values(text):
    # Regex patterns to extract skin color, rash type, and affected body part
    colors = r"(fair|dark|brown|black|white)"
    rash_types = r"(eczema|ringworm|dermatitis)"
    body_parts = r"(hand|chest|neck|face|leg|arm)"
    
    # Search text for matches
    color_match = re.search(colors, text, re.IGNORECASE)
    rash_match = re.search(rash_types, text, re.IGNORECASE)
    body_part_match = re.search(body_parts, text, re.IGNORECASE)
    
    # Extract matched values or use defaults
    color = color_match.group(0) if color_match else 'fair'
    rash_type = rash_match.group(0) if rash_match else 'eczema'
    body_part = body_part_match.group(0) if body_part_match else 'hand'
    
    return color, rash_type, body_part

def magic_prompt(color, rash_type, body_part):
    # Construct the detailed prompt using the extracted values
    return f"Create a highly detailed and realistic image showing the {body_part} of a person with {color} skin. The {body_part} should display a typical {rash_type} infection, characterized by a clearly visible rash."


## Helper functions
def load_image(p):
    '''
    Function to load images from a defined path
    '''
    return Image.open(p).convert('RGB').resize((512,512))

def pil_to_latents(image):
    '''
    Function to convert image to latents
    '''
    init_image = tfms.ToTensor()(image).unsqueeze(0) * 2.0 - 1.0
    init_image = init_image.to(device="cuda", dtype=torch.float16) 
    init_latent_dist = vae.encode(init_image).latent_dist.sample() * 0.18215
    return init_latent_dist

def latents_to_pil(latents):
    '''
    Function to convert latents to images
    '''
    latents = (1 / 0.18215) * latents
    with torch.no_grad():
        image = vae.decode(latents).sample
    image = (image / 2 + 0.5).clamp(0, 1)
    image = image.detach().cpu().permute(0, 2, 3, 1).numpy()
    images = (image * 255).round().astype("uint8")
    pil_images = [Image.fromarray(image) for image in images]
    return pil_images


def text_enc(prompts, maxlen=None):
    '''
    A function to take a texual promt and convert it into embeddings
    '''
    if maxlen is None: maxlen = tokenizer.model_max_length
    inp = tokenizer(prompts, padding="max_length", max_length=maxlen, truncation=True, return_tensors="pt") 
    return text_encoder(inp.input_ids.to("cuda"))[0].half()

def prompt_2_img(prompts, g=7.5, seed=100, steps=70, dim=512, save_int=True):
    """
    Diffusion process to convert prompt to image, modified to yield images for Streamlit.
    """
    
    bs = len(prompts) 
    text = text_enc(prompts)
    uncond = text_enc([""] * bs, text.shape[1])
    emb = torch.cat([uncond, text])
    
    if seed:
        torch.manual_seed(seed)
    
    latents = torch.randn((bs, unet.in_channels, dim//8, dim//8))
    scheduler.set_timesteps(steps)
    latents = latents.to("cuda").half() * scheduler.init_noise_sigma

    for i, ts in enumerate(scheduler.timesteps):
        inp = scheduler.scale_model_input(torch.cat([latents] * 2), ts)
        with torch.no_grad():
            u, t = unet(inp, ts, encoder_hidden_states=emb).sample.chunk(2)
        pred = u + g*(t-u)
        latents = scheduler.step(pred, ts, latents).prev_sample

        if save_int and i % (steps // 70) == 0:  # Yield 10 images throughout the process
            image = latents_to_pil(latents)[0]
            buf = io.BytesIO()
            image.save(buf, format="JPEG")
            byte_im = buf.getvalue()
            yield byte_im  # Yield image in bytes format for Streamlit to display

    final_image = latents_to_pil(latents)
    final_buf = io.BytesIO()
    final_image[0].save(final_buf, format="JPEG")
    final_byte_im = final_buf.getvalue()
    yield final_byte_im  # Yield the final image

## Initiating tokenizer and encoder.
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.float16)
text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.float16).to("cuda")

## Initiating the VAE
vae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="vae", torch_dtype=torch.float16).to("cuda")

## Initializing a scheduler and Setting number of sampling steps
scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
scheduler.set_timesteps(50)

## Initializing the U-Net model
unet = UNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="unet", torch_dtype=torch.float16).to("cuda")


import streamlit as st

# Custom CSS to inject for better control over the Streamlit layout
st.markdown(
    """
    <style>
    .big-font {
        font-size:20px !important;
        font-weight: bold;
    }
    .image-gen {
        padding-top: 10px;
        padding-bottom: 30px;
    }
    </style>
    """, unsafe_allow_html=True)

# Title of the app
st.title('🔬 Skin Rash Generator')

# Welcome message with enhanced markdown
st.markdown("""
Welcome to the Skin Rash Generator! Use the commands below to generate images of common skin rashes based on your preferences.
""", unsafe_allow_html=True)

# Assume other imports and function definitions (extract_values, magic_prompt) are already included as discussed

# Text input for user description
description_input = st.text_input('Describe the rash:', 'Type a detailed description of the skin rash including skin color, rash type, and affected area.', help='Enter details like skin color, type of rash, and body part affected.')

# Button to trigger image generation
if st.button('🖼️ Generate Images'):
    image_placeholder = st.empty()
    progress_bar = st.progress(0)

    # Extract values from the user's description
    color, rash_type, body_part = extract_values(description_input)
    
    st.write("Extracting values from the description. Replacing with default values if not found.")
    extracted_values = f"Skin Color - {color}, Rash Type - {rash_type}, Body Part - {body_part}"
    st.write("Extracted values:", extracted_values)
    
    # Generate detailed prompt based on extracted values
    detailed_prompt = magic_prompt(color, rash_type, body_part)

    generator_original = prompt_2_img([description_input], save_int=True)
    generator_new = prompt_2_img([detailed_prompt], save_int=True)
    total_steps = 70  # Total number of steps expected in the generation process

    # Set up layout for images and progress bar
    col1, col2 = st.columns(2)
    with col1:
        st.write("Original Image:")
        original_placeholder = st.empty()
    with col2:
        st.write("Enhanced Prompt Image:")
        new_placeholder = st.empty()
        
    for step, image_bytes in enumerate(generator_original):
        with st.spinner('Generating images...'):
            pg = min((step) / total_steps, 100)
            progress_text = f"Generating Image, Progress: {int(pg*100)}%"
            progress_bar.progress(pg, text=progress_text)
            if step%13 == 0:
                original_placeholder.image(image_bytes, use_column_width=True)

    for step, image_bytes in enumerate(generator_new):
        with st.spinner('Generating images with new prompt...'):
            pg = min((step) / total_steps, 100)
            progress_text = f"Generating Image, Progress: {int(pg*100)}%"
            progress_bar.progress(pg, text=progress_text)
            if step%13 == 0:
                new_placeholder.image(image_bytes, use_column_width=True)



Overwriting app.py


In [166]:
# Your public ip is the password to the localtunnel
!curl ipv4.icanhazip.com

34.90.55.162


In [None]:
!streamlit run app.py &>./logs.txt & npx localtunnel --port 8501

your url is: https://kind-cooks-fall.loca.lt


In [158]:
# Make image of rash for hand area fair skin eczema

In [99]:
# for i in range(70):
#     print(int((i+1)*100/70))