# __Transforming Images with Text Prompts Using the Stable Diffusion Pipeline__

## __Problem Statement__
Create a system using the Stable Diffusion model from the diffusers library to transform images based on textual prompts, including both positive and negative descriptors.

The system should be capable of loading images from URLs or local paths and applying AI-driven modifications to these images, with a focus on adjusting various diffusion strengths to achieve different visual effects.

##**Steps to Perform:**

- Step 1: Import Necessary Libraries
- Step 2: Load Pretrained Model
- Step 3: Define Helper Functions
- Step 4: Load and Display Image
- Step 5: Define Prompts and Transform Image
- Step 6: Experiment with Different Diffusion Strengths

### __Step 1: Import Necessary Libraries__

- Import torch, requests, and Image from PIL for image processing and model loading
- Import StableDiffusionDepth2ImgPipeline from diffusers for the depth-to-image pipeline

**Note:**

- Install the diffusers, transformers, scipy, ftfy, and accelerate libraries using pip
- These libraries are essential for the image transformation process









In [1]:
# Install necessary libraries
!pip install --quiet --upgrade diffusers transformers scipy ftfy
!pip install --quiet --upgrade accelerate

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.0/62.0 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.0/10.0 MB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m37.6/37.6 MB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.8/44.8 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gensim 4.3.3 requires scipy<1.14.0,>=1.7.0, but you have scipy 1.15.2 which is incompatible.[0m[31m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m342.1/342.1 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00

In [3]:
# Import necessary libraries
import torch
import requests
from PIL import Image
from diffusers import StableDiffusionDepth2ImgPipeline
import urllib.parse as parse
import os


### __Step 2: Load Pretrained Model__
- Set the model ID for __stabilityai/stable-diffusion-2-depth__
- Load the pretrained model using __StableDiffusionDepth2ImgPipeline.from_pretrained__ with the specified model ID and data type

In [4]:
# Load the model in mixed precision mode (fp16) for faster inference and reduced VRAM usage
model_id = "stabilityai/stable-diffusion-2-depth"
pipeline = StableDiffusionDepth2ImgPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipeline.to("cuda")  # Move the model to GPU for faster performance

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model_index.json:   0%|          | 0.00/545 [00:00<?, ?B/s]

Fetching 15 files:   0%|          | 0/15 [00:00<?, ?it/s]

model.safetensors:   0%|          | 0.00/490M [00:00<?, ?B/s]

scheduler%2Fscheduler_config.json:   0%|          | 0.00/346 [00:00<?, ?B/s]

(…)ure_extractor%2Fpreprocessor_config.json:   0%|          | 0.00/382 [00:00<?, ?B/s]

tokenizer%2Fspecial_tokens_map.json:   0%|          | 0.00/460 [00:00<?, ?B/s]

depth_estimator%2Fconfig.json:   0%|          | 0.00/9.96k [00:00<?, ?B/s]

tokenizer%2Fmerges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

text_encoder%2Fconfig.json:   0%|          | 0.00/732 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.36G [00:00<?, ?B/s]

tokenizer%2Fvocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

tokenizer%2Ftokenizer_config.json:   0%|          | 0.00/923 [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/3.46G [00:00<?, ?B/s]

vae%2Fconfig.json:   0%|          | 0.00/716 [00:00<?, ?B/s]

unet%2Fconfig.json:   0%|          | 0.00/1.07k [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/335M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

StableDiffusionDepth2ImgPipeline {
  "_class_name": "StableDiffusionDepth2ImgPipeline",
  "_diffusers_version": "0.32.2",
  "_name_or_path": "stabilityai/stable-diffusion-2-depth",
  "depth_estimator": [
    "transformers",
    "DPTForDepthEstimation"
  ],
  "feature_extractor": [
    "transformers",
    "DPTImageProcessor"
  ],
  "scheduler": [
    "diffusers",
    "PNDMScheduler"
  ],
  "text_encoder": [
    "transformers",
    "CLIPTextModel"
  ],
  "tokenizer": [
    "transformers",
    "CLIPTokenizer"
  ],
  "unet": [
    "diffusers",
    "UNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}

### __Step 3: Define Helper Functions__
- Define __check_url__ to verify if a given string is a valid URL
- Define __load_image__ to load an image from either a URL or a local file path

In [5]:
# Function to check if a string is a URL
def check_url(string):
    try:
        result = parse.urlparse(string)
        return all([result.scheme, result.netloc, result.path])
    except:
        return False

# Function to load an image from a URL or local path
def load_image(image_path):
    if check_url(image_path):
        return Image.open(requests.get(image_path, stream=True).raw)
    elif os.path.exists(image_path):
        return Image.open(image_path)


### __Step 4: Load and Display Image__
- Load an external image using its URL

In [6]:
# Load the image from URL and resize to 512x512
img = load_image("https://images.unsplash.com/photo-1465101162946-4377e57745c3?q=80&w=3878&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D")
img = img.resize((512, 512))


### __Step 5: Define Prompts and Transform Image__
- Define a positive prompt describing the desired transformation
- Transform the image based on the positive prompt
- Define a negative prompt to specify what should not be included in the transformed image
- Transform the image using both positive and negative prompts


In [9]:
# Define the positive and negative prompts
prompt = "A serene sunset over a calm lake with a couple"
n_prompt = "no buildings, no people"

# Transform the image using both prompts with a specified strength
transformed_img = pipeline(prompt=prompt, image=img, negative_prompt=n_prompt, strength=0.7, num_inference_steps=30).images[0]

# Save the transformed image
transformed_img.save("transformed_image.png")


  0%|          | 0/21 [00:00<?, ?it/s]

### __Step 6: Experiment with Different Diffusion Strengths__
- Loop through different values of diffusion strength
- For each strength value, transform the image using the specified positive and negative prompts

In [10]:
# Experiment with different diffusion strengths
for strength in [0.1, 0.4, 1.0]:
    transformed_img = pipeline(prompt=prompt, image=img, negative_prompt=n_prompt, strength=strength, num_inference_steps=30).images[0]

    # Save the image after transformation at each strength level
    transformed_img.save(f"transformed_img_strength_{strength}.png")


  0%|          | 0/3 [00:00<?, ?it/s]

  0%|          | 0/12 [00:00<?, ?it/s]

  0%|          | 0/30 [00:00<?, ?it/s]

###**Conclusion**
- The code demonstrates the use of a depth-to-image pipeline, suggesting that it can add depth or modify existing images based on the prompts provided.
- It shows how to work with images from URLs and local paths, and how to apply AI-driven transformations to these images based on textual descriptions.
- The final part of the code experiments with various strengths of transformation, offering a way to see how different levels of prompt influence change the resulting image.