# Install Required Libraries
In this section, we install the required libraries for Generative AI, focusing on image generation using Stable Diffusion.


In [None]:
!pip install diffusers transformers accelerate




# Introduction to Generative AI
Generative AI involves creating new content, such as images, text, or audio, using AI models. For image generation:
- **Generator**: Creates new images based on input (e.g., a text prompt).
- **Discriminator (optional)**: Evaluates the generated content (mainly used in GANs).

In this notebook, we'll use a pre-trained Stable Diffusion model to generate images from text prompts.


# Import Libraries
Import the necessary libraries for using the Stable Diffusion model.


In [None]:
from diffusers import StableDiffusionPipeline
import torch
from PIL import Image


The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

# Load a Pre-trained Model
Here we load a pre-trained Stable Diffusion model from the `diffusers` library. This model generates images based on a given text prompt.


In [None]:
# Load the Stable Diffusion pipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe = pipe.to("cuda" if torch.cuda.is_available() else "cpu")

print("Model loaded successfully!")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model_index.json:   0%|          | 0.00/541 [00:00<?, ?B/s]

Fetching 15 files:   0%|          | 0/15 [00:00<?, ?it/s]

(…)ature_extractor/preprocessor_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

text_encoder/config.json:   0%|          | 0.00/617 [00:00<?, ?B/s]

safety_checker/config.json:   0%|          | 0.00/4.72k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/492M [00:00<?, ?B/s]

tokenizer/special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

scheduler/scheduler_config.json:   0%|          | 0.00/308 [00:00<?, ?B/s]

tokenizer/tokenizer_config.json:   0%|          | 0.00/806 [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

tokenizer/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

vae/config.json:   0%|          | 0.00/547 [00:00<?, ?B/s]

unet/config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/335M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

Model loaded successfully!


# Generate an Image
Provide a text prompt to the Stable Diffusion model to generate an image.


In [None]:
# Text prompt
prompt = input("Enter prompt : ")

# Generate the image
image = pipe(prompt).images[0]

# Display the image
image.show()


Enter prompt : A futuristic cityscape at sunset, with flying cars and neon lights


  0%|          | 0/50 [00:00<?, ?it/s]

# Save the Generated Image
Save the generated image to the local directory for future use.


In [None]:
# Save the image
image.save("generated_image.png")
print("Image saved as 'generated_image.png'")


Image saved as 'generated_image.png'


# Generator and Discriminator Concepts
- **Generator**: In this case, the Stable Diffusion model acts as the generator, transforming a text prompt into an image.
- **Discriminator**: While not explicitly used here, in other models like GANs (Generative Adversarial Networks), the discriminator evaluates the quality of generated images.

This distinction is important in understanding how Generative AI works:
1. The generator creates content.
2. The discriminator evaluates and provides feedback (if applicable).


# Install Libraries for Discriminator
We need `torchvision` to use a pre-trained ResNet-50 model for evaluating or classifying the generated image.


In [None]:
!pip install torchvision




# Load a Pre-trained Discriminator
Here, we load a ResNet-50 model pre-trained on ImageNet. This model acts as the discriminator to classify the generated image.


In [None]:
import torchvision.transforms as transforms
from torchvision.models import resnet50
import torch.nn as nn

# Load ResNet-50 pre-trained on ImageNet
discriminator = resnet50(pretrained=True)
discriminator.eval()

# Transformation for input image
transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize image to 224x224 for ResNet
    transforms.ToTensor(),          # Convert image to tensor
    transforms.Normalize(           # Normalize using ImageNet mean and std
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

print("Discriminator (ResNet-50) loaded successfully!")


Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
100%|██████████| 97.8M/97.8M [00:01<00:00, 100MB/s]


Discriminator (ResNet-50) loaded successfully!


# Evaluate the Generated Image
We use the pre-trained ResNet-50 to classify the generated image and output the top predictions. This simulates how a discriminator evaluates or labels the generated image.


In [None]:
# Convert PIL image to tensor for ResNet
input_image = transform(image).unsqueeze(0)  # Add batch dimension

# Pass the image through the discriminator
with torch.no_grad():
    output = discriminator(input_image)

# Load ImageNet class labels
!wget -qO imagenet_classes.txt https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json
import json
with open("imagenet_classes.txt", "r") as f:
    labels = json.load(f)

# Get top 5 predictions
probabilities = torch.nn.functional.softmax(output[0], dim=0)
top5_prob, top5_catid = torch.topk(probabilities, 5)

print("Top 5 Predictions for the Generated Image:")
for i in range(top5_prob.size(0)):
    print(f"{labels[top5_catid[i]]}: {top5_prob[i].item():.4f}")


Top 5 Predictions for the Generated Image:
fountain: 0.5196
stage: 0.1648
monitor: 0.0281
limousine: 0.0208
front curtain: 0.0142
