Skip to content

This repo involves generating images using a diffusion model, with a specific focus on the Stable Diffusion inpainting pipeline. The primary goal is to transform random noise or partially masked images into coherent and visually appealing outputs based on textual prompts.

Notifications You must be signed in to change notification settings

Rifat004/Image_Generation_using_Diffusion_Model

Repository files navigation

Image Generation Using Diffusion Model

This task involves generating images using a diffusion model, with a specific focus on the Stable Diffusion inpainting pipeline. The primary goal is to transform random noise or partially masked images into coherent and visually appealing outputs based on textual prompts. The notebook provided outlines a comprehensive process from setting up the environment to experimenting with different model parameters.

  • Data Loading and Preprocessing: I loaded an image from the movie "Dreamer" (2005) and defined bounding boxes to specify the region of interest. This region was used to generate binary masks that guide the inpainting process. Using the YOLO model for segmentation, I created masks for the selected area, visualized, and saved them. These masks Ire crucial for the inpainting process, providing the model with information on which parts of the image to modify.

  • Model Initialization: I initialized the Stable Diffusion inpainting pipeline from Hugging Face's diffusers library, leveraging GPU acceleration for efficiency.

Image Generation:

The image generation process began with a basic prompt, "a realistic green dragon," applied to the masked image. I used a predefined number of inference steps and logged the results using Comet for tracking and comparison. Subsequent experiments involved adjusting various hyperparameters:

  • Number of Inference Steps: I varied the number of diffusion steps to observe how it impacted the quality and detail of the generated images. For instance, increasing the number of steps generally improved image quality but also increased computation time.

  • Guidance Scale: I experimented with different guidance scales, which influence the adherence of the generated image to the provided prompt. A higher guidance scale tends to produce images that more closely match the prompt but may reduce diversity.

  • Strength Parameter: This parameter controls the degree to which the original image is altered. By adjusting the strength, I explored the balance betIen retaining original image features and introducing new elements from the prompt.

  • Negative Prompts: I also introduced negative prompts to instruct the model on what not to generate. For example, using the negative prompt "cartoon" while asking for "a realistic green dragon" helped steer the model away from creating cartoonish features, emphasizing a more realistic style.

  • Logging and Experimentation: Throughout the experiments, I used Comet to log parameters and results systematically. Logging facilitated a thorough analysis of how different settings affected the generated images, enabling a structured comparison of outcomes.

About

This repo involves generating images using a diffusion model, with a specific focus on the Stable Diffusion inpainting pipeline. The primary goal is to transform random noise or partially masked images into coherent and visually appealing outputs based on textual prompts.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published