Skip to content

BrunooCS/Text-To-Image-Diffusion-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diffusion Model

Forward Diffusion Process

A powerful conditional diffusion model for image generation from text descriptions.

Python PyTorch Gymnasium


Overview

This project implements a conditional diffusion model that generates images from text descriptions. The model learns to gradually denoise random Gaussian noise into coherent images, guided by text prompts.

Generation Process

Key Features

  • Text-to-Image Generation: Convert textual descriptions into high-quality images
  • Web Interface: User-friendly interface for image generation
  • Conditional Generation: Fine-grained control over generated content
  • Interactive Process Visualization: Watch the denoising process in real-time
  • Multi-Class Support: Generate images across various categories

Architecture

The model consists of several key components:

├── models/
│   ├── modules.py        # Neural network building blocks
│   ├── text_encoder.py   # Text embedding module
│   ├── time_encoder.py   # Timestep encoding module
│   └── unet.py           # Conditional UNet architecture
│
├── output_ImageNet/      # Generated outputs and visualizations
│
├── diffusion.py          # Core diffusion model implementation
├── plot_func.py          # Visualization utilities
├── preprocess.py         # Data loading and preprocessing
└── web.py                # Web interface for image generation

Sample Generation

Web Interface

The project includes a sleek web interface for easy interaction with the model:

Web Interface

Features:

  • Text prompt input
  • Multiple image generation
  • Process visualization
  • Real-time generation progress

Model Components

1. Diffusion Process

  • Forward diffusion adds noise gradually
  • Reverse diffusion learns to remove noise
  • Conditional generation guided by text embeddings

2. Architecture Details

  • UNet backbone with skip connections
  • Text conditioning through cross-attention
  • Time embedding using sinusoidal positions
  • Batch normalization for stable training

3. Dataset

3. Training Process

  • Dataset: ImageNet subset
  • Text embeddings: SentenceTransformer
  • Loss: MSE between predicted and actual noise
  • Optimizer: AdamW with gradient scaling

Sample Generations

Generate images from text descriptions like:

  • "red apple"
  • "golden retriever"
  • "sunset over mountains"

License

This project is licensed under the MIT License - see the LICENSE file for details.


About

Text to image diffusion model from scratch with Tiny ImageNet dataset, Pytorch and Web interface for image generation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages