This document provides a summary of the Jupyter notebooks found in this directory, detailing their purpose, key technologies, and processes.
- Purpose: Demonstrates audio generation from text prompts using diffusion models.
- Model: Utilizes the
AudioLDMPipelinewith thecvssp/audioldm-s-full-v2pretrained model. - Libraries:
diffusers,transformers,ftfy,scipy,torch,soundfile,IPython.display. - Process: Installs dependencies, loads the pipeline onto a CUDA device (GPU), generates audio based on a positive and optional negative text prompt, saves the output as a
.wavfile, and provides an embedded player.
- Purpose: Provides a comprehensive introduction to PyTorch, focusing on the core data structure: Tensors. It covers the "what, why, and who" of PyTorch.
- Key Concepts: Tensor creation (scalars, vectors, matrices, tensors), random tensors, tensor datatypes (float32, float16, int8), getting tensor info (
shape,dtype,device), tensor operations (arithmetic, matrix multiplication), aggregation (min, max, mean, sum, argmin, argmax), reshaping/viewing/stacking/squeezing/permuting, indexing, NumPy interoperability (from_numpy,numpy), reproducibility (torch.manual_seed), and running computations on GPUs (CUDA/MPS) including device management. - Libraries:
torch,numpy. - Structure: A detailed tutorial with explanations in markdown and corresponding code examples for each fundamental concept. Includes exercises for practice.
- Purpose: Demonstrates image-to-image translation using Stable Diffusion. It takes an initial image and a text prompt to generate a new image that blends the content of the initial image with the style or elements described in the prompt.
- Model: Utilizes the
AutoPipelineForImage2Imagefrom thediffuserslibrary, specifically loading therunwayml/stable-diffusion-v1-5pretrained model. - Libraries:
diffusers,transformers,ftfy,accelerate,torch,requests,PIL,io. - Process: Installs necessary libraries, sets up the device (CUDA), loads the image-to-image pipeline, loads and prepares an initial input image from a URL, defines a target text prompt, uses a generator with a fixed seed for reproducibility, and runs the pipeline to generate the transformed image.
- Purpose: Demonstrates how to achieve deterministic and reproducible image generation using Stable Diffusion. It highlights the role of random seeds in controlling the output.
- Model: Uses the
DiffusionPipelinefromdiffuserswith therunwayml/stable-diffusion-v1-5pretrained model. - Libraries:
diffusers,transformers,ftfy,accelerate,PIL,torch,io,matplotlib. - Process: Installs libraries, loads the pipeline, defines a prompt, and crucially, uses
torch.Generatorinstances with specificmanual_seedvalues. It first generates multiple images from the same prompt but with different seeds to show variability. Then, it generates images using different prompts but the same seed to demonstrate reproducibility and controlled variation.
- Purpose: Demonstrates image generation using a pre-trained diffusion model pipeline. The active example focuses on unconditional generation using a model specialized for butterflies (
anton-l/ddpm-butterflies-128). A commented-out section suggests conditional (text-to-image) generation using Stable Diffusion (runwayml/stable-diffusion-v1-5) was also considered. - Model: Primarily uses
DiffusionPipelinewith theanton-l/ddpm-butterflies-128model. - Libraries:
diffusers,torch. - Process: Installs
diffusers, loads the pipeline, sends it to the CUDA device, generates an image (unconditionally in the active code), saves the image asgenerated_image.png, and displays it.
- Purpose: Demonstrates text-guided image inpainting using Stable Diffusion. It shows how to replace a masked area in an image based on a textual description.
- Model: Uses the
StableDiffusionInpaintPipelinewith therunwayml/stable-diffusion-inpaintingmodel, which is specifically fine-tuned for inpainting tasks. - Libraries:
diffusers,transformers,ftfy,accelerate,PIL,torch,io,matplotlib. - Process: Installs dependencies, loads the inpainting pipeline, loads an initial image and a corresponding mask image (where the mask defines the area to be modified), defines a text prompt describing the desired content for the masked area, runs the pipeline to generate the inpainted image, and displays the original, mask, and generated images side-by-side.
- Purpose: Demonstrates how to train a custom object detection model using the
detectolibrary, which simplifies working with PyTorch's object detection models (like Faster R-CNN). - Library:
detecto(a high-level wrapper around PyTorch for object detection). Also usestorchvision.transformsandnumpy. - Process:
- Mounts Google Drive.
- Installs
detecto. - Defines paths for images and label files.
- Sets up custom image augmentations.
- Creates
detectoDatasets and DataLoaders. - Initializes and trains a
detecto.core.Model. - Performs prediction on a test image and visualizes results.
- Shows how to filter predictions based on confidence thresholds.
- Purpose: Demonstrates fine-tuning a pre-trained YOLOv8 model for a custom object detection task (mask detection) and performing inference.
- Library:
ultralytics. Also usestorchandos. - Process:
- Handles file paths potentially involving Google Drive and local Colab storage.
- Installs
ultralytics. - Loads a pre-trained YOLOv8 model.
- Sets up the compute device (CUDA/CPU).
- Fine-tunes the model on a custom dataset using
model.train(). - Exports the trained model.
- Performs inference using the
yolo predictcommand-line tool. - Includes helper code for Colab environment specifics.