Advanced video generation system using:
- Qwen2.5-VL: Vision-language model for evolutionary prompt generation
- SDXL: High-quality image generation with LoRA support
- Signal Mapping: Map frames to signals (0.0-1.0) for dynamic video creation
- FFmpeg: Professional video encoding with signal-based frame selection
- VLM Analysis: Analyze a target image and generate N evolutionary prompts
- Image Generation: Use SDXL img2img to generate keyframes from prompts
- Interpolation: Create smooth transitions between keyframes
- Signal Mapping: Map all frames to normalized signal values (0.0 to 1.0)
- Video Creation: Generate videos using signal functions (sine, linear, etc.)
Every frame is mapped to a decimal value between 0.0 and 1.0:
- 100 frames → frame 0 = 0.00, frame 1 = 0.01, ..., frame 99 = 1.00
- Create videos by sampling frames based on signal functions
- Example: Sinusoidal signal creates oscillating forward/backward motion
# Install FFmpeg
sudo apt install ffmpeg
# Install Python dependencies
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu130
pip install diffusers transformers accelerate peft qwen-vl-utils
pip install sentence-transformers chromadb pillow opencv-python
pip install pypdf2 python-docx tqdm
# Create project structure
python -c "from config.paths import Paths; Paths.create_directories()"# Step 1: Generate images with evolutionary prompts
python scripts/2_generate_images.py \\
--initial-image data/input/neuron.png \\
--evolution "the dot becomes a neuron with multidirectional branches" \\
--num-prompts 13 \\
--interpolations 5
# Step 2: Create video with signal
python scripts/3_create_video.py \\
--project generation_20241124_153045 \\
--signal sine \\
--frequency 1.0 \\
--duration 10 \\
--fps 30python scripts/2_generate_images.py \\
--initial-image data/input/target.png \\
--evolution "description of evolution" \\
--num-prompts 13 \\
--num-keyframes 13 \\
--analysis-type "technical" \\
--interpolations 5 \\
--strength 0.25 \\
--guidance 15.0 \\
--steps 40Options:
--initial-image: Target/reference image for reverse engineering--evolution: Description of evolution (e.g., "dot to complex network")--num-prompts: Max sequence length (default: 13)--num-keyframes: How many images to generate (default: same as prompts)--analysis-type: What type of description for the initial image (default: detailed)--interpolations: Frames between keyframes for smoothness (default: 5)--strength: How much each frame changes (0.15-0.35)--use-txt2img: Start with txt2img instead of initial image--txt2img-prompt: Custom prompt for txt2img--lora: Path to LoRA weights--use-rag: Enable RAG guideline enhancement
# Linear progression (0 to 1)
python scripts/3_create_video.py \\
--project your_project_name \\
--signal linear \\
--duration 10 \\
--fps 30
# Sinusoidal oscillation
python scripts/3_create_video.py \\
--project your_project_name \\
--signal sine \\
--frequency 2.0 \\
--duration 15 \\
--fps 30
# Custom mathematical expression
python scripts/3_create_video.py \\
--project your_project_name \\
--signal custom \\
--custom-expr "sin(2*pi*t) * exp(-t)" \\
--duration 10Available Signals:
linear: Straight progression 0→1reverse: Reverse progression 1→0sine: Sinusoidal wave (oscillating)cosine: Cosine wavetriangle: Triangle wave (linear up/down)sawtooth: Sawtooth wave (linear up, instant reset)square: Square wave (binary on/off)ease: Smooth ease in/outbounce: Bouncing effectcustom: Custom mathematical expression
python scripts/create_pairing_template.py --image-dir data/training_datapython scripts/1_train_lora.py \
--dataset-dir data/training_data \
--epochs 15 \
--lora-rank 16 \
--save-every 5python scripts/2_generate_images.py \
--initial-image data/input/dot.png \
--evolution "dot becomes complex form" \
--lora models/lora_weights/custom_lora/checkpoint_epoch_15# Analyze image
python scripts/debug_vlm.py \\
--image data/input/test.png \\
--mode analyze \\
--analysis-type detailed
# Extract features
python scripts/debug_vlm.py \\
--image data/input/test.png \\
--mode features
# Compare two images
python scripts/debug_vlm.py \\
--image data/input/img1.png \\
--image2 data/input/img2.png \\
--mode compare
# Test prompt generation
python scripts/debug_vlm.py \\
--image data/input/test.png \\
--mode prompts \\
--evolution "becomes more complex" \\
--num-prompts 5 \\
--save prompts_debug.jsonsdxl_video_project/
├── config/
│ ├── model_config.py # Model configurations
│ └── paths.py # Path management
│
├── data/
│ ├── input/ # Input images
│ ├── guidelines/ # Visual guidelines (for RAG)
│ ├── training_data/ # LoRA training data
│ └── output/ # Generated outputs
│ └── project_name/
│ ├── frames/ # All generated frames
│ ├── prompts.json # Generated prompts
│ ├── signal_mapping.json
│ ├── metadata.json
│ └── video_*.mp4 # Generated videos
│
├── models/
│ ├── lora_weights/ # Trained LoRA models
│ ├── checkpoints/ # Training checkpoints
│ └── chroma_db/ # RAG vector database
│
├── src/
│ ├── vlm/
│ │ ├── image_analyzer.py # Image analysis
│ │ └── prompt_generator.py # Evolutionary prompts
│ │
│ ├── sdxl/
│ │ ├── generator.py # SDXL generation
│ │ ├── interpolator.py # Frame interpolation
│ │ ├── mapper.py # Signal mapping
│ │ └── lora_trainer.py # LoRA training
│ │
│ ├── video/
│ │ ├── video_generator.py # FFmpeg video creation
│ │ └── effects.py # Video effects
│ │
│ └── rag/
│ └── guideline_rag.py # RAG system (later)
│
└── scripts/
├── 1_train_lora.py # Train LoRA
├── 2_generate_images.py # Generate evolutionary images
├── 3_create_video.py # Create signal-based video
└── debug_vlm.py # VLM debugging tool
# Generate images
python scripts/2_generate_images.py \\
--initial-image data/input/neuron.png \\
--evolution "single neuron to complex neural network" \\
--num-prompts 10 \\
--interpolations 7
# Create linear video
python scripts/3_create_video.py \\
--project generation_* \\
--signal linear \\
--duration 12 \\
--fps 30# Train LoRA on your style
python scripts/1_train_lora.py \\
--dataset data/training_data/scientific_style \\
--output models/lora_weights/scientific \\
--epochs 20
# Generate with LoRA
python scripts/2_generate_images.py \\
--initial-image data/input/cell.png \\
--evolution "cell division process" \\
--lora models/lora_weights/scientific \\
--num-prompts 15
# Create video with sinusoidal motion
python scripts/3_create_video.py \\
--project generation_* \\
--signal sine \\
--frequency 0.5 \\
--duration 20 \\
--visualize-signal# Generate starting with txt2img
python scripts/2_generate_images.py \\
--initial-image data/input/reference.png \\
--use-txt2img \\
--txt2img-prompt "a simple geometric shape, minimal" \\
--evolution "shape evolves into a complex fractal pattern" \\
--num-prompts 20
# Create bounce effect video
python scripts/3_create_video.py \\
--project generation_* \\
--signal bounce \\
--duration 8The signal mapper creates a bijection between frames and signal values:
from src.sdxl.mapper import SignalMapper, SignalFunctions
# Create mapper
mapper = SignalMapper(num_frames=100)
# Get frame for signal value
frame_idx = mapper.get_frame_for_signal(0.75) # Returns frame 75
# Get signal for frame
signal = mapper.get_signal_for_frame(50) # Returns 0.50
# Create custom signal sequence
def custom_signal(t):
return (np.sin(4 * np.pi * t) + 1) / 2
sequence = mapper.create_signal_sequence(
signal_function=custom_signal,
duration_seconds=10,
fps=30
)from src.sdxl.mapper import SignalFunctions
import numpy as np
# Combine signals
def combined_signal(t):
sine = SignalFunctions.sine(t, frequency=2.0)
ease = SignalFunctions.ease_in_out(t)
return 0.7 * sine + 0.3 * ease
# Complex mathematical expression
python scripts/3_create_video.py \\
--project my_project \\
--signal custom \\
--custom-expr "0.5 + 0.5*sin(4*pi*t) * exp(-2*t)"Edit config/model_config.py:
DEFAULT_NUM_PROMPTS = 13 # Max sequence length
DEFAULT_INTERPOLATIONS = 5 # Smoothness
DEFAULT_STRENGTH = 0.25 # SDXL change amount
DEFAULT_GUIDANCE_SCALE = 15.0 # Prompt adherence
DEFAULT_FPS = 30 # Video frame rate- Use low strength (0.20-0.30)
- More interpolations (7-10)
- Gradual prompt changes
- More keyframes with smaller evolutionary steps
sineandcosinecreate oscillating motiontrianglecreates smooth forward-backward loopseasecreates cinematic slow-start/slow-end- Combine signals for complex motion
- Use high inference steps (50+)
- Higher guidance scale (15-20) for consistency
- Train LoRA for specific style consistency
- Use RAG for guideline adherence
Out of Memory:
- Reduce image resolution (--width 768 --height 768)
- Reduce batch size for LoRA training
- Lower inference steps
Inconsistent Results:
- Increase guidance scale (18-20)
- Reduce strength (0.15-0.20)
- Use LoRA for style consistency
- More keyframes with smaller changes
FFmpeg Not Found:
sudo apt install ffmpeg # Ubuntu
brew install ffmpeg # macOSRTX 4090 (24GB VRAM):
- VLM prompt generation: ~10-15s
- SDXL keyframe: ~4-6s per image
- Frame interpolation: ~0.1s per frame
- FFmpeg encoding: ~5-10s for 30s video
Total for 13 keyframes + 5 interpolations:
- ~100 frames generated
- ~5-7 minutes total pipeline
- Videos from existing frames: ~10s
MIT License
- Qwen2.5-VL by Alibaba Qwen Team
- Stable Diffusion XL by Stability AI
- FFmpeg for video encoding