# 🎨 Enhanced Doodle-to-Real Image Generator

Transform your doodles into realistic images or artistic masterpieces!

**Made by Rohith Cherukuri**

## 🚀 Features:
- ✅ Convert doodles to realistic representations
- ✅ Apply artistic styles (oil painting, watercolor, etc.)
- ✅ Handle complex prompts like "oil painting of my doodle's real image"
- ✅ GPU-accelerated processing
- ✅ Public API via ngrok

## 📋 Instructions:
1. **Enable GPU**: Runtime → Change runtime type → GPU → T4
2. **Run all cells** in order
3. **Draw a doodle** and describe what you want
4. **Get amazing results** in seconds!


## 🔧 Setup Environment

In [None]:
# Check GPU availability
import torch
import subprocess

print("🎮 GPU Check:")
if torch.cuda.is_available():
    print(f"✅ CUDA Available: {torch.cuda.get_device_name(0)}")
    print(f"📊 VRAM: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")
else:
    print("❌ No GPU detected - Make sure to enable GPU in Runtime settings!")

print(f"\n🐍 Python: {torch.version.__version__}")
print(f"🔥 PyTorch: {torch.__version__}")

In [None]:
# Install dependencies with NumPy compatibility fix
print("📦 Installing dependencies...")

# Fix NumPy compatibility issue first
print("🔧 Fixing NumPy compatibility...")
!pip install -q "numpy<2.0" --force-reinstall

# Core dependencies
!pip install -q fastapi==0.109.2
!pip install -q uvicorn[standard]==0.27.1
!pip install -q python-multipart==0.0.9
!pip install -q python-dotenv==1.0.1
!pip install -q Pillow==10.2.0
!pip install -q pydantic==2.6.1
!pip install -q aiofiles==23.2.1

# AI/ML Dependencies - GPU optimized
!pip install -q diffusers==0.26.3
!pip install -q transformers==4.38.2
!pip install -q accelerate==0.27.2
!pip install -q controlnet-aux==0.0.10
!pip install -q opencv-python-headless==4.9.0.80  # Headless version for Colab
!pip install -q safetensors==0.4.2
!pip install -q huggingface-hub==0.20.3
!pip install -q requests==2.31.0

# Additional for enhanced processing
!pip install -q xformers  # For memory optimization

# Install ngrok for public URL
!pip install -q pyngrok

print("✅ Dependencies installed with NumPy compatibility fix!")
print("⚠️ You may need to restart runtime if you see import errors")

In [None]:
# Restart runtime to apply NumPy fix (run this cell, then continue)
print("🔄 Restarting runtime to apply NumPy compatibility fix...")
print("⚠️ After this cell runs, continue with the next cells")

import os
os.kill(os.getpid(), 9)

## 📁 Setup Enhanced Processor

**After runtime restart, continue from here:**

In [None]:
# Create the enhanced doodle processor
enhanced_processor_code = '''
import torch
import numpy as np
from PIL import Image, ImageEnhance, ImageFilter
from typing import Optional, Tuple, Dict, List
import logging
import cv2
import base64
import io
from diffusers import (
    StableDiffusionControlNetPipeline, 
    ControlNetModel,
    StableDiffusionImg2ImgPipeline
)
from controlnet_aux import CannyDetector

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class EnhancedDoodleProcessor:
    def __init__(self):
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        logger.info(f"🎮 Using device: {self.device}")
        
        # Model instances (lazy loaded)
        self.controlnet_canny = None
        self.pipe_controlnet = None
        self.pipe_img2img = None
        self.canny_detector = None
        
        # Style configurations
        self.style_configs = {
            "oil_painting": {
                "prompt_suffix": "oil painting, thick brushstrokes, canvas texture, artistic masterpiece, detailed",
                "negative_prompt": "digital art, photograph, realistic, low quality, blurry",
                "guidance_scale": 8.0,
                "strength": 0.8
            },
            "watercolor": {
                "prompt_suffix": "watercolor painting, soft brushstrokes, flowing colors, artistic, beautiful",
                "negative_prompt": "digital art, harsh lines, photograph, low quality",
                "guidance_scale": 7.5,
                "strength": 0.7
            },
            "realistic": {
                "prompt_suffix": "photorealistic, detailed, high quality, professional photography, sharp focus",
                "negative_prompt": "cartoon, anime, painting, artistic, low quality, blurry",
                "guidance_scale": 7.0,
                "strength": 0.9
            }
        }
    
    def process_doodle_with_prompt(self, image: Image.Image, prompt: str) -> Image.Image:
        try:
            processing_plan = self._parse_complex_prompt(prompt)
            logger.info(f"🎯 Processing: {processing_plan['description']}")
            
            # Convert doodle to realistic first if needed
            if processing_plan["needs_realism_first"]:
                realistic_image = self._doodle_to_realistic(image, processing_plan["object_description"])
                if realistic_image is not None:
                    image = realistic_image
                    logger.info("✅ Converted doodle to realistic representation")
            
            # Apply artistic style
            if processing_plan["target_style"] != "realistic":
                styled_image = self._apply_artistic_style(
                    image, 
                    processing_plan["final_prompt"],
                    processing_plan["target_style"]
                )
                if styled_image is not None:
                    return styled_image
            
            return self._enhanced_fallback_processing(image, prompt)
            
        except Exception as e:
            logger.error(f"❌ Error: {e}")
            return self._enhanced_fallback_processing(image, prompt)
    
    def _parse_complex_prompt(self, prompt: str) -> Dict:
        prompt_lower = prompt.lower()
        
        # Detect if user wants realistic conversion first
        needs_realism_first = any(phrase in prompt_lower for phrase in [
            "real image", "realistic", "what this represents", "actual object",
            "real version", "photorealistic version", "of my doodle"
        ])
        
        # Detect target style
        target_style = "realistic"
        if "oil painting" in prompt_lower:
            target_style = "oil_painting"
        elif "watercolor" in prompt_lower:
            target_style = "watercolor"
        elif "realistic" in prompt_lower or "photorealistic" in prompt_lower:
            target_style = "realistic"
        
        # Extract object description
        object_description = self._extract_object_description(prompt)
        
        # Build description
        if needs_realism_first and target_style != "realistic":
            description = f"Convert to realistic {object_description}, then apply {target_style} style"
        elif target_style != "realistic":
            description = f"Apply {target_style} style directly"
        else:
            description = f"Make realistic {object_description}"
        
        return {
            "needs_realism_first": needs_realism_first,
            "target_style": target_style,
            "object_description": object_description,
            "final_prompt": prompt,
            "description": description
        }
    
    def _extract_object_description(self, prompt: str) -> str:
        common_objects = [
            "house", "tree", "car", "person", "dog", "cat", "bird", "flower",
            "sun", "moon", "star", "mountain", "ocean", "boat", "plane",
            "chair", "table", "cup", "book", "phone", "computer", "animal"
        ]
        
        prompt_lower = prompt.lower()
        for obj in common_objects:
            if obj in prompt_lower:
                return obj
        
        return "object from sketch"
    
    def _doodle_to_realistic(self, image: Image.Image, object_description: str) -> Optional[Image.Image]:
        try:
            if not self._load_controlnet_models():
                return None
            
            image = self._prepare_image(image)
            control_image = self.canny_detector(image)
            
            realistic_prompt = f"photorealistic {object_description}, detailed, high quality, professional photography, masterpiece"
            negative_prompt = "cartoon, anime, sketch, drawing, doodle, low quality, blurry, ugly"
            
            logger.info(f"🎨 Converting doodle to realistic {object_description}")
            
            result = self.pipe_controlnet(
                prompt=realistic_prompt,
                negative_prompt=negative_prompt,
                image=control_image,
                num_inference_steps=20,
                guidance_scale=7.0,
                controlnet_conditioning_scale=1.0,
                generator=torch.Generator(device=self.device).manual_seed(42)
            ).images[0]
            
            return result
            
        except Exception as e:
            logger.error(f"❌ Realistic conversion failed: {e}")
            return None
    
    def _apply_artistic_style(self, image: Image.Image, prompt: str, style: str) -> Optional[Image.Image]:
        try:
            if not self._load_img2img_model():
                return None
            
            style_config = self.style_configs.get(style, self.style_configs["oil_painting"])
            image = self._prepare_image(image)
            
            styled_prompt = f"{prompt}, {style_config['prompt_suffix']}"
            
            logger.info(f"🎨 Applying {style} style")
            
            result = self.pipe_img2img(
                prompt=styled_prompt,
                negative_prompt=style_config["negative_prompt"],
                image=image,
                strength=style_config["strength"],
                guidance_scale=style_config["guidance_scale"],
                num_inference_steps=20,
                generator=torch.Generator(device=self.device).manual_seed(42)
            ).images[0]
            
            return result
            
        except Exception as e:
            logger.error(f"❌ Style application failed: {e}")
            return None
    
    def _load_controlnet_models(self) -> bool:
        try:
            if self.pipe_controlnet is None:
                logger.info("📥 Loading ControlNet models...")
                
                self.controlnet_canny = ControlNetModel.from_pretrained(
                    "lllyasviel/sd-controlnet-canny",
                    torch_dtype=torch.float16 if self.device == "cuda" else torch.float32
                )
                
                self.pipe_controlnet = StableDiffusionControlNetPipeline.from_pretrained(
                    "runwayml/stable-diffusion-v1-5",
                    controlnet=self.controlnet_canny,
                    torch_dtype=torch.float16 if self.device == "cuda" else torch.float32,
                    safety_checker=None,
                    requires_safety_checker=False
                )
                
                self.pipe_controlnet.to(self.device)
                self.pipe_controlnet.enable_model_cpu_offload()
                self.canny_detector = CannyDetector()
                
                logger.info("✅ ControlNet models loaded!")
            return True
        except Exception as e:
            logger.error(f"❌ Failed to load ControlNet: {e}")
            return False
    
    def _load_img2img_model(self) -> bool:
        try:
            if self.pipe_img2img is None:
                logger.info("📥 Loading img2img model...")
                
                self.pipe_img2img = StableDiffusionImg2ImgPipeline.from_pretrained(
                    "runwayml/stable-diffusion-v1-5",
                    torch_dtype=torch.float16 if self.device == "cuda" else torch.float32,
                    safety_checker=None,
                    requires_safety_checker=False
                )
                
                self.pipe_img2img.to(self.device)
                self.pipe_img2img.enable_model_cpu_offload()
                
                logger.info("✅ Img2img model loaded!")
            return True
        except Exception as e:
            logger.error(f"❌ Failed to load img2img: {e}")
            return False
    
    def _prepare_image(self, image: Image.Image, size: int = 512) -> Image.Image:
        if image.mode != 'RGB':
            image = image.convert('RGB')
        
        width, height = image.size
        if width > height:
            new_width = size
            new_height = int((height * size) / width)
        else:
            new_height = size
            new_width = int((width * size) / height)
        
        new_width = (new_width // 8) * 8
        new_height = (new_height // 8) * 8
        
        return image.resize((new_width, new_height), Image.Resampling.LANCZOS)
    
    def _enhanced_fallback_processing(self, image: Image.Image, prompt: str) -> Image.Image:
        logger.info("🔄 Using enhanced fallback processing")
        
        prompt_lower = prompt.lower()
        
        if "oil painting" in prompt_lower:
            return self._simulate_oil_painting(image)
        elif "watercolor" in prompt_lower:
            return self._simulate_watercolor(image)
        elif "realistic" in prompt_lower:
            return self._enhance_realism(image)
        else:
            return self._general_artistic_enhancement(image)
    
    def _simulate_oil_painting(self, image: Image.Image) -> Image.Image:
        blurred = image.filter(ImageFilter.GaussianBlur(radius=1.5))
        color_enhancer = ImageEnhance.Color(blurred)
        vibrant = color_enhancer.enhance(1.4)
        contrast_enhancer = ImageEnhance.Contrast(vibrant)
        contrasted = contrast_enhancer.enhance(1.2)
        edges = contrasted.filter(ImageFilter.EDGE_ENHANCE_MORE)
        textured = Image.blend(contrasted, edges, 0.3)
        posterized = textured.quantize(colors=64).convert('RGB')
        return Image.blend(textured, posterized, 0.4)
    
    def _simulate_watercolor(self, image: Image.Image) -> Image.Image:
        blurred = image.filter(ImageFilter.GaussianBlur(radius=3))
        color_enhancer = ImageEnhance.Color(blurred)
        desaturated = color_enhancer.enhance(0.8)
        white_overlay = Image.new('RGB', image.size, (255, 255, 255))
        transparent = Image.blend(desaturated, white_overlay, 0.2)
        soft_edges = transparent.filter(ImageFilter.SMOOTH_MORE)
        bleeding = soft_edges.filter(ImageFilter.GaussianBlur(radius=1))
        return Image.blend(soft_edges, bleeding, 0.3)
    
    def _enhance_realism(self, image: Image.Image) -> Image.Image:
        sharpened = image.filter(ImageFilter.SHARPEN)
        contrast_enhancer = ImageEnhance.Contrast(sharpened)
        contrasted = contrast_enhancer.enhance(1.3)
        color_enhancer = ImageEnhance.Color(contrasted)
        enhanced = color_enhancer.enhance(1.1)
        return enhanced.filter(ImageFilter.UnsharpMask(radius=2, percent=150, threshold=3))
    
    def _general_artistic_enhancement(self, image: Image.Image) -> Image.Image:
        soft = image.filter(ImageFilter.GaussianBlur(radius=0.5))
        color_enhancer = ImageEnhance.Color(soft)
        colorful = color_enhancer.enhance(1.2)
        edges = colorful.filter(ImageFilter.EDGE_ENHANCE)
        return Image.blend(colorful, edges, 0.3)
'''

with open('enhanced_doodle_processor.py', 'w') as f:
    f.write(enhanced_processor_code)

print("✅ Enhanced processor created!")

In [None]:
# Create FastAPI application with enhanced processing
api_code = '''
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel
import base64
import io
import os
import uuid
from PIL import Image
from datetime import datetime
from enhanced_doodle_processor import EnhancedDoodleProcessor

app = FastAPI(
    title="Enhanced Doodle-to-Real Image API",
    description="Transform doodles into realistic images or artistic masterpieces",
    version="2.0.0"
)

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

os.makedirs("uploads", exist_ok=True)
app.mount("/uploads", StaticFiles(directory="uploads"), name="uploads")

# Initialize processor
processor = EnhancedDoodleProcessor()

class EnhancedDesignRequest(BaseModel):
    prompt: str
    sketch: str

class EnhancedDesignResponse(BaseModel):
    image_url: str
    generation_id: str
    prompt_used: str
    processing_time: float
    processing_description: str

@app.post("/generate-enhanced-design", response_model=EnhancedDesignResponse)
async def generate_enhanced_design(request: EnhancedDesignRequest):
    try:
        start_time = datetime.now()
        
        if not request.prompt.strip():
            raise HTTPException(status_code=400, detail="Prompt required")
        
        # Decode image
        try:
            if request.sketch.startswith('data:image'):
                request.sketch = request.sketch.split(',')[1]
            
            image_data = base64.b64decode(request.sketch)
            sketch_image = Image.open(io.BytesIO(image_data))
        except Exception as e:
            raise HTTPException(status_code=400, detail=f"Invalid image: {e}")
        
        # Process with enhanced processor
        generation_id = str(uuid.uuid4())
        
        # Get processing plan for description
        processing_plan = processor._parse_complex_prompt(request.prompt)
        
        processed_image = processor.process_doodle_with_prompt(
            sketch_image, request.prompt
        )
        
        # Save result
        filename = f"enhanced_{generation_id}.png"
        filepath = os.path.join("uploads", filename)
        processed_image.save(filepath, "PNG", quality=95)
        
        processing_time = (datetime.now() - start_time).total_seconds()
        
        return EnhancedDesignResponse(
            image_url=f"/uploads/{filename}",
            generation_id=generation_id,
            prompt_used=request.prompt,
            processing_time=processing_time,
            processing_description=processing_plan["description"]
        )
        
    except HTTPException:
        raise
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"Generation failed: {e}")

@app.get("/")
async def root():
    return {
        "message": "🎨 Enhanced Doodle-to-Real Image API",
        "version": "2.0.0",
        "features": [
            "Doodle to realistic conversion",
            "Artistic style application",
            "Complex prompt understanding",
            "GPU acceleration"
        ],
        "examples": [
            "oil painting of my doodle's real image",
            "make this sketch realistic",
            "watercolor version of this drawing",
            "photorealistic house from this doodle"
        ]
    }

@app.get("/health")
async def health():
    return {"status": "ok", "message": "Enhanced processor ready! 🚀"}

@app.get("/examples")
async def get_examples():
    return {
        "prompt_examples": [
            {
                "prompt": "oil painting of my doodle's real image",
                "description": "Converts doodle to realistic object, then applies oil painting style"
            },
            {
                "prompt": "make this house sketch photorealistic",
                "description": "Creates a photorealistic house from your sketch"
            },
            {
                "prompt": "watercolor painting of this tree drawing",
                "description": "Applies watercolor style to your tree sketch"
            },
            {
                "prompt": "realistic version of this car doodle",
                "description": "Converts car doodle into realistic car image"
            }
        ]
    }
'''

with open('main.py', 'w') as f:
    f.write(api_code)

print("✅ Enhanced API created!")

## 🌐 Start Enhanced Server

In [None]:
# Setup ngrok
from pyngrok import ngrok
import getpass

print("🔑 Ngrok Setup:")
print("For unlimited usage, get your authtoken from:")
print("https://dashboard.ngrok.com/get-started/your-authtoken")

auth_token = getpass.getpass("Ngrok authtoken (optional): ")

if auth_token.strip():
    ngrok.set_auth_token(auth_token)
    print("✅ Ngrok authenticated!")
else:
    print("⚠️ Using ngrok without auth (2-hour limit)")

ngrok.kill()
print("🌐 Ngrok ready!")

In [None]:
# Working Colab server - Using subprocess approach
print("🚀 Starting Enhanced Doodle-to-Real Image Server in Google Colab")
print("Made by Rohith Cherukuri")
print("="*70)

import subprocess
import sys
import time
import requests
from pyngrok import ngrok
import os
import signal

# Kill existing ngrok tunnels
ngrok.kill()

# Kill any existing uvicorn processes
try:
    subprocess.run(['pkill', '-f', 'uvicorn'], capture_output=True)
    time.sleep(2)
except:
    pass

print("📦 Starting FastAPI server using subprocess...")

# Start server using subprocess (most reliable in Colab)
server_process = subprocess.Popen([
    sys.executable, '-m', 'uvicorn', 'main:app',
    '--host', '0.0.0.0',
    '--port', '8000',
    '--workers', '1'
], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1)

print("⏳ Waiting for server to start...")

# Wait for server to be ready
server_ready = False
for attempt in range(15):  # Try for 15 seconds
    time.sleep(1)
    try:
        response = requests.get('http://127.0.0.1:8000/health', timeout=3)
        if response.status_code == 200:
            server_ready = True
            print("✅ Server is running successfully!")
            break
    except requests.exceptions.RequestException:
        print(f"⏳ Attempt {attempt + 1}/15 - Server starting...")

if not server_ready:
    print("❌ Server failed to start. Checking logs...")
    # Get server output
    try:
        output, _ = server_process.communicate(timeout=2)
        print("Server output:")
        print(output)
    except:
        print("Could not get server logs")
    
    print("\n🔍 Troubleshooting:")
    print("1. Make sure you ran all previous cells")
    print("2. Check that main.py was created successfully")
    print("3. Restart runtime if needed")
else:
    # Server is running, create ngrok tunnel
    print("🌐 Creating public tunnel with ngrok...")
    public_url = ngrok.connect(8000)
    
    print("\n" + "="*70)
    print("🎨 ENHANCED DOODLE-TO-REAL IMAGE GENERATOR")
    print("Made by Rohith Cherukuri")
    print("="*70)
    print(f"🌍 PUBLIC URL: {public_url}")
    print(f"📋 Copy this URL to your frontend!")
    print(f"🔗 API Docs: {public_url}/docs")
    print(f"❤️ Health Check: {public_url}/health")
    print(f"💡 Examples: {public_url}/examples")
    print("\n🚀 FEATURES:")
    print("  ✅ Doodle → Realistic Image")
    print("  ✅ Artistic Style Transfer")
    print("  ✅ Complex Prompt Understanding")
    print("  ✅ GPU Acceleration")
    print("\n📝 EXAMPLE PROMPTS:")
    print("  • 'oil painting of my doodle's real image'")
    print("  • 'make this house sketch photorealistic'")
    print("  • 'watercolor version of this tree drawing'")
    print("  • 'realistic car from this doodle'")
    print("="*70)
    
    print("\n✅ Server is ready! Test it now:")
    print(f"   🔗 Health: {public_url}/health")
    print(f"   📚 Docs: {public_url}/docs")
    print("\n🎯 Ready to process your doodles!")
    
    # Keep the server running
    try:
        print("\n💡 Keep this cell running while using the API")
        print("   Press the stop button to shutdown the server")
        
        while True:
            time.sleep(10)
            # Check if server process is still running
            if server_process.poll() is not None:
                print("❌ Server process stopped")
                break
                
    except KeyboardInterrupt:
        print("\n🛑 Shutting down server...")
    finally:
        # Cleanup
        try:
            server_process.terminate()
            server_process.wait(timeout=5)
        except:
            server_process.kill()
        ngrok.kill()
        print("✅ Server stopped")

## 🧪 Test the Enhanced Processor

In [None]:
# Test the enhanced processor with sample prompts
from enhanced_doodle_processor import EnhancedDoodleProcessor
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt

# Create a simple test doodle
def create_test_doodle():
    img = Image.new('RGB', (512, 512), 'white')
    draw = ImageDraw.Draw(img)
    
    # Draw a simple house
    # Base
    draw.rectangle([150, 300, 350, 450], outline='black', width=3)
    # Roof
    draw.polygon([130, 300, 250, 200, 370, 300], outline='black', width=3)
    # Door
    draw.rectangle([220, 350, 280, 450], outline='black', width=2)
    # Windows
    draw.rectangle([170, 320, 210, 360], outline='black', width=2)
    draw.rectangle([290, 320, 330, 360], outline='black', width=2)
    
    return img

# Create test doodle
test_doodle = create_test_doodle()

# Display the test doodle
plt.figure(figsize=(6, 6))
plt.imshow(test_doodle)
plt.title("Test Doodle: Simple House")
plt.axis('off')
plt.show()

print("✅ Test doodle created!")
print("\n🧪 You can now test with prompts like:")
print("  • 'oil painting of my doodle's real image'")
print("  • 'make this house sketch photorealistic'")
print("  • 'watercolor painting of this house'")

## 📋 Usage Instructions

### 🎯 How to Use:

1. **Copy the ngrok URL** from above
2. **Update your frontend** to use: `/generate-enhanced-design` endpoint
3. **Draw a doodle** of any object
4. **Use these example prompts**:

### 🎨 Example Prompts:

**For Realistic Conversion:**
- `"make this house sketch photorealistic"`
- `"realistic version of this car doodle"`
- `"turn this tree drawing into a real tree"`

**For Artistic Styles:**
- `"oil painting of my doodle's real image"`
- `"watercolor version of what this represents"`
- `"oil painting of this house"`

**Complex Requests:**
- `"oil painting of my doodle's real image"`
- `"photorealistic version then make it watercolor"`
- `"what would this look like as a real object"`

### 🔧 API Endpoints:

- `POST /generate-enhanced-design` - Main processing endpoint
- `GET /examples` - Get example prompts
- `GET /health` - Health check
- `GET /docs` - API documentation

### ⚡ Performance:

- **GPU Processing**: 30-60 seconds
- **Fallback Processing**: 5-10 seconds
- **First Generation**: Slower (model loading)
- **Subsequent Generations**: Much faster
