# Dynamic Style Visualizer Code
The presented code works best as described on Google Colab, to make it work on local environments, we might need to setup NVIDIA CUDA drivers sepparately and add additional checks to make sure that our code recognizes the NVIDIA GPU's.


First install all the dependencies by running the cell below and it might ask you to restart the session, so please do so. After we have restarted the session, leaving the cell of installing packages, run the remaining code cells to get the desired output.

In [1]:
# Install dependencies with version control
!pip install tensorflow==2.12.0 tensorflow-hub gradio diffusers transformers accelerate nltk sentence-transformers



To get the NLP aspect of the code working we needed proper implementation of NLTK data, and ensure that all of them downloads properly because while implementing the NLP aspect, many a times we got some error regarding missing package due to which the emotion recognition was unable to work.

In [2]:
# Configure NLTK properly with improved error handling
import nltk
from nltk.tokenize.punkt import PunktSentenceTokenizer
import os
import logging

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger('DynamicStyleVisualizer')

# Setup NLTK data path
nltk_data_path = '/content/nltk_data'
os.makedirs(nltk_data_path, exist_ok=True)

try:
    nltk.download('punkt', download_dir=nltk_data_path)
    nltk.download('averaged_perceptron_tagger', download_dir=nltk_data_path)
    nltk.download('wordnet', download_dir=nltk_data_path)
    nltk.download('omw-1.4', download_dir=nltk_data_path)
    nltk.data.path.append(nltk_data_path)
    logger.info("NLTK data downloaded successfully")
except Exception as e:
    logger.error(f"Failed to download NLTK data: {str(e)}")

# Create a custom sentence tokenizer that doesn't rely on punkt_tab
def custom_sent_tokenize(text):
    """Custom sentence tokenizer that uses PunktSentenceTokenizer directly"""
    try:
        # Initialize the tokenizer without loading from punkt_tab
        tokenizer = PunktSentenceTokenizer()
        return tokenizer.tokenize(text)
    except Exception as e:
        logger.error(f"Sentence tokenization failed: {str(e)}")
        # Fallback to simple split by period if tokenizer fails
        return [s.strip() + "." for s in text.split(".") if s.strip()]

[nltk_data] Downloading package punkt to /content/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /content/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package wordnet to /content/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /content/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


In [3]:
import tensorflow as tf
import tensorflow_hub as hub
from diffusers import StableDiffusionPipeline
import torch
import numpy as np
from PIL import Image
import gradio as gr
import re
import functools

# Import new NLP components
from sentence_transformers import SentenceTransformer
from transformers import pipeline

In [4]:
# Model Manager Class for Modularity
class ModelManager:
    def __init__(self):
        self.device = self._setup_device()
        self.sd_pipe = None
        self.stylize_fn = None
        self.style_encoder = None
        self.emotion_classifier = None
        self.models_loaded = False

    def _setup_device(self):
        """Configure GPU with fallback to CPU"""
        if torch.cuda.is_available():
            logger.info("GPU is available. Using CUDA.")
            device = torch.device("cuda")
            torch.backends.cudnn.benchmark = True
        else:
            logger.info("GPU not available. Falling back to CPU.")
            device = torch.device("cpu")
        return device

    def load_models(self):
        """Load all required models with proper error handling"""
        if self.models_loaded:
            logger.info("Models already loaded")
            return True

        try:
            # Stable Diffusion with appropriate settings based on device
            logger.info("Loading Stable Diffusion model...")
            if self.device.type == "cuda":
                self.sd_pipe = StableDiffusionPipeline.from_pretrained(
                    "stabilityai/stable-diffusion-2-1",
                    torch_dtype=torch.float16,
                    safety_checker=None
                ).to(self.device)
            else:
                # CPU-optimized settings
                self.sd_pipe = StableDiffusionPipeline.from_pretrained(
                    "stabilityai/stable-diffusion-2-1",
                    safety_checker=None
                ).to(self.device)
                logger.info("Using CPU for Stable Diffusion. Processing will be slower.")

            # Style Transfer
            logger.info("Loading style transfer model...")
            hub_module = hub.load('https://tfhub.dev/google/magenta/arbitrary-image-stylization-v1-256/2')
            self.stylize_fn = hub_module.signatures['serving_default']

            # NLP Models
            logger.info("Loading NLP models...")
            self.style_encoder = SentenceTransformer('all-MiniLM-L6-v2')
            self.emotion_classifier = pipeline(
                "text-classification",
                model="j-hartmann/emotion-english-distilroberta-base",
                return_all_scores=True,
                device=0 if self.device.type == "cuda" else -1  # Use GPU if available
            )

            self.models_loaded = True
            logger.info("All models loaded successfully")
            return True

        except Exception as e:
            logger.error(f"Failed to load models: {str(e)}")
            return False

# Initialize the model manager
model_manager = ModelManager()


In [5]:
# Style configuration with enhanced metadata
STYLE_MAPPING = {
    'dreamy': {
        'url': 'https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg/1024px-Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg',
        'keywords': ['peaceful', 'golden', 'serene', 'tranquil', 'calm', 'mystical', 'ethereal'],
        'description': "Serene, ethereal scenes with soft lighting and dreamy atmosphere"
    },
    'dark': {
        'url': 'https://upload.wikimedia.org/wikipedia/commons/c/c5/Edvard_Munch%2C_1893%2C_The_Scream%2C_oil%2C_tempera_and_pastel_on_cardboard%2C_91_x_73_cm%2C_National_Gallery_of_Norway.jpg',
        'keywords': ['dark', 'stormy', 'shadow', 'gloomy', 'frightening', 'ominous', 'tense'],
        'description': "Dramatic, ominous scenes with shadows and emotional intensity"
    },
    'vibrant': {
        'url': 'https://upload.wikimedia.org/wikipedia/commons/b/b4/Vassily_Kandinsky%2C_1913_-_Composition_7.jpg',
        'keywords': ['bright', 'colorful', 'lively', 'energetic', 'vivid', 'festive', 'dynamic'],
        'description': "Colorful, energetic scenes with vivid details and dynamic composition"
    }
}

# Emotion to style mapping
EMOTION_PRIORITY = {
    'neutral': 'dreamy',
    'fear': 'dark',
    'sadness': 'dark',
    'joy': 'vibrant',
    'surprise': 'dreamy',
    'anger': 'dark',
    'disgust': 'dark',
    'love': 'vibrant',
    'confusion': 'dreamy',
    'anticipation': 'vibrant'
}

# Default style to use if analysis fails
DEFAULT_STYLE = 'vibrant'

In [6]:
@functools.lru_cache(maxsize=None)
def load_style_image(style_url):
    """Load and preprocess style image to 256x256"""
    try:
        image_path = tf.keras.utils.get_file(os.path.basename(style_url)[-128:], style_url)
        img = tf.io.decode_image(tf.io.read_file(image_path), channels=3, dtype=tf.float32)[tf.newaxis, ...]
        img = tf.image.resize(img, (256, 256))
        return tf.nn.avg_pool(img, ksize=[3,3], strides=[1,1], padding='SAME')
    except Exception as e:
        logger.error(f"Failed to load style image: {str(e)}")
        # Return a default colored image if loading fails
        return tf.ones([1, 256, 256, 3], dtype=tf.float32) * 0.5  # Gray image

In [7]:
def analyze_mood_enhanced(text):
    """Enhanced mood detection using both keyword analysis and ML models"""
    # Initial values
    keyword_scores = {mood: 0 for mood in STYLE_MAPPING.keys()}
    emotion_style = DEFAULT_STYLE
    emotion_confidence = 0.0
    semantic_style = DEFAULT_STYLE
    semantic_confidence = 0.0

    try:
        # Legacy keyword-based analysis
        text_lower = text.lower()
        keyword_scores = {mood: sum(1 for kw in data['keywords'] if kw in text_lower)
                         for mood, data in STYLE_MAPPING.items()}
        logger.debug(f"Keyword analysis results: {keyword_scores}")
    except Exception as e:
        logger.error(f"Keyword analysis failed: {str(e)}")

    # Emotion-based analysis
    try:
        if model_manager.emotion_classifier is not None:
            emotion_results = model_manager.emotion_classifier(text)[0]
            dominant_emotion = max(emotion_results, key=lambda x: x['score'])
            emotion_style = EMOTION_PRIORITY.get(dominant_emotion['label'], DEFAULT_STYLE)
            emotion_confidence = dominant_emotion['score']
            logger.debug(f"Emotion analysis: {dominant_emotion['label']} with confidence {emotion_confidence}")
    except Exception as e:
        logger.error(f"Emotion analysis failed: {str(e)}")

    # Semantic similarity analysis
    similarities = {style: 0.0 for style in STYLE_MAPPING.keys()}
    try:
        if model_manager.style_encoder is not None:
            text_embedding = model_manager.style_encoder.encode(text)
            style_embeddings = {
                style: model_manager.style_encoder.encode(data['description'])
                for style, data in STYLE_MAPPING.items()
            }
            similarities = {
                style: float(np.dot(text_embedding, style_emb) /
                      (np.linalg.norm(text_embedding) * np.linalg.norm(style_emb)))
                for style, style_emb in style_embeddings.items()
            }
            semantic_style = max(similarities, key=similarities.get)
            semantic_confidence = similarities[semantic_style]
            logger.debug(f"Semantic analysis: {semantic_style} with confidence {semantic_confidence}")
    except Exception as e:
        logger.error(f"Semantic analysis failed: {str(e)}")

    # Combine analyses with weights
    try:
        combined_scores = {}
        for style in STYLE_MAPPING.keys():
            combined_scores[style] = (
                (0.2 * keyword_scores.get(style, 0)) +
                (0.4 * (1.0 if style == emotion_style else 0.0) * emotion_confidence) +
                (0.4 * similarities.get(style, 0.0))
            )

        best_mood = max(combined_scores, key=combined_scores.get)
        logger.info(f"Selected mood: {best_mood} with score {combined_scores[best_mood]}")
    except Exception as e:
        logger.error(f"Mood combination failed: {str(e)}")
        best_mood = DEFAULT_STYLE

    # Extract matched keywords for debugging
    matched_keywords = []
    try:
        matched_keywords = [kw for kw in STYLE_MAPPING[best_mood]['keywords'] if kw in text_lower]
    except Exception:
        pass

    # Return additional analysis details for debugging
    analysis_details = {
        'keyword_match': keyword_scores,
        'emotion_analysis': {'style': emotion_style, 'confidence': emotion_confidence},
        'semantic_analysis': {'style': semantic_style, 'confidence': semantic_confidence},
        'combined_scores': combined_scores
    }

    return best_mood, matched_keywords, analysis_details

In [8]:
def generate_content_image(prompt):
    """Generate base image using Stable Diffusion with detailed prompts"""
    try:
        if model_manager.sd_pipe is None:
            logger.error("Stable Diffusion model not loaded")
            return Image.new('RGB', (512, 512), color='gray')

        detailed_prompt = f"{prompt}, highly detailed, realistic, cinematic lighting"
        logger.info(f"Generating image for prompt: {detailed_prompt[:50]}...")

        # Adjust settings based on device
        steps = 50 if model_manager.device.type == "cuda" else 25  # Fewer steps on CPU

        with torch.autocast(model_manager.device.type):
            result = model_manager.sd_pipe(
                detailed_prompt,
                guidance_scale=7.5,
                height=512,
                width=512,
                num_inference_steps=steps
            )

        logger.info("Image generation completed successfully")
        return result.images[0]
    except Exception as e:
        logger.error(f"Image generation failed: {str(e)}")
        # Return gray image with error text
        from PIL import ImageDraw
        img = Image.new('RGB', (512, 512), color='gray')
        draw = ImageDraw.Draw(img)
        draw.text((10, 10), f"Generation Error: {str(e)[:100]}", fill="white")
        return img

In [9]:
# ========== IMAGE PROCESSING FUNCTIONS ==========
def process_scene(scene_text, style_image, mood, keywords, analysis_details=None):
    """Process individual scene through full pipeline with debugging details"""
    try:
        # Generate content image
        content_image = generate_content_image(scene_text)

        # Convert content image to TensorFlow tensor
        content_tensor = tf.image.resize(
            tf.keras.preprocessing.image.img_to_array(content_image)[tf.newaxis, ...] / 255.0,
            (256, 256)
        )

        # Apply Neural Style Transfer
        if model_manager.stylize_fn is None:
            logger.error("Style transfer model not loaded")
            styled_image = content_image
        else:
            try:
                outputs = model_manager.stylize_fn(
                    placeholder=content_tensor,
                    placeholder_1=style_image
                )

                # Convert styled output to PIL Image
                styled_array = (np.clip(outputs['output_0'].numpy()[0], 0, 1) * 255).astype(np.uint8)
                styled_image = Image.fromarray(styled_array)
                logger.info("Style transfer completed successfully")
            except Exception as e:
                logger.error(f"Style transfer failed: {str(e)}")
                styled_image = content_image  # Fall back to content image if style transfer fails

        # Return all debugging details
        return {
            "source_image": content_image,
            "styled_image": styled_image,
            "style_applied": mood,
            "keywords_used": keywords,
            "scene_text": scene_text[:100]+"..." if len(scene_text) > 100 else scene_text,
            "analysis_details": analysis_details
        }
    except Exception as e:
        logger.error(f"Scene processing failed: {str(e)}")
        error_image = Image.new('RGB', (512, 512), color='red')
        return {
            "source_image": error_image,
            "styled_image": error_image,
            "style_applied": "error",
            "keywords_used": [],
            "scene_text": scene_text[:50]+"..." if len(scene_text) > 50 else scene_text,
            "analysis_details": {"error": str(e)}
        }

In [10]:
def process_story(story_text):
    """Main processing pipeline with enhanced NLP analysis"""
    logger.info("Processing story...")

    # Ensure models are loaded
    if not model_manager.models_loaded:
        success = model_manager.load_models()
        if not success:
            logger.error("Failed to load required models")
            return [{"error": "Failed to load models"}]

    outputs = []

    try:
        # Use our custom sentence tokenizer
        scenes = custom_sent_tokenize(story_text)
        logger.info(f"Story split into {len(scenes)} scenes")

        for i, scene in enumerate(scenes):
            if len(scene.strip()) < 5:
                logger.info(f"Skipping scene {i+1}: too short")
                continue

            logger.info(f"Processing scene {i+1}/{len(scenes)}")

            # Analyze mood with enhanced NLP approach
            mood, keywords, analysis_details = analyze_mood_enhanced(scene)

            # Load style image based on mood
            style_url = STYLE_MAPPING[mood]['url']
            style_image = load_style_image(style_url)

            # Process scene and collect all details
            scene_details = process_scene(scene, style_image, mood, keywords, analysis_details)
            outputs.append(scene_details)

        logger.info(f"Story processing completed: {len(outputs)} images generated")
        return outputs

    except Exception as e:
        logger.error(f"Story processing failed: {str(e)}")
        error_details = {
            "source_image": Image.new('RGB', (512, 512), color='red'),
            "styled_image": Image.new('RGB', (512, 512), color='red'),
            "style_applied": "error",
            "keywords_used": [],
            "scene_text": "Processing error",
            "analysis_details": {"error": str(e)}
        }
        return [error_details]

In [None]:
# Gradio Interface
with gr.Blocks(theme=gr.themes.Soft()) as app:
    gr.Markdown("# 📖 Dynamic Story Visualizer with Enhanced NLP")

    with gr.Row():
        with gr.Column(scale=3):
            story_input = gr.Textbox(
                label="Your Story",
                placeholder="Once upon a time in a peaceful forest...",
                lines=5
            )
        with gr.Column(scale=1):
            generate_btn = gr.Button("Generate Visual Story 🎨", variant="primary")
            status = gr.Textbox(label="Status", value="Ready")
            device_info = gr.Textbox(
                label="Device Info",
                value=f"Using {'GPU (CUDA)' if torch.cuda.is_available() else 'CPU'}"
            )

    with gr.Row():
        source_gallery = gr.Gallery(label="Source Images", columns=3, object_fit="contain")
        styled_gallery = gr.Gallery(label="Stylized Images", columns=3, object_fit="contain")

    with gr.Row():
        style_info = gr.Textbox(label="Style Analysis Details", lines=10)

    def wrapper_fn(story_text):
        if not story_text or len(story_text.strip()) < 10:
            return [[], [], "Please enter a longer story", "Error: Story too short"]

        try:
            yield [[], [], "", "Starting processing..."]

            # Load models if not already loaded
            if not model_manager.models_loaded:
                yield [[], [], "", "Loading models..."]
                success = model_manager.load_models()
                if not success:
                    yield [[], [], "", "Failed to load models. Please try again."]
                    return

            # Process the story
            scenes_details = process_story(story_text)

            if not scenes_details or "error" in scenes_details[0]:
                yield [[], [], "", f"Error: {scenes_details[0].get('error', 'Unknown error')}"]
                return

            source_images = []
            styled_images = []
            style_details = []

            for detail in scenes_details:
                try:
                    source_caption = f"Source: {detail['scene_text']}"
                    styled_caption = f"Styled ({detail['style_applied']}): {detail['scene_text']}"

                    source_images.append((detail["source_image"], source_caption))
                    styled_images.append((detail["styled_image"], styled_caption))

                    # Enhanced style details with NLP analysis
                    analysis = detail.get('analysis_details', {})
                    style_details.append(
                        f"Scene: {detail['scene_text']}\n"
                        f"Style Applied: {detail['style_applied']}\n"
                        f"Keywords Used: {', '.join(detail['keywords_used'])}\n"
                        f"Emotion Analysis: {analysis.get('emotion_analysis', {})}\n"
                        f"Semantic Score: {analysis.get('semantic_analysis', {})}\n"
                        f"---"
                    )
                except Exception as e:
                    logger.error(f"Error formatting scene result: {str(e)}")

            yield [source_images, styled_images, "\n\n".join(style_details), "Processing complete!"]

        except Exception as e:
            error_message = f"❌ Error: {str(e)}"
            logger.error(error_message)
            yield [[], [], "", error_message]

    generate_btn.click(
        fn=wrapper_fn,
        inputs=story_input,
        outputs=[source_gallery, styled_gallery, style_info, status]
    )

    gr.Markdown("""
    ## How to Use
    1. Enter your story in the text box
    2. Click "Generate Visual Story"
    3. Wait for the AI to process each sentence and generate images
    4. Review the source images, stylized results, and analysis details

    Note: Processing may take longer on CPU environments.
    """)

app.launch(server_name="0.0.0.0", share=True, debug=True)

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://5b7ae2229be4c0ba32.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]

Device set to use cuda:0


  0%|          | 0/50 [00:00<?, ?it/s]