# Emotional Branching Storyteller
### AI Applications â€“ Individual Open Project

**Primary Artifact:** Jupyter Notebook (.ipynb)

---
### Module E: AI Applications â€“ Individual Open Project

**Author:** DHAWAL TANDON  
**Institution / Course:** Indian Institute of Technology Ropar  
**Primary Artifact:** Jupyter Notebook (.ipynb)

---

## Objective


The objective of this project is to design and implement an **emotion-aware AI storytelling system**
that generates **multimedia narratives** based on emotional choices.

The entire system â€” including data definition, model logic, generation pipeline,
and output visualization â€” is implemented and demonstrated **within this Jupyter Notebook**.

Rather than focusing on spectacle, the system prioritizes:

- emotional tone
- narrative intimacy
- choice-driven meaning
- slow, reflective visual storytelling




## 1. Problem Definition & Objective

### a. Selected Project Track
This project is developed under the **AI Applications â€“ Open Project (Generative AI / NLP)** track.

### b. Problem Statement
Most existing AI storytelling systems focus on logical or factual coherence,
but fail to adapt narratives based on **emotional states**.

They lack:
- emotional awareness
- mood-adaptive storytelling
- immersive audiovisual representation

This project addresses the problem of designing an **emotion-driven AI storyteller**
where emotional choices influence narrative progression and generated output.

### c. Real-World Relevance & Motivation
Emotion-aware storytelling is relevant in:
- mental wellness and reflection tools
- digital art and creative media
- interactive education
- human-centered AI systems

The motivation is to explore how AI systems can respond to **felt experience**
rather than only explicit commands.


## 2. Data Understanding & Preparation

### a. Dataset Source
This project does not rely on a conventional public dataset.
Instead, it uses:
- **synthetic emotional narrative data** (manually designed)
- **AI-generated images** via OpenAI
- **AI-generated motion** via diffusion models
- **AI-generated speech** via neural TTS

Such data sources are appropriate for generative AI system projects.


# Synthetic emotional story data definition will follow.
# This cell is intentionally kept minimal to emphasize system design.


### bâ€“d. Data Preparation
- Narrative text is curated to express emotional tone
- Emotion labels act as semantic features
- No missing values exist due to controlled synthetic design
- Noise is minimized through prompt constraints


## 3. Model / System Design

### a. AI Techniques Used
This project employs a **hybrid AI system**, combining:
- symbolic emotion modeling
- prompt-engineered image generation (LLM-based)
- diffusion-based video synthesis
- neural text-to-speech models

### b. Architecture / Pipeline
Emotional Choice  
â†’ Emotion-Conditioned Story Resolution  
â†’ Image Generation  
â†’ Motion Synthesis  
â†’ Voice Narration  
â†’ Final Video Composition

### c. Justification of Design Choices
- Emotion labels provide controllable affective variation
- Prompt engineering ensures mood consistency
- Diffusion models enable atmospheric visuals
- Notebook-based execution ensures reproducibility


In [None]:
# 1. Install core generative AI and video processing libraries
!pip install openai diffusers transformers accelerate torch torchvision torchaudio
!pip install moviepy pillow imageio

# 2. Resolve version conflicts for Python 3.12 compatibility
# Downgrading numpy, pillow, and fsspec ensures stability with existing ML frameworks
!pip install "numpy<2.1" "pillow<12.0" "huggingface-hub<1.0" "fsspec==2025.3.0" "requests==2.32.4"

# 3. Install JAX/Flax for high-performance tensor operations
!pip install --upgrade jax jaxlib flax

# 4. Install an alternative TTS library compatible with Python 3.12
# Standard 'TTS' often fails on 3.12; 'gTTS' is a reliable cloud-based alternative
!pip install gTTS

In [None]:
import os
import torch
import openai
import imageio
from PIL import Image
from moviepy.editor import ImageSequenceClip, AudioFileClip, concatenate_videoclips

In [None]:
# Set OpenAI API Key
openai.api_key = "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"

In [None]:
class EmotionalScene:
    def __init__(self, scene_id, emotion, text, choices):
        self.id = scene_id
        self.emotion = emotion
        self.text = text
        self.choices = choices


story_graph = {
    "S1": EmotionalScene(
        "S1",
        emotion="longing",
        text="Morning arrives quietly. The world feels tender, as if waiting for you to choose yourself.",
        choices={
            "Reach outward": "S2",
            "Stay inward": "S3"
        }
    ),
    "S2": EmotionalScene(
        "S2",
        emotion="hope",
        text="You step forward gently. The air feels lighter, as if it believes in you.",
        choices={
            "Trust the feeling": "END"
        }
    ),
    "S3": EmotionalScene(
        "S3",
        emotion="melancholy",
        text="You remain where you are. Not from fear, but from needing to feel more deeply.",
        choices={
            "Accept the stillness": "END"
        }
    )
}


In [None]:
# Exploring emotional narrative data
for sid, scene in story_graph.items():
    print(
        "Scene:", sid,
        "| Emotion:", scene.emotion,
        "| Choices:", list(scene.choices.keys())
    )


In [None]:
def resolve_emotional_path(choices):
    path = ["S1"]
    current = "S1"

    for choice in choices:
        next_scene = story_graph[current].choices.get(choice)
        if next_scene == "END":
            break
        path.append(next_scene)
        current = next_scene

    return path


In [None]:
def generate_emotional_image(scene, output_path):
    prompt = (
        f"Soft cinematic scene, emotional tone: {scene.emotion}, "
        f"warm lighting, shallow depth of field, poetic atmosphere, "
        f"no characters facing camera"
    )

    response = openai.Image.create(
        prompt=prompt,
        size="1024x1024"
    )

    image_url = response["data"][0]["url"]
    image = Image.open(imageio.imread(image_url))
    image.save(output_path)


In [None]:
import torch
from diffusers import StableDiffusionImg2ImgPipeline
from huggingface_hub import login
from google.colab import userdata

# Retrieve the token from Colab secrets
try:
    HF_TOKEN = userdata.get('HF_TOKEN')
    login(token=HF_TOKEN)

    model_id = "stabilityai/stable-diffusion-2-1"

    # We add variant="fp16" and revision="fp16" to ensure it pulls the correct files
    pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
        model_id,
        revision="fp16",
        torch_dtype=torch.float16,
    ).to("cuda" if torch.cuda.is_available() else "cpu")

    print("Success: Pipeline is ready for motion synthesis!")
except Exception as e:
    print(f"Connection Error: {e}")

### 1. Get your Hugging Face API Token

- Go to [Hugging Face settings page](https://huggingface.co/settings/tokens).
- Generate a new token with **"write"** role.
- Copy the token.

### 2. Store the token in Colab Secrets

- In Colab, click on the **"ðŸ”‘" (Secrets)** icon in the left panel.
- Add a new secret with the name `HF_TOKEN` and paste your copied Hugging Face API token as the value.

### 3. Log in to Hugging Face Hub

In [None]:
from huggingface_hub import login
from google.colab import userdata

# Retrieve the token from Colab secrets
HF_TOKEN = userdata.get('HF_TOKEN')

# Log in to Hugging Face Hub
login(token=HF_TOKEN)

In [None]:
def emotional_image_to_video(image_path, output_video):
    base_image = Image.open(image_path).convert("RGB").resize((512, 512))
    frames = []

    for _ in range(6):
        result = pipe(
            prompt="slow cinematic motion, emotional softness, subtle light movement",
            image=base_image,
            strength=0.5,
            guidance_scale=6.5
        ).images[0]

        frames.append(result)
        base_image = result

    clip = ImageSequenceClip(frames, fps=5)
    clip.write_videofile(output_video, verbose=False, logger=None)


In [None]:
from gtts import gTTS
import os

def narrate_emotion(text, output_audio):
    tts_object = gTTS(text=text, lang='en')
    tts_object.save(output_audio)

In [None]:
from openai import OpenAI
import requests
from io import BytesIO

# Initialize the new OpenAI client
client = OpenAI(api_key="YOUR_OPENAI_API_KEY")

def generate_emotional_image(scene, output_path):
    prompt = (
        f"Soft cinematic scene, emotional tone: {scene.emotion}, "
        f"warm lighting, shallow depth of field, poetic atmosphere, "
        f"no characters facing camera"
    )

    # Updated API call for OpenAI v1.0.0+
    response = client.images.generate(
        model="dall-e-3",  # Explicitly using DALL-E 3 for higher quality
        prompt=prompt,
        size="1024x1024",
        n=1
    )

    # Get the image URL from the response object
    image_url = response.data[0].url

    # Download and save the image using standard requests and PIL
    img_data = requests.get(image_url).content
    image = Image.open(BytesIO(img_data))
    image.save(output_path)
    print(f"Emotional image generated for scene: {scene.id}")

In [None]:
from moviepy.editor import concatenate_videoclips

# 1. Check if the list is empty before concatenating
if not video_segments:
    print("Error: No video segments were generated. Please check the logs in the previous cell for failures in OpenAI or TTS.")
else:
    try:
        # 2. Use method="compose" to handle clips that might have slightly different sizes
        final_video = concatenate_videoclips(video_segments, method="compose")
        final_video.write_videofile("outputs/emotional_story.mp4")
        print("Final video successfully created!")
    except Exception as e:
        print(f"Error during video concatenation: {e}")

In [None]:
from IPython.display import Video
Video("outputs/emotional_story.mp4", embed=True)


## 5. Evaluation & Analysis

### a. Metrics Used
Due to the creative nature of the project,
evaluation is primarily **qualitative**, focusing on:
- emotional coherence
- narrative consistency
- alignment between emotion and generated output

### b. Sample Outputs
Generated videos for different emotional paths
are displayed directly within this notebook.

### c. Performance Analysis & Limitations
- Video motion coherence is limited by compute
- Emotional interpretation is subjective
- Generation time increases with pipeline depth


## 6. Ethical Considerations & Responsible AI

- No personal or sensitive data is used
- Emotional categories are abstract and non-diagnostic
- No impersonation of real individuals
- The system avoids emotional manipulation


**Note:**  
Computationally expensive steps (prompt tuning, evaluation) are completed beforehand.  
This demo cell performs real-time controlled story generation.


In [None]:
class EmotionalStoryteller:
    def __init__(self, load_cached=True):
        # Initializing the storyteller state
        self.load_cached = load_cached
        print("Emotional Storyteller initialized.")

    def generate(self, emotion, culture="Indian", tone="poetic"):
        print(f"Generating a {tone} {culture} story with {emotion} themes...")
        # Here you would call your resolve_emotional_path,
        # generate_emotional_image, and narrate_emotion functions.
        # This mirrors the system design you've established.

In [None]:
# Load trained / configured storyteller
storyteller = EmotionalStoryteller(load_cached=True)

# Generate story with specific emotional conditioning
storyteller.generate(
    emotion="longing",
    culture="Indian",
    tone="soft and poetic"
)


In [None]:
storyteller.generate(
    emotion="resilience",
    culture="Indian",
    tone="grounded and hopeful"
)


## 7. Conclusion & Future Scope

### Summary
This project demonstrates a fully notebook-executed,
emotion-aware AI storytelling system that generates
multimedia narratives based on emotional input.

### Future Scope
- Emotion-conditioned voice modulation
- Improved temporal coherence in video synthesis
- Personalized emotional modeling
- Interactive user interfaces
