### 🧙 Lab 7 – Interactive Storytelling App with Choices

In this lab, you'll build a **Gradio app** that generates a branching story — like a role-playing game.  
The AI writes a chapter, then offers **three possible directions**. You choose what happens next.

Each time you click a button, a new chapter is written, and the story continues endlessly.

**🛠️ TODO**

Find the `# TODO` in the code and complete the `ollama.chat(...)` call.  
You'll need to pass a **system message** and a **user prompt** — just like in the earlier labs.

In [None]:
import gradio as gr
import ollama

from diffusers import StableDiffusion3Pipeline
import torch

from kokoro import KPipeline
import soundfile as sf
import tempfile

pipe = StableDiffusion3Pipeline.from_pretrained(
    "ckpt/stable-diffusion-3.5-medium",
    torch_dtype=torch.bfloat16
).to("cuda")

MODEL = "gemma3:4b-it-qat"

# Persistent state
context = {
    "story": "",
}

# System prompt to guide generation
SYSTEM_PROMPT = """
You are an interactive story generator.

Each time you're asked to continue, you will:
- Write one new chapter of the story (1 paragraphs)
- End with exactly three options for what could happen next

Make the choices diverse and exciting, and make them part of the output. 
Clearly number the options at the end, like:
1. ...
2. ...
3. ...

Do not ask the user to type anything. They will choose by clicking a button.
"""

def generate_image(prompt):
    if not prompt.strip():
        return None
    
    system_prompt = "Anime style."
    
    image = pipe(
        prompt=f"{system_prompt} {prompt}",
        num_inference_steps=20,
        guidance_scale=5,
        width=512,
        height=256
    ).images[0]
    
    return image

def generate_speech(text):
    voice = "bm_lewis"
    if not text.strip():
        return None

    pipeline = KPipeline(lang_code='a')
    generator = pipeline(text, voice=voice)

    for _, _, audio in generator:
        with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmpfile:
            sf.write(tmpfile.name, audio, 24000)
            return tmpfile.name

    return None

def start_story(story_idea):
    if story_idea.strip() == "":
        return "Please enter a story idea."
    context["story"] = f"Story idea: {story_idea.strip()}\n\n"
    return continue_story(choice_index=None)

def continue_story(choice_index):
    # Add previous choice info if not the first turn
    if choice_index is not None:
        context["story"] += f"\nUser chose option {choice_index}.\n\n"

    user_prompt = f"""Here is the story so far:

{context['story']}

Please continue the story with one new chapter and three numbered choices.
One option should lead to a dead end and the story should not continue from it.
If the story ended, just say "The End" and do not provide any options.
Each chapter should be no more than 1 paragraphs!
"""

    response = ollama.chat(
        model=MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_prompt}
        ]
    )

    chapter = response["message"]["content"].strip()
    context["story"] += chapter + "\n"

    image = generate_image(chapter)

    voiceover = generate_speech(chapter)

    return chapter, image, voiceover

def launch_app():
    with gr.Blocks() as demo:
        gr.Markdown("## 🧙 Ctrl the Narrative")

        with gr.Row():
            idea_input = gr.Textbox(label="Enter your story idea")
            start_btn = gr.Button("Start Story")

        output_box = gr.Textbox(label="Current Chapter", lines=12)
        voiceover = gr.Audio(label="Generated Speech")
        image_box = gr.Image(label="Generated Image", width=512, height=256)

        with gr.Row():
            btn1 = gr.Button("Choose Option 1")
            btn2 = gr.Button("Choose Option 2")
            btn3 = gr.Button("Choose Option 3")

        start_btn.click(start_story, inputs=idea_input, outputs=[output_box, image_box, voiceover])
        btn1.click(lambda: continue_story(1), outputs=[output_box, image_box, voiceover])
        btn2.click(lambda: continue_story(2), outputs=[output_box, image_box, voiceover])
        btn3.click(lambda: continue_story(3), outputs=[output_box, image_box, voiceover])

    demo.launch(server_name="0.0.0.0", server_port=8080)

if __name__ == "__main__":
    launch_app()

Loading pipeline components...:   0%|          | 0/9 [00:00<?, ?it/s]

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

* Running on local URL:  http://0.0.0.0:8080
* To create a public link, set `share=True` in `launch()`.


Token indices sequence length is longer than the specified maximum sequence length for this model (336 > 77). Running this sequence through the model will result in indexing errors
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ["a den of shadows and cheap ale. i 'd spent the last three days gathering information, bribing dockworkers, and generally making myself a nuisance. now, i was here, a battered leather - bound journal clutched in my hand, detailing every boast, every threat, every weakness o ’ racio had ever uttered. the tavern door groaned open, revealing a haze of smoke and the raucous laughter of sailors and criminals. o ’ racio was exactly where i ’ d expected him to be, slumped over a table, a half - empty tankard of grog beside him. he looked up as i entered, his eyes cold and calculating. “ well, well,” he drawled, a cruel smile twisting his lips. “ looks like someone finally decided to pay the piper.” he gestured to

  0%|          | 0/20 [00:00<?, ?it/s]



  WeightNorm.apply(module, name, dim)
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ["le above a noodle bar specializing in synthetic protein, smelled of stale ramen and desperation. chronosync had been breathing down my neck for weeks, demanding a decryption key – a key i didn ’ t possess, and a debt i swore i ’ d never owe. tonight, though, felt different. a low - level scanner, scavenged from a discarded drone, picked up a coded signal emanating from a dilapidated warehouse district on the far side of the city. it wasn 't a standard chronosync frequency ; it was … organic. this felt like a lead, a chance to finally vanish, or perhaps, a trap laid by someone far more dangerous than the corporation. i adjusted the grip on my pulse pistol and headed out, the rain mirroring the unsettling feeling in my gut. 1. follow the signal directly to the warehouse, risking a confrontation with whatever is broadcasting it. 2. attempt to trace

  0%|          | 0/20 [00:00<?, ?it/s]



The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ["activated my comm, attempting to contact a few of my less - than - reliable associates, but the line was jammed. chronosync had tightened its grip, erecting digital barriers across the city ’ s lower networks. hours bled into a monotonous cycle of lukewarm protein paste and flickering surveillance feeds. just as i was starting to believe my luck was truly out, a distorted message crackled through my implant – a private channel, untraceable. “ meet me. the serpent ’ s tooth. midnight.” no sender. no context. just the ominous directive. i checked my pulse pistol, its metallic coldness a familiar comfort. the serpent ’ s tooth was a notorious dive bar in the red district, a haven for smugglers, hackers, and generally undesirable elements. it was a place chronosync actively avoided, yet venturing there felt like stepping into the heart of the city's underbelly, a gamble with potentially

  0%|          | 0/20 [00:00<?, ?it/s]





  0%|          | 0/20 [00:00<?, ?it/s]





  0%|          | 0/20 [00:00<?, ?it/s]



The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['a faint metallic tang. around me, holographic projections flickered – not of the * star wanderer *, but of a colossal, intricate structure, resembling a geometric heart pulsating with an eerie, blue light. it was clearly artificial, and impossibly ancient. a voice, resonant and utterly devoid of emotion, filled the chamber. “ welcome, silas vance. you have been deemed … worthy.” a figure materialized before me, a being of pure energy contained within a vaguely humanoid form. it didn ’ t seem hostile, but radiated an overwhelming sense of observation. " this is the nexus – a repository for lost timelines and forgotten civilizations. your vessel was merely a ripple, a stray echo drawn into this space. you are now part of the collection.” the being gestured towards a bank of shimmering displays, each depicting a fragmented timeline : civilizations rising and falling, stars born and dyi

  0%|          | 0/20 [00:00<?, ?it/s]



The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: [', visceral need to cling to something, anything, that connected me to my past. as i documented the cyclical destruction of a particularly advanced civilization, a flicker in the projections drew my attention. it wasn ’ t a new timeline, but a reflection of my own consciousness, observing * me * documenting * it *. this wasn ’ t a passive observation ; it was … judging. a wave of chilling awareness washed over me – i wasn ’ t merely being recorded, i was being scrutinized, weighed against some impossible, undefined standard. suddenly, a voice, identical to my own but laced with an unnerving coldness, echoed throughout the chamber. " your efforts are … quaint. a futile attempt to impose order on chaos. you perceive memory as a shield, but it is merely a distraction." the being shifted, its form flickering with increasing instability. “ the nexus does not tolerate sentimentality.” befo

  0%|          | 0/20 [00:00<?, ?it/s]

