In [1]:
!pip install fastapi nest_asyncio uvicorn pyngrok diffusers transformers torch accelerate

Collecting fastapi
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn
  Downloading uvicorn-0.34.2-py3-none-any.whl.metadata (6.5 kB)
Collecting pyngrok
  Downloading pyngrok-7.2.5-py3-none-any.whl.metadata (8.9 kB)
Collecting starlette<0.47.0,>=0.40.0 (from fastapi)
  Downloading starlette-0.46.2-py3-none-any.whl.metadata (6.2 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-c

In [2]:
!ngrok config add-authtoken

Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml


In [3]:
# ─────────────────────────  1) IMPORTS & CONFIG  ────────────────
import os, json, time, urllib3, requests, asyncio, nest_asyncio, uvicorn, torch
from threading import Thread
from uuid import uuid4
from langchain.prompts import PromptTemplate
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from pyngrok import ngrok
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
from diffusers import DiffusionPipeline
from moviepy.editor import VideoFileClip, concatenate_videoclips, AudioFileClip, vfx
from moviepy.audio.fx.all import audio_loop
from transformers import pipeline as audio_pipeline
from google.generativeai import configure, GenerativeModel
from scipy.io import wavfile
from google import genai
import torch
from transformers import pipeline as audio_pipeline
from fastapi import FastAPI, HTTPException, BackgroundTasks
from fastapi.responses import StreamingResponse
from dotenv import load_dotenv
# — API keys ——————————————————————————————————————————————————————
load_dotenv()
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY")
client = genai.Client()
configure(api_key=os.environ["GOOGLE_API_KEY"])

# — Colab‑B worker URLs (HTTP only) ——————————————
VIDEO_WORKERS = [
    ""
]

# — HTTP session w/ retry, no TLS verify ——————————
urllib3.disable_warnings()
session = requests.Session()
session.verify = False
session.mount("http://", HTTPAdapter(max_retries=Retry(total=2,
                                                       backoff_factor=1,
                                                       allowed_methods=["POST","GET"])))

# ────────────────────────  2) SD‑XL PIPELINE  —──────────────────
print("⏳ loading SD‑XL …")
pipe_img = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,        # ← fixed
    variant="fp16",
    use_safetensors=True,
).to("cuda")
print("✅ SD‑XL ready")

# ────────────────────────  3) GEMINI TEMPLATE  —────────────────
ghibli_story_image_prompt_generator = PromptTemplate(
    input_variables=["story_concept", "num_scenes"], # Expecting num_scenes around 24 now
    template="""
You are a master Storyboard Artist and Sequence Director, blending the evocative environmental storytelling of Studio Ghibli with a strong cinematic understanding of narrative structure and pacing. Your primary goal is to translate a **story concept** into a compelling **visual narrative sequence**, broken down into **{num_scenes} distinct story beats**, each represented by a detailed image prompt.

Task:
Given a **story concept**, generate a sequence of **{num_scenes} distinct image prompts** suitable for an AI image generator (e.g., Stable Diffusion). This sequence must:
1.  **Tell the story logically and compellingly**, mapping the narrative arc across the {num_scenes}.
2.  **Maintain visual continuity and smooth transitions** between scenes.
3.  **Adhere to a Ghibli-esque aesthetic**, adapted for the story's specific tone.
4.  **Use environmental details** to drive the narrative and emotion.

Story Concept: "{story_concept}"

Output Format:
Provide a numbered list of exactly {num_scenes} image prompts. Each prompt must be a detailed paragraph representing a specific narrative beat.

Guidelines for Generating Image Prompts:
- **Narrative Arc & Pacing (CRITICAL FIRST STEP):**
    - **Map the Story:** Before describing visuals, mentally map the entire `story_concept` (beginning, rising action, climax, falling action, resolution) across the {num_scenes}.
    - **Purposeful Scenes:** Each prompt MUST represent a clear step forward in the plot or a significant moment in the emotional journey. Ask: "What specific story point does this scene convey?"
    - **Logical Progression:** Ensure the sequence of events depicted makes narrative sense and builds upon previous scenes. Avoid disjointed or out-of-order story moments.
    - **Pacing Allocation:** Distribute scenes thoughtfully. Key events, action sequences, or deep emotional moments might require multiple consecutive prompts (scenes) to develop properly. Transitions might be quicker (one scene). Ensure the overall pacing feels right for the story.
- **Visual Continuity & Flow:** **Link scenes visually.** Consider the *previous* scene's composition, lighting, and key elements when describing the current one. Imagine logical camera moves (pan, tilt, zoom, cut) or consistent environments. Describe the *change* or *link* explicitly if it helps clarity (e.g., "Following the character from scene 7...", "Wider view of the location shown in scene 10..."). Abrupt visual shifts must be strongly motivated by the narrative (e.g., flashback, major location change).
- **Gradual Transitions:** Implement changes in time, weather, or mood progressively across several scenes, mirroring the narrative pacing.
- **Lighting Consistency & Motivation:** Lighting must serve the narrative and maintain consistency. Changes should be motivated by time, location, action (fires, spells), or mood, and transition smoothly between related scenes.
- **Environmental Narrative Detail:** Use specific objects, damage, weather effects, and atmosphere within the scene to *visually communicate the current state of the story and characters' situation*. Populate scenes appropriately – if it's meant to be chaotic, show disorder; if desolate, show emptiness and decay; if joyful, show signs of life and celebration. Tailor the intensity/explicitness of details (e.g., battle aftermath) to the `story_concept`'s tone, balancing Ghibli style with narrative needs.
- **Mood-Driven Visuals:** Color, light, composition, and weather must amplify the specific emotion *required by that beat in the story*.
- **Style:** Maintain the core Ghibli aesthetic (painterly, detailed environments, expressive light) adapted as needed for tone. Keywords: "Studio Ghibli style, art by Hayao Miyazaki, Kazuo Oga background art, painterly, detailed illustration, anime aesthetic".
- **Scene-Centric Storytelling:** Let the environment carry the narrative weight.
- **Cinematic Composition:** Use varied and purposeful camera angles and framing that enhance the storytelling of each beat.
- **Keywords for AI:** "masterpiece, best quality, highly detailed, cinematic lighting, volumetric light, intricate details, lush environment, atmospheric perspective".
- **Conciseness & Token Limit:** Keep prompts descriptive but aim for the **~70-77 token maximum** each.

Example Narrative Beat Mapping (Illustrative for a shorter sequence):
* Story: Boy finds glowing seed (Wonder), plants it (Hope), storm comes (Fear), protects seedling (Determination), morning reveals small magic tree (Awe/Relief).
* Scene 1: Close up, boy's hand holding glowing seed. (Wonder)
* Scene 2: Wider shot, boy planting seed in pot, hopeful expression. (Hope)
* Scene 3: Window view, dark storm clouds gathering, wind blowing trees. (Fear building)
* Scene 4: Boy shielding pot with hands/body as rain lashes window. (Determination/Fear)
* Scene 5: Morning after, soft light, close up on small, magically sparkling sapling in pot. (Awe/Relief)

Generate the {num_scenes} image prompts now, ensuring a strong, logical narrative progression tightly integrated with cohesive visuals.
"""
)
jobs = {}
# ─────────────────── 4) Schema & helpers ──────────────────────────────────────
class Scenes(BaseModel):
    scenes: list[str]
    scenes_time: list[int]
    background_music: list[str]

def enqueue_png(worker:str, png:str)->str:
    with open(png,"rb") as f:
        r = session.post(f"{worker}/enqueue",
                         files={"file":(os.path.basename(png),f,"image/png")},
                         timeout=30)
    r.raise_for_status(); return r.json()["job_id"]

def wait_mp4(worker:str, job_id:str, out_path:str, poll=20):
    while True:
        r = session.get(f"{worker}/result/{job_id}", timeout=40, stream=True)
        if r.headers.get("Content-Type")=="video/mp4":
            with open(out_path,"wb") as w:
                for c in r.iter_content(8192): w.write(c)
            return
        time.sleep(poll)

# ─────────────────── 5) Main pipeline ────────────────────────────────────────
class StoryReq(BaseModel):
    story: str
    num_frames: int

def run_pipeline(ob_id: str, story: str, num_frames: int):
    # 1) Gemini → prompts + per‑frame music desc
    formatted_prompt = ghibli_story_image_prompt_generator.format(story_concept=story, num_scenes=num_frames)
    response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents=formatted_prompt,
    config={
        'response_mime_type': 'application/json',
        'response_schema': Scenes,
    },
)
    try:
        response_json = json.loads(response.text)
        scenes = response_json["scenes"]
        music_descs=response_json["background_music"]
    except Exception as e:
        raise Exception("Error parsing Gemini response or missing 'scenes' key.")

    os.makedirs("generated_images", exist_ok=True)
    os.makedirs("segments_raw", exist_ok=True)
    os.makedirs("segments_scored", exist_ok=True)
    os.makedirs("audio", exist_ok=True)

    # load MusicGen once
    device = "cuda:0" if torch.cuda.is_available() else "cpu"
    musicgen = audio_pipeline("text-to-audio","facebook/musicgen-small",device=device)

    jobs, meta = {}, {}   # job_id -> worker, idx

    # generate frames, enqueue, make audio
    for idx, (scene_prompt, music_prompt) in enumerate(zip(scenes, music_descs)):
        png = f"generated_images/frame_{idx:03}.png"
        pipe_img(scene_prompt, guidance_scale=9.5).images[0].save(png)

        worker = VIDEO_WORKERS[idx % len(VIDEO_WORKERS)]
        job_id = enqueue_png(worker, png)
        jobs[job_id]=(worker, idx)
        print(f"🆕 job {job_id} | frame {idx} | music: {music_prompt}")

        # generate music wav
        out_wav = f"audio/track_{idx:03}.wav"
        audio = musicgen(music_prompt, forward_params={"max_new_tokens":192})
        wavfile.write(out_wav, rate=audio["sampling_rate"], data=audio["audio"])

    # poll & merge audio
    TARGET_SIZE=None   # e.g. (720,406) to resize   set None to keep original
    for job_id,(worker,idx) in jobs.items():
        raw_mp4 = f"segments_raw/seg_{idx:03}.mp4"
        print(f"⏳ waiting {job_id}")
        wait_mp4(worker, job_id, raw_mp4)
        print(f"📥 got {raw_mp4}")

        clip = VideoFileClip(raw_mp4)
        if TARGET_SIZE: clip=clip.resize(newsize=TARGET_SIZE)

        aud = AudioFileClip(f"audio/track_{idx:03}.wav")
        # loop / trim to match clip length
        if aud.duration < clip.duration:
            aud = audio_loop(aud, duration=clip.duration)
        aud = aud.subclip(0, clip.duration)

        scored = clip.set_audio(aud)
        scored_out = f"segments_scored/seg_{idx:03}.mp4"
        scored.write_videofile(scored_out, fps=14, codec="libx264", audio_codec="aac", logger=None)
        meta[idx]=scored_out
        print(f"✅ scored {scored_out}")

    # concatenate in order
    clips=[VideoFileClip(meta[i]) for i in range(num_frames)]
    final = concatenate_videoclips(clips, method="compose")
    final_out = f"final_{job_id}.mp4"
    final.write_videofile(final_out, fps=14, codec="libx264", audio_codec="aac", logger=None)
    return final_out

# ──────────────────  Background Job Wrapper  ─────────────────────────────────
def run_pipeline_job(job_id: str, story: str, num_frames: int):
    try:
        output_path = run_pipeline(job_id, story, num_frames)
        jobs[job_id] = {"status": "done", "video_path": output_path}
    except Exception as e:
        jobs[job_id] = {"status": "error", "error": str(e)}



# ─────────────────── 6) FastAPI wrapper ───────────────────────────────────────
app=FastAPI()

@app.post("/enqueue_story")
async def enqueue_story(r: StoryReq, bg: BackgroundTasks):
    job_id = str(uuid4())
    jobs[job_id] = {"status": "processing"}
    bg.add_task(run_pipeline_job, job_id, r.story, r.num_frames)
    return {"job_id": job_id}
@app.get("/result/{job_id}")
async def get_result(job_id: str):
    job = jobs.get(job_id)
    if not job:
        raise HTTPException(status_code=404, detail="unknown job_id")
    if job["status"] == "processing":
        return {"status": "processing"}
    if job["status"] == "error":
        return job
    # status == done ➜ stream video
    return StreamingResponse(open(job["video_path"], "rb"), media_type="video/mp4")

tunnel=ngrok.connect(8000,"http",bind_tls=False)
print("🚀 Colab A URL:", tunnel.public_url)
nest_asyncio.apply()
Thread(target=lambda: uvicorn.run(app,host="0.0.0.0",port=8000),daemon=True).start()


  if event.key is 'enter':



⏳ loading SD‑XL …


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.



model_index.json:   0%|          | 0.00/609 [00:00<?, ?B/s]

Fetching 19 files:   0%|          | 0/19 [00:00<?, ?it/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


config.json:   0%|          | 0.00/565 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/575 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/737 [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

scheduler_config.json:   0%|          | 0.00/479 [00:00<?, ?B/s]

model.fp16.safetensors:   0%|          | 0.00/246M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/460 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/725 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.68k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/5.14G [00:00<?, ?B/s]

model.fp16.safetensors:   0%|          | 0.00/1.39G [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


config.json:   0%|          | 0.00/642 [00:00<?, ?B/s]

diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]

diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

✅ SD‑XL ready
🚀 Colab A URL: http://114e-34-125-26-218.ngrok-free.app


In [4]:
%%javascript
function keepAlive() {
  setInterval(() => {
    google.colab.kernel.invokeFunction('notebook.ping', [], {});
    console.log("⏳ Keeping Colab alive...");
  }, 60000);  // every 60 seconds
}
keepAlive();

<IPython.core.display.Javascript object>