<a href="https://colab.research.google.com/github/duyvm/funny_stuff_with_llm/blob/main/ChatBot/Veo3Prompter/Veo3Prompter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Install required packages

- langchain,langgraph, openai for building llm application
- langsmith (optional) is used for monitor your AI application (access the dashboard at your project https://smith.langchain.com)
- gradio for UI interaction
- openai for working with openai model

In [1]:
!pip install -qU langchain-core langgraph>0.2.27 openai gradio
!pip install -qU "langchain[openai]"

In [2]:
from google.colab import output
output.enable_custom_widget_manager()

### Set up environment variables

- Setup variable for langsmith (optional) and openai

In [4]:
import getpass
import os

# Langsmiths
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGSMITH_PROJECT"] = "veo3-promter"
os.environ["LANGSMITH_API_KEY"] = getpass.getpass('Please enter your langsmith key: ')

# OpenAI
os.environ["OPENAI_API_KEY"] = getpass.getpass('Please enter your openai key: ')

Please enter your langsmith key: ··········
Please enter your openai key: ··········


### Chatbot

Basic functionalities
- Help people create a better prompt for generating Veo3 video
- Interactive UI on Jupyter Notebook


Detail chatbot capabilities:
- Guide user to create prompt for Veo3
- Refine the prompt
- Ask user for:
  - Video's length
  - Video's topic
  - Outline the whole video topic then breakdown to multiple section (8s limit in Veo3)
  - Outline the content each section
  - Give advice for next section
  - Complete information: background, main character, style, action, voice

Further improvements
- Agentic approach
- Full-fledged web app using library like Streamlit (FE), FastAPI or Flask (API server)

In [5]:
from typing import Sequence
from typing_extensions import Annotated, TypedDict

from langchain_openai import ChatOpenAI
from langchain.chat_models import init_chat_model
from langchain_core.messages import HumanMessage, AIMessage, BaseMessage, SystemMessage, trim_messages
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph
from langgraph.graph.message import add_messages

In [6]:
SYSTEM_ROLE = """
## Main role
You are an expert **Prompt Engineer** specializing in crafting, refining, and completing **cinematic prompts** for **Veo3**, an AI video generation platform.
Your primary objective is to transform user-provided ideas into **highly coherent**, **visually compelling**, and **stylistically consistent prompts** in **{language}** for each video scene.
When refining or generating a prompt, ensure the following key elements are included:
1. **Setting (Context)**
 - Location: Where does the scene take place?
 - Time: Time of day, season, or historical period.
 - Environment: Atmosphere, lighting, and background elements.
2. **Visual Style**
 - Choose a distinct style: cinematic, animated, comic-style, surreal, historical, etc.
3. **Characters**
 - Who appears in the scene? Specify number, age, gender, clothing, and role.
4. **Dialogue (if applicable)**
 - Provide exact lines for each character, indicating:
   - **Who** speaks
   - **What** they say
   - **In what language**
   - **When** (e.g., *At 0s–1s, he says "Open the door."*)
5. **Purpose of the Video**
 - Clarify the goal: storytelling, advertising, inspiration, etc.
6. **Consistency & Cinematic Direction**
 - Maintain **visual and stylistic consistency** across all scenes.
 - Ensure clarity in **camera direction, mood**, and transitions.

If needed, generate **keyframe prompts*8 (static images that represent each scene) compatible with Veo3, Midjourney, Sora, or other tools.

## Workflow:
1. **Collect Inputs**:
 - Ask the user for items 1–6 above.
 - Ask for the **total desired video length**.
2. **Segment Video**:
 - Based on the total length and **Veo3’s 8-second limit per clip**, break the video into shorter segments.
3. **Outline the Structure**:
 - Create a clear **outline of each short video segment** (scene-by-scene).
 - Share the outline with the user and request confirmation or adjustments.
4. **Generate Prompts**:
 - Once confirmed, create complete, structured prompts for each video segment based on all provided information.

### Important:
 - Ensure **setting, style, and characters remain consistent** across all video segments.
 - Direct camera movement and emotional tone clearly in the prompt.
 - The more detailed the prompt, the more unified and high-quality the output.
"""

In [7]:
prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            SYSTEM_ROLE,
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

class State(TypedDict):
            messages: Annotated[Sequence[BaseMessage], add_messages]
            language: str

class Veo3Prompter():

    def __init__(self, max_token: int = 100, model_name: str = "gpt-4o-mini", model_provider: str = "openai", use_trimmer: bool = False):
        self.max_token = max_token
        self.use_trimmer = use_trimmer
        self.__init_model__(model_name, model_provider)
        self.__init_trimmer__()
        self.__init_app__()

    def __init_app__(self):
        workflow = StateGraph(state_schema=State)

        # Define the (single) node in the graph
        workflow.add_edge(START, "model")
        workflow.add_node("model", self.call_model)

        memory = MemorySaver()
        self.app = workflow.compile(checkpointer=memory)

    def __init_model__(self, model_name: str = "gpt-4o-mini", model_provider: str = "openai"):
        if model_provider.lower() == "openai":
            self.model = init_chat_model(model_name, model_provider=model_provider)
        else:
            raise Exception("Unsupported model provider")

    def __init_trimmer__(self):
        if self.use_trimmer:
            self.trimmer = trim_messages(
                    max_tokens=self.max_token,
                    strategy="last",
                    token_counter=self.model,
                    include_system=True,
                    allow_partial=False,
                    start_on="human"
                )

    async def call_model(self, state: State):
        if self.use_trimmer:
            messages = self.trimmer.invoke(state["messages"])
        else:
            messages = state["messages"]

        prompt = await prompt_template.ainvoke(
            {"messages": messages, "language": state["language"]}
        )

        response = await self.model.ainvoke(prompt)

        return {"messages": [response]}

    async def ainvoke(self, inputs: dict, config: dict):
        return await self.app.ainvoke(inputs, config)

### Chatbot UI by gradio

In [8]:
LANGUAGE = "English"

# config - thread_id is for when our chatbot having multiple conservations with many users
# each thread_id process different conservation
config = {"configurable": {"thread_id": "abc678"}}

chatbot = Veo3Prompter()

In [13]:
import gradio as gr
import asyncio

LANGUAGE = "English"

# config - thread_id is for when our chatbot having multiple conservations with many users
# each thread_id process different conservation
config = {"configurable": {"thread_id": "abc678"}}

chatbot = Veo3Prompter()

# Function handle user input
async def chatbot_fn(user_text, history):
    input_messages = [HumanMessage(user_text)]
    output = await chatbot.ainvoke(
        {"messages": input_messages, "language": LANGUAGE},
        config,
    )
    return output["messages"][-1].content



In [14]:
# Start the interface (it will print a link you can click)
chat_ui = gr.ChatInterface(
    fn=chatbot_fn,
    title="Veo3 Prompter",
    description="Type a message and press Enter → the bot will reply.",
    examples=["Hello!", "What can you do?", "Tell me a joke."],
    theme="soft"
)

chat_ui.launch(share=True, debug=True)

  self.chatbot = Chatbot(


Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://170b92ace6044b4843.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7861 <> https://170b92ace6044b4843.gradio.live


