# VideoChat Lite on Colab: Cloud-Based Video Labeling

This notebook demonstrates the core "behind-the-scenes" logic of the VideoChat application's **lite mode**, adapted to run directly in Google Colab. Instead of using local, GPU-intensive models, this workflow relies entirely on Google Cloud APIs (Google AI Studio or Vertex AI) for analysis.

We will walk through:
1.  **Environment Setup**: Cloning the application code and installing necessary packages.
2.  **Authentication & Configuration**: Securely providing API keys and defining the target video.
3.  **Asset Preparation**: Programmatically downloading and preparing a video, just like the web app does.
4.  **Cloud-Based Labeling**: Calling the function that sends the video and context to the selected Google Cloud model for analysis.
5.  **Result Processing**: Displaying the structured JSON labels returned by the model.

**Requirements**:
*   A Google Account to run this Colab notebook.
*   A Google AI Studio API Key or a configured Google Cloud Platform (GCP) project for Vertex AI.

## 1. Environment Setup

First, we'll clone the project repository from GitHub to get access to all the necessary Python helper scripts (`app.py`, `inference_logic.py`, etc.). Then, we'll install the required packages listed in `requirements-lite.txt`.

In [None]:
# Clone the repository (replace with your actual repository URL)
!git clone https://github.com/username/vChat.git
%cd vChat

# Install dependencies using pip
!pip install -r requirements-lite.txt

# Add the cloned repository directory to Python's path to allow imports
import sys
import os
if '/content/vChat' not in sys.path:
    sys.path.append('/content/vChat')

# Now we can import the necessary functions from the application
import asyncio
import pprint
from pathlib import Path
from app import prepare_video_assets_async, generate_and_save_croissant_metadata
from inference_logic import run_gemini_labeling_pipeline, run_vertex_labeling_pipeline
from factuality_logic import parse_vtt

# This allows running async code in a Jupyter notebook
import nest_asyncio
nest_asyncio.apply()

print("‚úÖ Setup complete. You can now proceed to the next step.")

## 2. Authentication & Configuration

Now, let's configure the model and provide the necessary credentials.

**‚û°Ô∏è Action Required**: 
1. Choose which cloud model to use (`MODEL_SELECTION`).
2. Fill in the `VIDEO_URL`.
3. Provide credentials for your chosen model:
    *   **For Gemini**: Click the **Key icon (üîë)** in the left sidebar, add a new secret named `GEMINI_API_KEY`, and paste your key there. Get one from [Google AI Studio](https://aistudio.google.com/app/apikey).
    *   **For Vertex AI**: You will be prompted to log in to your Google account when you run the cell. Make sure to fill in your `VERTEX_PROJECT_ID`.

In [None]:
from google.colab import userdata, auth

# --- USER CONFIGURATION ---

# Choose which cloud model to use: 'gemini' or 'vertex'
MODEL_SELECTION = 'gemini'

# Provide the URL of the video you want to analyze
VIDEO_URL = "https://www.youtube.com/watch?v=Ad_TEk94B9w" # Example: A cat playing

# --- Credentials (handled by Colab) ---

# 1. For Google AI Studio ('gemini') - Fetches from Colab Secrets
GEMINI_API_KEY = None
GEMINI_MODEL_NAME = "models/gemini-1.5-pro-latest"
if MODEL_SELECTION == 'gemini':
    try:
        GEMINI_API_KEY = userdata.get('GEMINI_API_KEY')
        print("Successfully loaded Gemini API Key from Colab Secrets.")
    except userdata.SecretNotFoundError:
        print("ERROR: Gemini API Key not found. Please add it to Colab Secrets (üîë).")

# 2. For Google Cloud Vertex AI ('vertex')
VERTEX_PROJECT_ID = "your-gcp-project-id-here"  # <-- IMPORTANT: SET YOUR GCP PROJECT ID
VERTEX_LOCATION = "us-central1"
VERTEX_MODEL_NAME = "gemini-1.5-pro-preview-0409"
if MODEL_SELECTION == 'vertex':
    print("Authenticating for Vertex AI... Please follow the pop-up prompt.")
    auth.authenticate_user()
    print("‚úÖ Vertex AI authentication successful.")

print("\nConfiguration loaded. Ready to run the analysis pipeline.")

## 3. The Labeling Pipeline Function

The function below encapsulates the entire end-to-end process. It mirrors the exact steps the full web application takes when you click the "Generate & Append Labels" button.

In [None]:
async def run_full_labeling_pipeline():
    """Main async function to orchestrate the labeling process."""
    final_labels = None
    csv_row_data = None

    # --- Step 1: Prepare Video Assets ---
    print(f"\n--- [Step 1] Preparing Assets for: {VIDEO_URL} ---")
    paths = await prepare_video_assets_async(VIDEO_URL)
    video_path = paths.get("video")
    transcript_path = paths.get("transcript")
    metadata = paths.get("metadata", {})
    print(f"  -> Video processed and saved to: {video_path}")
    print(f"  -> Transcript found at: {transcript_path}")
    print(f"  -> Extracted Metadata: {metadata}")

    # --- Step 2: Prepare Context for the Model ---
    print("\n--- [Step 2] Preparing Textual Context ---")
    caption = metadata.get("caption", "No caption available.")
    transcript_text = "No transcript available." 
    if transcript_path and Path(transcript_path).exists():
        transcript_text = parse_vtt(transcript_path)
    print("  -> Caption and transcript are ready.")

    # --- Step 3: Run Cloud-Based Labeling ---
    print(f"\n--- [Step 3] Running Labeling with '{MODEL_SELECTION.capitalize()}' Model ---")
    
    pipeline_generator = None
    if MODEL_SELECTION == 'gemini':
        if not GEMINI_API_KEY: 
            print("ERROR: Cannot proceed without a Gemini API Key.")
            return
        gemini_config = {"api_key": GEMINI_API_KEY, "model_name": GEMINI_MODEL_NAME}
        pipeline_generator = run_gemini_labeling_pipeline(video_path, caption, transcript_text, gemini_config, include_comments=True)
    elif MODEL_SELECTION == 'vertex':
        vertex_config = {"project_id": VERTEX_PROJECT_ID, "location": VERTEX_LOCATION, "model_name": VERTEX_MODEL_NAME, "api_key": None}
        pipeline_generator = run_vertex_labeling_pipeline(video_path, caption, transcript_text, vertex_config, include_comments=True)
    else:
        print(f"ERROR: Invalid model selection '{MODEL_SELECTION}'. Choose 'gemini' or 'vertex'.")
        return

    # The pipeline yields progress messages and finally the result dictionary
    async for message in pipeline_generator:
        if isinstance(message, dict): # This is the final result
            final_labels = message
        elif isinstance(message, str): # This is a progress update
            print(f"  -> {message.strip()}")

    if not final_labels:
        print("\nERROR: Failed to retrieve labels from the model.")
        return

    # --- Step 4: Display Parsed Results ---
    print("\n--- [Step 4] Successfully Parsed JSON Labels ---")
    pprint.pprint(final_labels)
    
    # --- Step 5: Generate Croissant Metadata ---
    print("\n--- [Step 5] Generating Croissant Metadata File ---")
    # Recreate the data structure expected by the metadata function
    def get_score(value):
        return value.get('score', '') if isinstance(value, dict) else value

    disinfo_analysis = final_labels.get("disinformation_analysis", {})
    sentiment_tactics = disinfo_analysis.get("sentiment_and_bias_tactics", {})
    
    csv_row_data = {
        "id": metadata.get("id", ""),
        "link": metadata.get("link", VIDEO_URL),
        "caption": caption,
        "videocontext": final_labels.get("video_context_summary", ""),
        "politicalbias": get_score(final_labels.get("political_bias", "")),
        "criticism": get_score(final_labels.get("criticism_level", "")),
        "videoaudiopairing": get_score(final_labels.get("video_audio_pairing", "")),
        "videocaptionpairing": get_score(final_labels.get("video_caption_pairing", "")),
        "audiocaptionpairing": get_score(final_labels.get("audio_caption_pairing", "")),
        "disinfo_level": disinfo_analysis.get("disinformation_level", ""),
        "disinfo_intent": disinfo_analysis.get("disinformation_intent", ""),
        "disinfo_threat_vector": disinfo_analysis.get("threat_vector", ""),
        "disinfo_emotional_charge": sentiment_tactics.get("emotional_charge", ""),
    }
    
    metadata_path = await generate_and_save_croissant_metadata(csv_row_data)
    print(f"  -> Metadata file saved to: {metadata_path}")
    print("\n--- Pipeline Finished ---")

## 4. Run the Pipeline

Now, we execute the main function. This will trigger the video download, API calls, and processing. You will see real-time progress updates printed below.

In [None]:
asyncio.run(run_full_labeling_pipeline())