## File Set-Up
- Sets up Python module search path by adding the repository root to `sys.path` for reliable imports
- Imports shared directory paths from `src.paths` module (PROJECT_ROOT, RAW_DIR, PROCESSED_DIR, REFERENCE_DIR, FIGURES_DIR)
- Defines input file paths: raw YouTube comments JSON, comments with transcripts JSON, and analyzed transcripts CSV
- Defines output file paths: comment analysis CSV and state file for resumable processing
- Prints configuration summary showing project root and key input/output paths

## Configuration / Setup
- **nb_dir**: Current working directory (auto-detected)
- **repo_root**: Repository root directory (parent of `notebooks/` if present, else current directory)
- **COMMENTS_JSON**: `RAW_DIR / "youtube_comments.json"` → `data/raw/youtube_comments.json`
- **COMMENTS_WITH_TRANSCRIPTS_JSON**: `PROCESSED_DIR / "comments_with_transcripts.json"` → `data/processed/comments_with_transcripts.json`
- **TRANSCRIPTS_CSV**: `PROCESSED_DIR / "GPT_5_Mini_Transcripts_SummaryV2.csv"` → `data/processed/GPT_5_Mini_Transcripts_SummaryV2.csv`
- **COMMENT_ANALYSIS_CSV**: `PROCESSED_DIR / "GPT_5_Mini_Comment_Analysis.csv"` → `data/processed/GPT_5_Mini_Comment_Analysis.csv`
- **STATE_FILE**: `PROCESSED_DIR / "comments_analysis_state.json"` → `data/processed/comments_analysis_state.json`

## Inputs
- **src.paths module**: Must exist at repository root and export PROJECT_ROOT, RAW_DIR, PROCESSED_DIR, REFERENCE_DIR, FIGURES_DIR
- **youtube_comments.json** (optional): Raw YouTube comments data in `data/raw/`
- **comments_with_transcripts.json** (optional): Enriched comments with transcript context in `data/processed/`
- **GPT_5_Mini_Transcripts_SummaryV2.csv**: Analyzed video transcripts with summaries in `data/processed/`

## Outputs
- **Console output**: Prints project root path and key file paths for verification
- No files are created or modified by this cell (configuration only)

## Notes / Assumptions
- Assumes standard project structure with `notebooks/` subdirectory (if present) and `src/` module at repo root
- Expects `src/paths.py` to define and export all directory path constants
- The cell is purely declarative—no data processing occurs; it only sets up variables for subsequent cells
- Subsequent cells (e.g., CELL INDEX 3) depend on these path variables being defined

In [None]:
# === Setup: Ensure repo root is on sys.path for imports ===
import sys
from pathlib import Path

nb_dir = Path.cwd().resolve()
repo_root = nb_dir.parent if nb_dir.name == "notebooks" else nb_dir
if str(repo_root) not in sys.path:
    sys.path.insert(0, str(repo_root))

# === Import shared paths ===
from src.paths import PROJECT_ROOT, RAW_DIR, PROCESSED_DIR, REFERENCE_DIR, FIGURES_DIR

# === Notebook-specific file paths ===
# Input files
COMMENTS_JSON = RAW_DIR / "youtube_comments.json"
COMMENTS_WITH_TRANSCRIPTS_JSON = PROCESSED_DIR / "comments_with_transcripts.json"
TRANSCRIPTS_CSV = PROCESSED_DIR / "GPT_5_Mini_Transcripts_SummaryV2.csv"

# Output files
COMMENT_ANALYSIS_CSV = PROCESSED_DIR / "GPT_5_Mini_Comment_Analysis.csv"
STATE_FILE = PROCESSED_DIR / "comments_analysis_state.json"

print(f"Project root: {PROJECT_ROOT}")
print(f"Comments input: {COMMENTS_JSON}")
print(f"Comment analysis output: {COMMENT_ANALYSIS_CSV}")

## 01_Comment Analysis with context from Analyzed Transcripts

- Processes YouTube comment threads in parallel using OpenAI GPT to extract structured electrical engineering insights
- Filters parent comments to only analyze those with ≥4 tokens (using tiktoken or whitespace fallback)
- For each eligible parent comment, sends video context (title + transcript summary) and comment thread to GPT with a strict JSON schema prompt
- GPT extracts question/answer excerpts and summaries, overall reply summary, and electrical engineering topic/subtopic breakdowns (percentages summing to 100)
- Writes analysis results to CSV with 10 columns: VideoID, CommentID, QuestionExcerpt, QuestionSummary, AnswerExcerpt, AnswerSummary, ReplySummary, ReplyCount, Topic, Sub-Topic
- Implements graceful CTRL+C handling: saves state after each row, allowing resumable processing by skipping already-analyzed (VideoID, CommentID) pairs
- Uses a dedicated writer thread to serialize CSV writes and state persistence, while video workers run in parallel (configurable via `VIDEO_WORKERS`)
- Includes retry logic with exponential backoff for OpenAI API calls (up to 5 retries) and random jitter between calls to avoid rate limits

## Configuration / Setup

- **OPENAI_API_KEY**: Must be set in environment (`.env` or exported); required for GPT API calls
- **MODEL_NAME**: Defaults to `"gpt-5-mini"` (override with `OPENAI_MODEL` env var)
- **VIDEO_WORKERS**: Parallel video processing threads (default 4; set via `VIDEO_WORKERS` env var)
- **MIN_PARENT_TOKENS**: Minimum token count for parent comments (default 4)
- **USE_TIKTOKEN**: If `True`, uses tiktoken `cl100k_base` encoding; if `False` or tiktoken unavailable, falls back to whitespace split
- **MAX_RETRIES**: API retry limit (default 5)
- **MIN_DELAY_S / MAX_DELAY_S**: Random jitter range between API calls (0.2–0.6 seconds)
- **OUTPUT_CSV**: Resolves to `COMMENT_ANALYSIS_CSV` from cell 1 → `data/processed/GPT_5_Mini_Comment_Analysis.csv`
- **STATE_FILE**: Resolves to `data/processed/comments_analysis_state.json` (for resumable processing)

## Inputs

- **COMMENTS_WITH_TRANSCRIPTS_JSON** (`data/processed/comments_with_transcripts.json`): Preferred input; enriched comments with transcript context
    - Falls back to **COMMENTS_JSON** (`data/raw/youtube_comments.json`) if not found
- **TRANSCRIPTS_CSV** (`data/processed/GPT_5_Mini_Transcripts_SummaryV2.csv`): Video transcript summaries; must contain a VideoID column and a column with "summary" in the name
- **OpenAI API**: GPT model accessed via `openai` Python client using `OPENAI_API_KEY`
- **Existing output CSV** (if present): Read to identify already-processed (VideoID, CommentID) pairs for resumption
- **State file** (`comments_analysis_state.json`): Read at startup to resume from previous runs

## Outputs

- **GPT_5_Mini_Comment_Analysis.csv** (`data/processed/GPT_5_Mini_Comment_Analysis.csv`): Main output with 10 columns per comment:
    - `VideoID`, `CommentID`, `QuestionExcerpt`, `QuestionSummary`, `AnswerExcerpt`, `AnswerSummary`, `ReplySummary`, `ReplyCount`, `Topic` (formatted as "Topic1:40%; Topic2:35%; ..."), `Sub-Topic` (same format)
- **comments_analysis_state.json** (`data/processed/comments_analysis_state.json`): Persistent state file with list of processed (VideoID|CommentID) pairs; updated after each row write
- **Console output**: Prints per-video completion messages (video ID, count of processed comments, video title) and final summary (videos submitted, workers used)

## Notes / Assumptions

- Requires `src.paths` module to define `PROCESSED_DIR`, `RAW_DIR` (see cell 1)
- Expects input JSON structure: list of video objects, each with `VideoID`, `Title`, and `Comments` (list of comment dicts with `CommentID`, `Text`, `Author`, `PublishedAt`, optional `ParentID` and `Replies`)
- Transcripts CSV must have a column matching `VideoID` (case-insensitive variants like `video_id`, `Id`, `id` are tried) and a column containing "summary" (case-insensitive)
- GPT is instructed to return valid JSON with exact schema; failure to parse after retries skips the comment
- Topic/subtopic breakdowns are normalized to sum to exactly 100% (proportional rounding with drift correction)
- Only parent comments (no `ParentID`) are analyzed; replies are included as context in the prompt
- CTRL+C triggers graceful shutdown: current tasks finish, then state is saved; re-running resumes from last checkpoint
- The cell defines `main()` and runs it via `if __name__ == "__main__":` (suitable for both notebook and script execution)

In [None]:
import os
import re
import csv
import json
import time
import random
import signal
import threading
import queue
from pathlib import Path
from typing import Dict, Any, List

import pandas as pd
from dotenv import load_dotenv
from openai import OpenAI
from concurrent.futures import ThreadPoolExecutor, as_completed

# =========================
# ======== CONFIG =========
# =========================
# Using portable paths from configuration cell
OUTPUT_CSV = COMMENT_ANALYSIS_CSV

# Model and API
load_dotenv()
MODEL_NAME = os.getenv("OPENAI_MODEL", "gpt-5-mini")       # set OPENAI_MODEL if needed
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")               # must be set

# Concurrency
VIDEO_WORKERS = int(os.getenv("VIDEO_WORKERS", "4"))       # how many videos to process in parallel

# API behavior
MAX_RETRIES = 5
MIN_DELAY_S = 0.2          # small jitter between calls (optional)
MAX_DELAY_S = 0.6

# === Token length filter ===
# "skip any comments that are 3 tokens or shorter" => require at least 4 tokens
MIN_PARENT_TOKENS = 4
USE_TIKTOKEN = True        # set to False to force whitespace tokenization

# Output CSV headers (exact order required)
CSV_HEADERS = [
    "VideoID",
    "CommentID",
    "QuestionExcerpt",
    "QuestionSummary",
    "AnswerExcerpt",
    "AnswerSummary",
    "ReplySummary",
    "ReplyCount",
    "Topic",
    "Sub-Topic",
]

# =========================
# ====== SIGNAL/STATE =====
# =========================
stop_requested = False

def _sigint_handler(sig, frame):
    global stop_requested
    stop_requested = True
    print("\nCTRL+C detected — finishing current item(s) and saving state...")

signal.signal(signal.SIGINT, _sigint_handler)

# =========================
# ====== UTILITIES  =======
# =========================
def ensure_api():
    if not OPENAI_API_KEY:
        raise RuntimeError("OPENAI_API_KEY is not set. Please export it before running the script.")

def ensure_output_csv(path: Path):
    """Create the output CSV with header if it doesn't exist."""
    if not path.exists():
        with open(path, "w", newline="", encoding="utf-8") as f:
            writer = csv.writer(f)
            writer.writerow(CSV_HEADERS)

def load_existing_pairs_from_output(path: Path) -> set:
    """Read (VideoID|CommentID) pairs already written to the output CSV."""
    processed = set()
    if path.exists():
        try:
            df = pd.read_csv(path, dtype=str)
            if {"VideoID", "CommentID"}.issubset(df.columns):
                for _, row in df[["VideoID", "CommentID"]].dropna().iterrows():
                    processed.add(f"{row['VideoID']}|{row['CommentID']}")
        except Exception as e:
            print(f" Could not read existing output CSV to resume: {e}")
    return processed

def save_state(state: Dict[str, Any], state_path: Path):
    try:
        with open(state_path, "w", encoding="utf-8") as f:
            json.dump(state, f, indent=2, ensure_ascii=False)
    except Exception as e:
        print(f" Warning: failed to save state file: {e}")

def load_state(state_path: Path) -> Dict[str, Any]:
    if state_path.exists():
        try:
            with open(state_path, "r", encoding="utf-8") as f:
                return json.load(f)
        except Exception as e:
            print(f" Warning: failed to read state file: {e}")
    return {"processed_pairs": []}

def find_summary_column(df: pd.DataFrame) -> str:
    """Try to find a column whose name contains 'summary' (case-insensitive)."""
    for col in df.columns:
        if re.search(r"summary", str(col), flags=re.I):
            return col
    return ""

def load_transcript_summaries(csv_path: Path) -> Dict[str, str]:
    """Map VideoID -> summary string."""
    df = pd.read_csv(csv_path, dtype=str, keep_default_na=False)
    vid_col = None
    for candidate in ["VideoID", "video_id", "Id", "id"]:
        if candidate in df.columns:
            vid_col = candidate
            break
    if not vid_col:
        raise ValueError("No VideoID column found in transcripts CSV.")

    summary_col = find_summary_column(df)
    if not summary_col:
        print(" No column containing 'summary' found. Proceeding with empty summaries.")
    summaries = {}
    for _, row in df.iterrows():
        vid = str(row.get(vid_col, "")).strip()
        if not vid:
            continue
        summaries[vid] = str(row.get(summary_col, "")).strip() if summary_col else ""
    return summaries

def load_comments_json(json_path: Path) -> List[Dict[str, Any]]:
    with open(json_path, "r", encoding="utf-8") as f:
        return json.load(f)

def normalize_breakdown(items: List[Dict[str, Any]], key: str) -> List[Dict[str, Any]]:
    """
    Ensure percentages sum to 100. If not, normalize proportionally and round to integers, fixing rounding drift.
    key: 'percent' in items
    """
    if not items:
        return items
    # Convert to ints
    arr = []
    for it in items:
        try:
            pct = int(round(float(it.get(key, 0))))
        except Exception:
            pct = 0
        arr.append({**it, key: pct})
    total = sum(x[key] for x in arr)
    if total == 100:
        return arr
    if total <= 0:
        # make the first 100, rest 0 as a fallback
        if arr:
            arr[0][key] = 100
            for i in range(1, len(arr)):
                arr[i][key] = 0
        return arr
    # proportional renormalization
    factor = 100.0 / total
    arr = [{**x, key: int(round(x[key] * factor))} for x in arr]
    # fix rounding drift
    drift = 100 - sum(x[key] for x in arr)
    if drift != 0:
        idx = max(range(len(arr)), key=lambda i: arr[i][key])
        arr[idx][key] += drift
    return arr

def format_breakdown_for_csv(items: List[Dict[str, Any]], name_key: str, pct_key: str) -> str:
    """
    Convert breakdown list to a compact CSV-friendly string:
    e.g., "Rough-in wiring:40%; Branch circuits:35%; Grounding and bonding:25%"
    """
    parts = []
    for it in items:
        label = it.get(name_key, "").strip()
        pct = it.get(pct_key, 0)
        parts.append(f"{label}:{pct}%")
    return "; ".join(parts)

# === Token length helper ===
def token_len(text: str) -> int:
    """
    Returns approximate token count.
    - If tiktoken is available and USE_TIKTOKEN=True, uses cl100k_base.
    - Otherwise falls back to a simple whitespace split.
    """
    if not text:
        return 0
    if USE_TIKTOKEN:
        try:
            import tiktoken
            enc = tiktoken.get_encoding("cl100k_base")
            return len(enc.encode(text))
        except Exception:
            pass
    # Fallback heuristic based on non-whitespace sequences
    return len(re.findall(r"\S+", text))

# =========================
# ===== OPENAI CALLS  =====
# =========================
SYSTEM_PROMPT = (
    "You are an expert electrical engineering analyst extracting structured data from "
    "YouTube comment threads. Use ONLY electrical engineering terminology for topics and subtopics.\n\n"
    "Return exactly ONE valid JSON object with this schema:\n"
    "{\n"
    '  "QuestionExcerpt": string,\n'
    '  "QuestionSummary": string,\n'
    '  "ReplySummary": string,\n'
    '  "AnswerExcerpt": string,\n'
    '  "AnswerSummary": string,\n'
    '  "TopicBreakdown": [{"topic": string, "percent": integer}, ...],\n'
    '  "SubtopicBreakdown": [{"subtopic": string, "percent": integer}, ...]\n'
    "}\n\n"
    "Rules:\n"
    "- Topics and subtopics MUST be electrical engineering terms (e.g., branch circuits, grounding/bonding, NEC 210.52, AFCI/GFCI protection, conductor ampacity, voltage drop, service equipment, panelboards, luminaires, switching, conduit fill, wire gauge, etc.).\n"
    "- Include 2–4 items in TopicBreakdown and 2–4 in SubtopicBreakdown; integer percentages that SUM to 100 in each list.\n"
    '- If the parent comment does not ask a question, set "QuestionExcerpt" and "QuestionSummary" to "No Question Present".\n'
    '- If replies contain a clear solution, include a ≤25-word verbatim "AnswerExcerpt" from the best reply and a 1–2 sentence "AnswerSummary". Otherwise set both to "No Solution Present".\n'
    "- ReplySummary is 1–2 sentences summarizing the overall replies (agreement, debate, solution, etc.). Use only the provided replies.\n"
    "- Prefer precise EE phrasing and NEC references when applicable.\n"
    "- Do NOT use markdown or code fences; return a plain JSON object."
)

def build_user_payload(video_summary: str,
                       video_title: str,
                       parent_comment: Dict[str, Any],
                       replies: List[Dict[str, Any]]) -> str:
    """
    Send a compact JSON payload as the user message.
    """
    payload = {
        "video_context": {
            "title": video_title or "",
            "summary": video_summary or ""
        },
        "parent_comment": {
            "id": parent_comment.get("CommentID", ""),
            "author": parent_comment.get("Author", ""),
            "text": parent_comment.get("Text", ""),
            "published_at": parent_comment.get("PublishedAt", "")
        },
        "replies": [
            {
                "id": r.get("CommentID", ""),
                "author": r.get("Author", ""),
                "text": r.get("Text", ""),
                "published_at": r.get("PublishedAt", "")
            }
            for r in (replies or [])
        ]
    }
    return json.dumps(payload, ensure_ascii=False)

def call_gpt(client: OpenAI, user_payload: str) -> Dict[str, Any]:
    """
    Call GPT with robust retry/backoff and force JSON response.
    """
    for attempt in range(MAX_RETRIES):
        try:
            resp = client.chat.completions.create(
                model=MODEL_NAME,
                messages=[
                    {"role": "system", "content": SYSTEM_PROMPT},
                    {"role": "user", "content": user_payload},
                ],
                response_format={"type": "json_object"},
            )
            content = resp.choices[0].message.content
            data = json.loads(content)
            return data
        except Exception as e:
            wait = min(60, 2 ** attempt + random.random() * 0.5)
            print(f" API error: {e} — retrying in {wait:.1f}s (attempt {attempt+1}/{MAX_RETRIES})")
            time.sleep(wait)

    raise RuntimeError("Failed to get a valid response from the model after retries.")

# =========================
# === WRITER THREAD LOOP ==
# =========================
def writer_loop(row_queue: "queue.Queue",
                output_csv: Path,
                state_file: Path,
                processed_pairs: set,
                processed_pairs_lock: threading.Lock):
    """
    Dedicated writer thread that:
      - writes rows to CSV,
      - updates processed_pairs,
      - persists state after EACH row,
      - prints per-video completion after batch.
    Queue items:
      - dict(type="batch", rows=[row,...], pairs=[pair_key,...], video_id=..., video_title=...)
      - None => sentinel to exit
    """
    ensure_output_csv(output_csv)

    with open(output_csv, "a", newline="", encoding="utf-8") as f_out:
        writer = csv.writer(f_out)

        while True:
            item = row_queue.get()
            if item is None:  # sentinel
                row_queue.task_done()
                break

            try:
                if item.get("type") == "batch":
                    rows: List[List[Any]] = item.get("rows", [])
                    pairs: List[str] = item.get("pairs", [])
                    video_id = item.get("video_id", "")
                    video_title = item.get("video_title", "")

                    written = 0
                    for pair_key, row in zip(pairs, rows):
                        # double-check for duplicates at write time
                        with processed_pairs_lock:
                            if pair_key in processed_pairs:
                                continue
                            writer.writerow(row)
                            f_out.flush()
                            processed_pairs.add(pair_key)
                            # persist state after each row
                            state = {"processed_pairs": list(processed_pairs)}
                            save_state(state, state_file)
                            written += 1
                    # Per-video completion message after batch actually written
                    print(f" Finished video {video_id} — processed {written} parent comments. ({video_title})")
            finally:
                row_queue.task_done()

# =========================
# ====== VIDEO WORKER =====
# =========================
def process_one_video(video: Dict[str, Any],
                      transcript_summaries: Dict[str, str],
                      processed_pairs: set,
                      processed_pairs_lock: threading.Lock,
                      row_queue: "queue.Queue"):
    """
    Process a single video sequentially (parent comments), then push one batch to the writer queue.
    """
    if stop_requested:
        return

    client = OpenAI()

    video_id = str(video.get("VideoID", "")).strip()
    video_title = str(video.get("Title", "") or "").strip()
    summary = transcript_summaries.get(video_id, "")

    comments = video.get("Comments", []) or []
    # parent comments only and >= MIN_PARENT_TOKENS tokens
    parent_comments = [
        c for c in comments
        if not c.get("ParentID") and token_len(str(c.get("Text", ""))) >= MIN_PARENT_TOKENS
    ]

    rows_to_write: List[List[Any]] = []
    pairs_to_write: List[str] = []

    for parent in parent_comments:
        if stop_requested:
            break

        comment_id = str(parent.get("CommentID", "")).strip()
        pair_key = f"{video_id}|{comment_id}"

        # Skip if already processed (thread-safe read)
        with processed_pairs_lock:
            if pair_key in processed_pairs:
                continue

        replies = parent.get("Replies", []) or []
        reply_count = len(replies)

        # Build payload and call GPT
        payload = build_user_payload(summary, video_title, parent, replies)
        try:
            result = call_gpt(client, payload)
        except Exception as e:
            print(f" Skipping comment {comment_id} on video {video_id} due to repeated API errors: {e}")
            continue

        # Normalize breakdowns
        topics = normalize_breakdown(result.get("TopicBreakdown", []), "percent")
        subs = normalize_breakdown(result.get("SubtopicBreakdown", []), "percent")

        # Prepare CSV row
        question_excerpt = str(result.get("QuestionExcerpt", "")).strip()
        question_summary = str(result.get("QuestionSummary", "")).strip()
        reply_summary = str(result.get("ReplySummary", "")).strip()

        answer_excerpt = str(result.get("AnswerExcerpt", "")).strip() or "No Solution Present"
        answer_summary = str(result.get("AnswerSummary", "")).strip() or "No Solution Present"

        topic_str = format_breakdown_for_csv(topics, "topic", "percent")
        subtopic_str = format_breakdown_for_csv(subs, "subtopic", "percent")

        row = [
            video_id,
            comment_id,
            question_excerpt,
            question_summary,
            answer_excerpt,
            answer_summary,
            reply_summary,
            reply_count,
            topic_str,
            subtopic_str,
        ]
        rows_to_write.append(row)
        pairs_to_write.append(pair_key)

        # tiny jitter helps avoid rate limits
        time.sleep(random.uniform(MIN_DELAY_S, MAX_DELAY_S))

    # Send one batch to writer (writer will print the per-video completion after writing)
    if rows_to_write:
        row_queue.put({
            "type": "batch",
            "rows": rows_to_write,
            "pairs": pairs_to_write,
            "video_id": video_id,
            "video_title": video_title
        })
    else:
        # Still print a completion message for transparency (0 processed)
        row_queue.put({
            "type": "batch",
            "rows": [],
            "pairs": [],
            "video_id": video_id,
            "video_title": video_title
        })

# =========================
# ====== MAIN LOGIC  ======
# =========================
def main():
    ensure_api()

    # Prep output and state
    ensure_output_csv(OUTPUT_CSV)
    processed_pairs = load_existing_pairs_from_output(OUTPUT_CSV)
    state = load_state(STATE_FILE)
    # Merge state pairs (if any) into processed set
    for p in state.get("processed_pairs", []):
        processed_pairs.add(p)

    # Lock to protect processed_pairs across threads
    processed_pairs_lock = threading.Lock()

    # Load inputs (use comments with transcripts if available, else raw comments)
    if COMMENTS_WITH_TRANSCRIPTS_JSON.exists():
        comments_by_video = load_comments_json(COMMENTS_WITH_TRANSCRIPTS_JSON)
    else:
        comments_by_video = load_comments_json(COMMENTS_JSON)
    transcript_summaries = load_transcript_summaries(TRANSCRIPTS_CSV)

    total_videos = len(comments_by_video)

    # Start the writer thread
    row_queue: "queue.Queue" = queue.Queue()
    writer_thread = threading.Thread(
        target=writer_loop,
        args=(row_queue, OUTPUT_CSV, STATE_FILE, processed_pairs, processed_pairs_lock),
        daemon=True
    )
    writer_thread.start()

    # Submit video tasks
    futures = []
    with ThreadPoolExecutor(max_workers=VIDEO_WORKERS) as executor:
        for video in comments_by_video:
            if stop_requested:
                break
            futures.append(executor.submit(
                process_one_video,
                video,
                transcript_summaries,
                processed_pairs,
                processed_pairs_lock,
                row_queue
            ))

        # Wait for all submitted futures to finish
        try:
            for _ in as_completed(futures):
                if stop_requested:
                    # We still wait for running tasks to finish their current comment
                    pass
        except KeyboardInterrupt:
            # Redundant safety; SIGINT is already handled
            pass

    # Ensure all queued rows are written
    row_queue.join()
    # Stop writer thread
    row_queue.put(None)
    writer_thread.join()

    if stop_requested:
        print("Progress saved. Resume by running the script again.")
    else:
        print(f"All done. Submitted {len(futures)}/{total_videos} videos with {VIDEO_WORKERS} workers.")

# =========================
# ========= RUN ===========
# =========================
if __name__ == "__main__":
    try:
        main()
    except KeyboardInterrupt:
        # Redundant safety — we already trap SIGINT
        print("\n Interrupted. Progress saved where possible.")

### 02_Comment Analysis Results

In [None]:
# Code here visualizing the comment analysis results