# 🎬 YouTube Content Integrity Analyzer

This notebook uses PraisonAI to analyze YouTube video captions for harmful content (hate speech, misinformation, violence, fraud) and evaluates educational value, relevance, integrity, and clarity. The agent provides unbiased, category-based content quality assessments using only the transcript text.

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Dhivya-Bharathy/PraisonAI/blob/main/examples/cookbooks/Qwen2.5_InstructionAgent.ipynb)

# Dependencies

In [None]:
!pip install praisonaiagents openai youtube-transcript-api --quiet

# Set up API KEY

In [None]:
from praisonaiagents import Agent
import os
import json

# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "Enter your api key"  # <-- Replace with your actual OpenAI API key

# Get YouTube Transcript

In [None]:
from youtube_transcript_api import YouTubeTranscriptApi

def get_youtube_transcript(video_id, languages=['en']):
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=languages)
        text = " ".join([entry['text'] for entry in transcript])
        return text
    except Exception as e:
        return None

# Prompts (role, goal, instructions)

In [None]:
ROLE = """
Act as a content integrity and quality analyst trained to evaluate YouTube video captions. You are an expert in content policy enforcement, linguistic analysis, and media integrity.
You specialize in detecting harmful content (e.g., hate speech, misinformation, fraud) and evaluating the clarity, educational value, and relevance of language in video captions, using only textual evidence and without assumptions about tone or intent.
"""

GOAL = """
Your task is to assess whether specific content categories are present based solely on the language in the transcript. Your judgments should be precise, unbiased, and strictly based on linguistic clues — do not rely on assumptions about video intent or tone.
"""

INSTRUCTIONS = """
You will receive a list of categories and a YouTube transcript. Analyze the transcript to determine the presence of the specified categories.
Only use these categories: ["hatred", "misinformation", "violence", "fraud", "educational", "relevance", "integrity", "clarity"].
Discard any other category. If none remain, return an error.
For each valid category, assign a score from 0 to 10 and provide a reason.
Return a JSON object with:
  - categories: list of {name, score, connotation, reason}
  - overall: {score, reason}
  - error: ""
  - content_summary: brief summary of the transcript
Start your response with { and end with }.
"""

# Main Agent Function

In [None]:
def analyze_youtube_content(video_url, categories, custom_prompts=None):
    # Extract video ID from URL
    import re
    match = re.search(r"v=([A-Za-z0-9_-]+)", video_url)
    if not match:
        return {"error": "Invalid YouTube URL"}
    video_id = match.group(1)

    transcript = get_youtube_transcript(video_id)
    if not transcript:
        return {"error": "Could not fetch transcript for this video."}

    # Prepare the prompt for the agent
    prompt = f"Categories: {categories}\n\nTranscript:\n{transcript}\n"
    if custom_prompts:
        for cp in custom_prompts:
            prompt += f"\nAdditional user prompt: {cp}"

    agent = Agent(
        role=ROLE,
        goal=GOAL,
        instructions=INSTRUCTIONS,
    )
    response = agent.start(prompt)
    return response

# Example Usage (with a valid video URL)

In [None]:
# Example usage with a valid video URL (TED Talk with English transcript)
video_url = "https://www.youtube.com/watch?v=Ks-_Mh1QhMc"  # TED: Your body language may shape who you are
categories = ["hatred", "educational", "clarity"]
result = analyze_youtube_content(video_url, categories)
print(result)

Output()

{
  "categories": [
    {
      "name": "educational",
      "score": 9,
      "connotation": "positive",
      "reason": "The transcript provides a detailed explanation of body language and its effects on personal and professional outcomes. It includes scientific studies and practical advice, making it highly educational."
    },
    {
      "name": "clarity",
      "score": 8,
      "connotation": "positive",
      "reason": "The transcript is clear in its explanation of concepts related to body language and power dynamics. It uses examples and studies to illustrate points, although the length and detail might be overwhelming for some."
    }
  ],
  "overall": {
    "score": 8.5,
    "reason": "The transcript is highly educational and clear, providing valuable insights into body language and its impact on personal empowerment. It effectively communicates scientific findings and practical applications."
  },
  "error": "",
  "content_summary": "The transcript discusses the impact of b