# Backend Component Testing

This notebook tests the EXACT functionality of each backend component:

1. **backend/youtube** - Takes video ID(s), returns metadata via YouTube Data API v3
2. **backend/subtitles** - Takes video ID, returns list of chunks
3. **backend/openai** - Takes chunk text, returns title + 3 fields from OpenAI

---

## 1. Setup and Imports

Import all necessary modules and configure environment.

In [2]:
import os
import sys
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Import backend modules
from youtube.metadata import extract_video_id, fetch_video_metadata, fetch_batch_metadata
from subtitles.extractor import extract_and_chunk_subtitles
from openai_api.enrichment import enrich_chunk
from prompts import PROMPTS
from config import OPENAI_MODEL, OPENAI_TEMPERATURE

print("‚úÖ All modules imported successfully!")
print(f"üì¶ OpenAI Model: {OPENAI_MODEL}")
print(f"üå°Ô∏è  Temperature: {OPENAI_TEMPERATURE}")

‚úÖ All modules imported successfully!
üì¶ OpenAI Model: gpt-4o-mini
üå°Ô∏è  Temperature: 0.5


---

## 2. Test YouTube Metadata Retrieval

**Specification:** Takes in a YouTube video ID (or multiple IDs) and uses the Data API v3 to get metadata.

### Test 2.1: Single Video ID

In [3]:
# Test with single video ID
video_id = "dQw4w9WgXcQ"

print(f"üìπ Testing with video ID: {video_id}")
print("=" * 60)

metadata = fetch_video_metadata(video_id)

if metadata:
    print("‚úÖ SUCCESS - Metadata retrieved!")
    print(f"\nüìä Metadata Structure:")
    print(f"  ‚Ä¢ video_id: {metadata['video_id']}")
    print(f"  ‚Ä¢ title: {metadata['title']}")
    print(f"  ‚Ä¢ channel_title: {metadata['channel_title']}")
    print(f"  ‚Ä¢ channel_id: {metadata['channel_id']}")
    print(f"  ‚Ä¢ published_at: {metadata['published_at']}")
    print(f"  ‚Ä¢ duration: {metadata['duration']}")
    print(f"  ‚Ä¢ view_count: {metadata['view_count']:,}")
    print(f"  ‚Ä¢ like_count: {metadata['like_count']:,}")
    print(f"  ‚Ä¢ thumbnail_url: {metadata['thumbnail_url'][:50]}...")
else:
    print("‚ùå FAILED - Could not retrieve metadata")

üìπ Testing with video ID: dQw4w9WgXcQ
‚úÖ SUCCESS - Metadata retrieved!

üìä Metadata Structure:
  ‚Ä¢ video_id: dQw4w9WgXcQ
  ‚Ä¢ title: Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)
  ‚Ä¢ channel_title: Rick Astley
  ‚Ä¢ channel_id: UCuAXFkgsw1L7xaCfnd5JJOw
  ‚Ä¢ published_at: 2009-10-25T06:57:33Z
  ‚Ä¢ duration: PT3M34S
  ‚Ä¢ view_count: 1,701,939,817
  ‚Ä¢ like_count: 18,582,124
  ‚Ä¢ thumbnail_url: https://i.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg...
‚úÖ SUCCESS - Metadata retrieved!

üìä Metadata Structure:
  ‚Ä¢ video_id: dQw4w9WgXcQ
  ‚Ä¢ title: Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)
  ‚Ä¢ channel_title: Rick Astley
  ‚Ä¢ channel_id: UCuAXFkgsw1L7xaCfnd5JJOw
  ‚Ä¢ published_at: 2009-10-25T06:57:33Z
  ‚Ä¢ duration: PT3M34S
  ‚Ä¢ view_count: 1,701,939,817
  ‚Ä¢ like_count: 18,582,124
  ‚Ä¢ thumbnail_url: https://i.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg...


### Test 2.2: Multiple Video IDs (Batch)

In [4]:
# Test with multiple video IDs
video_ids = ["dQw4w9WgXcQ", "9bZkp7q19f0"]

print(f"üìπ Testing with {len(video_ids)} video IDs")
print("=" * 60)

metadata_list = fetch_batch_metadata(video_ids)

if metadata_list:
    print(f"‚úÖ SUCCESS - Retrieved {len(metadata_list)} videos!")
    print(f"\nüìä Batch Results:")
    for i, meta in enumerate(metadata_list, 1):
        print(f"\n  Video {i}:")
        print(f"    ‚Ä¢ ID: {meta['video_id']}")
        print(f"    ‚Ä¢ Title: {meta['title'][:50]}...")
        print(f"    ‚Ä¢ Channel: {meta['channel_title']}")
        print(f"    ‚Ä¢ Views: {meta['view_count']:,}")
else:
    print("‚ùå FAILED - Could not retrieve batch metadata")

üìπ Testing with 2 video IDs
‚úÖ SUCCESS - Retrieved 2 videos!

üìä Batch Results:

  Video 1:
    ‚Ä¢ ID: dQw4w9WgXcQ
    ‚Ä¢ Title: Rick Astley - Never Gonna Give You Up (Official Vi...
    ‚Ä¢ Channel: Rick Astley
    ‚Ä¢ Views: 1,701,939,817

  Video 2:
    ‚Ä¢ ID: 9bZkp7q19f0
    ‚Ä¢ Title: PSY - GANGNAM STYLE(Í∞ïÎÇ®Ïä§ÌÉÄÏùº) M/V...
    ‚Ä¢ Channel: officialpsy
    ‚Ä¢ Views: 5,727,528,210
‚úÖ SUCCESS - Retrieved 2 videos!

üìä Batch Results:

  Video 1:
    ‚Ä¢ ID: dQw4w9WgXcQ
    ‚Ä¢ Title: Rick Astley - Never Gonna Give You Up (Official Vi...
    ‚Ä¢ Channel: Rick Astley
    ‚Ä¢ Views: 1,701,939,817

  Video 2:
    ‚Ä¢ ID: 9bZkp7q19f0
    ‚Ä¢ Title: PSY - GANGNAM STYLE(Í∞ïÎÇ®Ïä§ÌÉÄÏùº) M/V...
    ‚Ä¢ Channel: officialpsy
    ‚Ä¢ Views: 5,727,528,210


---

## 3. Test Subtitle Chunk Extraction

**Specification:** Takes in a YouTube video ID and returns the LIST OF CHUNKS for that video.

In [5]:
# Test subtitle extraction and chunking
video_id = "m3ojamMNbKM"

print(f"üìπ Extracting subtitles for: {video_id}")
print("=" * 60)

chunks = extract_and_chunk_subtitles(video_id)

if chunks:
    print(f"‚úÖ SUCCESS - Extracted {len(chunks)} chunks!")
    print(f"\nüìä Chunk Structure:")
    
    for i, chunk in enumerate(chunks):
        print(f"\n  Chunk {i}:")
        print(f"    ‚Ä¢ text: {chunk['text'][:100]}...")
        print(f"    ‚Ä¢ word_count: {chunk['word_count']}")
        print(f"    ‚Ä¢ sentence_count: {chunk['sentence_count']}")
    
    # Save first chunk for OpenAI test
    test_chunk_text = chunks[0]['text']
    print(f"\nüíæ Saved first chunk for OpenAI testing")
else:
    print("‚ùå FAILED - Could not extract subtitles")
    test_chunk_text = None

üìπ Extracting subtitles for: m3ojamMNbKM
‚úÖ SUCCESS - Extracted 12 chunks!

üìä Chunk Structure:

  Chunk 0:
    ‚Ä¢ text: it's you that goes backwards it's you're not telling me that you fail you're not you've never said t...
    ‚Ä¢ word_count: 1000
    ‚Ä¢ sentence_count: 25

  Chunk 1:
    ‚Ä¢ text: time and I don't know if this is gonna be kind of oddly stubborn but pick something sure let's say i...
    ‚Ä¢ word_count: 1000
    ‚Ä¢ sentence_count: 25

  Chunk 2:
    ‚Ä¢ text: the Battle of kawatche yeah because I hadn't I didn't know anything about it I was this is insane it...
    ‚Ä¢ word_count: 1000
    ‚Ä¢ sentence_count: 25

  Chunk 3:
    ‚Ä¢ text: easier for me to cook takes less time things that great yeah so it feels that's not the issue okay s...
    ‚Ä¢ word_count: 1000
    ‚Ä¢ sentence_count: 25

  Chunk 4:
    ‚Ä¢ text: so let's talk let's dig into one of these and try to understand what that rubber band is so pick som...
    ‚Ä¢ word_count: 1000
    ‚Ä¢ sentence

---

## 4. Test OpenAI Processing

**Specification:** Takes in a subtitle chunk (only the text) and returns the title and 3 fields from OpenAI based on the prompt.

In [6]:
# Prepare prompts
prompt_dict = {
    'title': PROMPTS['short_title']['template'],
    'field_1': PROMPTS['ai_field_1']['template'],
    'field_2': PROMPTS['ai_field_2']['template'],
    'field_3': PROMPTS['ai_field_3']['template']
}

# Test with chunk text from previous step
if test_chunk_text:
    print(f"ü§ñ Processing chunk with OpenAI")
    print("=" * 60)
    print(f"üìù Input text preview: {test_chunk_text[:150]}...")
    print(f"\n‚è≥ Calling OpenAI API...")
    
    result = enrich_chunk(
        text=test_chunk_text,
        prompts=prompt_dict,
        model=OPENAI_MODEL,
        temperature=OPENAI_TEMPERATURE
    )
    
    if result:
        print(f"\n‚úÖ SUCCESS - OpenAI enrichment complete!")
        print(f"\nüìä Output Structure:")
        print(f"  ‚Ä¢ title: {result.get('title', 'N/A')}")
        print(f"  ‚Ä¢ field_1: {result.get('field_1', 'N/A')[:100]}...")
        print(f"  ‚Ä¢ field_2: {result.get('field_2', 'N/A')[:100]}...")
        print(f"  ‚Ä¢ field_3: {result.get('field_3', 'N/A')[:100]}...")
    else:
        print("‚ùå FAILED - OpenAI enrichment failed")
else:
    print("‚ö†Ô∏è  Skipping OpenAI test - no chunk text available")

ü§ñ Processing chunk with OpenAI
üìù Input text preview: it's you that goes backwards it's you're not telling me that you fail you're not you've never said that to me right you say that when I am making prog...

‚è≥ Calling OpenAI API...

‚úÖ SUCCESS - OpenAI enrichment complete!

üìä Output Structure:
  ‚Ä¢ title: Overcoming Feelings of Powerlessness and Addiction
  ‚Ä¢ field_1: In this video segment, the conversation revolves around feelings of powerlessness and hopelessness r...
  ‚Ä¢ field_2: - The conversation revolves around feelings of powerlessness and hopelessness in daily life, particu...
  ‚Ä¢ field_3: addiction, feelings of powerlessness, mental health, personal growth, self-improvement...

‚úÖ SUCCESS - OpenAI enrichment complete!

üìä Output Structure:
  ‚Ä¢ title: Overcoming Feelings of Powerlessness and Addiction
  ‚Ä¢ field_1: In this video segment, the conversation revolves around feelings of powerlessness and hopelessness r...
  ‚Ä¢ field_2: - The conversation re

---

## 5. Integration Test: End-to-End Pipeline

Test all components together in sequence: **YouTube ‚Üí Subtitles ‚Üí OpenAI**

In [None]:
# End-to-end integration test
test_video_id = "dQw4w9WgXcQ"

print("üöÄ STARTING END-TO-END INTEGRATION TEST")
print("=" * 60)

# Step 1: Get metadata
print(f"\n[1/3] üìπ Fetching video metadata...")
metadata = fetch_video_metadata(test_video_id)
if metadata:
    print(f"      ‚úÖ Got metadata: {metadata['title'][:50]}...")
else:
    print(f"      ‚ùå Failed to get metadata")

# Step 2: Extract and chunk subtitles
print(f"\n[2/3] üìù Extracting subtitle chunks...")
chunks = extract_and_chunk_subtitles(test_video_id)
if chunks:
    print(f"      ‚úÖ Got {len(chunks)} chunks")
else:
    print(f"      ‚ùå Failed to extract chunks")

# Step 3: Enrich first chunk with OpenAI
if chunks and len(chunks) > 0:
    print(f"\n[3/3] ü§ñ Processing first chunk with OpenAI...")
    
    enriched = enrich_chunk(
        text=chunks[0]['text'],
        prompts=prompt_dict,
        model=OPENAI_MODEL,
        temperature=OPENAI_TEMPERATURE
    )
    
    if enriched:
        print(f"      ‚úÖ OpenAI enrichment complete!")
    else:
        print(f"      ‚ùå OpenAI enrichment failed")
else:
    print(f"\n[3/3] ‚ö†Ô∏è  Skipping OpenAI - no chunks available")
    enriched = None

# Final results
print(f"\n" + "=" * 60)
print(f"üìä INTEGRATION TEST RESULTS")
print("=" * 60)

if metadata:
    print(f"\n‚úÖ Video Metadata:")
    print(f"   ‚Ä¢ Title: {metadata['title']}")
    print(f"   ‚Ä¢ Channel: {metadata['channel_title']}")
    print(f"   ‚Ä¢ Views: {metadata['view_count']:,}")

if chunks:
    print(f"\n‚úÖ Subtitle Chunks:")
    print(f"   ‚Ä¢ Total chunks: {len(chunks)}")
    print(f"   ‚Ä¢ First chunk words: {chunks[0]['word_count']}")
    print(f"   ‚Ä¢ First chunk sentences: {chunks[0]['sentence_count']}")

if enriched:
    print(f"\n‚úÖ OpenAI Enrichment:")
    print(f"   ‚Ä¢ Title: {enriched.get('title', 'N/A')}")
    print(f"   ‚Ä¢ Field 1: {enriched.get('field_1', 'N/A')[:80]}...")
    print(f"   ‚Ä¢ Field 2: {enriched.get('field_2', 'N/A')[:80]}...")
    print(f"   ‚Ä¢ Field 3: {enriched.get('field_3', 'N/A')[:80]}...")

if metadata and chunks and enriched:
    print(f"\nüéâ ALL COMPONENTS WORKING CORRECTLY!")
else:
    print(f"\n‚ö†Ô∏è  Some components failed - check logs above")

---

## Summary

This notebook tested the EXACT functionality of each backend component:

### ‚úÖ Component Specifications Verified:

1. **backend/youtube/metadata.py**
   - ‚úì Takes in video ID(s)
   - ‚úì Uses YouTube Data API v3
   - ‚úì Returns metadata dictionary

2. **backend/subtitles/extractor.py**
   - ‚úì Takes in video ID
   - ‚úì Returns LIST OF CHUNKS
   - ‚úì Each chunk has: text, word_count, sentence_count

3. **backend/openai_api/enrichment.py**
   - ‚úì Takes in chunk text (only)
   - ‚úì Takes in prompts (required)
   - ‚úì Returns title + 3 fields

All components work independently and integrate correctly! üéâ