# üé¨ YouTube Channel Search Model

**Purpose:** Search videos within specific YouTube channels and check for English transcripts

**What this model does:**
- üîç Finds YouTube channels by name
- üì∫ Searches videos within specific channels
- üìù Checks if videos have English subtitles/transcripts
- üéØ Filters results based on transcript availability

**Architecture:** Uses YouTube Data API v3 + youtube-transcript-api

## üì¶ Step 1: Import Required Libraries

Import the necessary tools to work with YouTube API and handle transcripts.

In [1]:
# Import necessary libraries
import os
from googleapiclient.discovery import build
from youtube_transcript_api import YouTubeTranscriptApi
from dotenv import load_dotenv

# Load API key from environment variables
load_dotenv()
api_key = os.getenv('YOUTUBE_API_KEY')

## üîß Step 2: Build the Model Class

The `YouTubeSearcher` class contains all the core functionality for searching YouTube channels and videos.

In [2]:
class YouTubeSearcher:
    """
    YouTube Channel Search Model
    
    Main features:
    1. Convert channel names to channel IDs
    2. Search videos within specific channels
    3. Check transcript availability
    4. Extract transcript text
    5. Filter videos by transcript availability
    """
    
    def __init__(self, api_key):
        """Initialize YouTube API connection"""
        self.youtube = build('youtube', 'v3', developerKey=api_key)
    
    def get_channel_id(self, channel_name):
        """
        Find YouTube channel ID from channel name
        
        Args:
            channel_name: String - Name of the channel (e.g., "TED")
            
        Returns:
            String - Channel ID or None if not found
        """
        try:
            request = self.youtube.search().list(
                part='snippet',
                q=channel_name,
                type='channel',
                maxResults=1
            )
            response = request.execute()
            
            if response['items']:
                return response['items'][0]['snippet']['channelId']
            return None
            
        except Exception as e:
            print(f"Error: {e}")
            return None
    
    def search_videos(self, query, channel_id, max_results=5):
        """
        Search for videos within a specific channel
        
        Args:
            query: String - Search query (e.g., "python tutorial")
            channel_id: String - YouTube channel ID
            max_results: Integer - Number of videos to return (default: 5)
            
        Returns:
            List of dictionaries containing video information
        """
        try:
            request = self.youtube.search().list(
                part='snippet',
                q=query,
                channelId=channel_id,  # Filter to specific channel
                type='video',
                maxResults=max_results,
                relevanceLanguage='en'
            )
            response = request.execute()
            
            # Extract video information
            videos = []
            for item in response['items']:
                video_info = {
                    'video_id': item['id']['videoId'],
                    'title': item['snippet']['title'],
                    'channel': item['snippet']['channelTitle'],
                    'description': item['snippet']['description'],
                    'url': f"https://www.youtube.com/watch?v={item['id']['videoId']}"
                }
                videos.append(video_info)
            
            return videos
            
        except Exception as e:
            print(f"Error: {e}")
            return []
    
    def check_transcript(self, video_id):
        """
        Check if video has English transcript available
        
        Args:
            video_id: String - YouTube video ID
            
        Returns:
            Tuple (has_transcript, is_auto_generated)
            - has_transcript: Boolean - True if English transcript exists
            - is_auto_generated: Boolean - True if auto-generated
        """
        try:
            transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
            
            # Try manual transcript first (better quality)
            try:
                transcript_list.find_transcript(['en'])
                return True, False
            except:
                pass
            
            # Try auto-generated transcript
            try:
                transcript_list.find_generated_transcript(['en'])
                return True, True
            except:
                pass
            
            return False, False
            
        except Exception as e:
            return False, False
    
    def get_transcript_text(self, video_id):
        """
        Extract the actual transcript text from a video
        
        Args:
            video_id: String - YouTube video ID
            
        Returns:
            String - Full transcript text or None if not available
        """
        try:
            transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
            
            # Try manual transcript first
            try:
                transcript = transcript_list.find_transcript(['en'])
                transcript_data = transcript.fetch()
                return " ".join([entry['text'] for entry in transcript_data])
            except:
                pass
            
            # Try auto-generated transcript
            try:
                transcript = transcript_list.find_generated_transcript(['en'])
                transcript_data = transcript.fetch()
                return " ".join([entry['text'] for entry in transcript_data])
            except:
                pass
            
            return None
            
        except Exception as e:
            return None
    
    def search_with_transcripts(self, channel_name, query, max_results=5):
        """
        Complete search pipeline: find channel, search videos, filter by transcripts
        
        This is the main function that combines all steps:
        1. Get channel ID from channel name
        2. Search videos in that channel
        3. Check each video for English transcripts
        4. Return only videos with transcripts
        
        Args:
            channel_name: String - Channel name (e.g., "Khan Academy")
            query: String - Search query (e.g., "python")
            max_results: Integer - Number of results (default: 5)
            
        Returns:
            List of videos with transcript information
        """
        # Step 1: Get channel ID
        channel_id = self.get_channel_id(channel_name)
        if not channel_id:
            return []
        
        # Step 2: Search videos (get extra to filter)
        videos = self.search_videos(query, channel_id, max_results * 3)
        
        # Step 3: Filter by transcript availability
        videos_with_transcripts = []
        for video in videos:
            has_transcript, is_auto = self.check_transcript(video['video_id'])
            
            if has_transcript:
                video['has_transcript'] = True
                video['is_auto_generated'] = is_auto
                videos_with_transcripts.append(video)
                
                # Stop when we have enough
                if len(videos_with_transcripts) >= max_results:
                    break
        
        return videos_with_transcripts

## üöÄ Step 3: Initialize the Model

Create an instance of the YouTubeSearcher with your API key.

In [3]:
# Create the searcher instance
searcher = YouTubeSearcher(api_key)

## ‚úÖ Test the Model

Let's test both methods: searching by channel name and by channel ID directly:

In [6]:
# Get user input for search method
print("Choose search method:")
print("1. Search by Channel Name")
print("2. Search by Channel ID")
method = input("Enter choice (1 or 2): ")

if method == "1":
    # Method 1: Search by Channel Name
    channel_name = input("\nEnter channel name (e.g., TEDx Talks): ")
    channel_id = searcher.get_channel_id(channel_name)
    if channel_id:
        print(f"‚úì Found Channel ID: {channel_id}")
    else:
        print("‚úó Channel not found!")
        channel_id = None
        
elif method == "2":
    # Method 2: Search by Channel ID Directly
    channel_id = input("\nEnter channel ID (e.g., UCsT0YIqwnpJCM-mx7-gSA4Q): ")
    print(f"‚úì Using Channel ID: {channel_id}")
else:
    print("Invalid choice!")
    channel_id = None

# Search for videos if channel_id is valid
if channel_id:
    query = input("\nEnter search query: ")
    max_results = int(input("Enter number of videos to search (e.g., 5): "))
    
    print(f"\nüîç Searching for '{query}' in channel...")
    videos = searcher.search_videos(query, max_results=max_results, channel_id=channel_id)
    
    print(f"\n‚úì Found {len(videos)} videos:\n")
    for i, video in enumerate(videos, 1):
        print(f"{i}. {video['title']}")
        print(f"   Video ID: {video['video_id']}")
        has_transcript = searcher.check_transcript(video['video_id'])
        print(f"   Has transcript: {has_transcript}")
        print()

Choose search method:
1. Search by Channel Name
2. Search by Channel ID
‚úì Found Channel ID: UCBwmMxybNva6P_5VmxjzwqA
‚úì Found Channel ID: UCBwmMxybNva6P_5VmxjzwqA

üîç Searching for 'pythons' in channel...

üîç Searching for 'pythons' in channel...

‚úì Found 5 videos:

1. Python Tutorial for Beginners - Full Course (with Notes &amp; Practice Questions)
   Video ID: ERCMXc8x7mc
   Has transcript: (False, False)

2. Is Python the Coding Language of the Future? A Brief Analysis
   Video ID: PnPc2xDwMvQ
   Has transcript: (False, False)

3. Python Tutorial for Beginners | Learn Python in 1.5 Hours
   Video ID: vLqTf2b6GZw
   Has transcript: (False, False)

4. Java or C++ or Python | Which language is best for Placements?
   Video ID: 1g3kYtJf6Tw
   Has transcript: (False, False)

5. Python Tutorial for Beginners (Full Course) at @shradhaKD  | Republic Day Gift
   Video ID: KrBnRcpWGEI
   Has transcript: (False, False)


‚úì Found 5 videos:

1. Python Tutorial for Beginners - Full Cou

## üìö Model Usage Guide

### Method 1: Basic Search
```python
# Find a channel
channel_id = searcher.get_channel_id("TED")

# Search videos in that channel
videos = searcher.search_videos("artificial intelligence", channel_id, max_results=5)
```

### Method 2: Check Transcripts
```python
# Check if a specific video has transcript
has_transcript, is_auto = searcher.check_transcript(video_id)

# Get the actual transcript text
transcript_text = searcher.get_transcript_text(video_id)
```

### Method 3: Complete Pipeline (Recommended)
```python
# All-in-one: Search with transcript filter
videos = searcher.search_with_transcripts(
    channel_name="Khan Academy",
    query="python programming",
    max_results=5
)
```

---

## üèóÔ∏è Model Architecture

### Core Components:

1. **API Connection (`__init__`)**: 
   - Establishes connection to YouTube Data API v3
   - Requires valid API key

2. **Channel Resolution (`get_channel_id`)**:
   - Converts channel name ‚Üí channel ID
   - Uses search API with type='channel'

3. **Video Search (`search_videos`)**:
   - Searches videos within specific channel
   - Uses channelId parameter to filter
   - Returns video metadata

4. **Transcript Detection (`check_transcript`)**:
   - Checks for manual English transcripts
   - Falls back to auto-generated if needed
   - Returns availability status

5. **Transcript Extraction (`get_transcript_text`)**:
   - Retrieves actual transcript content
   - Combines all subtitle entries

6. **Complete Pipeline (`search_with_transcripts`)**:
   - Orchestrates all steps
   - Filters results by transcript availability
   - Returns curated video list

### Data Flow:
```
Channel Name ‚Üí Channel ID ‚Üí Video Search ‚Üí Transcript Check ‚Üí Filtered Results
```

### Key Features:
- ‚úÖ Real-time search (no pre-downloading)
- ‚úÖ Channel-specific filtering
- ‚úÖ Transcript availability checking
- ‚úÖ Support for manual & auto-generated transcripts
- ‚úÖ Error handling at each step

---

## üéØ This is the exact model used in the Streamlit application!