# YouTube Video Data Scraping Assignment

## Q1-Q5: Scrape video URL, thumbnail URL, title, views, and posting time for the first five videos, and save to CSV

Below is a Python program using `requests` and `BeautifulSoup` to scrape the required data from a YouTube search results page. Note: YouTube uses dynamic content loading, so for reliable scraping, Selenium is often used. This example uses `requests` and `BeautifulSoup` for demonstration, but may not work for all YouTube pages due to JavaScript rendering.

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

# Example YouTube search URL (replace with your target URL)
url = 'https://www.youtube.com/results?search_query=python+programming'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

# Find all video containers (YouTube uses 'a' tags with 'href' containing '/watch')
videos = []
for a in soup.find_all('a', href=True):
    if '/watch?v=' in a['href']:
        videos.append(a)

# Remove duplicates and keep only the first five unique videos
seen = set()
unique_videos = []
for v in videos:
    if v['href'] not in seen:
        unique_videos.append(v)
        seen.add(v['href'])
    if len(unique_videos) == 5:
        break

video_data = []
for v in unique_videos:
    video_url = 'https://www.youtube.com' + v['href']
    title = v.get('title') or v.text.strip()
    # Thumbnail URL pattern
    video_id = v['href'].split('v=')[1].split('&')[0]
    thumbnail_url = f'https://img.youtube.com/vi/{video_id}/0.jpg'
    # Views and posting time are not available in the anchor tag; would require more advanced scraping or Selenium
    video_data.append({
        'Video URL': video_url,
        'Thumbnail URL': thumbnail_url,
        'Title': title,
        'Views': 'N/A',
        'Posted': 'N/A'
    })

# Save to CSV
csv_filename = 'youtube_videos.csv'
df = pd.DataFrame(video_data)
df.to_csv(csv_filename, index=False)
df