#### Q.) What is Selenium and how do web drivers work?
##### Selenium is an open-source framework designed for testing. The Selenium Web Driver is just one part of the overall Selenium library. Selenium allows us to run our scripts in an automated process on the web driver, which helps us to decrease the time to scrape data. As it uses automation, we don’t have to manually do anything. Selenium supports a lot of web drivers, e.g., Chrome, Mozilla, Safari, etc.

##### Selenium Web Drivers basically work in 3 steps:
- ##### First, the JSON wire protocol converts the test commands into an HTTP request.
- ##### The Browser Driver, which is already present in each browser, is initialised before any script is executed.
##### Once everything is ready, the driver starts to send requests to the browser.
- ##### Each Web Driver can directly call its browser. Whenever a script is executed, an HTTP request is generated by the JSON wire protocol. The browser driver uses an HTTP server for getting HTTP requests. After receiving and executing the request, the status is sent back to the HTTP server, which forwards it back to the script.

- WebDriver is a tool that lets you control a web browser from your code. It's like a remote control for your browser, allowing you to automate  tasks, test websites, and scrape data. Think of it as the bridge between your program and the web.

# Youtube Web Scraping | Understanding the Tags

In [36]:
from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd
from tqdm import tqdm
import numpy as np
import time

In [4]:
browser = webdriver.Chrome()
browser.get('https://www.youtube.com/c/GeeksforGeeksVideos/videos')

In [14]:
soup = BeautifulSoup(browser.page_source,'html.parser')

# Youtube Web Scraping | Data from Channel Page

In [16]:
data = []
for sp in soup.find_all('ytd-rich-item-renderer'):

    try:
        title      = sp.find('a', class_ ='yt-simple-endpoint focus-on-expand style-scope ytd-rich-grid-media').text
    except:
        title = np.nan
    try:
        video_link = sp.find('a', class_ ='yt-simple-endpoint focus-on-expand style-scope ytd-rich-grid-media').get('href')
    except:
        video_link = np.nan
    try:
        views = sp.find_all('span' , class_ = 'inline-metadata-item style-scope ytd-video-meta-block')[0].text
    except:
        views = np.nan
    try:
        date_time  = sp.find_all('span' , class_ = 'inline-metadata-item style-scope ytd-video-meta-block')[1].text
    except:
        date_time = np.nan
    try:
        thumbnail_link = sp.find('img').get('src').split('?')[0]
    except:
        thumbnails_link = np.nan

    data.append([title,views,date_time,video_link,thumbnail_link])

In [20]:
data

[['Three 90 Challenge | 1.5 Cr Refunded | GeeksforGeeks',
  '728 views',
  '1 day ago',
  '/watch?v=28wXxweOOFM',
  'https://i.ytimg.com/vi/28wXxweOOFM/hqdefault.jpg'],
 ['From Tier 3 College to 20LPA Package 🤑 at Harman as a Senior Software Engineer | My Journey',
  '2.2K views',
  '3 days ago',
  '/watch?v=cGPpUdaISGE',
  'https://i.ytimg.com/vi/cGPpUdaISGE/hqdefault.jpg'],
 ['How I Built an AirBnb Clone from Scratch | MERN Stack Project | GeeksforGeeks',
  '1.1K views',
  '4 days ago',
  '/watch?v=koTBLYNq63I',
  'https://i.ytimg.com/vi/koTBLYNq63I/hqdefault.jpg'],
 ['Master Web3 in 2024 (The Ultimate Roadmap) | Block Chain, Crypto Currency |',
  '1.2K views',
  '7 days ago',
  '/watch?v=CKrM5ouILBg',
  'https://i.ytimg.com/vi/CKrM5ouILBg/hqdefault.jpg'],
 ['🚀 Do This to Learn DSA and Get a Job | *Free Resources Inside* | The Ultimate DSA Roadmap 📚',
  '3.1K views',
  '11 days ago',
  '/watch?v=VdLUuIeuXqM',
  'https://i.ytimg.com/vi/VdLUuIeuXqM/hqdefault.jpg'],
 ['Learn Sorting in 

In [22]:
df = pd.DataFrame(data, columns = [ 'title','views','date_time','video_link','thumbnail_link'])

In [24]:
df.to_csv('Data.csv')

In [26]:
df.isnull().sum()

title             0
views             0
date_time         0
video_link        0
thumbnail_link    0
dtype: int64

# Youtube Web Scraping | Video Data Scraping

In [28]:
browser.get('https://www.youtube.com/')
time.sleep(2)

data = []
for link in tqdm(df['video_link']):
    link = 'https://www.youtube.com/' + link
    browser.get(link)

    time.sleep(3)

    soup =  BeautifulSoup(browser.page_source, 'html.parser')

    try:
        title   = soup.find('yt-formatted-string', class_ = 'style-scope ytd-watch-metadata').text
    except:
        title = np.nan
    try:
        views   = soup.find('span', class_ = 'bold style-scope yt-formatted-string').text
    except:
        views = np.nan
    try:
        likes = soup.find('segmented-like-dislike-button-view-model', class_ = 'YtSegmentedLikeDislikeButtonViewModelHost style-scope ytd-menu-renderer').text
    except:
        likes = np.nan
    try:
        description = soup.find('ytd-text-inline-expander','style-scope ytd-watch-metadata').text
    except:
        description = np.nan
        
    data.append([title,views,likes,link,description])

100%|████████████████████████████████████████████████████████████████████████████| 1888/1888 [3:00:28<00:00,  5.74s/it]


In [72]:
data

122

In [30]:
df = pd.read_csv('Data.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,title,views,date_time,video_link,thumbnail_link
0,0,Three 90 Challenge | 1.5 Cr Refunded | Geeksfo...,728 views,1 day ago,/watch?v=28wXxweOOFM,https://i.ytimg.com/vi/28wXxweOOFM/hqdefault.jpg
1,1,From Tier 3 College to 20LPA Package 🤑 at Harm...,2.2K views,3 days ago,/watch?v=cGPpUdaISGE,https://i.ytimg.com/vi/cGPpUdaISGE/hqdefault.jpg
2,2,How I Built an AirBnb Clone from Scratch | MER...,1.1K views,4 days ago,/watch?v=koTBLYNq63I,https://i.ytimg.com/vi/koTBLYNq63I/hqdefault.jpg
3,3,Master Web3 in 2024 (The Ultimate Roadmap) | B...,1.2K views,7 days ago,/watch?v=CKrM5ouILBg,https://i.ytimg.com/vi/CKrM5ouILBg/hqdefault.jpg
4,4,🚀 Do This to Learn DSA and Get a Job | *Free R...,3.1K views,11 days ago,/watch?v=VdLUuIeuXqM,https://i.ytimg.com/vi/VdLUuIeuXqM/hqdefault.jpg


In [38]:
for link in df['video_link']:
    link = ('https://www.youtube.com/' + link)
    browser.get(link)
    
    break

WebDriverException: Message: disconnected: not connected to DevTools
  (failed to check if window was closed: disconnected: not connected to DevTools)
  (Session info: chrome=128.0.6613.138)
Stacktrace:
	GetHandleVerifier [0x00007FF6FB269412+29090]
	(No symbol) [0x00007FF6FB1DE239]
	(No symbol) [0x00007FF6FB09B1DA]
	(No symbol) [0x00007FF6FB0828FC]
	(No symbol) [0x00007FF6FB0827C0]
	(No symbol) [0x00007FF6FB09DAC1]
	(No symbol) [0x00007FF6FB135ED9]
	(No symbol) [0x00007FF6FB116493]
	(No symbol) [0x00007FF6FB0E09D1]
	(No symbol) [0x00007FF6FB0E1B31]
	GetHandleVerifier [0x00007FF6FB58871D+3302573]
	GetHandleVerifier [0x00007FF6FB5D4243+3612627]
	GetHandleVerifier [0x00007FF6FB5CA417+3572135]
	GetHandleVerifier [0x00007FF6FB325EB6+801862]
	(No symbol) [0x00007FF6FB1E945F]
	(No symbol) [0x00007FF6FB1E4FB4]
	(No symbol) [0x00007FF6FB1E5140]
	(No symbol) [0x00007FF6FB1D461F]
	BaseThreadInitThunk [0x00007FFC94BA257D+29]
	RtlUserThreadStart [0x00007FFC9524AF28+40]


In [34]:
soup = BeautifulSoup(browser.page_source,'html.parser')
views = soup.find('span','bold style-scope yt-formatted-string').text

WebDriverException: Message: disconnected: not connected to DevTools
  (failed to check if window was closed: disconnected: not connected to DevTools)
  (Session info: chrome=128.0.6613.138)
Stacktrace:
	GetHandleVerifier [0x00007FF6FB269412+29090]
	(No symbol) [0x00007FF6FB1DE239]
	(No symbol) [0x00007FF6FB09B1DA]
	(No symbol) [0x00007FF6FB0828FC]
	(No symbol) [0x00007FF6FB0827C0]
	(No symbol) [0x00007FF6FB09DAC1]
	(No symbol) [0x00007FF6FB135ED9]
	(No symbol) [0x00007FF6FB116493]
	(No symbol) [0x00007FF6FB0E09D1]
	(No symbol) [0x00007FF6FB0E1B31]
	GetHandleVerifier [0x00007FF6FB58871D+3302573]
	GetHandleVerifier [0x00007FF6FB5D4243+3612627]
	GetHandleVerifier [0x00007FF6FB5CA417+3572135]
	GetHandleVerifier [0x00007FF6FB325EB6+801862]
	(No symbol) [0x00007FF6FB1E945F]
	(No symbol) [0x00007FF6FB1E4FB4]
	(No symbol) [0x00007FF6FB1E5140]
	(No symbol) [0x00007FF6FB1D461F]
	BaseThreadInitThunk [0x00007FFC94BA257D+29]
	RtlUserThreadStart [0x00007FFC9524AF28+40]


In [52]:
views

'728 views'

In [74]:
df = pd.DataFrame(data,columns = ['title','views','likes','link','description']) 

In [76]:
df

Unnamed: 0,title,views,likes,link,description
0,Three 90 Challenge | 1.5 Cr Refunded | Geeksfo...,728 views,13,https://www.youtube.com//watch?v=28wXxweOOFM,Three 90 Challenge : ENDING SOON \nREGISTER NO...
1,From Tier 3 College to 20LPA Package 🤑 at Harm...,2.2K views,139,https://www.youtube.com//watch?v=cGPpUdaISGE,"""From a Tier 3 college to a 20 LPA package as ..."
2,How I Built an AirBnb Clone from Scratch | MER...,1.1K views,39,https://www.youtube.com//watch?v=koTBLYNq63I,"In this video, we dive into building a fully f..."
3,Master Web3 in 2024 (The Ultimate Roadmap) | B...,1.2K views,62,https://www.youtube.com//watch?v=CKrM5ouILBg,Unlock the future of the digital economy with ...
4,🚀 Do This to Learn DSA and Get a Job | *Free R...,3.1K views,214,https://www.youtube.com//watch?v=VdLUuIeuXqM,🚀 Do This to Learn DSA and Get a Job | Free Re...
...,...,...,...,...,...
117,Kyun Hai Rohan #successsepareshan? | GeeksforG...,275K views,62,https://www.youtube.com//watch?v=v41P3Y5Xol8,Kya aapko bhi chaiye aise pareshani?\nToh Roha...
118,TREE DATA STRUCTURES | What is Tree? | DSA Cou...,6.8K views,90,https://www.youtube.com//watch?v=AylOfzYJ2qE,"Welcome to the next video of our DSA Course, w..."
119,Three 90 Challenge Extended | 100+ Refunds Pro...,33K views,35,https://www.youtube.com//watch?v=FVCERWiQSDY,Start the Three 90 Challenge Today: https://bi...
120,TYPESCRIPT vs JAVASCRIPT | Which one to Choose...,1.1K views,29,https://www.youtube.com//watch?v=BFf8n2RrKtM,Let us dive into the comparison between TypeSc...


In [40]:
df = df.dropna()

In [42]:
df.to_csv('GFG.csv')