As a practice, you can have fun trying to scrape YouTube. We will provide you with a notebook to setup the scraper. Here are some ideas what to do:

1. Scrape the text from each span tag

2. How many images are on YouTube'e homepage?

3. Can you find the URL of the link with title = "Movies"? Music? Sports?

4. Now, try connecting to and scraping https://www.youtube.com/results?search_query=stairway+to+heaven

        a. Can you get the names of the first few videos in the search results?

        b. Next, connect to one of the search result videos - https://www.youtube.com/watch?v=qHFxncb1gRY

        c. Can you find the "related" videos? What are their titles? Durations? URLs? Number of views?

        d. Try finding (and scraping) the description of the video.

In [155]:
import requests
from bs4 import BeautifulSoup
from collections import namedtuple
from urllib.parse import urljoin

In [3]:
base_url = 'https://www.youtube.com/'

In [4]:
response = requests.get(base_url)
response

<Response [200]>

In [5]:
html = response.content

In [8]:
soup = BeautifulSoup(html, 'lxml')

In [9]:
with open('youtube.html', 'wb') as yt_file:
    yt_file.write(soup.prettify('utf-8'))

### 1. Scrape the text from each span tag

In [11]:
span_tags = soup.find_all('span')

In [28]:
span_text = [tag.text.strip() for tag in span_tags if tag.text.strip()]
span_text

['IN',
 'IN',
 'Sign in',
 'Search',
 'Trending',
 'Home',
 'Home',
 'Home',
 'Trending',
 'Trending',
 'Trending',
 'History',
 'History',
 'History',
 'Get YouTube Premium',
 'Get YouTube Premium',
 'Get YouTube Premium',
 'Music',
 'Music',
 'Music',
 'Sports',
 'Sports',
 'Sports',
 'Gaming',
 'Gaming',
 'Gaming',
 'Films',
 'Films',
 'Films',
 'News',
 'News',
 'News',
 'Live',
 'Live',
 'Live',
 'Fashion',
 'Fashion',
 'Fashion',
 'Learning',
 'Learning',
 'Learning',
 'Spotlight',
 'Spotlight',
 'Spotlight',
 '360┬░ Video',
 '360┬░ Video',
 '360┬░ Video',
 'Browse channels',
 'Browse channels',
 'Browse channels',
 'Sign in',
 'Watch Queue',
 'Queue',
 'Remove allDisconnect',
 'Remove all',
 'Disconnect',
 'The next video is starting',
 'stop',
 'stop',
 'Loading...',
 'Recommended',
 '3:56',
 '3:56',
 '- Duration: 3:56.',
 '5:17',
 '5:17',
 '- Duration: 5:17.',
 '17:49',
 '17:49',
 '- Duration: 17:49.',
 '1:19:30',
 '1:19:30',
 '- Duration: 1:19:30.',
 '23:08',
 '23:08',
 '- Du

In [29]:
len(span_text)

405

### 2. How many images are on YouTube'e homepage?

In [30]:
img_tags = soup.find_all('img')

In [32]:
len(img_tags)

119

### 3. Can you find the URL of the link with title = "Movies"? Music? Sports?

In current html page there is not tag with title="Movies" but there is one tag for Music and Sports each.

It comes under `a` tag

In [42]:
music_tag = soup.find('a', title='Music')
sports_tag = soup.find('a', title='Sports')

In [39]:
music_tag['href']

'/channel/UC-9-kyTW8ZkZNDHQJ6FgpwQ'

Obtained link is relative URL. We can convert it into Absolute url

In [41]:
relative_url_music = music_tag['href']
music_url = urljoin(base_url, relative_url_music)
music_url

'https://www.youtube.com/channel/UC-9-kyTW8ZkZNDHQJ6FgpwQ'

In [43]:
relative_url_sports = sports_tag['href']
sports_url = urljoin(base_url, relative_url_sports)
sports_url

'https://www.youtube.com/channel/UCEgdi0XIXXZ-qJOFPf4JSKw'

### Scraping https://www.youtube.com/results?search_query=stairway+to+heaven

In [45]:
search_url = 'https://www.youtube.com/results?search_query=stairway+to+heaven'
r = requests.get(search_url)
html = r.content
soup = BeautifulSoup(html, 'lxml')
with open('YTsearch.html', 'wb') as yt_file:
    yt_file.write(soup.prettify('utf-8'))

### Get the names of first few videos

`div` tag with `class=yt-lockup-content`

In [46]:
div_tags = soup.find_all('div', class_='yt-lockup-content')

In [52]:
video_titles = [tag.find('a').text for tag in div_tags]

In [53]:
video_titles

['Led Zeppelin -  Stairway to Heaven Live',
 'Stairway to Heaven Led Zeppelin Lyrics',
 'Led Zeppelin - Stairway To Heaven (Official Audio)',
 'Led Zeppelin - Stairway To Heaven',
 'Heart - Stairway to Heaven (Live at Kennedy Center Honors) [FULL VERSION]',
 'Stairway To Heaven',
 'Led Zeppelin "Stairway to Heaven" performed by The Classic Rock Show',
 'Led Zeppelin Live Aid 1985 3 Stairway to Heaven Stereo',
 'STAIRWAY TO HEAVEN -  Flashmob - LEGENDADO PORTUGUÊS - INGLÊS',
 'stairway to heaven backwards',
 'stairway to heaven love song',
 'Stairway To Heaven Led Zeppelin Guitar Lesson + Tutorial',
 'Metallica: Nothing Else Matters (Official Music Video)',
 "Led Zeppelin Celebration Day 10 dicembre 2007 presso l'O2 Arena di Londra",
 'The Kennedy Center Honors Led Zeppelin 2012',
 'Deep Purple - Highway Star 1972 Video HQ',
 'Led Zeppelin - Achilles Last Stand (Live Knebworth 1979)',
 "Legendary Licks You Think Are Easy (but aren't)",
 'Homenagem ao Led Zeppelin com Stairway to Heaven'

### 3. find the "related" videos? What are their titles? Durations? URLs? Number of views

https://www.youtube.com/watch?v=qHFxncb1gRY

In [54]:
url = 'https://www.youtube.com/watch?v=qHFxncb1gRY'
r = requests.get(url)
html = r.content
soup = BeautifulSoup(html, 'lxml')
with open('YTsearchpt2.html', 'wb') as yt_file:
    yt_file.write(soup.prettify('utf-8'))

In [99]:
anchor_tag = soup.find_all('li', class_="video-list-item related-list-item")

In [133]:
related_titles = [tag.find('a')['title'].strip() for tag in anchor_tag 
                  if tag.find('a',class_='content-link')]
related_titles

['Led Zeppelin - Stairway To Heaven',
 '「Eagles」Hotel California lyrics (HD)',
 'Scorpions - Still Loving You ( lyrics )',
 'Pink Floyd - Another Brick In The Wall',
 'Lynyrd Skynyrd - Simple Man (Subtitulado Español/Inglés)',
 'Metallica - Nothing else matter lyrics',
 'The Animals - House Of The Rising Sun (LYRICS)',
 'Dire Straits-Sultans of Swing (with lyrics)',
 'Have You Ever Seen The Rain ? - Creedence Clearwater Revival (lyrics)',
 'Eagles - Hotel California',
 'Pink Floyd - Comfortably Numb With Lyrics',
 'Metallica - "Nothing Else Matters" Lyrics (HD)',
 'Hotel California',
 'Nirvana - Smells Like Teen Spirit (Lyrics)',
 'Eric Clapton - Tears In Heaven (lyrics)',
 "Guns N' Roses - Knockin' on Heaven's door Lyrics",
 'Aerosmith - Dream on- Lyrics',
 'Pink Floyd-Wish You Were Here (Lyrics)',
 'Deep Purple - Soldier of Fortune Lyrics']

In [161]:
related_durations = [tag.find('span', class_='video-time').text for tag in anchor_tag
             if tag.find('span', class_='video-time')]
related_durations

['8:02',
 '6:30',
 '6:27',
 '10:16',
 '5:53',
 '6:37',
 '4:30',
 '5:51',
 '2:36',
 '6:31',
 '6:25',
 '6:31',
 '7:27',
 '5:03',
 '4:37',
 '5:37',
 '4:28',
 '5:18',
 '3:14']

In [143]:
abs_urls = [tag.find('a')['href'].strip() for tag in anchor_tag 
                  if tag.find('a',class_='content-link')]

In [159]:
related_urls = [urljoin(url, link) for link in abs_urls]
related_urls

['https://www.youtube.com/watch?v=Nnu1E5Kslig',
 'https://www.youtube.com/watch?v=w1pK5anfAVM',
 'https://www.youtube.com/watch?v=EYyarcp5LtU',
 'https://www.youtube.com/watch?v=Y9d72n2fX6g',
 'https://www.youtube.com/watch?v=fx4qXtdaNpE',
 'https://www.youtube.com/watch?v=x7bIbVlIqEc',
 'https://www.youtube.com/watch?v=y2oKRKZnEoA',
 'https://www.youtube.com/watch?v=nRIiyCWRGTo',
 'https://www.youtube.com/watch?v=pmTiK9jp970',
 'https://www.youtube.com/watch?v=EqPtz5qN7HM',
 'https://www.youtube.com/watch?v=XpqjEnRU6uM',
 'https://www.youtube.com/watch?v=TJJ_qnoF_yI',
 'https://www.youtube.com/watch?v=jVHhV3A5C5c',
 'https://www.youtube.com/watch?v=ukWaogFC0O8',
 'https://www.youtube.com/watch?v=ZqtyQuXo9zM',
 'https://www.youtube.com/watch?v=FoYD-IT-jRI',
 'https://www.youtube.com/watch?v=L9srmft6STc',
 'https://www.youtube.com/watch?v=1tGO1Y4FGpI',
 'https://www.youtube.com/watch?v=d9hmm6MZ3GY']

In [160]:
related_views = [tag.find('span', class_='stat view-count').text[:-5].strip() for tag in anchor_tag
             if tag.find('span', class_='stat view-count')]

related_views

['2,87,99,739',
 '7,64,205',
 '5,67,06,864',
 '36,06,768',
 '75,56,783',
 '7,10,06,979',
 '1,88,36,818',
 '5,31,074',
 '1,05,30,006',
 '47,00,76,198',
 '86,36,850',
 '56,44,910',
 '49,56,955',
 '12,48,387',
 '1,61,80,224',
 '17,04,652',
 '72,54,409',
 '2,96,01,307',
 '85,28,985']

In [157]:
related_videos = namedtuple('Video', 'Title Duration URL Views')

In [162]:
for i in zip(related_titles, related_durations, related_urls, related_views):
    print(related_videos(*i))

Video(Title='Led Zeppelin - Stairway To Heaven', Duration='8:02', URL='https://www.youtube.com/watch?v=Nnu1E5Kslig', Views='2,87,99,739')
Video(Title='「Eagles」Hotel California lyrics (HD)', Duration='6:30', URL='https://www.youtube.com/watch?v=w1pK5anfAVM', Views='7,64,205')
Video(Title='Scorpions - Still Loving You ( lyrics )', Duration='6:27', URL='https://www.youtube.com/watch?v=EYyarcp5LtU', Views='5,67,06,864')
Video(Title='Pink Floyd - Another Brick In The Wall', Duration='10:16', URL='https://www.youtube.com/watch?v=Y9d72n2fX6g', Views='36,06,768')
Video(Title='Lynyrd Skynyrd - Simple Man (Subtitulado Español/Inglés)', Duration='5:53', URL='https://www.youtube.com/watch?v=fx4qXtdaNpE', Views='75,56,783')
Video(Title='Metallica - Nothing else matter lyrics', Duration='6:37', URL='https://www.youtube.com/watch?v=x7bIbVlIqEc', Views='7,10,06,979')
Video(Title='The Animals - House Of The Rising Sun (LYRICS)', Duration='4:30', URL='https://www.youtube.com/watch?v=y2oKRKZnEoA', Views=

### Getting the Discription of video

In [171]:
soup.find('p', {'id':'eow-description'}).text

'"Stairway to Heaven" is a song by the English rock band Led Zeppelin. It was composed by guitarist Jimmy Page and vocalist Robert Plant for the band\'s fourth unnamed studio album, (see Led Zeppelin IV (1971)). The song was voted #3 in 2000 by VH1 on their list of the 100 Greatest Rock Songs.[1] It was the most requested song on FM radio stations in the United States in the 1970s, despite never having been released as a single there.[2] In November 2007, through download sales promoting Led Zeppelin\'s Mothership release, "Stairway to Heaven" hit #37 on the UK Singles Chart.[3]'