## Web Scraping with `Requests` and `BeautifulSoup`


#### 1. Import `requests` and `BeautifulSoup` and read an url of your choice:

In [1]:
import requests
from bs4 import BeautifulSoup

response = requests.get('https://www.pokemon.com/us/')

In [2]:
response.status_code

200

- html:

In [3]:
response.content

b'\n\n\n\n\n\n<!DOCTYPE html>\n<html class="no-js " lang="en">\n<head>\n  <meta charset="utf-8" />\n  <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" /><script type="text/javascript">(window.NREUM||(NREUM={})).loader_config={xpid:"VQ4OWFZXGwIAXFZTBgI=",licenseKey:"ba34eb72cb",applicationID:"1087113"};window.NREUM||(NREUM={}),__nr_require=function(t,e,n){function r(n){if(!e[n]){var o=e[n]={exports:{}};t[n][0].call(o.exports,function(e){var o=t[n][1][e];return r(o||e)},o,o.exports)}return e[n].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<n.length;o++)r(n[o]);return r}({1:[function(t,e,n){function r(t){try{c.console&&console.log(t)}catch(e){}}var o,i=t("ee"),a=t(27),c={};try{o=localStorage.getItem("__nr_flags").split(","),console&&"function"==typeof console.log&&(c.console=!0,o.indexOf("dev")!==-1&&(c.dev=!0),o.indexOf("nr_dev")!==-1&&(c.nrDev=!0))}catch(s){}c.nrDev&&i.on("internal-error",function(t){r(t.stack)}),c.dev&&i.on("fn-err",function(

- This is not easy to understand; let's get a more readable version with BeautifulSoup instead:

In [4]:
def makesoup(response):
    return BeautifulSoup(response.content, 'html5lib')

makesoup(response)

<!DOCTYPE html>
<html class="no-js" lang="en"><head>
  <meta charset="utf-8"/>
  <meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/><script type="text/javascript">(window.NREUM||(NREUM={})).loader_config={xpid:"VQ4OWFZXGwIAXFZTBgI=",licenseKey:"ba34eb72cb",applicationID:"1087113"};window.NREUM||(NREUM={}),__nr_require=function(t,e,n){function r(n){if(!e[n]){var o=e[n]={exports:{}};t[n][0].call(o.exports,function(e){var o=t[n][1][e];return r(o||e)},o,o.exports)}return e[n].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<n.length;o++)r(n[o]);return r}({1:[function(t,e,n){function r(t){try{c.console&&console.log(t)}catch(e){}}var o,i=t("ee"),a=t(27),c={};try{o=localStorage.getItem("__nr_flags").split(","),console&&"function"==typeof console.log&&(c.console=!0,o.indexOf("dev")!==-1&&(c.dev=!0),o.indexOf("nr_dev")!==-1&&(c.nrDev=!0))}catch(s){}c.nrDev&&i.on("internal-error",function(t){r(t.stack)}),c.dev&&i.on("fn-err",function(t,e,n){r(n.stack)}),c.

- We can also use the prettify() function from BeautifulSoup:

In [5]:
def makeprettysoup(response):
    return BeautifulSoup(response.content, 'html5lib').prettify()

makeprettysoup(response)

'<!DOCTYPE html>\n<html class="no-js" lang="en">\n <head>\n  <meta charset="utf-8"/>\n  <meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>\n  <script type="text/javascript">\n   (window.NREUM||(NREUM={})).loader_config={xpid:"VQ4OWFZXGwIAXFZTBgI=",licenseKey:"ba34eb72cb",applicationID:"1087113"};window.NREUM||(NREUM={}),__nr_require=function(t,e,n){function r(n){if(!e[n]){var o=e[n]={exports:{}};t[n][0].call(o.exports,function(e){var o=t[n][1][e];return r(o||e)},o,o.exports)}return e[n].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<n.length;o++)r(n[o]);return r}({1:[function(t,e,n){function r(t){try{c.console&&console.log(t)}catch(e){}}var o,i=t("ee"),a=t(27),c={};try{o=localStorage.getItem("__nr_flags").split(","),console&&"function"==typeof console.log&&(c.console=!0,o.indexOf("dev")!==-1&&(c.dev=!0),o.indexOf("nr_dev")!==-1&&(c.nrDev=!0))}catch(s){}c.nrDev&&i.on("internal-error",function(t){r(t.stack)}),c.dev&&i.on("fn-err",function(t,e,n)

#### 2. Create a `BeautifulSoup` object and navigate the data structure:

In [6]:
soup = BeautifulSoup(response.text, 'html.parser')
soup


<!DOCTYPE html>

<html class="no-js" lang="en">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/><script type="text/javascript">(window.NREUM||(NREUM={})).loader_config={xpid:"VQ4OWFZXGwIAXFZTBgI=",licenseKey:"ba34eb72cb",applicationID:"1087113"};window.NREUM||(NREUM={}),__nr_require=function(t,e,n){function r(n){if(!e[n]){var o=e[n]={exports:{}};t[n][0].call(o.exports,function(e){var o=t[n][1][e];return r(o||e)},o,o.exports)}return e[n].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<n.length;o++)r(n[o]);return r}({1:[function(t,e,n){function r(t){try{c.console&&console.log(t)}catch(e){}}var o,i=t("ee"),a=t(27),c={};try{o=localStorage.getItem("__nr_flags").split(","),console&&"function"==typeof console.log&&(c.console=!0,o.indexOf("dev")!==-1&&(c.dev=!0),o.indexOf("nr_dev")!==-1&&(c.nrDev=!0))}catch(s){}c.nrDev&&i.on("internal-error",function(t){r(t.stack)}),c.dev&&i.on("fn-err",function(t,e,n){r(n.stack)}),c.d

In [7]:
soup.title

<title>The Official Pokémon Website | Pokemon.com | 
  
    Explore the World of Pokémon
  
</title>

In [8]:
soup.title.name

'title'

In [9]:
soup.title.string

'The Official Pokémon Website | Pokemon.com | \n  \n    Explore the World of Pokémon\n  \n'

In [10]:
soup.title.parent.name

'head'

In [11]:
soup.p

<p>See how the lightweight Whimsicott has what it takes to bring down the mightiest Pokémon TCG opponents.</p>

In [12]:
soup.a

<a class="nav-toggle" href="#"><span class="icon icon_hamburger"></span></a>

In [13]:
soup.div

<div id="gus-wrapper">
<div class="pokemon-gus-container" data-api="https://www.pokemon.com/api/gus/" data-locale="en" data-site="pcom" data-width="913px"></div>
</div>

In [14]:
soup.a['class']

['nav-toggle']

In [15]:
soup.a['href']

'#'

In [16]:
soup.div['id']

'gus-wrapper'

#### 3. Use the `find()` method to get a result from a specific tag:

In [17]:
soup.find('a')

<a class="nav-toggle" href="#"><span class="icon icon_hamburger"></span></a>

In [18]:
soup.find('p')

<p>See how the lightweight Whimsicott has what it takes to bring down the mightiest Pokémon TCG opponents.</p>

In [19]:
soup.find('div')

<div id="gus-wrapper">
<div class="pokemon-gus-container" data-api="https://www.pokemon.com/api/gus/" data-locale="en" data-site="pcom" data-width="913px"></div>
</div>

In [20]:
soup.find(id="gus-wrapper")

<div id="gus-wrapper">
<div class="pokemon-gus-container" data-api="https://www.pokemon.com/api/gus/" data-locale="en" data-site="pcom" data-width="913px"></div>
</div>

#### 4. Use `find_all()` to get all the results of a given type:

In [21]:
soup.find_all('a')

[<a class="nav-toggle" href="#"><span class="icon icon_hamburger"></span></a>,
 <a href="/us/pokemon-trainer-club/login">
 <div class="avatar-icon-wrapper">
 <img alt="View Profile" class="avatar-icon avatar-icon-mobile" src="https://assets.pokemon.com/static2/_ui/img/chrome/profile-navigation/profile-nav-avatar.png"/>
 </div>
 </a>,
 <a href="/us/pokemon-trainer-club/login">
 <div class="avatar-icon-wrapper">
 <img alt="Log In / Sign Up" class="avatar-icon avatar-icon-mobile" src="https://assets.pokemon.com/static2/_ui/img/chrome/profile-navigation/profile-nav-signup.png"/>
 </div>
 <div class="sign-in-text-wrapper sign-in-text-wrapper-mobile">
                             Log In
                         </div>
 </a>,
 <a data-content-category="" data-content-download="" data-content-id="" data-content-location="" data-content-type="Sidebar" data-content-variation="sidebarLeft" href="/us/" target="_self">
 <span class="fill"></span>
 <span class="icon icon_home">
 </span>
 <span class

In [22]:
all_links = soup.find_all('a')
all_images = soup.find_all('img')
print(len(all_links))
print(len(all_images))

120
52


In [23]:
all_links[0]

<a class="nav-toggle" href="#"><span class="icon icon_hamburger"></span></a>

In [24]:
all_links[-1]

<a class="closeBtn button button-orange no-arrow right" href="#"><i class="icon icon_multiply"></i> Close</a>

In [25]:
all_images[:3]

[<img alt="View Profile" class="avatar-icon avatar-icon-mobile" src="https://assets.pokemon.com/static2/_ui/img/chrome/profile-navigation/profile-nav-avatar.png"/>,
 <img alt="Log In / Sign Up" class="avatar-icon avatar-icon-mobile" src="https://assets.pokemon.com/static2/_ui/img/chrome/profile-navigation/profile-nav-signup.png"/>,
 <img alt="Sign In" class="avatar-icon" src="https://assets.pokemon.com/static2/_ui/img/chrome/profile-navigation/profile-nav-signup.png"/>]

#### 5. Get the `alt` attributes from all the `<img>` elements:

In [26]:
for link in soup.find_all('img'):
    print(link.get('alt'))

View Profile
Log In / Sign Up
Sign In
Profile
Profile
Profile
Top Deck Academy: Small Pokémon Can Do Big Damage, Too
Fletchling Features in March Community Day
Compete in the Spikemuth Cup
See Urshifu’s Two Sword & Shield—Battle Styles Cards
Pokémon Journeys: The Series Recap
Test Your Pokémon TCG: Sword & Shield Series Knowledge with This Background Pokémon Quiz
Top Deck Academy: Small Pokémon Can Do Big Damage, Too
See Urshifu’s Two <em>Sword & Shield—Battle Styles</em> Cards
<em>Pokémon Journeys: The Series</em> Recap
Fletchling Features in March Community Day
Compete in the Spikemuth Cup
Test Your Pokémon TCG: <em>Sword & Shield</em> Series Knowledge with This Background Pokémon Quiz
Pidgey
Rampardos
Palkia
Stoutland
Duosion
Delphox
Aegislash
Toucannon
Komala
Stonjourner
Pidgey
Rampardos
Palkia
Stoutland
Duosion
Delphox
Aegislash
Toucannon
Komala
Stonjourner
Get a Most Melodic Pikachu
Return to the Sinnoh Region
Celebrate the Season of Legends in Pokémon GO from March 1 to June 1
E