# WEB SCRAPING Lab

I am going to be using Python and several Python libraries such as Beautiful Soup, to find the top subscribed individual YouTube channels that are not backed by a big corporation or large team. Beautiful Soup is a library of Python that is designed for pulling data out of HTML and XML files. This is accomplished by representing the data that is pulled as a set of objects via parsing. This data can then be navigated as a tree and filter out what exactly we are looking for. In this lab I will be using HTML. 

In [5]:
!mamba install bs4==4.10.0 -y
!pip install lxml==4.6.4
!mamba install html5lib==1.1 -y
# !pip install requests==2.26.0

/usr/bin/sh: mamba: command not found
Collecting lxml==4.6.4
  Downloading lxml-4.6.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.9 MB)
[K     |████████████████████████████████| 6.9 MB 27.7 MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
  Attempting uninstall: lxml
    Found existing installation: lxml 4.7.1
    Uninstalling lxml-4.7.1:
      Successfully uninstalled lxml-4.7.1
Successfully installed lxml-4.6.4
/usr/bin/sh: mamba: command not found


In [8]:
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests  # this module helps us to download a web page

In [29]:
# Let us consider the following HTML


%% html
<!DOCTYPE html>
<html>
<head>
<title> Page Title </title>
</head>
<body>
<h3><b id = "boldest"> Pewdiepie</b></h3>
<p> Subscribers: 111,000,000 </p>
<h3> MrBeast </h3>
<p> Subscribers:92,100,000 </p>
<h3> Kids Diana Show </h3>
<p> Subscribers: 91,900,000 </p>
</body>
</html>

In [30]:
# We can store the information as a string in the variable labeled html

html ="<!DOCTYPE html><html><head><title>Page Title</title></head><body><h3><b id='boldest'>PewDiePie</b></h3><p> Subscribers: 111,000,000 </p><h3> MrBeast</h3><p> Subscribers: 92,100, 000 </p><h3> Kids Diana Show </h3><p> Subscribers: 91,900, 000</p></body></html>"

In [31]:
# To pasre a document,we pass it through the BeautifulSoup constructor

soup = BeautifulSoup(html, "html.parser")

In [32]:
# We can use the method prettify() to display HTML in a nested structure

print(soup.prettify())

<!DOCTYPE html>
<html>
 <head>
  <title>
   Page Title
  </title>
 </head>
 <body>
  <h3>
   <b id="boldest">
    PewDiePie
   </b>
  </h3>
  <p>
   Subscribers: 111,000,000
  </p>
  <h3>
   MrBeast
  </h3>
  <p>
   Subscribers: 92,100, 000
  </p>
  <h3>
   Kids Diana Show
  </h3>
  <p>
   Subscribers: 91,900, 000
  </p>
 </body>
</html>


### Tags

The Tag object can be used to find the title of the page and the number one subscribed channel based on our criteria

In [33]:
tag_object = soup.title
print('tag object:', tag_object)

tag object: <title>Page Title</title>


In [37]:
# We can also see the tag type

print("tag object type:",type(tag_object))

tag object type: <class 'bs4.element.Tag'>


In [39]:
# If there is more than one tag with the same name, the first element with that tag name is called

tag_object = soup.h3
tag_object

<h3><b id="boldest">PewDiePie</b></h3>

### Children, Parents, and Siblings

In [40]:
# We can navigate the tag tree and find the child

tag_child = tag_object.b
tag_child

#b is the child of h3


<b id="boldest">PewDiePie</b>

In [41]:
# Parent tag

parent_tag = tag_child.parent
parent_tag

<h3><b id="boldest">PewDiePie</b></h3>

In [43]:
tag_object

<h3><b id="boldest">PewDiePie</b></h3>

In [42]:
# We can also find the body element

tag_object.parent

<body><h3><b id="boldest">PewDiePie</b></h3><p> Subscribers: 111,000,000 </p><h3> MrBeast</h3><p> Subscribers: 92,100, 000 </p><h3> Kids Diana Show </h3><p> Subscribers: 91,900, 000</p></body>

In [46]:
# Sibling is the paragraph element 

sibling_1 = tag_object.next_sibling
sibling_1

<p> Subscribers: 111,000,000 </p>

In [47]:
sibling_2 = sibling_1.next_sibling
sibling_2

<h3> MrBeast</h3>

### HTML Attributes

If a tag has attributes, the tag id = "boldest" had the attribute id whose value is boldest. We can access the attributes of a tag by treating the taf like a dict.

In [48]:
tag_child["id"]

'boldest'

In [49]:
# Dictionary access

tag_child.attrs

{'id': 'boldest'}

In [50]:
#The get method also works to access the content

tag_child.get("id")

'boldest'

### Navigable String

BeautifulSoup uses the NavigableString class to contain a string that is contained as text.

In [51]:
tag_string = tag_child.string
tag_string

'PewDiePie'

In [52]:
# We can verify that the type is Navigable String

type(tag_string)

bs4.element.NavigableString

In [53]:
# A NavigableString is just like a Unicode string and we can convert it to a string object

unicode_string = str(tag_string)
unicode_string

'PewDiePie'

### Filter

Filtering allows us to find the complex patterns we need to know in order to mess with and interpret the data. Condifer the following HTML of GDP.

In [54]:
%%html
<table>
    <tr>
        <td id = "Rank" > GDP Rank </td>
        <td>Country</td>
        <td> GDP</td>
        </tr>
        <tr>
        <td>1</td>
        <td><a href ="https://en.wikipedia.org/wiki/United_States">United States</a></td>
        <td>20.89 trillion</td>
        </tr>
        <tr>
        <td>2</td>
        <td><a href = "https://en.wikipedia.org/wiki/China">China</a></td>
        <td>14.72 trillion</td>
        </tr>
        <tr>
        <td>3</td>
        <td><a href = "https://en.wikipedia.org/wiki/Japan">Japan</a></td>
        <td>5.07 trillion</td>
        </tr>
    </table>

0,1,2
GDP Rank,Country,GDP
1,United States,20.89 trillion
2,China,14.72 trillion
3,Japan,5.07 trillion


In [66]:
# We can store it a as a string 

table = '<table><tr><td id = "Rank" > GDP Rank </td><td>Country</td><td> GDP</td></tr><tr><td>1</td><td><a href ="https://en.wikipedia.org/wiki/United_States">United States</a></td><td>20.89 trillion</td></tr><tr><td>2</td><td><a href = "https://en.wikipedia.org/wiki/China">China</a></td><td>14.72 trillion</td></tr><tr><td>3</td><td><a href = "https://en.wikipedia.org/wiki/Japan">Japan</a></td><td>5.07 trillion</td></tr></table>'

In [67]:
table_bs = BeautifulSoup(table,"html.parser")

### Find All

We can use the find all method to look at information that matches our filters

In [76]:
table_rows=table_bs.find_all("tr")
table_rows

[<tr><td id="Rank"> GDP Rank </td><td>Country</td><td> GDP</td></tr>,
 <tr><td>1</td><td><a href="https://en.wikipedia.org/wiki/United_States">United States</a></td><td>20.89 trillion</td></tr>,
 <tr><td>2</td><td><a href="https://en.wikipedia.org/wiki/China">China</a></td><td>14.72 trillion</td></tr>,
 <tr><td>3</td><td><a href="https://en.wikipedia.org/wiki/Japan">Japan</a></td><td>5.07 trillion</td></tr>]

In [77]:
first_row=table_rows[0]
first_row

<tr><td id="Rank"> GDP Rank </td><td>Country</td><td> GDP</td></tr>

In [78]:
print(type(first_row))

<class 'bs4.element.Tag'>


In [79]:
first_row.td

<td id="Rank"> GDP Rank </td>

In [80]:
# If we iterate through the list, each element corresponds to a row in the table

for i,row in enumerate(table_rows):
    print("row",i,"is",row)

row 0 is <tr><td id="Rank"> GDP Rank </td><td>Country</td><td> GDP</td></tr>
row 1 is <tr><td>1</td><td><a href="https://en.wikipedia.org/wiki/United_States">United States</a></td><td>20.89 trillion</td></tr>
row 2 is <tr><td>2</td><td><a href="https://en.wikipedia.org/wiki/China">China</a></td><td>14.72 trillion</td></tr>
row 3 is <tr><td>3</td><td><a href="https://en.wikipedia.org/wiki/Japan">Japan</a></td><td>5.07 trillion</td></tr>


In [81]:
for i,row in enumerate(table_rows):
    print("row",i)
    cells=row.find_all("td")
    for j,cell in enumerate (cells):
        print("column",j,"cell",cell)

row 0
column 0 cell <td id="Rank"> GDP Rank </td>
column 1 cell <td>Country</td>
column 2 cell <td> GDP</td>
row 1
column 0 cell <td>1</td>
column 1 cell <td><a href="https://en.wikipedia.org/wiki/United_States">United States</a></td>
column 2 cell <td>20.89 trillion</td>
row 2
column 0 cell <td>2</td>
column 1 cell <td><a href="https://en.wikipedia.org/wiki/China">China</a></td>
column 2 cell <td>14.72 trillion</td>
row 3
column 0 cell <td>3</td>
column 1 cell <td><a href="https://en.wikipedia.org/wiki/Japan">Japan</a></td>
column 2 cell <td>5.07 trillion</td>


In [82]:
# If we use a list we can match any item in that list

list_input = table_bs.find_all(name=["tr", "td"])
list_input

[<tr><td id="Rank"> GDP Rank </td><td>Country</td><td> GDP</td></tr>,
 <td id="Rank"> GDP Rank </td>,
 <td>Country</td>,
 <td> GDP</td>,
 <tr><td>1</td><td><a href="https://en.wikipedia.org/wiki/United_States">United States</a></td><td>20.89 trillion</td></tr>,
 <td>1</td>,
 <td><a href="https://en.wikipedia.org/wiki/United_States">United States</a></td>,
 <td>20.89 trillion</td>,
 <tr><td>2</td><td><a href="https://en.wikipedia.org/wiki/China">China</a></td><td>14.72 trillion</td></tr>,
 <td>2</td>,
 <td><a href="https://en.wikipedia.org/wiki/China">China</a></td>,
 <td>14.72 trillion</td>,
 <tr><td>3</td><td><a href="https://en.wikipedia.org/wiki/Japan">Japan</a></td><td>5.07 trillion</td></tr>,
 <td>3</td>,
 <td><a href="https://en.wikipedia.org/wiki/Japan">Japan</a></td>,
 <td>5.07 trillion</td>]

### Attributes

If the argument is not recognized it will be turned into a filter on the tag's attributes.

In [85]:
table_bs.find_all(id= "Rank")

[<td id="Rank"> GDP Rank </td>]

In [87]:
# We can find the elements that have links to the United State Wiki

list_input = table_bs.find_all(href="https://en.wikipedia.org/wiki/United_States")
list_input

[<a href="https://en.wikipedia.org/wiki/United_States">United States</a>]

In [89]:
# If we set href to true it will find all links

list_input = table_bs.find_all(href=True)
list_input

[<a href="https://en.wikipedia.org/wiki/United_States">United States</a>,
 <a href="https://en.wikipedia.org/wiki/China">China</a>,
 <a href="https://en.wikipedia.org/wiki/Japan">Japan</a>]

In [92]:
list_input = table_bs.find_all("https" == True)
list_input

[<table><tr><td id="Rank"> GDP Rank </td><td>Country</td><td> GDP</td></tr><tr><td>1</td><td><a href="https://en.wikipedia.org/wiki/United_States">United States</a></td><td>20.89 trillion</td></tr><tr><td>2</td><td><a href="https://en.wikipedia.org/wiki/China">China</a></td><td>14.72 trillion</td></tr><tr><td>3</td><td><a href="https://en.wikipedia.org/wiki/Japan">Japan</a></td><td>5.07 trillion</td></tr></table>,
 <tr><td id="Rank"> GDP Rank </td><td>Country</td><td> GDP</td></tr>,
 <td id="Rank"> GDP Rank </td>,
 <td>Country</td>,
 <td> GDP</td>,
 <tr><td>1</td><td><a href="https://en.wikipedia.org/wiki/United_States">United States</a></td><td>20.89 trillion</td></tr>,
 <td>1</td>,
 <td><a href="https://en.wikipedia.org/wiki/United_States">United States</a></td>,
 <a href="https://en.wikipedia.org/wiki/United_States">United States</a>,
 <td>20.89 trillion</td>,
 <tr><td>2</td><td><a href="https://en.wikipedia.org/wiki/China">China</a></td><td>14.72 trillion</td></tr>,
 <td>2</td>,
 <

In [93]:
# We can also search for strings instead of tags

table_bs.find_all(string="United States")

['United States']

### Downloading and Scraping the Contents of a Web Page

In [110]:
# We can down the contents of the following web page

url = "https://www.reddit.com/"

# The get method is used to download in text format

data = requests.get(url).text

In [111]:
# Creat a BeautifulSoup object

soup = BeautifulSoup(data,"html.parser")

In [112]:
# Scrape all links

for link in soup.find_all("a", href = True):
    print(link.get("href"))

/
https://ads.reddit.com?utm_source=d2x_consumer&utm_name=top_nav_cta
https://www.reddit.com/login/?dest=https%3A%2F%2Fwww.reddit.com%2F
https://www.reddit.com/register/?dest=https%3A%2F%2Fwww.reddit.com%2F
/hot/
/new/
/top/
/rising/
/hot/
/new/
/top/
/rising/
/r/Damnthatsinteresting/
https://www.reddit.com/r/Damnthatsinteresting/comments/tjqoh3/ukrainian_troops_are_now_deploying_panzerfaust3it/
/r/Damnthatsinteresting/comments/tjqoh3/ukrainian_troops_are_now_deploying_panzerfaust3it/
/r/Damnthatsinteresting/?f=flair_name%3A%22Image%22
/r/Damnthatsinteresting/comments/tjqoh3/ukrainian_troops_are_now_deploying_panzerfaust3it/
/r/Damnthatsinteresting/comments/tjqoh3/ukrainian_troops_are_now_deploying_panzerfaust3it/
/r/funny/
https://www.reddit.com/r/funny/comments/tjpeji/i_spotted_the_cameras_during_my_graduation/
/r/funny/comments/tjpeji/i_spotted_the_cameras_during_my_graduation/
/r/funny/comments/tjpeji/i_spotted_the_cameras_during_my_graduation/
/r/funny/comments/tjpeji/i_spotted_th

### Scrape all images Tags

In [113]:
for link in soup.find_all("img"):
    print(link)
    print(link.get("src"))

<img class="_3BcAFuYpz37S0WFvgyWCUN" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADAAAAAwCAYAAABXAvmHAAAE9 0lEQVRoQ9WYW0hlZRTHf94TIXXAYRQzQwmFQQSVYSZ6UlESH+whRxMDX6IHJ1AQCXFQkSFUCIUeCkETUx/CF1FB8ClNpFFEX8T UKWkyxgFngjRvJ9Z278Pe57Yv51Kul33O3mut7/9fa32X9UVxwyXqhuMn0gR+BbJ0QfsU+CaYIEaSgEuAulzKQ3lGR0fLzzngA 6ckIkpAA68HGxWlQHCMw7Ghg4i5AhBIBE4d+HTO3OZgscB5bW0tk5OTbtPV1VXu3bv3v8/AZ8DXwAmQODw8TFNTE0dHR6SlpWl kHFeCY0MLGSgE1lW9D4Fp4AdAfmvyO5BpwZdflVATuF5ijJIAnAUDMpBtuAhIYa+GC7RhFQvxIFoGQh2YiJXQR8AU8C6wE+Lg+ HQXjkhFNAvhICCREhLh8m3IRLgGEQIXgGxgmqwA3wHfqt9CUmGhIvAe8KMDRPGyQzuwc5sES+AXIEe8tbS0MDAwYAnL2dkZt2/ f5tWrV5p+nNOsmBH4EmjzgeoKiE5ISOD01NEZzO3y6uqKmJgY7X8S8DdwB/hDffkx8L2/yPgj0AwMipGAfPjwIS9fvmRmZsbtx 9fJ0lL4/Sitra1RVFQUyMUn6hwyncTPgLcrKyuZm5Ne41ok3SkpKVRVVRmIBAPal63aH7gbH9GJj4/n/FyZKk+AL/R2nhmQqDf v7++TnZ3t1uvu7ubx48dsbGxQUFAQasxe/pKSkpQ5Ijg0SUxM1MrVgFn/R35fNTQ0MDY25jZ8/fo1ycnJHB8fK89Iya1bt9ja2 iIjI8M9pJYd/R6jJ2DoWTUrMRocHKS5WaZFZEWifnIibcS1TE1NKfPRLwH5ODEx4TY4PDwkKysLWfb+K6mpqWF6WlqJa/H

### Scrape data from HTML tables

In [117]:
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/datasets/HTMLColorCodes.html"
data = requests.get(url).text

In [119]:
soup=BeautifulSoup(data,"html.parser")

In [120]:
table=soup.find("table")

In [122]:
for row in table.find_all("tr"):
    cols=row.find_all("td")
    color_name=cols[2].string
    color_code=cols[3].string
    print("{}--->{}".format(color_name,color_code))

Color Name--->None
lightsalmon--->#FFA07A
salmon--->#FA8072
darksalmon--->#E9967A
lightcoral--->#F08080
coral--->#FF7F50
tomato--->#FF6347
orangered--->#FF4500
gold--->#FFD700
orange--->#FFA500
darkorange--->#FF8C00
lightyellow--->#FFFFE0
lemonchiffon--->#FFFACD
papayawhip--->#FFEFD5
moccasin--->#FFE4B5
peachpuff--->#FFDAB9
palegoldenrod--->#EEE8AA
khaki--->#F0E68C
darkkhaki--->#BDB76B
yellow--->#FFFF00
lawngreen--->#7CFC00
chartreuse--->#7FFF00
limegreen--->#32CD32
lime--->#00FF00
forestgreen--->#228B22
green--->#008000
powderblue--->#B0E0E6
lightblue--->#ADD8E6
lightskyblue--->#87CEFA
skyblue--->#87CEEB
deepskyblue--->#00BFFF
lightsteelblue--->#B0C4DE
dodgerblue--->#1E90FF


### Scrape data from HTML tables into a DataFrame using BeautifulSoup and Pandas

In [150]:
import pandas as pd

In [204]:
url = "https://en.wikipedia.org/wiki/World_population"
data  = requests.get(url).text

In [205]:
soup = BeautifulSoup(data,"html.parser")

In [206]:
tables = soup.find_all("table")

In [207]:
len(tables)

26

In [217]:
for index,table in enumerate(tables):
    if ("10 most densely populated countries" in str(table)):
        table_index = index
print(table_index)

5


In [218]:
print(tables[table_index].prettify())

<table class="wikitable sortable" style="text-align:right">
 <caption>
  10 most densely populated countries
  <small>
   (with population above 5 million)
  </small>
 </caption>
 <tbody>
  <tr>
   <th>
    Rank
   </th>
   <th>
    Country
   </th>
   <th>
    Population
   </th>
   <th>
    Area
    <br/>
    <small>
     (km
     <sup>
      2
     </sup>
     )
    </small>
   </th>
   <th>
    Density
    <br/>
    <small>
     (pop/km
     <sup>
      2
     </sup>
     )
    </small>
   </th>
  </tr>
  <tr>
   <td>
    1
   </td>
   <td align="left">
    <span class="flagicon">
     <img alt="" class="thumbborder" data-file-height="600" data-file-width="900" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/4/48/Flag_of_Singapore.svg/23px-Flag_of_Singapore.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/4/48/Flag_of_Singapore.svg/35px-Flag_of_Singapore.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/4/48/Flag_of_Singapo

In [219]:
population_data = pd.DataFrame(columns=["Rank", "Country", "Population", "Area", "Density"])

for row in tables[table_index].tbody.find_all("tr"):
    col = row.find_all("td")
    if (col != []):
        rank = col[0].text
        country = col[1].text
        population = col[2].text.strip()
        area = col[3].text.strip()
        density = col[4].text.strip()
        population_data = population_data.append({"Rank":rank, "Country":country, "Population":population, "Area":area, "Density":density}, ignore_index=True)

population_data

Unnamed: 0,Rank,Country,Population,Area,Density
0,1,Singapore,5704000,710,8033
1,2,Bangladesh,172390000,143998,1197
2,3,\n Palestine\n\n,5266785,6020,847
3,4,Lebanon,6856000,10452,656
4,5,Taiwan,23604000,36193,652
5,6,South Korea,51781000,99538,520
6,7,Rwanda,12374000,26338,470
7,8,Haiti,11578000,27065,428
8,9,Netherlands,17700000,41526,426
9,10,Israel,9490000,22072,430


### Scrape data from HTML tables into a DataFrame using BeautifulSoup and read_html

In [227]:
pd.read_html(str(tables[5]), flavor ="bs4")

[   Rank      Country  Population  Area(km2)  Density(pop/km2)
 0     1    Singapore     5704000        710              8033
 1     2   Bangladesh   172390000     143998              1197
 2     3    Palestine     5266785       6020               847
 3     4      Lebanon     6856000      10452               656
 4     5       Taiwan    23604000      36193               652
 5     6  South Korea    51781000      99538               520
 6     7       Rwanda    12374000      26338               470
 7     8        Haiti    11578000      27065               428
 8     9  Netherlands    17700000      41526               426
 9    10       Israel     9490000      22072               430]

In [228]:
# we can also see the other tables in the wiki list by changing the index number

pd.read_html(str(tables[0]), flavor ="bs4")

[                                                    #  \
 0                                                   1   
 1                                                   2   
 2                                                   3   
 3                                                   4   
 4                                                   5   
 5                                                   6   
 6                                                   7   
 7                                                   8   
 8                                                   9   
 9                                                  10   
 10                                                NaN   
 11  Notes: .mw-parser-output .reflist{font-size:90...   
 
                               Most populous countries  \
 0                                            China[B]   
 1                                               India   
 2                                       United States   
 3          

In [230]:
pd.read_html(str(tables[2]), flavor ="bs4")

[  World population milestones in billions (Worldometers estimates)        \
                                                         Population     1   
 0                                               Year                1804   
 1                                      Years elapsed                   —   
 
                                                          
       2     3     4     5     6     7     8     9    10  
 0  1927  1960  1974  1987  1999  2011  2023  2037  2056  
 1   123    33    14    13    12    12    12    14    19  ]

In [231]:
pd.read_html(str(tables[4]), flavor ="bs4")

[   Rank        Country  Population % of world         Date  \
 0     1          China  1412397440      17.8%  17 Mar 2022   
 1     2          India  1374305400      17.3%  17 Mar 2022   
 2     3  United States   332566652      4.19%  17 Mar 2022   
 3     4      Indonesia   269603400      3.40%   1 Jul 2020   
 4     5       Pakistan   220892331      2.78%   1 Jul 2020   
 5     6         Brazil   214389640      2.70%  17 Mar 2022   
 6     7        Nigeria   206139587      2.60%   1 Jul 2020   
 7     8     Bangladesh   172388884      2.17%  17 Mar 2022   
 8     9         Russia   146748590      1.85%   1 Jan 2020   
 9    10         Mexico   127792286      1.61%   1 Jul 2020   
 
            Source(official or UN)  
 0   National population clock[91]  
 1   National population clock[92]  
 2   National population clock[93]  
 3  National annual projection[94]  
 4               UN Projection[95]  
 5   National population clock[96]  
 6               UN Projection[95]  
 7   Nati

In [232]:
#read_html always returned a list of DF, so we must pick the ones we want

population_data_read_html = pd.read_html(str(tables[5]), flavor='bs4')[0]

population_data_read_html

Unnamed: 0,Rank,Country,Population,Area(km2),Density(pop/km2)
0,1,Singapore,5704000,710,8033
1,2,Bangladesh,172390000,143998,1197
2,3,Palestine,5266785,6020,847
3,4,Lebanon,6856000,10452,656
4,5,Taiwan,23604000,36193,652
5,6,South Korea,51781000,99538,520
6,7,Rwanda,12374000,26338,470
7,8,Haiti,11578000,27065,428
8,9,Netherlands,17700000,41526,426
9,10,Israel,9490000,22072,430


In [234]:
# We can also use the match parameter to specify the table we want instead of the index

pd.read_html(url, match="World population milestones in billions ", flavor='bs4')[0]

Unnamed: 0_level_0,World population milestones in billions (Worldometers estimates),World population milestones in billions (Worldometers estimates),World population milestones in billions (Worldometers estimates),World population milestones in billions (Worldometers estimates),World population milestones in billions (Worldometers estimates),World population milestones in billions (Worldometers estimates),World population milestones in billions (Worldometers estimates),World population milestones in billions (Worldometers estimates),World population milestones in billions (Worldometers estimates),World population milestones in billions (Worldometers estimates),World population milestones in billions (Worldometers estimates)
Unnamed: 0_level_1,Population,1,2,3,4,5,6,7,8,9,10
0,Year,1804,1927,1960,1974,1987,1999,2011,2023,2037,2056
1,Years elapsed,—,123,33,14,13,12,12,12,14,19
