## Webscraping BBC News

#### To get started, install the library into your environment using the command line. We will also need the requests module to get the HTML code from the BBC. Run these in your terminal:

conda install beautifulsoup4 requests -y

#### OR

pip install beautifulsoup4 requests

In [42]:
import requests
from bs4 import BeautifulSoup as bs

In [33]:
#Link to the article on BBC.com

url = 'https://www.bbc.com/news/science-environment-53119686'

#### How to locate elements in the browser

Open the webpage in your browser, right-click and ‘inspect’.

Hover the mouse over any of the text, and the browser will tell you in which div that paragraph is in.

If it's not working, press Ctrl+Shift+C or CMD+shift+C and try again

In this case, "story-body_introduction" contains the highlighted paragraph

The div class "story-body_inner" with the property "articleBody" contains the entire article.

In [11]:
def get_bbc_text(url: str) -> list:
    
    #Access the article via python
    article = requests.get(url)
    
    #Pass article content to BeautifulSoup
    soup = bs(article.content, 'html.parser')
    
    #Find the body in the soup
    body = soup.find(property = 'articleBody')
    
    #This gives us the content of the page as a list of paragraphs (p) in raw HTML format.
    
    #Access the text of each paragraph and assign it to a new list using list comprehension
    text = [p.text for p in body.find_all("p")]
    return text

In [29]:
def get_bbc_title(url: str) -> str:
    article = requests.get(url)
    soup = bs(article.content, 'html.parser')
    
    #Finding the title in the soup
    title = soup.find(class_= 'story-body__h1').text
    return title

In [30]:
x = get_bbc_text(url)
y = get_bbc_title(url)

In [20]:
print(x)

["We've just become a little less ignorant about Planet Earth.", 'The initiative that seeks to galvanise the creation of a full map of the ocean floor says one-fifth of this task has now been completed.', 'When the Nippon Foundation-GEBCO Seabed 2030 Project was launched in 2017, only 6% of the global ocean bottom had been surveyed to what might be called modern standards.', 'That number now stands at 19%, up from 15% in just the last year.', 'Some 14.5 million sq km of new bathymetric (depth) data was included in the GEBCO grid in 2019 - an area equivalent to almost twice that of Australia.', 'It does, however, still leave a great swathe of the planet unmapped to an acceptable degree.', '"Today we stand at the 19% level. That means we\'ve got another 81% of the oceans still to survey, still to map. That\'s an area about twice the size of Mars that we have to capture in the next decade," project director Jamie McMichael-Phillips told BBC News.', 'The map at the top of this page illustr

In [31]:
print(y)

One-fifth of Earth's ocean floor is now mapped


## Refactoring into a class

In [49]:
class BBC:
    def __init__(self, url:str):
        article = requests.get(url)
        
        #Writing soup variable as self.soup makes it an attribute of the class and accessible anywhere within the class
        self.soup = bs(article.content, "html.parser")
        
        self.body = self.get_body()
        self.title = self.get_title()
        
    def get_body(self) -> list:
        body = self.soup.find(property = "articleBody")
        return [p.text for p in body.find_all("p")]
    
    def get_title(self) -> str:
        return self.soup.find(class_="story-body__h1").text

In [50]:
parsed = BBC('https://www.bbc.com/news/science-environment-53119686')

In [51]:
parsed.title

"One-fifth of Earth's ocean floor is now mapped"

In [52]:
parsed.body

["We've just become a little less ignorant about Planet Earth.",
 'The initiative that seeks to galvanise the creation of a full map of the ocean floor says one-fifth of this task has now been completed.',
 'When the Nippon Foundation-GEBCO Seabed 2030 Project was launched in 2017, only 6% of the global ocean bottom had been surveyed to what might be called modern standards.',
 'That number now stands at 19%, up from 15% in just the last year.',
 'Some 14.5 million sq km of new bathymetric (depth) data was included in the GEBCO grid in 2019 - an area equivalent to almost twice that of Australia.',
 'It does, however, still leave a great swathe of the planet in need of mapping to an acceptable degree.',
 '"Today we stand at the 19% level. That means we\'ve got another 81% of the oceans still to survey, still to map. That\'s an area about twice the size of Mars that we have to capture in the next decade," project director Jamie McMichael-Phillips told BBC News.',
 'The map at the top of 