# Extract search results with BeautifulSoup: PBS.org - part 03
In our previous Notebook, we scraped only one page of the results. At the time of writing, there were 30 pages. By adding an extra for-loop to the code, we will traverse through all the pages. But before we do this, we will make the code dynamic so you can scrape multiple keywords from the site (if you want to)

### 1. Retrieve how many pages there are
This will vary per website, but luckily PBS.org displays the final page in the pagination overview. If you click on a link, you see the URL of your browser changes into something like:
`https://www.pbs.org/newshour/search-results?q=%22artificial+intelligence%22&pnb=2` where `&pnb=2` is the current page. Again, this will change from site to site, but it is a welcome way to scrape for now.
So now we need to know how many pages there are. Looking at the HTML code, the best strategy is to get the last item of the class `pagination__number`

In [2]:
import requests
from bs4 import BeautifulSoup

# we need the %22 or " to ensure that we get the combination artificial intelligence
url = 'https://www.pbs.org/newshour/search-results?q=%22artificial%20intelligence%22'

# get url
page = requests.get(url)

# transform to soup
soup = BeautifulSoup(page.content, 'html')

# search for pagination links
pages = soup.find_all(class_='pagination__number')

# [-1] selects last item in a list
last_page = pages[-1].get_text()

# convert to int
number_of_pages = int(last_page)

number_of_pages


50

### 2. Create URL list
Now we have our total number of pages we can create a nice url list. The `url_list` should be:
`['https://www.pbs.org/newshour/search-results?q=%22artificial+intelligence%22&pnb=1',
 'https://www.pbs.org/newshour/search-results?q=%22artificial+intelligence%22&pnb=2', ...
 'https://www.pbs.org/newshour/search-results?q=%22artificial+intelligence%22&pnb=30'`
This can be achieved by using a for-loop with a `range()`

In [None]:
import urllib.parse

def build_search_url(page):
    url = 'https://www.pbs.org/newshour/search-results?'
    params = {'q': '"artifical intelligence"', 'pnb': page}
    encoded = urllib.parse.urlencode(params)
    return url+encoded

url_list = [build_search_url(n) for n in range(1, 51)]
url_list

### 3. Retrieve all the article URLs and save them in a list 
Use the `url_list` and collect all the URLs of the articles of each page. The `article_list` should only contain the URLs of the articles.

In [15]:
import requests
from bs4 import BeautifulSoup
import time
from itertools import chain

def urls_from_search_page(soup):
    results = soup.find_all(class_='search-result__title')
    url_list = list(map(
        lambda soup: soup.find('a')['href'],
        results
    ))
    return url_list

def soup_from_search_page(search_page_url):
   print('Retrieving', search_page_url)
   res = requests.get(search_page_url)
   return BeautifulSoup(res.content)

article_list = [
   urls_from_search_page(soup_from_search_page(url))
   for url in url_list
]

article_list = list(chain(*article_list))
len(article_list)

Retrieving https://www.pbs.org/newshour/search-results?q=%22artifical+intelligence%22&pnb=1
Retrieving https://www.pbs.org/newshour/search-results?q=%22artifical+intelligence%22&pnb=2
Retrieving https://www.pbs.org/newshour/search-results?q=%22artifical+intelligence%22&pnb=3
Retrieving https://www.pbs.org/newshour/search-results?q=%22artifical+intelligence%22&pnb=4
Retrieving https://www.pbs.org/newshour/search-results?q=%22artifical+intelligence%22&pnb=5
Retrieving https://www.pbs.org/newshour/search-results?q=%22artifical+intelligence%22&pnb=6
Retrieving https://www.pbs.org/newshour/search-results?q=%22artifical+intelligence%22&pnb=7
Retrieving https://www.pbs.org/newshour/search-results?q=%22artifical+intelligence%22&pnb=8
Retrieving https://www.pbs.org/newshour/search-results?q=%22artifical+intelligence%22&pnb=9
Retrieving https://www.pbs.org/newshour/search-results?q=%22artifical+intelligence%22&pnb=10
Retrieving https://www.pbs.org/newshour/search-results?q=%22artifical+intellige

500

### 4. Go through the list of articles and save the individual files.
Look at the previous Notebooks in order to solve this part. Don't forget to use `article_list`. This can take some time to complete ±15 minutes

In [None]:
from tqdm import tqdm
import time

def save_article(article_url):
    page = requests.get(article_url)
    filename = article_url.replace('https://www.pbs.org/newshour/', '').replace('/', '-') + '.html'
    destination = './data/' + filename
    
    with open(destination, 'w') as f:
        f.write(page.text)
    time.sleep(.5)

for article in tqdm(article_list):
    save_article(article)
