# Introduction
Hello!  In this tutorial we will scrape a more complicated webage from Wikipedia.  This is a continuation of [Part 1](https://onefortheroad.github.io/python/tutorial/2017/04/29/web-scraping-part-1/) where we learned the basics of web scraping.

When we left off Part 1, we had a *pandas* dataframe containing the Top 100 Canadian Beers. I'd like to add some **geospatial** information to our beer list so I can plan a pilgrimage to these fantastic breweries.  (Actually, we'll use this geospatial information in a future tutorial on visualization.)  Wikipedia's [List of Breweries in Canada](https://en.wikipedia.org/wiki/List_of_breweries_in_Canada) is a fine place to start.  Let's go!

## Contents


# 1. Import Libraries

In [1]:
import re
import requests
from bs4 import BeautifulSoup
import pandas as pd

# 2. Download the web page

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_breweries_in_Canada'
page = requests.get(url)

# 3. Examine the HTML
Looking at the [wiki](https://en.wikipedia.org/wiki/List_of_breweries_in_Canada), the breweries are listed by province.  The HTML for the breweries in Alberta looks like this:

```html
<h3><span class="mw-headline" id="Alberta">Alberta</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/w/index.php?title=List_of_breweries_in_Canada&amp;action=edit&amp;section=2" title="Edit section: Alberta">edit</a><span class="mw-editsection-bracket">]</span></span></h3>
<ul>
<li>Alley Kat Brewing Company (<a href="/wiki/Edmonton" title="Edmonton">Edmonton</a>)</li>
...
</ul>
```

We can start thinking of the structure, and hence our parse logic, as follows:
- Heading `<h3>` followed by a `<span>` with class `mw-headline` gives the province
- For each province, look for the *list item* `<li>` tag which gives the individual brewery
- Repeat for each province

# 4. Parse the HTML
Before we go all-in and try to parse the entire page, let's start with the first province in the wiki page and see if we can get the first brewery there. Coding gradually and testing often like this makes debugging easier and development faster.

We'll first turn our `page` object into a Beautiful Soup object, then start looking for the headings denoting provinces:

In [14]:
soup = BeautifulSoup(page.content, 'lxml')

provinces = soup.find_all('h3')



In [15]:
provinces

[<h3><span class="mw-headline" id="Alberta">Alberta</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/w/index.php?title=List_of_breweries_in_Canada&amp;action=edit&amp;section=2" title="Edit section: Alberta">edit</a><span class="mw-editsection-bracket">]</span></span></h3>,
 <h3><span class="mw-headline" id="British_Columbia">British Columbia</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/w/index.php?title=List_of_breweries_in_Canada&amp;action=edit&amp;section=3" title="Edit section: British Columbia">edit</a><span class="mw-editsection-bracket">]</span></span></h3>,
 <h3><span class="mw-headline" id="Manitoba">Manitoba</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/w/index.php?title=List_of_breweries_in_Canada&amp;action=edit&amp;section=4" title="Edit section: Manitoba">edit</a><span class="mw-editsection-bracket">]</span></span></h3>,
 <h3><span class="mw-h