# Beer

I like beer...a lot.  This project has as its goal to build a dataset containing information about a variety of different beers, scraped from `beeradvocates.com`.  This site has a few hundred recent reviews of different beer.  Once we acquire the data, we will try to use it in building a beer recommendation application.


![](images/beer.png)

In [1]:
import requests
from bs4 import BeautifulSoup

In [4]:
url = 'http://www.beeradvocate.com/beer/'

In [6]:
response = requests.get(url, verify=False)



In [7]:
soup = BeautifulSoup(response.text, 'html.parser')

## Strategy

My goal is to build a dataset that has the features containing the typical information presented in these reviews.  For example, the first entry upon retrieval for me was 

**TITLE:** Bush League Sour Ale

**COMPANY**: 4th Meridian Brewing Co.

**STYLE**: American Wild Ale

**ABV:** 4.50%

**SCORE**: 3.74

**LOOK**: 3.5

**SMELL**: 3.75

**TASTE**: 3.75

**FEEL**: 3.75

If we can get this information for each entry, maybe we'll be off to a good start.  Because there is the structure to each, but on an early glance to the page it seems that there will be problems if we try to get a list of each item instead of grabbing each review and slicing each from there.  To find the limits of this container we can simply search in the inspector.  A quick look shows certain `div` tags bound the postings.

```html
<div id = 'rating_fullview_container'>
    ```
    
Here, we will see the HTML code resulting from the scrape.  Next, we can pop it into an `HTML` magic cell and see that we have what we want.



In [8]:
soup.find('div', {'id': 'rating_fullview_container'})

<div ba-user="894557" class="user-comment" id="rating_fullview_container"><div id="rating_fullview_user"><div style="padding:3px; background:#E8E8E8;"><a class="username" href="/community/members/hodgson.894557/"><img alt="Photo of Hodgson" border="0" height="48" src="styles/default/xenforo/avatars/avatar_male_m.png" width="48"/></a></div></div><div id="rating_fullview_content_2"><a href="/beer/profile/12949/367058/"><img alt="Loon Lager" border="0" height="90" src="https://cdn.beeradvocate.com/im/placeholder-beer.jpg" style="float:right; margin:0px 0px 10px 10px;" width="45"/></a><h6><a href="/beer/profile/12949/367058/">Loon Lager</a></h6><br/><a href="/beer/profile/12949/">Barley Days Brewery</a><br/><a href="/beer/style/155/">American Pale Lager</a> / 4.50% ABV<br/><br/><span class="BAscore_norm">3.86</span><span class="rAvg_norm">/5</span><span style="font-size:1.5em;font-weight:700;"></span>  <span class="muted">rDev 0%</span> | Score: 4.43<br/><span class="muted">look: 4.5 | sme

In [9]:
%%HTML
<div ba-user="168458" class="user-comment" id="rating_fullview_container"><div id="rating_fullview_user"><div style="padding:3px; background:#E8E8E8;"><a class="username" href="/community/members/biboergosum.168458/"><img alt="Photo of biboergosum" border="0" height="48" src="https://cdn.beeradvocate.com/data/avatars/m/168/168458.jpg?1330550786" width="48"/></a></div></div><div id="rating_fullview_content_2"><a href="/beer/profile/47607/366919/"><img alt="Bush League Sour Ale" border="0" height="90" src="https://cdn.beeradvocate.com/im/placeholder-beer.jpg" style="float:right; margin:0px 0px 10px 10px;" width="45"/></a><h6><a href="/beer/profile/47607/366919/">Bush League Sour Ale</a></h6><br><a href="/beer/profile/47607/">4th Meridian Brewing Co.</a><br><a href="/beer/style/171/">American Wild Ale</a> / 4.50% ABV<br><br><span class="BAscore_norm">3.74</span><span class="rAvg_norm">/5</span><span style="font-size:1.5em;font-weight:700;"></span>  <span class="muted">rDev 0%</span> | Score: 3.74<br><span class="muted">look: 3.5 | smell: 3.75 | taste: 3.75 | feel: 3.75 |  overall: 3.75</span><br><br><div class="comment">8oz glass at Beer Revolution YEG Oliver Square. No idea what kind of sour this is supposed to be.  More later.<br/>
<br/>
This beer appears a hazy, pale golden straw colour, with one solid finger of puffy, finely foamy, and mildly creamy off-white head, which leaves a few instances of randomly streaky lace around the glass as it quickly dissipates. <br/>
<br/>
It smells of grainy and crackery cereal malt, some mixed domestic citrus peel, a tame earthy yeastiness, and plain leafy, herbal, and floral green hop bitters. The taste is gritty and grainy pale malt, some muddled melon and citrus fruit bowl notes, faintly tart yeast, and more understated leafy, musty, and dead grassy hoppiness. <br/>
<br/>
The carbonation is fairly active in its palate-tickling frothiness, the body a so-so medium weight, and generally smooth, with nothing really getting uppity, as such, here.  It finishes trending dry, the lingering fruitiness exhibiting its sour tendencies. <br/>
<br/>
Overall - this sure is a refreshing and enjoyable version of the style, with the fruity character contributing in large part.  Nothing bush league about this offering, unless you are the kind of sour snob who insists on losing tooth enamel.</div><br><i class="fas fa-align-left"></i> <span class="muted">1,176 characters</span><br><br><div><span class="muted"><a class="username" href="/community/members/biboergosum.168458/">biboergosum</a>, <a href="/beer/profile/47607/366919/?ba=biboergosum#review">A moment ago</a></span></div></br></br></br></br></br></br></br></br></br></br></div></div>


In [10]:
posts = soup.find_all('div', {'id': 'rating_fullview_container'})

In [12]:
soup.find_all('div', {'id': 'rating_fullview_container'})[0].find_all('a')

[<a class="username" href="/community/members/hodgson.894557/"><img alt="Photo of Hodgson" border="0" height="48" src="styles/default/xenforo/avatars/avatar_male_m.png" width="48"/></a>,
 <a href="/beer/profile/12949/367058/"><img alt="Loon Lager" border="0" height="90" src="https://cdn.beeradvocate.com/im/placeholder-beer.jpg" style="float:right; margin:0px 0px 10px 10px;" width="45"/></a>,
 <a href="/beer/profile/12949/367058/">Loon Lager</a>,
 <a href="/beer/profile/12949/">Barley Days Brewery</a>,
 <a href="/beer/style/155/">American Pale Lager</a>,
 <a class="username" href="/community/members/hodgson.894557/">Hodgson</a>,
 <a href="/beer/profile/12949/367058/?ba=Hodgson#review">37 minutes ago</a>]

In [13]:
for ent in posts:
    print(ent.h6.text)

Loon Lager
Spruce Budd
Punkin Ale
Budweiser Reserve Copper Lager (Aged on Jim Beam Bourbon Barrel Staves)
Barrel Runner
Abbot Ale
Johnny’s American Session IPA
Milkshake IPA #3
Pipeworks / ColdFire - Tropic Of Unicorn
Fresh Off The Farm
Porter Baltycki
Franziskaner Hefe-Weisse
Frisson
Playafication
Session Watermelon Wheat
Jelly King
Sweet Action
Revivalists
The Boysenberry
Schlafly Sour Blonde Ale
Speedway Stout - Cinnamon Vanilla - Bourbon Barrel-Aged
Resin
Scrimshaw Pilsner
Humulo Nimbus
Rose ale


## Scrape All Pages

If we follow the next page of reviews, you might notice that the url contains a query that will take us to pages that look like

```
'https://www.beeradvocate.com/beer/?start=25'
```

where the number is a continued higher multiple of 25.  The script below takes advantage of this structure and retrieves a few hundred different beer names.  

In [14]:
titles = []
for num in [25*i for i in range(20)]:
    link = url + '?start=' + str(num)
    response = requests.get(link, verify=False)
    soup = BeautifulSoup(response.text, 'html.parser')
    posts = soup.find_all('div', {'id': 'rating_fullview_container'})
    for ent in posts:
        titles.append(ent.h6.text)



In [18]:
type(titles)

list

In [19]:
len(titles)

270

In [26]:
titles

['Midnight Ember',
 'Coors Light',
 'First Anniversary Blend (Off the Cuff)',
 'Loon Lager',
 'Spruce Budd',
 'Punkin Ale',
 'Budweiser Reserve Copper Lager (Aged on Jim Beam Bourbon Barrel Staves)',
 'Barrel Runner',
 'Abbot Ale',
 'Johnny’s American Session IPA',
 'Milkshake IPA #3',
 'Pipeworks / ColdFire - Tropic Of Unicorn',
 'Fresh Off The Farm',
 'Porter Baltycki',
 'Franziskaner Hefe-Weisse',
 'Frisson',
 'Playafication',
 'Session Watermelon Wheat',
 'Jelly King',
 'Sweet Action',
 'Revivalists',
 'The Boysenberry',
 'Schlafly Sour Blonde Ale',
 'Speedway Stout - Cinnamon Vanilla - Bourbon Barrel-Aged',
 'Resin',
 'The Black Sow',
 'Edmund Fitzgerald Porter',
 'Rokoko',
 'Death of the Author',
 'Quince Essential Hazy Ale',
 'Thunder Gun Express',
 'Lervig Lucky Jack',
 'Grodziskie',
 'Bare Tree: The Blend',
 'Black Forest',
 'Stone / Societe - The Skedaddler',
 '1000 IBU',
 'Gweilo Blazon IPA',
 'Midnight Ember',
 'Coors Light',
 'First Anniversary Blend (Off the Cuff)',
 'Loo

### PROBLEM

Complete the code snippet above and create a dataframe based on the retrieved lists.  What are the top 10 overall reviews?  ABV?  