# Beer

I like beer...a lot.  This project has as its goal to build a dataset containing information about a variety of different beers, scraped from `beeradvocates.com`.  This site has a few hundred recent reviews of different beer.  Once we acquire the data, we will try to use it in building a beer recommendation application.


![](images/beer.png)

In [1]:
import requests
from bs4 import BeautifulSoup

In [2]:
url = 'https://www.beeradvocate.com/beer/'

In [3]:
response = requests.get(url)

In [4]:
soup = BeautifulSoup(response.text, 'html.parser')

## Strategy

My goal is to build a dataset that has the features containing the typical information presented in these reviews.  For example, the first entry upon retrieval for me was 

**TITLE:** Bush League Sour Ale

**COMPANY**: 4th Meridian Brewing Co.

**STYLE**: American Wild Ale

**ABV:** 4.50%

**SCORE**: 3.74

**LOOK**: 3.5

**SMELL**: 3.75

**TASTE**: 3.75

**FEEL**: 3.75

If we can get this information for each entry, maybe we'll be off to a good start.  Because there is the structure to each, but on an early glance to the page it seems that there will be problems if we try to get a list of each item instead of grabbing each review and slicing each from there.  To find the limits of this container we can simply search in the inspector.  A quick look shows certain `div` tags bound the postings.

```html
<div id = 'rating_fullview_container'>
    ```
    
Here, we will see the HTML code resulting from the scrape.  Next, we can pop it into an `HTML` magic cell and see that we have what we want.



In [5]:
soup.find('div', {'id': 'rating_fullview_container'})

<div ba-user="621888" class="user-comment" id="rating_fullview_container"><div id="rating_fullview_user"><div style="padding:3px; background:#E8E8E8;"><a class="username" href="/community/members/emperorbevis.621888/"><img alt="Photo of EmperorBevis" border="0" height="48" src="https://cdn.beeradvocate.com/data/avatars/m/621/621888.jpg?1352062679" width="48"/></a></div></div><div id="rating_fullview_content_2"><a href="/beer/profile/139/316041/"><img alt="Steam Beer" border="0" height="90" src="https://cdn.beeradvocate.com/im/placeholder-beer.jpg" style="float:right; margin:0px 0px 10px 10px;" width="45"/></a><h6><a href="/beer/profile/139/316041/">Steam Beer</a></h6><br/><a href="/beer/profile/139/">Shipyard Brewing Company</a><br/><a href="/beer/style/132/">California Common / Steam Beer</a> / 4.10% ABV<br/><br/><span class="BAscore_norm">3.32</span><span class="rAvg_norm">/5</span><span style="font-size:1.5em;font-weight:700;"></span>  rDev <span style="color:#006600;">+2.2%</span> 

In [6]:
%%HTML
<div ba-user="168458" class="user-comment" id="rating_fullview_container"><div id="rating_fullview_user"><div style="padding:3px; background:#E8E8E8;"><a class="username" href="/community/members/biboergosum.168458/"><img alt="Photo of biboergosum" border="0" height="48" src="https://cdn.beeradvocate.com/data/avatars/m/168/168458.jpg?1330550786" width="48"/></a></div></div><div id="rating_fullview_content_2"><a href="/beer/profile/47607/366919/"><img alt="Bush League Sour Ale" border="0" height="90" src="https://cdn.beeradvocate.com/im/placeholder-beer.jpg" style="float:right; margin:0px 0px 10px 10px;" width="45"/></a><h6><a href="/beer/profile/47607/366919/">Bush League Sour Ale</a></h6><br><a href="/beer/profile/47607/">4th Meridian Brewing Co.</a><br><a href="/beer/style/171/">American Wild Ale</a> / 4.50% ABV<br><br><span class="BAscore_norm">3.74</span><span class="rAvg_norm">/5</span><span style="font-size:1.5em;font-weight:700;"></span>  <span class="muted">rDev 0%</span> | Score: 3.74<br><span class="muted">look: 3.5 | smell: 3.75 | taste: 3.75 | feel: 3.75 |  overall: 3.75</span><br><br><div class="comment">8oz glass at Beer Revolution YEG Oliver Square. No idea what kind of sour this is supposed to be.  More later.<br/>
<br/>
This beer appears a hazy, pale golden straw colour, with one solid finger of puffy, finely foamy, and mildly creamy off-white head, which leaves a few instances of randomly streaky lace around the glass as it quickly dissipates. <br/>
<br/>
It smells of grainy and crackery cereal malt, some mixed domestic citrus peel, a tame earthy yeastiness, and plain leafy, herbal, and floral green hop bitters. The taste is gritty and grainy pale malt, some muddled melon and citrus fruit bowl notes, faintly tart yeast, and more understated leafy, musty, and dead grassy hoppiness. <br/>
<br/>
The carbonation is fairly active in its palate-tickling frothiness, the body a so-so medium weight, and generally smooth, with nothing really getting uppity, as such, here.  It finishes trending dry, the lingering fruitiness exhibiting its sour tendencies. <br/>
<br/>
Overall - this sure is a refreshing and enjoyable version of the style, with the fruity character contributing in large part.  Nothing bush league about this offering, unless you are the kind of sour snob who insists on losing tooth enamel.</div><br><i class="fas fa-align-left"></i> <span class="muted">1,176 characters</span><br><br><div><span class="muted"><a class="username" href="/community/members/biboergosum.168458/">biboergosum</a>, <a href="/beer/profile/47607/366919/?ba=biboergosum#review">A moment ago</a></span></div></br></br></br></br></br></br></br></br></br></br></div></div>


In [7]:
posts = soup.find_all('div', {'id': 'rating_fullview_container'})

In [27]:
soup.find_all('div', {'id': 'rating_fullview_container'}).find_all('a')

AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

In [10]:
for ent in posts:
    print(ent.h6.text)

Steam Beer
Stolen Canoe Pale Ale
Premium Lager
Brainstorm
Grande Cuvée Porter Baltique - Fûts De Bourbon Et Brandy
Holy Toledo Pilsener
London Porter
Kulture Clash
Wolters Pilsener
Genesee Brew House Dry Hopped Mosaic Cream Ale
All Nelson Everything
Shark Attack
Beatification
Native Texan
Millennial Hipster
420 Extra Pale Ale
Dr. Caligari
NEU BLK
Samuel Adams Black Lager
Angry Chair / Horus - Burkitshi
Fuller's London Porter
Permutation Series #58: Double IPA w/ Citra and Galaxy
Wolters Weizen
Johann Buys A Broat
Soursmith Apricot


In [27]:
url = 'https://www.beeradvocate.com/beer/'

## Scrape All Pages

If we follow the next page of reviews, you might notice that the url contains a query that will take us to pages that look like

```
'https://www.beeradvocate.com/beer/?start=25'
```

where the number is a continued higher multiple of 25.  The script below takes advantage of this structure and retrieves a few hundred different beer names.  

In [28]:
titles = []
for num in [25*i for i in range(20)]:
    link = url + '?start=' + str(num)
    response = requests.get(link)
    soup = BeautifulSoup(response.text, 'html.parser')
    posts = soup.find_all('div', {'id': 'rating_fullview_container'})
    for ent in posts:
        titles.append(ent.h6.text)

In [29]:
titles[::20]

['North Shore Iron Works',
 'Deftones Digital Bath IPA',
 'The Muddy Imperial Stout',
 'Cambridge River Porter',
 'Ode To The Afternoon Crew',
 'Session Raid',
 'Wijngaard Muscat & Riesling',
 'Super Session #7',
 'War Goose',
 'Nebulosity',
 'Big Eye - Ginger',
 'Open Doors',
 'Revolution',
 'Effortless Grapefruit',
 'Session Raid',
 'Wave Chaser IPA',
 'Stingray Imperial IPA',
 'Domestically Challenged',
 'Sun Juice',
 'Belhaven 90/~ Wee Heavy',
 'Keller Pils 2.0',
 'Abner',
 'Paled It!',
 'The Seven Faces Of Pepe Grano']

In [30]:
len(titles)

463

### PROBLEM

Complete the code snippet above and create a dataframe based on the retrieved lists.  What are the top 10 overall reviews?  ABV?  