In [1]:
########## HIDE CODE BLOCKS ##########

from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')

In [2]:
########## INITIALIZATION ##########
from scripts.Beer_Advisor_Helper import *
warnings.filterwarnings('ignore')

########## PRESENTATION MAIN ##########

    
display(show_overview_btn)
show_overview_btn.on_click(show_overview_text)
show_webscraping_btn.on_click(show_webscraping_text)
show_EDA_btn.on_click(show_EDA1_text)
show_boxhist_btn.on_click(goplot_boxhist)
show_superuserplot_btn.on_click(goplot_super_user)
show_correlation_btn.on_click(goplot_correlation)
show_beeradvisor_btn.on_click(show_beeradvisor_text)
show_beerrecs_btn.on_click(show_beerrecs)       

 
# Beer Advisor Overview

Hello, and welcome to Beer Advisor.

This project had three steps:
1. This project aimed to scrape as much data as possible from [BeerAdvocate.com](https://www.beeradvocate.com)
2. Peform Numerical Analysis on user reviews.
3. Develop a recommender system to suggest beers to new users based on the data scraped from previous reviews.
4. Analyze textual data from reviews to advisor brewers on new products.')) 

Button(button_style='primary', style=ButtonStyle(), tooltip='Show Webscraping Block')


### 1. Webscraping

__Beer Advocate__ is online community where users can rate and review all beers, craft to mainstream. This website was scraped using a _scrapy spider_. Information on the general product page and each individual review was pulled by the spider.

    
    Beer Advocate boasts a database of nearly 300,000 beers (probably more at this point).

    Only looked at beers with more than 100 user ratings. (conveniently listed on the beer list page)

    3 hours later and we have nearly 10,000 individual beers and 1.7 million individual reviews!!
    

Based on the origin of the data, the information was piped into one of two csv files and generated tables of information similar to this:

#### Beers DataFrame

|   beer_id | beer_name                        | brewery                            | beer_style                 |   abv |   num_reviews |   ranking |
|----------:|:---------------------------------|:-----------------------------------|:---------------------------|------:|--------------:|----------:|
|      9128 | Motor City Brewing Ghettoblaster | Motor City Brewing Works           | English Dark Mild Ale      |   4.2 |            64 |     44196 |
|       205 | Spellbound IPA                   | Spellbound Brewing                 | American IPA               |   6.5 |            35 |     13651 |
|      9358 | Red Nose Winter Ale              | Natty Greene's Pub & Brewing Co.   | Winter Warmer              |   6.8 |            51 |     38773 |
|      6646 | Hunter Vanilla                   | 18th Street Brewery - Gary Taproom | English Sweet / Milk Stout |   8.5 |            61 |      1391 |
|      3753 | Bière De Miel Biologique         | Brasserie Dupont sprl              | Belgian Saison             |   8   |           184 |     16355 |

#### Reviews DataFrame
  
|   review_id |   beer_id | posted              | ratings                     |   score | username        |
|------------:|----------:|:--------------------|:----------------------------|--------:|:----------------|
|      675575 |      4399 | 2012-04-26 00:00:00 | [4.5, 4.5, 3.5, 4.0, 4.0]   |    3.95 | Rutager         |
|     1270933 |      7509 | 2016-04-19 00:00:00 | [4.0, 4.25, 4.0, 4.25, 4.0] |    4.09 | stortore        |
|      686394 |      4481 | 2009-10-21 00:00:00 | [4.0, 4.0, 4.5, 4.5, 4.5]   |    4.35 | Josievan        |
|     1347362 |      7959 | 2009-11-13 00:00:00 | [4.0, 3.0, 3.0, 2.5, 2.0]   |    2.81 | civilizedpsycho |
|     1119392 |      6675 | 2010-02-17 00:00:00 | [4.0, 4.5, 4.0, 4.0, 4.0]   |    4.12 | drizzam         |


Due to the enormity of reviews scraped from this website, text content was omitted from this notebook to save on data limits. Some _post-processed text_ will be discussed later.

Lets look a little more closely at the data...



Button(button_style='primary', style=ButtonStyle(), tooltip='Show EDA Block')


### 2. Numerical Analysis

In our dataset we have reviews from 57,023 individual users.

|   EDA  |   count |    mean |     std |   min |   25% |   50% |   75% |   __max__ |
|-------:|--------:|:-------:|:-------:|:-----:|:-----:|:-----:|:-----:|:-----:|
| summary|   57023 | 30.8128 | 133.207 |     1 |     1 |     2 |     9 |  __4175__ |



Button(button_style='info', style=ButtonStyle(), tooltip='Show boxplot and histogram')




We can see that the vast majority of reviews are supplied by __less than 25%__ of the population ==> The dataset is _heavily skewed_ to a set of __super users__

Lets try to find some of our __super users__. Here are the Top 10


|  user  |   StonedTrippin |   metter98 |   superspak |   brentk56 |   BEERchitect |   UCLABrewN84 |   zeff80 |   woodychandler |   jlindros |   NeroFiddled |
|:-------|----------------:|-----------:|------------:|-----------:|--------------:|--------------:|---------:|----------------:|-----------:|--------------:|
| posted |            4175 |       4056 |        3855 |       3753 |          3682 |          3581 |     3015 |            2959 |       2957 |          2834 |



Button(button_style='info', style=ButtonStyle(), tooltip='Show superuser plot')






What's the impact of all these reviews? Do more reviews for a given product impact its favorability?




Button(button_style='info', style=ButtonStyle(), tooltip='Show correlational plot')



Notice how as the number of reviewsfor a given beer gets larger the score stabilizes.

Does that imply that beers with thousands of reviews are more accurately scored?

    Perhaps its simple a popularity bias
   
Regardless, the more reviews a beer has, the higher its score.

And the higher its score, the higher it's visibility.


Button(button_style='primary', style=ButtonStyle(), tooltip='Show beeradvisor section')


### 3. Beer Advisor Recommender

With all of our data, we can actually approach the building of a Recommender System.

- Not at all like Netflix or Facebooks. Beer Advisor is a _baby recommender_

We employ a __User-Item Collaborative Filter__:
1. You input a series of Beers that you like _(or don't like)_ 

| Beers |   Lagunitas IPA |   Two Hearted Ale |   Sweet Action |   Hoegaarden Original White Ale |   Blue Moon Belgian White |   Club De Stella Artois |   Yuengling Traditional Lager |
|:------|----------------:|------------------:|---------------:|--------------------------------:|--------------------------:|------------------------:|------------------------------:|
| Scores |             3.2 |               4.1 |            2.7 |                             3.4 |                       3.1 |                     4.5 |                           4.2 |



2. We find other users who have rated those same beers!!

| index| username     |   Lagunitas IPA |   Two Hearted Ale |   Sweet Action |   Hoegaarden Original White Ale |   Blue Moon Belgian White |   Club De Stella Artois |   Yuengling Traditional Lager |
|:-----|:-------------|----------------:|------------------:|---------------:|--------------------------------:|--------------------------:|------------------------:|------------------------------:|
| 0    | KidDoc       |          nan    |              4.18 |            nan |                          nan    |                    nan    |                     nan |                           nan |
| 1    | NCSUdo       |            4.38 |              4.44 |            nan |                          nan    |                      3.57 |                     nan |                           nan |
| 2    | ThreePistols |          nan    |            nan    |            nan |                            4    |                    nan    |                     nan |                             4 |
| 3    | appenzeller  |            4.08 |            nan    |            nan |                            3.73 |                    nan    |                     nan |                           nan |
| -    | -            |               - |            -      |            -   |                            -    |                    -    |                     - |                           - |
| 6930 | FireorHigher |            nan  |            nan    |            nan |                            nan  |                    3.51    |                     nan |                           3.18 |

3. Measure the correlation between the two

| user       |       corr |    Lagunitas IPA |   Two Hearted Ale |   Sweet Action |   Hoegaarden Original White Ale |   Blue Moon Belgian White |   Club De Stella Artois |   Yuengling Traditional Lager |
|:-----------|-----------:|-------:|------:|-------:|-------:|-------:|-------:|-------:|
| TheSarge   | -0.042 |   3.86 |  3.68 |    nan |   3.76 |   2.88 |    nan |   2.56 |
| SkunkWorks |  0.300  |   3.82 |  4.32 |    nan |   3.4  | nan    |    nan |   3    |
| HotHands   |  0.923  | nan    |  4.15 |    nan |   3.95 | nan    |    nan |   3.45 |
| jsprain1   |  0.136  |   3.65 |  4.5  |    nan |   3.66 |   3.59 |    nan |   3.73 |
| Tom_Banjo |  0.995 |   3    |   3.46 |   3    |   3.46 | nan    |    nan | nan    |
| beergoot  |  0.876 |   3.66 |   4.45 | nan    | nan    | nan    |    nan |   3.33 |

4. Mutliplies across the rows and then sums down the columns
5. Outputs the 5 most highly scored beers weighted by my closest users (_in terms of preference_)

|    beer  | Turbo Nerd XIPA |       Vanilla Joe |       Innis & Gunn Lager Beer |        Filthy Dirty IPA |       Brunch Money |
|:---------|-----------:|-----------:|-----------:|-----------:|-----------:|
| corr.sum | 0.00584292 | 0.00584274 | 0.00584233 | 0.00584223 | 0.00584183 |

Let's take a look at the top two beers and see why we might like them so much...


Button(button_style='primary', style=ButtonStyle(), tooltip='Show beer recommendations')

Name:  Filthy Dirty IPA
Brewery:  Parallel 49 Brewing Company
Family:  India Pale Ales
ABV:  7.2 	 BA Score:  3.79
Description:  None provided.


![No Image Available](https://cdn.beeradvocate.com/im/beers/133798.jpg)

*.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.*
Name:  Turbo Nerd XIPA
Brewery:  Separatist Beer Project / Separatist Bar + Bottle
Family:  India Pale Ales
ABV:  7.5 	 BA Score:  4.22
Description:  None provided.


![No Image Available](https://cdn.beeradvocate.com/im/beers/225343.jpg)

*.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.*
Name:  Vanilla Joe
Brewery:  Sante Adairius Rustic Ales
Family:  Porters
ABV:  6.8 	 BA Score:  4.39
Description:  None provided.


![No Image Available](nan)

*.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.*
Name:  Brunch Money
Brewery:  Armadillo Ale Works
Family:  Specialty Beers
ABV:  10.0 	 BA Score:  4.03
Description:  None provided.


![No Image Available](https://cdn.beeradvocate.com/im/beers/116079.jpg)

*.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.**.*
