In [5]:
from IPython.core.display import HTML
HTML("""
<style>
code {
    padding:2px 4px !important;
    color: #c7254e !important;
    font-size: 90%;
    background-color: #f9f2f4 !important;
    border-radius: 4px !important;
    color: rgb(138, 109, 59);
}
mark {
    color: rgb(138, 109, 59) !important;
    font-weight: bold !important;
}
.container { width: 90% !important; }
table { font-size:15px !important; }
</style>
""")

In [6]:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt 
import os, sys

sys.path.insert(0, '../../src/data/')
import utils

%matplotlib inline 
path_raw = "../../data/raw/beer_reviews"

# LOAD DATA
# we assume the file we're after is a
# single .csv in path_raw
for file in os.listdir(path_raw):
    file = os.path.join(path_raw, file)
    if os.path.isfile(file) and '.csv' in file: 
        dat_raw = pd.read_csv(file)

# Introduction

This notebook serves as the master report document which will present the analysis results for the CiBO data science exercise. The exercise is structured around a beer review data set available [online](https://urldefense.proofpoint.com/v2/url?u=https-3A__s3.amazonaws.com_demo-2Ddatasets_beer-5Freviews.tar.gz&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=8bgQeuykrF3aSX4ERnAE37e9TNni25ddf39sbnkKHrQ&m=hkI6yrD7SBn4Z9WO9Zt31KmSuYIswplFpvMihrHqFd4&s=PStqu-SKl1ZEMNBu4MLVtzHvrTddC9h1mM3NqDgmYmI&e=) for which the following must be explored:
> 1. Which brewery produces the strongest beers by ABV%?
> 2. If you had to pick 3 beers to recommend using only this data, which would you pick?
> 3. Which of the factors (aroma, taste, appearance, palette) are most important in determining the overall quality of a beer?
> 4. If I enjoy a beer's aroma and appearance, which beer style should I try?

## Initial data exploration

The starting point of this analysis is to get a sense of what the data looks like and how it's organized, notebook [1.0](../explore/1.0_initial_look.ipynb) covers this initial investigation. The input data has the following caveats:
- The dataset is composed of a series of beer reviews containing scores for numerous categories. Each data entry also has an associated brewery and beer style.
- The following attribute fields have special characters (e.g. accents) in them: `brewery_name`, `beer_style`, `beer_name`.
- Several attributes have missing values: `brewery_name`, `review_profilename` and `beer_abv`. We aren't too concerned about `brewery_name` since we have associated `brewery_id` with them. However, `review_profilename` will be needed if we want to compare across reviewers. A total of 17,043 beers have a missing values for `beer_abv`.
- The distribution of beers reviews is not normal: the majority of beers have only a two reviews, however on average, each beer is reviewed about 24 times. <img src='figures/1.0_initial_look-0.png'></img>
- Similarly, each brewery is not represented with the same number of beers: the average number of beers reviewed for the 5840 breweries in the dataset is ~11 where as the median is 5. <img src='figures/1.0_initial_look-1.png'></img>
- Of the 66,055 beer represented, the most highly represented style is *American IPA* or *American Pale Ale (APA)*. <img src='figures/1.0_initial_look-2.png'></img>
- The distribution of alcohol by volume percentage (ABV%) is roughly normal with the presence of numerous outliers. <img src='figures/1.0_initial_look-4.png'></img>

## Q1: Which brewery produces the strongest beers by ABV%?

### Introduction

The notebook [2.0](../explore/2.0_brewery_highest_abv.ipynb) explore this questions. 

There are several issues with the ABV% data contained in the beer review data set:
1. As mentioned previously, numerous beers (17,043) have missing values for ABV%; this represents about 25% of the total beers available in the dataset. It is unclear why this data is missing. One potential way around this would be to construct a model that could predict ABV% based on the scored review features (e.g. `review_taste`) as well as the `beer_style`. This was not done as part of this analysis; instead, those beers without an ABV% were ignored.
2. Some breweries have many more beers associated with them than others (as shown above in the initial data exploration), therefore it is unclear whether having only a single strong beer qualifies the brewery as having the "strongest beers". <mark>For this analysis, we will assume that a brewery must have at least 5 beers for it to qualify for this questions.</mark>
3. The initial data exploration also mentions the inherent noise with the ABV% metric, especially in the high-ABV range; this will have to be taken into account when electing beers.
4. Because several of the breweries do not have any associated names, we will identify breweries by the `brewery_id` attribute throughout the analysis.

### Dealing with noisy ABV%

We being by naively looking at which brewery has the highest ABV% beers without removing any noisy data in order to get a general sense of the data. To do that, we generate a *beer dataset* which contains only the metadata associated with beer, this is done by grouping the starting dataset on `beer_beerid` and simply grabing the first entry of each group. From this *beer dataset* we group on `brewery_id`, aggregate with *max*, and sort the breweries by max ABV*; the plot below shows the top 15 breweries with the highest ABV% beer.

<img src='figures/2.0_brewery_highest_abv-0.png'/>

Several interesting insights can be seen in this plot:
- Noisy high-ABV beers help rank several breweries very well.
- Brewery 6513 seems to be a general outlier when compared to the remaining breweries. In general it has very high ABV beers; it's median beer ABV is nearly higher than most all other beers represented. It is a possibility that this brewery represents some sort of anomaly.

In order to remove the noise, we will use the John Tukey method of detecting outliers: that is, any beer with an ABV% value that is more than Q3 + 1.5 * IQR  or less than Q1 + 1.5 * IQR is considered an outlier. Where IQR is the inner quartile range, Q1 is the first quartile, and Q3 is the third quartile.

After removing outliers, we can replot the same top 15 breweries

<img src='figures/2.0_brewery_highest_abv-1.png'/>

- We can see that the ABV% values for all of the breweries are much more concentrated
- We can also see that several breweries with very few beers enter the list - a brewery with very few beers is perhaps not very representative of a brewery that is able to produce strong beers, we therefore choose to set a threshold of 5 beers needed in order for a brewery to be considered for this analysis. The disadvantage of this is that 5 is a relatively arbitrary cutoff. 
- Brewery 6513 did not contain any outliers, however it now looks to be even more anomalous when compared to the remaining breweries; <mark>we therefore elect to call brewery 6513 erroneous and we remove it from consideration</mark>

With both noisy beers and brewery 6513 removed with arrive at the following distribution of breweries:

<img src='figures/2.0_brewery_highest_abv-3.png'/>

From the plot above, we can begin to see several candidates for highest beer ABV% brewery: 2097, 11031, 10796 and 13307:
- 10796 has two beers with the absolute highest ABV%
- 2097, 11031 and 13307 has a cluster of high ABV% beers

### Choosing a brewery

If one were to choose a brewery by the single most high-ABV% beer, the winner would be brewery 10796; however if one were to consider multiple high-ABV% beers together, a different brewery would be chosen. In the data presented above, the 95th percentile is 14.48, if one were to calculate the number of beers above this percentile we would see:

In [4]:
cutoff = 14.48
dat = pd.read_csv('../../data/interim/high_beer_ABV_breweries.csv')
beer_counts = dat[dat.beer_abv > cutoff].groupby('brewery_id').agg(['count','median','mean'])[['beer_abv']]
beer_counts.sort_values(('beer_abv','count'),ascending=False).head(5)

Unnamed: 0_level_0,beer_abv,beer_abv,beer_abv
Unnamed: 0_level_1,count,median,mean
brewery_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
2097,9,15.0,15.388889
11031,8,16.3,16.15
13307,7,15.0,15.131429
10796,3,18.0,16.833333
15732,3,15.0,15.166667


The breweries 2097, 11031, and 13307 all have a high number of beers (9, 8, 7 respectively) that are in the 95th percentile of highest ABV% beers among only those breweries that produce high ABV% beers. Since these numbers are quite similar, we use the median/mean ABV% to decide which brewery tends to brew numerous high ABV% beers: **11031**.

### Summary

We have determined that brewery **11031 (Brouwerij De Molen)** produces more of the strongest ABV% beers than any other brewery, this is contingent on the following assumptions:
- brewery 6513 is an anomaly.
- a brewery must product at least 5 beers to be considered.
- a single very high ABV% beer does not automatically select the given brewery since precedence is given to those breweries that produce multiple strong beers (i.e. 1 very strong beer is not better than 10 slightly less strong beers).

It should be reiterated that we are excluding about 25% of the beers due to missing ABV% values; this represents a large fraction of the total data and thus creates some concern regarding whether any of these missing beers might be a high-ABV% beer. One way to get around that would be to build a predictive model based on the current data - this would prove to be a difficult task due to the fact that high-ABV% beers aren't very common. Most models would thus not predict a beer to be high ABV%

## Q2: If you had to pick 3 beers to recommend using only this data, which would you pick?

### Introduction

The notebook [3.0](../explore/3.0_recommend_3_beers.ipynb) explores this questions.

There are several issues with the given question:
- it is unclear by what metric one might recommend beers on, <mark>for the purpose of this question, we will only use `review_overall` to gauge the "goodness" of a beer.</mark>
- several of the reviews don't have an associated `review_profilename` value for it. <mark>Given that those reviews without a profilename represent a small fraction of the total reviews, we will exclude these reviews from the analysis.</mark>
- beers are categorized by `beer_style` it is conceivable that a best beer could be chosen for each style, <mark>for the purpose of this questions, the attribute `beer_style` is not taken into account</mark>
- as was mentioned in the initial data exploration, the number of reviews per beer varies drastically between beers; therefore we cannot naively choose those beers with the highest `review_overall` scores.

### Removing suspicious reviews (trolls)

Before we being using the review that in order to gauge which beers are best, we need to get a sense for the quality of the scores provided in the dataset. It is conceivable that there exists reviewers that are not interested in actually rating the beer, but are rather interested in drinking the beer; we will call these reviewers trolls: that is, reviewers that do not accurately provide review scores.

In order to identify trolls we subset the reviewers down to those that have reviewed more than 3 times, and for which the standard deviation of their supplied `review_overall` scores is 0 - this means that the reviewer gave the same score for all of the reviews they provided. We deem these reviews as low quality and remove them from the dataset. All in all, the analysis identified 50 reviewers that were deemed trolls and subsequently removed from the analysis; find the list of trolls in *data/interim/trolls.csv*

### Determining best beer

Since we cannot simply elect those beers with the highest `review_overall` score (since this would simply select all those beers with a single review of 5.0), we must take into account the number of times a beer was reviewed as well as its resulting score.

We identify top ranked beers by doing the following:
1. For each of the remaining non-troll reviewers <u>that have reviewed more than a single beer</u>, we sort the reviewed beers by decreasing `review_overall` score. We then descend the list of ranked beers until we've recorded at least 3 beers. In cases of multiple tied beer scores, we descend further into the list until the tie is broken. In this way, we guarantee to record each reviewers "best" beers. (this has been implemented in the function *get_highest_rated_beers()* found in *utils.py*).
2. Next we generate a count for the number of times each beer was chosen as a best beer by all the reviewers.
3. We then rank the beers by the number of times it was voted best beer and add various other metrics. These data were written to the file found at *data/interim/best_beers.csv*


In [5]:
top10 = pd.read_csv('../../data/interim/best_beers.csv', index_col=0).head(10)
top10

Unnamed: 0_level_0,counts as best beer,avg review_overall,avg review_aroma,avg review_appearance,avg review_taste,avg beer_abv,number of reviews,counts with top score,brewery_name,beer_style,beer_name
beer_beerid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
7971,1362,4.590028,4.612188,4.388603,4.630985,8.0,2527,1067,Russian River Brewing Company,American Double / Imperial IPA,Pliny The Elder
695,909,4.34218,4.151082,4.512862,4.329522,8.5,2449,591,Brouwerij Duvel Moortgat NV,Belgian Strong Pale Ale,Duvel
731,898,4.516414,4.296465,4.381061,4.425505,5.4,1980,695,Bayerische Staatsbrauerei Weihenstephan,Hefeweizen,Weihenstephaner Hefeweissbier
17112,882,4.377609,4.533156,4.24253,4.51842,10.0,2443,630,"Bell's Brewery, Inc.",American Double / Imperial IPA,Bell's Hopslam Ale
1093,854,4.330216,4.266593,4.154199,4.319399,7.0,2727,554,"Bell's Brewery, Inc.",American IPA,Two Hearted Ale
11757,847,4.354658,4.412035,4.364454,4.5012,8.3,2501,507,Founders Brewing Company,American Double / Imperial Stout,Founders Breakfast Stout
2093,840,4.145485,4.213439,4.192156,4.325935,9.0,3289,396,Dogfish Head Brewery,American Double / Imperial IPA,90 Minute IPA
34,791,4.298027,4.231977,4.193919,4.396093,9.0,2483,492,Unibroue,Tripel,La Fin Du Monde
276,772,4.245845,3.915539,3.995168,4.115385,5.6,2587,416,Sierra Nevada Brewing Co.,American Pale Ale (APA),Sierra Nevada Pale Ale
412,765,4.174116,4.198553,4.374116,4.342122,9.0,3110,342,North Coast Brewing Co.,Russian Imperial Stout,Old Rasputin Russian Imperial Stout


The table above presents 10 of the "best" beers as ranked by `review_overall`. We can see that beer 7971 is a very highly ranked beer, that has been reviewed many times (2527) and which received a perfect score of 5.0 a total of 1067 times (see column *counts with top score*). However, as we descend down the column *counts as best beer* the remaining beers seem to be equally "good" with counts in the ranges 800 to 900. One issue with using just this count value is that, as we saw previously, these beers aren't ranked the same number of times. Take for example beer 2093, which was ranked over 3,000 times; compare that to the slightly less than 2,000 times beer 731 was ranked.

In order to decide the remaining best beers, we will calculate the effect size of the beers `review_overall` scores against all the remaining beers in the dataset. We will use [Cohen's d](https://en.wikipedia.org/wiki/Effect_size#Cohen.27s_d) to calculate the effect size and recommend those beers with the highest effect size.

In [6]:
effect_size = dict()
for beer in top10.index:
    a = dat_raw[dat_raw.beer_beerid==beer].review_overall
    b = dat_raw[dat_raw.beer_beerid!=beer].review_overall
    effect_size[beer] = utils.cohen_d(a,b)
    
effect_size = pd.DataFrame(effect_size, index=[0]).transpose()
effect_size.columns = ['cohen_d']
effect_size.sort_values('cohen_d', ascending=False).head(3)

Unnamed: 0,cohen_d
7971,1.077399
731,0.973617
17112,0.781491


### Summary

Given the data at hand, we would recommend the beers:
1. **7971 (Pliny The Elder, Russian River Brewing Company)**
2. **731 (Weihenstephaner Hefeweissbier, Bayerische Staatsbrauerei Weihenstephan)**
3. **17112 (Bell's Hopslam Ale, Bell's Brewery, Inc.)** 

since they are ranked to be the best overall beers as deemed by the reviewers; this is contingent on:

- removing those reviews with no associated reviewer name
- removing those reviews from those reviewers that reviewed only a single beer or where deemed to be trolls
- beers were judged only by the `review_overall` scores because it was assumed this metric most accurately represented the "goodness" of a beer.

## Q3: Which of the factors (aroma, taste, appearance, palette) are most important in determining the overall quality of a beer?

### Introduction

The notebook [4.0](../explore/4.0_factors_for_beer_quality.ipynb) explores this questions.

Another way of phrasing the question is to determine which factors (`review_aroma`, `review_taste`, `review_appearance` or `review_palette` explain the most variance in the overall quality of beer (`review_overall`). Given the results from the previous two questions we know that we can do the following:
- remove the reviews that had no associated `review_profilename`
- there were numerous "troll" reviewers which should be removed from the dataset (trolls written to *../../data/interim/trolls.csv*)

<mark>It is assumed that the overall quality of beer is directly measured by the attribute `review_overall`.</mark>

We can take multiple regression approaches to identifying which factor is more important in determining the overall quality of beer; in the analysis below, we will apply these multiple approaches to arrive at a consensus of which factor is most important.

### Analysis

We begin our first approach by simply applying a univariate linear regression of each factor on `review_overall`; in doing so we can measure how correlated each factor is to the overall quality. The table below shows the results of those regressions with each factor:

In [9]:
pd.read_csv('../../data/interim/beer_factors_linear_regression.csv', index_col=0).sort_values('r^2', ascending=False)

Unnamed: 0_level_0,r^2,p,std err
factor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
review_taste,0.62378,0,0.000495
review_palate,0.492664,0,0.000535
review_aroma,0.379439,0,0.000605
review_appearance,0.251726,0,0.000587


Using this technique, we can see that `review_taste` explains the most variance in `review_overall` (r^2 = 0.62). This issue with this technique however is that it treats each factor as independent and does not consider any additive or interaction effects. To take those into account we run a multivariate linear model analysis on the data and allow for full interaction between all terms; the results below show the summary of that analysis.

In [15]:
with open('../../data/interim/ols.txt', 'r') as fin:
    ols = ''.join(fin.readlines())
print ols

                            OLS Regression Results                            
Dep. Variable:         review_overall   R-squared:                       0.664
Model:                            OLS   Adj. R-squared:                  0.664
Method:                 Least Squares   F-statistic:                 2.087e+05
Date:                Tue, 20 Jun 2017   Prob (F-statistic):               0.00
Time:                        21:32:52   Log-Likelihood:            -8.6659e+05
No. Observations:             1586266   AIC:                         1.733e+06
Df Residuals:                 1586250   BIC:                         1.733e+06
Df Model:                          15                                         
Covariance Type:            nonrobust                                         
                                                                coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------

The main takeaway from these results is that the r^2 value of this much more complex model (0.664) is not appreciably better than the univariate model using just `review_taste`. This further points us in the direction that `review_taste` has the most impact on `review_overall` scores.

A final approach is to train a Random Forest regression model on the entire dataset and use the feature importance scores it produces as a gauge for the most important factor. We use a gridsearch in a cross-validated manner (3-fold) in order to do a little bit of hyper parameter tuning; in this way we can use our entire dataset to ensure we have a well trained model. By inspecting the analysis jupyter notebook, one can see that `review_taste` has a much higher feature importance than any of the other factors.

### Summary

By using numerous regression approaches, we were able to show that **review_taste** is the factor most important in determining the overall quality of beer.

## Q4: If I enjoy a beer's aroma and appearance, which beer style should I try?

### Introduction

The notebook [5.0](../explore/5.0_beer_style.ipynb) explores this questions.

This questions, as opposed to the previous one, is a classification type problem. We are interested in identifying which class of `beer_style` is most highly correlated with high scores for both `review_aroma` and `review_appearance`. We will tackle this problem first by using simple descriptive stats, and then later by using numerous machine learning models.

As we've identified in previous analyses, we will be removing any reviews that do not have an associated reviewer name and we will also be removing the "trolls".

### Descriptive stats

We begin by naively applying various descriptive stats to the two factors in question; furthermore, we engineer a feature by taking the mean score of `review_aroma` and `review_appearance`. These stats are all shown in the table below (sorted by `combo_score` mean)

In [11]:
pd.read_csv('../../data/interim/factor_stats.csv', index_col=0).head(10)

Unnamed: 0_level_0,aroma mean,aroma median,aroma std,aroma max,appearance mean,appearance median,appearance std,appearance max,combo mean,combo median,combo std,combo max,num beers,mean abv
beer_style,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
American Double / Imperial Stout,4.160604,4.0,0.570494,5.0,4.163563,4.0,0.515692,5.0,4.162084,4.25,0.446933,5.0,50696,10.60553
Russian Imperial Stout,4.076571,4.0,0.542298,5.0,4.21007,4.0,0.508619,5.0,4.14332,4.25,0.434802,5.0,54120,9.986842
Quadrupel (Quad),4.132493,4.0,0.544134,5.0,4.117922,4.0,0.514125,5.0,4.125207,4.25,0.448639,5.0,18084,10.461493
American Double / Imperial IPA,4.097774,4.0,0.568202,5.0,4.078882,4.0,0.469201,5.0,4.088328,4.0,0.430182,5.0,85958,9.370207
Gueuze,4.117696,4.0,0.56012,5.0,4.034876,4.0,0.500074,5.0,4.076286,4.25,0.450927,5.0,6007,5.598439
American Wild Ale,4.126756,4.0,0.565143,5.0,4.005451,4.0,0.501612,5.0,4.066104,4.0,0.442322,5.0,17794,7.713634
Eisbock,4.156778,4.0,0.52895,5.0,3.964514,4.0,0.496758,5.0,4.060646,4.0,0.412955,5.0,2663,11.392852
American Barleywine,4.019348,4.0,0.524654,5.0,4.036376,4.0,0.478667,5.0,4.027862,4.0,0.408577,5.0,26721,10.703676
Belgian IPA,3.979666,4.0,0.511232,5.0,4.075199,4.0,0.481009,5.0,4.027432,4.0,0.402156,5.0,12467,8.349086
Weizenbock,4.044677,4.0,0.519257,5.0,4.009297,4.0,0.510222,5.0,4.026987,4.0,0.427077,5.0,9412,8.135482


The beers above show those that are both highly rated (on average) in aroma and appearance -> these numbers suggest *American Double / Imperial Stout* be the `beer_style` to recommend. It should be noted that *American Double / Imperial IPA* was reviewed more than any other beer, and thus we can be more confident in scores attributed to this beer. It should also, perhaps, come as no surprise that the most highly ranked beers tend to have a high ABV% (as that would contribute to taste)

Another simple approach would be to filter out all the beers that were scored low in both aroma and appearance; for example, if we filter out all beers that scored below 4.0 in these two factors, we might consider the beer styles that remain to be good candidates for the question at hand. We can then rank the beer styles by the ratio of of beers that remain (w.r.t to the total starting in each category). If we rank these beers at multiple cutoffs, we can get a better sense to how the beers ranks.

<img src='figures/5.0_beer_style-0.png'/>

The heatmap above shows how the highest rated `beer_style` rank (0 is highest rank) as the cutoff value for score is increased from 4 to 5. We can see that *Quadrupel (Quad)*, *American Double/Imperial Stout* and *Russian Imperial Stout* consistently rank as the highest or second-highest rated `beer_style` all throughout the range of score cutoff.

In general, the following beers show up highly ranked (top 5) across most cutoffs:
- Quadrupel (Quad)
- American Double/Imperial Stout
- Russian Imperial Stout
- American Double / Imperial IPA

It's interesting to note that the *Biere Brut* shows up highly ranked only at cutoffs above 4.5. Furthermore, many of the beers on this list look to be the same types of beers: IPA, Stout or Sours (Lambic/Fruit). This gives us confidence that we are pulling out the right groups of beer style.

### Machine learning classification

Several machine learning classification models were trained with this data set; then, simulated aroma and appearance scores were fed into the models and predictions for best beer style were acquired. Originally, all beer styles were included, however that produced very poor accuracy results. After subsetting the beer_styles to those shown on the heatmap we retrained the models on the training data and then calculated accuracies on a hold-out test set. 

The random forest classifier predicts the beer style of 'American Double / Imperial IPA' when predicting on scores of aroma and appearance up to 4.7; whereas for scores of 4.8 and above it predicts 'American Double / Imperial Stout'. The logistic regression model predicts 'American Double / Imperial IPA' all throughout the score range.

It should be noted that for all the predictions, the calculated probabilities were low - therefore the confidence in these predictions is not very high. Furthermore, it should be noted that the Imperial IPA is quite over represented, especially when compared to the American Double / Imperial IPA.

### Summary

All of the results for beer style that rank highly in aroma and appearance seem to fall into a single larger meta-category: beers that are strong, dark ales. This includes the styles:
- Quadrupel (Quad)
- American Double/Imperial Stout
- Russian Imperial Stout
- American Double / Imperial IPA

It is difficult to pick just a single one that would guarantee enjoyment in both aroma and appearance, therefore it is our suggestion that any of the 4 beer styles above would guarantee enjoyment.

## Q5: Generate 10,000 random numbers (i.e. sample) from a Logistic distribution with parameters “location" = 10 and “scale” = 2.

Please refer to notebook [6.0](../explore/6.0_bonus.ipynb) for the proposed solution.
<img src='figures/6.0_bonus-0.png'/>