# Evolent Health: Beer Data Analysis
#### The project descriptions, questions and datasets were all downloaded from Strata Scratch. According to Strata Scratch, this data project has been used as a take-home assignment in the recruitment process for the data science positions at Evolent Health. 

### Assignment: 
- Rank the top 3 breweries which produce the strongest beers.
- Which year did beers enjoy the highest ratings?
- Based on the users' ratings, which factors are important among taste, aroma, appearance, and palette?
- If you were to recommend 3 beers to your friends based on this data, which ones would you recommend?
- Which beer style seems to be the favourite based on the reviews written by users? How does written reviews compare to overall review score for the beer style?

### About the dataset(s): 
- Source: https://platform.stratascratch.com/data-projects/beer-data-analysis
- Description: The provided compressed file `EvolentHealth_data_beer.tar.bz2` contains data about beers and their reviews

### Data Dictionary:

**EvolentHealth_data_beer.tar.bz2**

| variable               | description                            |
|:-----------------------|:---------------------------------------|
| beer_ABV               | alcohol by volume                      |
| beer_beerId            | beer ID                                |
| beer_brewerId          | beer brewer ID                         |
| beer_name              | beer name                              |
| beer_style             | beer style                             |
| review_appearance      | review on the beer's appearance        |
| review_palette         | review on the beer's palette (colours) |
| review_overall         | overall beer review                    |
| review_taste           | review on the beer's taste             |
| review_profileName     | profile name of the reviewer           |
| review_aroma           | review on the beer's aroma             |
| review_text            | the full text of the review            |
| review_time            | timestamp when the review was made     |

In [3]:
import os 
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns
import plotly.express as px
pd.options.display.max_colwidth=65

os.chdir(path="/Users/noel/Desktop/2023_projects/datasets")
df = pd.read_csv("EvolentHealth_data_beer.tar.bz2", compression="bz2")

print(df.columns)
print(df.info())
df.head()

Index(['beer_ABV', 'beer_beerId', 'beer_brewerId', 'beer_name', 'beer_style',
       'review_appearance', 'review_palette', 'review_overall', 'review_taste',
       'review_profileName', 'review_aroma', 'review_text', 'review_time'],
      dtype='object')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 528870 entries, 0 to 528869
Data columns (total 13 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   beer_ABV            508590 non-null  float64
 1   beer_beerId         528870 non-null  int64  
 2   beer_brewerId       528870 non-null  int64  
 3   beer_name           528870 non-null  object 
 4   beer_style          528870 non-null  object 
 5   review_appearance   528870 non-null  float64
 6   review_palette      528870 non-null  float64
 7   review_overall      528870 non-null  float64
 8   review_taste        528870 non-null  float64
 9   review_profileName  528755 non-null  object 
 10  review_aroma        528870 n

Unnamed: 0,beer_ABV,beer_beerId,beer_brewerId,beer_name,beer_style,review_appearance,review_palette,review_overall,review_taste,review_profileName,review_aroma,review_text,review_time
0,5.0,47986,10325,Sausa Weizen,Hefeweizen,2.5,2.0,1.5,1.5,stcules,1.5,A lot of foam. But a lot. In the smell some ba...,1234817823
1,6.2,48213,10325,Red Moon,English Strong Ale,3.0,2.5,3.0,3.0,stcules,3.0,"Dark red color, light beige foam, average. In ...",1235915097
2,6.5,48215,10325,Black Horse Black Beer,Foreign / Export Stout,3.0,2.5,3.0,3.0,stcules,3.0,"Almost totally black. Beige foam, quite compac...",1235916604
3,5.0,47969,10325,Sausa Pils,German Pilsener,3.5,3.0,3.0,2.5,stcules,3.0,"Golden yellow color. White, compact foam, quit...",1234725145
4,7.7,64883,1075,Cauldron DIPA,American Double / Imperial IPA,4.0,4.5,4.0,4.0,johnmichaelsen,4.5,"According to the website, the style for the Ca...",1293735206
