![header](./images/beertaps.png)

# Beer-Recommendation System: Content Based Recommendations
Author: Ashli Dougherty 

# Overview

This project's goal is to build a recommendation system for the beer enthusiast. I am interested in creating both a content based and collaborative filtering recommendation system. 
- A content based system will make recommendations based on a beer's features. The content based system will allow any user to enter a beer/characteristic and in return they will be given the names of other beers they will (hopefully) enjoy drinking.  
- The collaborative system will recommend items based on the ratings of other users. This system will compare beer drinker/reviewer profiles and then recommend items based on the similarity between these users. 


***

# Business Understanding 

As of December 2021, there are more than [9,000 breweries](https://vinepair.com/booze-news/us-record-number-breweries-2021/#:~:text=Even%20after%20the%20setbacks%20of,beer%20producers%20in%20the%20U.S.)in the US alone. Even though some taprooms were forced to shut their doors during the pandemic, the craft beer business is still going strong. The [Brewer’s Association](https://www.brewersassociation.org/statistics-and-data/national-beer-stats/) is expecting an increase in craft brewery volume share in the post-pandemic industry market, and reported that craft beer retail sales were over $26 billion dollars in 2021.    
  
Currently, there are mobile apps (like [Untapped](https://untappd.com/)) and websites (like [Beer Advocate](https://www.beeradvocate.com/)) that allow you to personally track and rate the beer you try, but consumers should know they can enjoy their next sip (or pint) with confidence. There are so many options on the market that choosing which beverage to buy next, what brewery to visit in person, or which booth to stand in line for at a festival can seem overwhelming. My goal is to provide a system for beer enthusiasts to try new beers that they are guaranteed to love. Cheers!

***

# Content Based Recommendations

content based does not require other users' data to make a recommendation to one user. 
Here is where i describe the type of system and my approach

Content based systems are based on the similarity of the items that the user unputs. It is like saying if I like X-Beer and the model comes back with recommendations with other beers that are similar to the characteristics of X-Beer. This type of recommendation system bypasses the "cold-start" problem and is good for niche interests such as craft beer. Recommendations like this are distance based metrics. The following are all distance metrics that can be utilized: 

- Cosine similarity
- Euclidian distance
- Manhattan distance
- Pearson correlation
- Jaccard similarity

***

# Imports & Functions

In [46]:
import pandas as pd
import numpy as np

from sklearn.metrics.pairwise import linear_kernel
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer,  make_column_selector as selector
from sklearn.impute import SimpleImputer

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [None]:
#stripping description of punctuation 
#df_tasting['Description'] = df_tasting['Description'].replace(r'[^\w\s]', "", regex=True)

# Load Data

In [22]:
df = pd.read_csv('../BeerData/tasting_cleaned.csv')

In [23]:
df.drop(columns='Unnamed: 0', inplace=True)

In [24]:
df.head()

Unnamed: 0,Name,beer_id,Style,Style Key,Brewery,ABV,Avg Rating,Astringency,Body,Alcohol,Bitter,Sweet,Sour,Salty,Fruits,Hoppy,Spices,Malty,AvgIBU
0,Amber,251,Brown Ale,8,Alaskan Brewing Co.,5.3,3.65,13,32,9,47,74,33,0,33,57,8,111,37.5
1,Double Bag,252,Brown Ale,8,Long Trail Brewing Co.,7.2,3.9,12,57,18,33,55,16,0,24,35,12,84,37.5
2,Long Trail Ale,253,Brown Ale,8,Long Trail Brewing Co.,5.0,3.58,14,37,6,42,43,11,0,10,54,4,62,37.5
3,Doppelsticke,254,Brown Ale,8,Uerige Obergärige Hausbrauerei,8.5,4.15,13,55,31,47,101,18,1,49,40,16,119,37.5
4,Scurry,255,Brown Ale,8,Off Color Brewing,5.3,3.67,21,69,10,63,120,14,0,19,36,15,218,37.5


In [None]:
# which columns will actually be used? from ABV over? and style does it need to OHE?

In [31]:
df.set_index('beer_id', inplace=True)

In [32]:
df.head(1)

Unnamed: 0_level_0,Name,Style,Style Key,Brewery,ABV,Avg Rating,Astringency,Body,Alcohol,Bitter,Sweet,Sour,Salty,Fruits,Hoppy,Spices,Malty,AvgIBU
beer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
251,Amber,Brown Ale,8,Alaskan Brewing Co.,5.3,3.65,13,32,9,47,74,33,0,33,57,8,111,37.5


In [37]:
df.dtypes

Name            object
Style           object
Style Key        int64
Brewery         object
ABV            float64
Avg Rating     float64
Astringency      int64
Body             int64
Alcohol          int64
Bitter           int64
Sweet            int64
Sour             int64
Salty            int64
Fruits           int64
Hoppy            int64
Spices           int64
Malty            int64
AvgIBU         float64
dtype: object

In [42]:
def grab_numeric(df):
    return df.select_dtypes(include=['int', 'float'])

def grab_cat(df): 
    return df.select_dtypes(include=['object'])

In [47]:
# creating subpipeline for numeric values that will fill with the median value and scale numeric data
subpipe_num = Pipeline(steps=[
    ('num_impute', SimpleImputer(strategy='median')),
    ('ss', StandardScaler())
])

#creating subpipline for categorical val
subpipe_cat = Pipeline(steps=[
    ('cat_impute',SimpleImputer(strategy='most_frequent')),
    ('ohe', OneHotEncoder(sparse=True, handle_unknown='ignore'))
])


In [48]:
CT = ColumnTransformer(transformers=[
    ('subpipe _num', subpipe_num, selector(dtype_include=np.number)),
    ('subpipe_cat', subpipe_cat, selector(dtype_include=object))
], remainder='passthrough')