<a href="https://colab.research.google.com/github/MonkeyWrenchGang/2021-ban7002/blob/main/Week_5_Beer_Recommender.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Beer Recommendation System Using a Simple Collaborative Filter

In this notebook, we'll create a simple beer recommendation system using a form of collaborative filtering. The underlying assumption of our approach is that if a user (let's call them User A) rates beers similarly to another user (User B), User A is likely to rate beers that they haven't tried yet in the same way that User B has.

**Our dataset a sample from beer advocate includes beer reviews with the following columns:**

- reviewer: The username of the person who left the review.
- beer: The name of the beer being reviewed.
- beer_full_name: The name of the brewery: beer name being reviewed.
- rating: The rating given by the reviewer, on a scale of 1 to 5.

In [None]:
import pandas as pd
import seaborn as sns

In [None]:
beer = pd.read_csv("https://raw.githubusercontent.com/MonkeyWrenchGang/2023_BAN6005/main/module_4/data/beer_reviews_sample.csv")
beer.head()

# Distribution of Ratings

In [None]:
sns.countplot(beer,x='rating', color='lightblue')

# Make a Pivot Table


---

The pivot table will drive the recomendations.

- each row is a reviewer
- each column is a beer
- values is the average rating by the user.

> note we need the nulls! as you'll see later.


In [None]:
pivot_table = beer.pivot_table(
    index='reviewer',
    columns='full_beer_name',
    values='rating')

pivot_table.head()

## What is the MEAN rating for each Beer?


In [None]:
# Calculate mean rating of each beer
mean_ratings = pivot_table.mean()
mean_ratings.nlargest(10)

## Get the Ratings for a User!

In [None]:
beer['reviewer'].value_counts()

In [None]:
user = 'BuckeyeNation'
# Get ALL the beers reviewed and not reviewed by our user
user_reviews = pivot_table.loc[user]
user_reviews

## Not Reviewed


In [None]:
# Get the beers not yet rated by our user
not_reviewed = user_reviews[user_reviews.isna()]
not_reviewed

# Get Recomendations
---
these are highly rated beers not reviewed by the user.

In [None]:
# Get the mean ratings of the beers not yet rated by the user
recommendations = mean_ratings[not_reviewed.index]
recommendations.nlargest(10)

## Add something little fancy


---

how about getting the top 20 beers and randomly selecting 5 of those.




In [None]:
recommendations.nlargest(20).sample(5)

## Pull it all together



---



In [None]:
user = 'BuckeyeNation'
# Get ALL the beers reviewed and not reviewed by our user
user_reviews = pivot_table.loc[user]
# Get the beers not yet rated by our user
not_reviewed = user_reviews[user_reviews.isna()]
recommendations = mean_ratings[not_reviewed.index]
recommendations.nlargest(20).sample(5)



# Make a Function

In [None]:
def beer_recomender():
  user = 'BuckeyeNation'
  # Get ALL the beers reviewed and not reviewed by our user
  user_reviews = pivot_table.loc[user]
  # Get the beers not yet rated by our user
  not_reviewed = user_reviews[user_reviews.isna()]
  recommendations = mean_ratings[not_reviewed.index]
  return recommendations.nlargest(20).sample(5)

beer_recomender()

In [None]:
# add parameters
def beer_recomender(user):
  #user = 'BuckeyeNation'
  # Get ALL the beers reviewed and not reviewed by our user
  user_reviews = pivot_table.loc[user]
  # Get the beers not yet rated by our user
  not_reviewed = user_reviews[user_reviews.isna()]
  recommendations = mean_ratings[not_reviewed.index]
  return recommendations.nlargest(20).sample(5)

beer_recomender(user='BuckeyeNation')

In [None]:
beer_recomender(user='mikesgroove')

# what happens if the user isn't in the data?
---
return the top N beers.

In [None]:
def beer_recomender(user):
  pivot_table = beer.pivot_table(
    index='reviewer',
    columns='full_beer_name',
    values='rating')
  mean_ratings = pivot_table.mean()
  # Get ALL the beers reviewed and not reviewed by our user
  if user not in pivot_table.index:
    print(f"user {user} not found returning top N")
    return mean_ratings.nlargest(20).sample(5)
  else:
    user_reviews = pivot_table.loc[user]
    # Get the beers not yet rated by our user
    not_reviewed = user_reviews[user_reviews.isna()]
    recommendations = mean_ratings[not_reviewed.index]
    return recommendations.nlargest(20).sample(5)


beer_recomender('BuckeyeNation')

In [None]:
beer_recomender(user='Random')