# 🍺 Interactive Beer Recommendation Notebook

This notebook loads your trained models, defines all helper functions, and then lets users request recommendations from:

1. **Collaborative filtering (Neu-MF)**  
2. **Content-based autoencoder**  
3. **Hybrid “content → collaborative lift” to solve Cold Start**  
   - Collaborative lift via Jaccard similarity over seed beers  
   - Collaborative lift via co-occurrence counts of seed beers  
   - Collaborative lift via a user autoencoder embedding of seed beers   
---


## Device Configuration

This cell detects and configures the compute device for your models.

In [21]:
import torch
import torch.nn as nn
import torch.nn.functional as F

device = torch.device(
    "cuda" if torch.cuda.is_available()
    else "mps"  if torch.backends.mps.is_available()
    else "cpu"
)

print(f"Using device: {device}")

if device.type == "cuda":
    idx = device.index or 0
    print("GPU name: ", torch.cuda.get_device_name(idx))
elif device.type == "mps":
    print("Running on Apple Silicon GPU via MPS")
else:
    print("Running on CPU")

Using device: mps
Running on Apple Silicon GPU via MPS


## Load Prepared Datasets

This cell loads two key CSV files into pandas DataFrames:

1. **`beer_content.csv`** — embedding features for each beer.  
2. **`final_beers_reviews_breweries.csv`** — the merged reviews for collaborative filtering.


In [22]:
import pandas as pd

try:
    df_beer_content = pd.read_csv('beer_content.csv')
    print("\final Data Sample:")
    print(df_beer_content.head())
except Exception as e:
    print(f"Error loading reviews.csv: {e}")

try:
    df_beers_reviews_breweries = pd.read_csv('final_beers_reviews_breweries.csv')
    print("\final Data Sample:")
    print(df_beers_reviews_breweries.head())
except Exception as e:
    print(f"Error loading reviews.csv: {e}")

inal Data Sample:
   beer_id               name  \
0        6           Turbodog   
1        7        Purple Haze   
2       10         Dubbel Ale   
3       17  Widmer Hefeweizen   
4       30     Trois Pistoles   

                                            all_text  
0  turbodog flavor flavors turbodog ale turbodog ...  
1  haze flavors haze flavor flavour haze haze bre...  
2  tasting dubbel dubbels flavored dubbel flavors...  
3  flavorful hefeweizens flavorful hefeweizen hef...  
4  beers trois brew trois breweries trois brews t...  
inal Data Sample:
              name state country                    style availability   abv  \
0  Older Viscosity    CA      US  American Imperial Stout     Rotating  12.0   
1  Older Viscosity    CA      US  American Imperial Stout     Rotating  12.0   
2  Older Viscosity    CA      US  American Imperial Stout     Rotating  12.0   
3  Older Viscosity    CA      US  American Imperial Stout     Rotating  12.0   
4  Older Viscosity    CA      US 

## Load Trained Content-Based Model and Artifacts

In this step, we:

- **Load the trained Autoencoder** for beer tasting notes, move it to the selected device, and set it to evaluation mode.
- **Load the precomputed beer embeddings** generated by the autoencoder.
- **Load the saved TF-IDF vectorizer** (via pickle) for transforming new tasting-note queries.

These components power the content-based recommendation pipeline.


In [23]:
beer_autoencoder_trained = torch.jit.load("beer_autoencoder_frozen.pt", map_location="cpu")
beer_autoencoder_trained.to(device)
beer_autoencoder_trained.eval()

beerEmbeddings = torch.load("beer_embeddings_autoencoder.pt", map_location="cpu")
beerEmbeddings.to(device)

import pickle
with open("tfidf_vectorizer.pkl", "rb") as f:
    vectorizer = pickle.load(f)

## Load Trained Collaborative Filtering Model

In this step, we:

- **Load the trained NeuMF model** for collaborative filtering, move it to the selected device, and set it to evaluation mode.

This model, together with the collaborative embeddings, powers the user-based recommendation pipeline.


In [24]:
neu_mf_model_trained = torch.jit.load("beer_neu_mf_model_frozen.pt", map_location="cpu")
neu_mf_model_trained.to(device)
neu_mf_model_trained.eval()

RecursiveScriptModule(
  original_name=NeuMF_Recommender
  (gmf_user_embedding): RecursiveScriptModule(original_name=Embedding)
  (gmf_beer_embedding): RecursiveScriptModule(original_name=Embedding)
  (mlp_user_embedding): RecursiveScriptModule(original_name=Embedding)
  (mlp_beer_embedding): RecursiveScriptModule(original_name=Embedding)
  (user_bias): RecursiveScriptModule(original_name=Embedding)
  (beer_bias): RecursiveScriptModule(original_name=Embedding)
  (beer_content_fc): RecursiveScriptModule(
    original_name=Sequential
    (0): RecursiveScriptModule(original_name=Linear)
    (1): RecursiveScriptModule(original_name=ReLU)
  )
  (mlp): RecursiveScriptModule(
    original_name=Sequential
    (0): RecursiveScriptModule(original_name=Linear)
    (1): RecursiveScriptModule(original_name=ReLU)
    (2): RecursiveScriptModule(original_name=Dropout)
    (3): RecursiveScriptModule(original_name=Linear)
    (4): RecursiveScriptModule(original_name=ReLU)
    (5): RecursiveScriptModule(

In [25]:
# Run this cell for some necessary pre-processing for Collaborative Filtering Model

from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder

scaler = MinMaxScaler()
df_beers_reviews_breweries[['look', 'smell', 'taste', 'feel', 'overall', 'score']] = scaler.fit_transform(
    df_beers_reviews_breweries[[ 'look', 'smell', 'taste', 'feel', 'overall', 'score']]
)

beerMeta = df_beers_reviews_breweries[['beer_id', 'abv', 'style']].drop_duplicates().set_index('beer_id')

style_encoder = LabelEncoder()
beerMeta['style_encoded'] = style_encoder.fit_transform(beerMeta['style'])

beerFeatureTensor = torch.tensor(
    beerMeta[['abv', 'style_encoded']].values,
    dtype=torch.float32
)

beerFeatureIndexMap = {beer_id: i for i, beer_id in enumerate(beerMeta.index)}

# Create mapping dictionaries for users and beers
users = df_beers_reviews_breweries['username'].unique()
userIndexDict = {user: idx for idx, user in enumerate(users)}

beers = df_beers_reviews_breweries['beer_id'].unique()
beerIndexDict = {beer: idx for idx, beer in enumerate(beers)}

# Map the original columns to new index columns
df_beers_reviews_breweries['userIndex'] = df_beers_reviews_breweries['username'].map(userIndexDict)
df_beers_reviews_breweries['beerIndex'] = df_beers_reviews_breweries['beer_id'].map(beerIndexDict)

print("Number of unique users:", len(userIndexDict))
print("Number of unique beers:", len(beerIndexDict))

# Inverse mappings
indexBeerDict = {index: beer for beer, index in beerIndexDict.items()}
indexUserMap = {index: user for user, index in userIndexDict.items()}


Number of unique users: 15894
Number of unique beers: 500


In [26]:
# Run this cell for necessary tility methods for Collaborative Filtering Model

def topKRecommendedBeersForUser(model, userIndex, beerFeatureTensor, topK=10):

    model.to(device)
    model.eval()
    allBeerIndices = torch.arange(len(beerIndexDict)).to(device)
    userTensor = torch.tensor([userIndex] * len(beerIndexDict), dtype=torch.long).to(device)
    # Feature indices are same as beer indices (1-to-1 mapping assumed)
    beerFeatureIndices = allBeerIndices
    with torch.no_grad():
        predictedRatings = model(userTensor, allBeerIndices, beerFeatureIndices, beerFeatureTensor.to(device))
    topRatings, topBeerIndices = torch.topk(predictedRatings, topK)
    recommendedBeers = [indexBeerDict[index.item()] for index in topBeerIndices]
    return recommendedBeers,topRatings

beerDetails = df_beers_reviews_breweries.groupby('beer_id').agg({
    'name': 'first',
    'state': 'first',
    'country': 'first',
    'style': 'first',
    'availability': 'first',
    'abv': 'mean',
    'notes': 'first',
    'look': 'mean',
    'smell': 'mean',
    'taste': 'mean',
    'feel': 'mean',
    'overall': 'mean',
    'score': 'mean',
    'name_brewery': 'first',
    'city': 'first',
    'notes_brewery': 'first',
    'types': 'first'
}).reset_index()

def getBeerDetailsFromIds(beerIdList):
    df_idx = beerDetails.set_index('beer_id')
    df_out = df_idx.loc[beerIdList]
    return df_out.reset_index()

def getBeerDetailsFromIdsWithPredictedScore(beerIdList, predictedScores):
    df_beerDetails = getBeerDetailsFromIds(beerIdList)[['name', 'style', 'abv', 'score']]
    scores = predictedScores.detach().cpu().tolist()
    df_beerDetails['predicted_user_score'] = scores
    return df_beerDetails
    
def getActualTopKReviewedBeersForUser(username, topK=10):
    userReviews = df_beers_reviews_breweries[df_beers_reviews_breweries['username'] == username]
    return userReviews.sort_values(by='score', ascending=False).head(topK)

def getUsername(userIndex):
    username = indexUserMap.get(userIndex, "Unknown User")
    print(f"Username for user index {userIndex}: {username}")
    return username

In [27]:
# Run this cell for necessary utility methods for BeerTastingNotes Content Autoencoder

import re
def cleanText(text):
    text = text.lower()
    text = re.sub(r'[^a-z0-9\s]', '', text)
    text = re.sub(r'\s+', ' ', text).strip()
    return text

def getBeerIdFromIndex(beerIndex):
    return df_beer_content.iloc[beerIndex]['beer_id']

def getBeerIdsFromIndices(beerIndexList):
    return df_beer_content.iloc[beerIndexList]['beer_id'].tolist()

def getBeerDetailsByIndex(beerIndex):
    beer_id = df_beer_content.iloc[beerIndex]['beer_id']
    name = df_beer_content.iloc[beerIndex]['name']
    print("Beer ID:", beer_id, "Name:", name)

def getBeersDetailsByIndices(beerIndices,similarity):
    for index in beerIndices:
        beer_id = df_beer_content.iloc[index.item()]['beer_id']
        name    = df_beer_content.iloc[index.item()]['name']
        print(f"{beer_id:5d} | {name:30s} | simlarity={similarity[index.item()]:.3f}")


def getTopKSimilarBeersByIndex(beerIndex, topK):
    targetEmbedding = beerEmbeddings[beerIndex].unsqueeze(0)
    cosSim = F.cosine_similarity(targetEmbedding, beerEmbeddings)
    cosSim[beerIndex] = -1
    _, topIndices = torch.topk(cosSim, k=topK)
    getBeerDetailsByIndex(beerIndex)
    getBeersDetailsByIndices(topIndices,cosSim)
    return topIndices

def getTopKSimilarBeersByQuery(query, topK):
    queryClean = cleanText(query)
    queryTfidf = vectorizer.transform([queryClean])
    queryTensor = torch.tensor(queryTfidf.toarray(), dtype=torch.float).to(device)

    beer_autoencoder_trained.eval()
    with torch.no_grad():
        _, queryEmbedding = beer_autoencoder_trained(queryTensor)
    similarity = F.cosine_similarity(queryEmbedding, beerEmbeddings.to(device))
    _, topIndicesTensor = torch.topk(similarity, topK)
    getBeersDetailsByIndices(topIndicesTensor, similarity)
    return topIndicesTensor


---

## Neu MF Collaborative Filtering Beer Recommender

This cell prompts the user for their preferred number of beer recommendations and their user index, then displays personalized picks from the collaborative filtering model. However, this model suffers from the "cold start" problem as it can only recommend for existing users.

---

## User Prompts

**How many beers would you like recommended?**  
*(Enter a number between 1 and 500)*
   - Prompts for an integer between **1** and **500** (Number of Beers).  
   - Validates input and repeats until a valid number is entered.

**Enter your User index**  
*(Enter your userIndex (0–15893))*
   - Prompts for an integer between **0** and **15893** (Number of Users).  
   - Validates input and repeats until a valid index is entered.

---

## Fetch and display recommendations
- Looks up the username for the given index.  
- Runs the trained Neu-MF collaborative filtering model to get top N beer IDs and predicted scores.
---

In [28]:
while True:
    try:
        numberOfBeersToRecommend = int(input("How many beers would you like recommended? "))
        if not (1 <= numberOfBeersToRecommend <= 500):
            raise ValueError("must be between 1 and 500")
        break
    except ValueError as e:
        print(f" Invalid number: {e}")

while True:
    try:
        u_str = input("Enter your userIndex (0–15893): ")
        userIndex = int(u_str)
        if not (0 <= userIndex < 15894):
            raise ValueError("must be between 0 and 15893")
        break
    except ValueError as e:
        print(f" Invalid userIndex: {e}")

try:
    username = getUsername(userIndex)
    print(f"\nWelcome back, {username}! Here are your top {numberOfBeersToRecommend} picks:")
    
    recommended_beer_ids, predictedRatings = topKRecommendedBeersForUser(
        neu_mf_model_trained,
        userIndex,
        beerFeatureTensor,
        topK=numberOfBeersToRecommend
    )
    
    df = getBeerDetailsFromIdsWithPredictedScore(recommended_beer_ids, predictedRatings)
    print(df.to_string(index=False))
    
except Exception as e:
    print(f" Could not find collaborative recommendations: {e}")


How many beers would you like recommended?  5
Enter your userIndex (0–15893):  65


Username for user index 65: Stinger80OH

Welcome back, Stinger80OH! Here are your top 5 picks:
                             name                    style   abv    score  predicted_user_score
                Pliny The Younger    American Imperial IPA 10.25 0.914736              0.920964
   CBS (Canadian Breakfast Stout)  American Imperial Stout 11.70 0.902547              0.920748
   Trappist Westvleteren 12 (XII) Belgian Quadrupel (Quad) 10.20 0.902712              0.914673
Bourbon County Brand Coffee Stout  American Imperial Stout 12.90 0.886392              0.912987
                     Heady Topper          New England IPA  8.00 0.906973              0.911953


---

# Content Autoencoder Based Beer Recommender

---

## Example Tasting-Note Queries

- **q1:**  imperial stout roasted chocolate espresso  
- **q2:**  ipa juicy citrus piney  
- **q3:**  hefeweizen german  
- **q4:**  belgian tripel apricot spicy vanilla  
- **q5:**  pumpkin cinnamon vanilla  
- **q6:**  smoked porter coffee cocoa roast  
- **q7:**  raspberry sour tart lemon  
- **q8:**  bourbon barrel aged stout roasty malty  
- **q9:**  roasty malty smooth  
- **q10:** pilsner pale citrus lemon  
- **q11:** heineken lager  

---

## User Prompts

**How many beers would you like recommended?**  
*(Enter a number between 1 and 500)*

**Enter your tasting notes**  
*(e.g. “bourbon barrel aged stout roasty malty”)*

---

## Fetch and display recommendations
- Transforms a text query into an embedding via the Autoencoder
- Displays the top‑K similar beers


In [12]:

while True:
    try:
        numberOfBeersToRecommend = int(input("How many beers would you like recommended? "))
        if numberOfBeersToRecommend < 1 or numberOfBeersToRecommend > 500:
            raise ValueError
        break
    except ValueError:
        print("Please enter a positive integer.")

query = input("Enter your tasting notes (e.g. “bourbon barrel aged stout roasty malty”): ").strip()
if not query:
    print("No query given, defaulting to an empty string.")
    query = ""

# Content‐based seeds
print(f"\nContent‐based picks ({numberOfBeersToRecommend}) for “{query}”:")
try:
    beerIndicesTensor = getTopKSimilarBeersByQuery(query, topK=numberOfBeersToRecommend)

except Exception as e:
    print(f"  → Error during content lookup: {e}")

How many beers would you like recommended?  5
Enter your tasting notes (e.g. “bourbon barrel aged stout roasty malty”):   imperial stout roasted chocolate espresso



Content‐based picks (5) for “imperial stout roasted chocolate espresso”:
  754 | Guinness Draught               | simlarity=0.999
41702 | Theobroma                      | simlarity=0.999
16062 | Brewer's Reserve Bourbon Barrel Stout | simlarity=0.999
 2010 | Java Stout                     | simlarity=0.998
 8322 | New Holland The Poet           | simlarity=0.998


---

### Helper: `similarUsersJaccard`

This function finds the existing users whose “liked” beer sets best match the seed beers, using Jaccard similarity.

1. Builds each user’s full set of reviewed/liked beer IDs.  
2. For each user, computes  
   \- **Intersection** size = number of beers both the user and the seed list share  
   \- **Union** size = total distinct beers in either set  
   \- **Jaccard score** = intersection / union  
3. Ranks users by Jaccard score and returns the top N.

- **Inputs**  
  - `top_beer_ids` (list[int]): seed beer IDs from the content recommendation step  
  - `df` (DataFrame): all user–beer interactions (`userIndex`, `beer_id`, …)  
  - `top_n` (int): how many top users to return (default: 1)  

- **Returns**  
  - `List[int]`: the `top_n` user indices with highest Jaccard similarity to the seed set  


In [13]:
from collections import defaultdict

def similarUsersJaccard(top_beer_ids, df, top_n=1):
    top_beers = set(top_beer_ids)
    scores = {}
    # precompute each user’s full liked-set
    user_likes = {uid: set(group['beer_id']) for uid, group in df.groupby('userIndex')}
    for uid, likes in user_likes.items():
        inter = len(likes & top_beers)
        union = len(likes | top_beers)
        if union > 0 and inter > 0:
            scores[uid] = inter / union

    top_users = sorted(scores.items(), key=lambda x: x[1], reverse=True)[:top_n]
    return [u for u, _ in top_users]


# Hybrid Content → Collaborative-Lift Beer Recommender (Jaccard)

This workflow combines the strengths of both the content-based autoencoder and collaborative filtering models to address the cold-start problem:

1. **Content-Based Seed Selection**  
   - Transform the user’s tasting-note query into an embedding via the Autoencoder.  
   - Retrieve the top _N_ beers that best match those notes.

2. **Collaborative “Lift”**  
   - Treat those _N_ seed beers as “liked” by a new user.  
   - Compute Jaccard similarity against every existing user’s like-vector.  
   - Select the most similar existing user and surface their top _N_ beers (excluding the seeds).

---

## User Prompts

1. **How many beers would you like recommended?**  
   *(Enter a number between 1 and 500)*  
2. **Enter your tasting notes**  
   *(e.g. “bourbon barrel aged stout roasty malty”)*

---

## Example Tasting-Note Queries

- **q1:**  imperial stout roasted chocolate espresso  
- **q2:**  hefeweizen german  
- **q3:**  belgian tripel apricot spicy vanilla  
- **q4:**  pumpkin cinnamon vanilla  
- **q5:**  smoked porter coffee cocoa roast

---

## Hybrid Recommendation Steps

1. **Content-Based Picks**  
   - Display the top _N_ beers matching the user’s tasting-notes query.  
2. **Collaborative Lift**  
   - Identify the one existing user whose liked beers most overlap with those _N_ seeds (Jaccard similarity).  
   - Recommend that user’s top _N_ beers, excluding the original seeds.  



In [14]:
# Prompt user for inputs
while True:
    try:
        numberOfBeersToRecommend = int(input("How many beers would you like recommended? "))
        if numberOfBeersToRecommend < 1 or numberOfBeersToRecommend > 500:
            raise ValueError
        break
    except ValueError:
        print("Please enter a positive integer.")

query = input("Enter your tasting notes (e.g. “bourbon barrel aged stout roasty malty”): ").strip()
if not query:
    print("No query given, defaulting to an empty string.")
    query = ""

# Content‐based seeds
print(f"\nContent‐based picks ({numberOfBeersToRecommend}) for “{query}”:")
try:
    beerIndicesTensor = getTopKSimilarBeersByQuery(query, topK=numberOfBeersToRecommend)
    beerIndicesList = beerIndicesTensor.detach().cpu().tolist()
    beerIdList = getBeerIdsFromIndices(beerIndicesList)
except Exception as e:
    print(f"  → Error during content lookup: {e}")
    beerIdList = []

# Collaborative “lift” from similar user
if beerIdList:
    try:
        similarUsers = similarUsersJaccard(
            top_beer_ids=beerIdList,
            df=df_beers_reviews_breweries,
            top_n=1
        )
        if not similarUsers:
            raise LookupError("no user found")
        similarUserIndex = similarUsers[0]
        print(' ')
        username = getUsername(similarUserIndex)
        print(f"User‐based picks ({numberOfBeersToRecommend}) from someone with a similar palate ({username}):")
        recommended_beer_ids, predictedRatings = topKRecommendedBeersForUser(
            neu_mf_model_trained,
            similarUserIndex,
            beerFeatureTensor,
            topK=numberOfBeersToRecommend
        )
        print(getBeerDetailsFromIdsWithPredictedScore(recommended_beer_ids, predictedRatings).to_string(index=False))
    except Exception as e:
        print(f"  → Could not find collaborative recommendations: {e}")
else:
    print("No seed beers to drive collaborative step.")


How many beers would you like recommended?  5
Enter your tasting notes (e.g. “bourbon barrel aged stout roasty malty”):  pumpkin coffee



Content‐based picks (5) for “pumpkin coffee”:
  132 | Ayinger Bräu Weisse            | simlarity=0.999
  924 | Franziskaner Hefe-Weisse Dunkel | simlarity=0.998
  727 | Aecht Schlenkerla Rauchbier Märzen | simlarity=0.997
 1932 | Pumpkinhead Ale                | simlarity=0.996
   74 | Post Road Pumpkin Ale          | simlarity=0.996
 
Username for user index 2424: PolishHurricane
User‐based picks (5) from someone with a similar palate (PolishHurricane):
                          name                    style   abv    score  predicted_user_score
             Pliny The Younger    American Imperial IPA 10.25 0.914736              0.895172
                  Heady Topper          New England IPA  8.00 0.906973              0.889271
               Pliny The Elder    American Imperial IPA  8.00 0.894939              0.873127
               Sip Of Sunshine    American Imperial IPA  8.00 0.882953              0.870841
Trappist Westvleteren 12 (XII) Belgian Quadrupel (Quad) 10.20 0.902712     

---

### Helper: `similarUserContentCooccurrence`

Given a set of beer IDs (the “seed” beers from the content model) and the full reviews DataFrame, this function:

1. Groups reviews by user.  
2. For each user, computes how many of the seed beer IDs they have also reviewed (“co-occurrence” count).  
3. Ranks users by that co-occurrence count.  
4. Returns the top N user indices (default N=1).

- **Inputs**  
  - `top_beer_ids`: list of beer IDs from the content recommendation step  
  - `df`: DataFrame of all user reviews (`userIndex`, `beer_id`, …)  
  - `top_n`: number of top users to return  

- **Output**  
  - List of the `top_n` user indices whose taste overlaps the most with the seed beers  


In [13]:
from collections import defaultdict

def similarUserContentCooccurrence(top_beer_ids, df, top_n=1):
    cooccur_counts = defaultdict(int)

    # For each user, count overlap
    for user_id, group in df.groupby('userIndex'):
        user_beers = set(group['beer_id'])
        shared = user_beers & set(top_beer_ids)
        if shared:
            cooccur_counts[user_id] = len(shared)

    top_users = sorted(cooccur_counts.items(), key=lambda x: x[1], reverse=True)[:top_n]

    return [user for user, _ in top_users]


# Hybrid Content → Collaborative-Lift Beer Recommender (Co-occurrence)

This workflow combines the strengths of both the content-based autoencoder and collaborative filtering models to address the cold-start problem:

1. **Content-Based Seed Selection**  
   - Transform the user’s tasting-note query into an embedding via the Autoencoder.  
   - Retrieve the top _N_ beers that best match those notes.

2. **Collaborative “Lift” via Co-occurrence**  
   - Treat those _N_ seed beers as “liked” by a new user.  
   - Count how many of those seed beers each existing user has also liked.  
   - Select the user with the highest overlap (co-occurrence) and surface their top _N_ beers (excluding the seeds).

---

## User Prompts

1. **How many beers would you like recommended?**  
   *(Enter a number between 1 and 500)*  
2. **Enter your tasting notes**  
   *(e.g. “bourbon barrel aged stout roasty malty”)*  

---

## Example Tasting-Note Queries

- **q1:**  imperial stout roasted chocolate espresso  
- **q2:**  hefeweizen german  
- **q3:**  belgian tripel apricot spicy vanilla  
- **q4:**  pumpkin cinnamon vanilla  
- **q5:**  smoked porter coffee cocoa roast

---

## Hybrid Recommendation Steps

1. **Content-Based Picks**  
   - Display the top _N_ beers matching the user’s tasting-notes query.  
2. **Collaborative Lift**  
   - Identify the one existing user whose liked beers have the greatest overlap with those _N_ seeds (co-occurrence count).  
   - Recommend that user’s top _N_ beers, excluding the original seeds.  


In [14]:
while True:
    try:
        numberOfBeersToRecommend = int(input("How many beers would you like recommended? "))
        if numberOfBeersToRecommend < 1 or numberOfBeersToRecommend > 500:
            raise ValueError
        break
    except ValueError:
        print("Please enter a positive integer.")

query = input("Enter your tasting notes (e.g. “bourbon barrel aged stout roasty malty”): ").strip()
if not query:
    print("No query given, defaulting to an empty string.")
    query = ""

# Content‐based seeds
print(f"\nContent‐based picks ({numberOfBeersToRecommend}) for “{query}”:")
try:
    beerIndicesTensor = getTopKSimilarBeersByQuery(query, topK=numberOfBeersToRecommend)
    beerIndicesList = beerIndicesTensor.detach().cpu().tolist()
    beerIdList = getBeerIdsFromIndices(beerIndicesList)
except Exception as e:
    print(f"  → Error during content lookup: {e}")
    beerIdList = []

# Collaborative “lift” from similar user
if beerIdList:
    try:
        similarUsers = similarUserContentCooccurrence(
            top_beer_ids=beerIdList,
            df=df_beers_reviews_breweries,
            top_n=1
        )
        if not similarUsers:
            raise LookupError("no user found")
        similarUserIndex = similarUsers[0]
        print(' ')
        username = getUsername(similarUserIndex)
        print(f"User‐based picks ({numberOfBeersToRecommend}) from someone with a similar palate ({username}):")
        recommended_beer_ids, predictedRatings = topKRecommendedBeersForUser(
            neu_mf_model_trained,
            similarUserIndex,
            beerFeatureTensor,
            topK=numberOfBeersToRecommend
        )
        print(getBeerDetailsFromIdsWithPredictedScore(recommended_beer_ids, predictedRatings).to_string(index=False))
    except Exception as e:
        print(f"  → Could not find collaborative recommendations: {e}")
else:
    print("No seed beers to drive collaborative step.")


How many beers would you like recommended?  5
Enter your tasting notes (e.g. “bourbon barrel aged stout roasty malty”):  hefeweizen german



Content‐based picks (5) for “hefeweizen german”:
  924 | Franziskaner Hefe-Weisse Dunkel | simlarity=1.000
48434 | Kellerweis                     | simlarity=0.999
  132 | Ayinger Bräu Weisse            | simlarity=0.999
 1256 | Paulaner Hefe-Weissbier Naturtrüb | simlarity=0.999
 1946 | Franziskaner Hefe-Weisse       | simlarity=0.999
 
Username for user index 80: gatornation
User‐based picks (5) from someone with a similar palate (gatornation):
                          name                    style   abv    score  predicted_user_score
                  Heady Topper          New England IPA  8.00 0.906973              0.892603
             Pliny The Younger    American Imperial IPA 10.25 0.914736              0.886225
Trappist Westvleteren 12 (XII) Belgian Quadrupel (Quad) 10.20 0.902712              0.883244
                      Parabola   Russian Imperial Stout 12.70 0.879978              0.883177
CBS (Canadian Breakfast Stout)  American Imperial Stout 11.70 0.902547             

---

## Load Trained User Autoencoder Model and Artifacts

In this step, we:

- **Load the trained Autoencoder** for user beer like, move it to the selected device, and set it to evaluation mode.
- **Load the precomputed user embeddings** generated by the autoencoder.

These components power the gap between the help to find the most similar user in the recommendation pipeline.


In [16]:
user_autoencoder_trained = torch.jit.load("user_autoencoder_frozen.pt", map_location="cpu")
user_autoencoder_trained.to(device)
user_autoencoder_trained.eval()

userEmbeddings = torch.load("user_embeddings_autoencoder.pt", map_location="cpu")
userEmbeddings.to(device)

tensor([[-0.4708,  0.6864,  0.6942,  ..., -0.3813,  1.0553, -1.7108],
        [ 0.4211, -0.2114,  1.1575,  ..., -2.4868,  2.5109, -0.2493],
        [-1.1366,  0.6097, -0.2067,  ..., -1.1771,  0.5580,  0.8170],
        ...,
        [-0.9689,  2.3252, -0.9861,  ..., -0.3556,  0.2636, -0.1046],
        [ 0.4801,  0.5090, -0.3059,  ..., -1.2790,  2.0390,  1.2585],
        [-0.7030,  0.2653,  0.7570,  ..., -0.5803,  1.3950,  0.3581]],
       device='mps:0')

In [86]:
# Run this cell for necessary utility methods for User Autoencoder
try:
    user_beer_likes = pd.read_csv('user_beer_likes.csv')
    print("final Data Sample:")
    print(user_beer_likes.head())
except Exception as e:
    print(f"Error loading reviews.csv: {e}")

def getUserDetailsByIndex_likes(userIndex):
    username = user_beer_likes.iloc[userIndex]['username']
    print("Index:", userIndex, "Username:", username)
    return username

def beerIdsToIndices_likes(beerIds):
    beerCols = [col for col in user_beer_likes.columns if col not in ('username', 'userIndex')]
    colIndex = {col: idx for idx, col in enumerate(beerCols)}
    indices = []
    for bid in beerIds:
        if str(bid) not in colIndex:
            raise KeyError(f"Beer ID {bid} not found among DataFrame columns")
        indices.append(colIndex[str(bid) ])
    return indices

def getTopKSimilarUsersFromBeers(beerIds,topK):
    beerIndices = beerIdsToIndices_likes(beerIds)
    seedVector = torch.zeros(500, device=device) # 500 num of beers
    seedVector[beerIndices] = 1.0
    with torch.no_grad():
        seedEmbedding = user_autoencoder_trained.encoder(seedVector.unsqueeze(0))  # (1, emb_dim)
    cosSim = F.cosine_similarity(seedEmbedding, userEmbeddings.to(device))  # (n_users,)
    _, topIndices = torch.topk(cosSim, k=topK)
    return topIndices

final Data Sample:
   username  6  7  10  17  30  31  33  34  39  ...  76816  77299  82250  \
0   --Dom--  0  0   0   0   0   0   0   0   1  ...      0      1      0   
1     -Rick  0  0   0   0   0   0   0   0   0  ...      0      0      1   
2   -steve-  0  0   0   0   0   0   0   0   0  ...      0      0      0   
3   00trayn  0  0   0   0   0   0   0   0   0  ...      0      0      0   
4  01001111  0  0   0   0   0   0   0   0   0  ...      0      0      0   

   84596  86149  89174  94350  99873  117177  148052  
0      0      0      0      0      0       0       0  
1      0      0      0      1      0       0       0  
2      0      0      0      1      0       0       0  
3      0      0      0      0      0       0       0  
4      1      0      0      1      0       0       0  

[5 rows x 501 columns]


# Hybrid Content → Collaborative-Lift Beer Recommender (User Autoencoder)

This workflow combines the strengths of both the content-based autoencoder and collaborative filtering models to address the cold-start problem:

1. **Content-Based Seed Selection**  
   - Transform the user’s tasting-note query into an embedding via the content Autoencoder.  
   - Retrieve the top _N_ beers that best match those notes.

2. **Collaborative Lift via User Autoencoder**  
   - Build a binary “like” vector of length 500, setting entries for the _N_ seed beers to 1.  
   - Pass this seed vector through the UserAutoencoder encoder to obtain a “seed-user” embedding.  
   - Compute cosine similarity between the seed embedding and every existing user’s embedding.  
   - Select the most similar user and recommend their top _N_ beers (excluding the original seeds).

---

## User Prompts

1. **How many beers would you like recommended?**  
   *(Enter a number between 1 and 500)*  
2. **Enter your tasting notes**  
   *(e.g. “bourbon barrel aged stout roasty malty”)*  

---

## Example Tasting-Note Queries

- **q1:**  imperial stout roasted chocolate espresso  
- **q2:**  hefeweizen german  
- **q3:**  belgian tripel apricot spicy vanilla  
- **q4:**  pumpkin cinnamon vanilla  
- **q5:**  smoked porter coffee cocoa roast  

---

## Hybrid Recommendation Steps

1. **Content-Based Picks**  
   - Display the top _N_ beers matching the user’s tasting-notes query.  

2. **Collaborative Lift**  
   - Encode the seed beers into a user-style embedding.  
   - Find the existing user whose embedding is closest (cosine) to that seed embedding.  
   - Recommend that user’s top _N_ beers.  


In [88]:
while True:
    try:
        numberOfBeersToRecommend = int(input("How many beers would you like recommended? "))
        if numberOfBeersToRecommend < 1 or numberOfBeersToRecommend > 500:
            raise ValueError
        break
    except ValueError:
        print("Please enter a positive integer.")

query = input("Enter your tasting notes (e.g. “bourbon barrel aged stout roasty malty”): ").strip()
if not query:
    print("No query given, defaulting to an empty string.")
    query = ""

# Content‐based seeds
print(f"\nContent‐based picks ({numberOfBeersToRecommend}) for “{query}”:")
try:
    beerIndicesTensor = getTopKSimilarBeersByQuery(query, topK=numberOfBeersToRecommend)
    beerIndicesList = beerIndicesTensor.detach().cpu().tolist()
    beerIdList = getBeerIdsFromIndices(beerIndicesList)
except Exception as e:
    print(f"  → Error during content lookup: {e}")
    beerIdList = []

# Collaborative “lift” from similar user
if beerIdList:
    try:
        userindex = getTopKSimilarUsersFromBeers(
            beerIds=beerIdList,
            topK=1
        )
        if not userindex:
            raise LookupError("no user found")
        username = getUserDetailsByIndex_likes(userindex.item())
        userIndicesActual = df_beers_reviews_breweries.loc[df_beers_reviews_breweries['username'] == username,'userIndex']
        similarUserIndex = userIndicesActual.iloc[0]
        verifyUsername = getUsername(similarUserIndex)
        print(' ')
        print(f"User‐based picks ({numberOfBeersToRecommend}) from someone with a similar palate ({username}):")
        recommended_beer_ids, predictedRatings = topKRecommendedBeersForUser(
            neu_mf_model_trained,
            similarUserIndex,
            beerFeatureTensor,
            topK=numberOfBeersToRecommend
        )
        print(getBeerDetailsFromIdsWithPredictedScore(recommended_beer_ids, predictedRatings).to_string(index=False))
    except Exception as e:
        print(f"  → Could not find collaborative recommendations: {e}")
else:
    print("No seed beers to drive collaborative step.")


How many beers would you like recommended?  5
Enter your tasting notes (e.g. “bourbon barrel aged stout roasty malty”):  pumpkin cinammon vanilla



Content‐based picks (5) for “pumpkin cinammon vanilla”:
   74 | Post Road Pumpkin Ale          | simlarity=0.999
 1932 | Pumpkinhead Ale                | simlarity=0.999
38394 | Pumking                        | simlarity=0.999
  100 | Blue Moon Harvest Moon Pumpkin Ale | simlarity=0.999
 6260 | Punkin Ale                     | simlarity=0.999
Index: 1791 Username: ConsumerOfBeer
Username for user index 12255: ConsumerOfBeer
 
User‐based picks (5) from someone with a similar palate (ConsumerOfBeer):
                          name                    style  abv    score  predicted_user_score
                      Parabola   Russian Imperial Stout 12.7 0.879978                   1.0
      Hunahpu's Imperial Stout  American Imperial Stout 11.0 0.873814                   1.0
Trappist Westvleteren 12 (XII) Belgian Quadrupel (Quad) 10.2 0.902712                   1.0
KBS (Kentucky Breakfast Stout)  American Imperial Stout 12.3 0.884183                   1.0
CBS (Canadian Breakfast Stout)  Ame