# Content Based Recommendation 

Content-based recommendation focuses on suggesting recipes based on the features of the recipes themselves, such as ingredients, dietary preferences, and nutritional values. By analyzing these features, the system can recommend similar recipes to users based on their previous interactions or preferences, helping to personalize suggestions effectively. Now that we have preprocessed the data, we will build a content-based recommendation system.


## Introduction  

In our **Recipe Recommender System**, we leverage **natural language processing (NLP)** techniques to enhance recipe recommendations. By combining **Word2Vec embeddings** for ingredients and **TF-IDF vectorization** for textual data, we ensure accurate and meaningful suggestions based on user input.  


Let us start by importing necessary libraries.

In [29]:
import pandas as pd
from gensim.models import Word2Vec
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from nltk.tokenize import word_tokenize
import numpy as np

### High-Level Plan
**Weighting:**  
Ingredients and tags will be weighted the highest.
Name and description will have lower weights.

**User Input:**  
The user will provide everything as a single search string (e.g., "Italian pasta with tomato and cheese").
The system will parse the input to extract relevant information (e.g., ingredients, cuisine type, etc.).

**Recommendations:**  
The system will return the top 5 recommendations based on the combined similarity score.

**Steps:**
- Train Word2Vec on the ingredients column.
- Create TF-IDF vectors for the text_data column (which includes name, tags, and description).
- Combine the Word2Vec embeddings and TF-IDF vectors with appropriate weights.
- Calculate similarity scores between the user input and the dataset.
- Return the top 5 recipes.

In [30]:
df = pd.read_pickle("C:/Users/pd006/Desktop/internship_search/machine_learning/Recipe-Recommender-System/data/food.pkl")

In [31]:
# Combine 'name', 'tags', and 'description' into 'text_data'
df['text_data'] = df['name'] + ' ' + df['tags'].apply(' '.join) + ' ' + df['description']

In [32]:
# Train Word2Vec on ingredients
ingredient_sentences = df['ingredients'].tolist()
word2vec_model = Word2Vec(sentences=ingredient_sentences, vector_size=100, window=5, min_count=1, workers=4)

In [33]:
# Function to get average Word2Vec vector for a list of ingredients
def get_average_word2vec(ingredients, model):
    vectors = [model.wv[word] for word in ingredients if word in model.wv]
    if len(vectors) > 0:
        return np.mean(vectors, axis=0)
    else:
        return np.zeros(model.vector_size)

In [34]:
# Convert 'text_data' to TF-IDF vectors
tfidf_vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf_vectorizer.fit_transform(df['text_data'])


In [35]:
# Function to recommend recipes
def recommend_recipes(user_input, df, word2vec_model, tfidf_vectorizer, tfidf_matrix, top_n=5):
    # Tokenize user input
    user_tokens = word_tokenize(user_input.lower())
    
    # Extract ingredients from user input (assuming ingredients are known words in the Word2Vec model)
    user_ingredients = [word for word in user_tokens if word in word2vec_model.wv]
    
    # Get Word2Vec vector for user ingredients
    user_ingredient_vector = get_average_word2vec(user_ingredients, word2vec_model)
    
    # Get TF-IDF vector for user input
    user_tfidf_vector = tfidf_vectorizer.transform([user_input])
    
    # Calculate cosine similarity for Word2Vec (ingredients)
    ingredient_similarities = []
    for ingredients in df['ingredients']:
        recipe_ingredient_vector = get_average_word2vec(ingredients, word2vec_model)
        similarity = cosine_similarity([user_ingredient_vector], [recipe_ingredient_vector])[0][0]
        ingredient_similarities.append(similarity)
    
    # Calculate cosine similarity for TF-IDF (text_data)
    tfidf_similarities = cosine_similarity(user_tfidf_vector, tfidf_matrix).flatten()
    
    # Combine similarities with weights
    # Ingredients and tags are weighted highest (0.6), name and description lower (0.4)
    combined_similarities = 0.6 * np.array(ingredient_similarities) + 0.4 * tfidf_similarities
    
    # Sort recipes by combined similarity
    df['similarity'] = combined_similarities
    recommendations = df.sort_values(by='similarity', ascending=False).head(top_n)
    
    return recommendations[['name', 'tags', 'description', 'ingredients', 'similarity']]


In [38]:
import nltk
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\pd006\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt_tab.zip.


True

In [43]:
# Example usage
user_input = "spicy chicken noodles"
recommendations = recommend_recipes(user_input, df, word2vec_model, tfidf_vectorizer, tfidf_matrix, top_n=5)

In [44]:
recommendations[["name", "ingredients"]]

Unnamed: 0,name,ingredients
165647,spicy cheesy chicken noodles,"[velveeta cheese, rotel tomatoes, rotini noodl..."
121681,nif s butterflied grilled whole chicken,"[chicken, dry rub seasonings]"
54449,creamy skillet chicken and noodles,"[chicken breast, onion, chicken broth, condens..."
194870,yummy chicken casserole,"[chicken breasts, chicken flavor stuffing mix,..."
141225,puerto rican chicken,"[chicken, adobo seasoning, season-all salt]"


### Summary  

To enhance recipe recommendations, we utilize **Word2Vec** to model ingredient relationships and **TF-IDF** to analyze textual data. The **Word2Vec model** generates semantic embeddings for ingredients, while **TF-IDF** captures important features from recipe names, tags, and descriptions.  

When a user provides input, it is **tokenized**, and relevant ingredients are extracted by checking against the **Word2Vec vocabulary**. The input is also converted into a **TF-IDF vector** for textual comparison.  

To determine similarity, we compute **cosine similarity** for both **Word2Vec embeddings** (ingredients) and **TF-IDF vectors** (text). The final similarity score is a **weighted combination**, giving **0.6 importance to ingredients and tags** and **0.4 to recipe names and descriptions**.  

Based on this **hybrid similarity approach**, recipes are ranked, and the **top 5 most relevant recipes** are returned as recommendations.  
