## Importing libraries data

In [1]:
# Importing required libraries
import pandas as pd
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import jaccard_score
vectorizer = TfidfVectorizer(stop_words='english')

In [2]:
# Loading ingredients csv file
ingredients = pd.read_csv('../data/ingredients.csv')

## Data Pre-processing

In [3]:
# Removing the white spaces in the cuisine type column
ingredients['C_Type'] = ingredients['C_Type'].apply(lambda x:x.strip())
# Seperating all the ingredients with a comma, removing white spaces around the ingredient name
ingredients['Describe Cleaned'] = ingredients['Describe'].apply(lambda x:[item.strip().lower() for item in x.split(',')])
# Counting the # of ingredients required for each meal
ingredients['ingredients_count'] = ingredients['Describe Cleaned'].apply(lambda x:len(x))

In [4]:
ingredients

Unnamed: 0,Food_ID,Name,C_Type,Veg_Non,Describe,Describe Cleaned,ingredients_count
0,1,summer squash salad,Healthy Food,veg,"white balsamic vinegar, lemon juice, lemon rin...","[white balsamic vinegar, lemon juice, lemon ri...",10
1,2,chicken minced salad,Healthy Food,non-veg,"olive oil, chicken mince, garlic (minced), oni...","[olive oil, chicken mince, garlic (minced), on...",16
2,3,sweet chilli almonds,Snack,veg,"almonds whole, egg white, curry leaves, salt, ...","[almonds whole, egg white, curry leaves, salt,...",6
3,4,tricolour salad,Healthy Food,veg,"vinegar, honey/sugar, soy sauce, salt, garlic ...","[vinegar, honey/sugar, soy sauce, salt, garlic...",11
4,5,christmas cake,Dessert,veg,"christmas dry fruits (pre-soaked), orange zest...","[christmas dry fruits (pre-soaked), orange zes...",8
...,...,...,...,...,...,...,...
395,396,Kimchi Toast,Korean,veg,"cream cheese, chopped kimchi, scallions,count...","[cream cheese, chopped kimchi, scallions, coun...",6
396,397,"Tacos de Gobernador (Shrimp, Poblano, and Chee...",Mexican,non-veg,"poblano chiles, bacon, shrips, red salsa, garl...","[poblano chiles, bacon, shrips, red salsa, gar...",7
397,398,Melted Broccoli Pasta With Capers and Anchovies,French,non-veg,"broccoli,Bread Crumbs, anchovy fillets, garli...","[broccoli, bread crumbs, anchovy fillets, garl...",7
398,399,Lemon-Ginger Cake with Pistachios,Dessert,non-veg,"egg yolks,lemon juice, unsalted butter, all pu...","[egg yolks, lemon juice, unsalted butter, all ...",7


## Hybrid Recommender Systems - Rule and Content based

In [5]:
# Coverage metric function estimates how many of required ingredients are available with the user
def CoverageMetric(A, R):
    # A represents available ingredients
    set1 = set(A)
    # R represents ingredients required to make a food item
    set2 = set(R)
    # intersection represents how many of required ingredients are already available
    intersection = set1.intersection(set2)
    return len(intersection) / len(set2)

The coverage metric in the above cell helps in identifying whether user has all the list of the items that are required to make the food item. The metric is avoiding penalty for the additional ingredients that user has as the goal is to identify how much of the requirement is being satisfied by the user.

One major limitation of this metric is that it does not consider the importance of base ingredients to make the recommendation which is handled by the Tf-Idf implmentation

In [6]:
def hybrid_recommender(ing_df, c_type, veg, ingredients_lst, ingredients_max=20):
    # Convert the input ingredients string into a list
    ingredients_lst = ingredients_lst.split(', ')
    
    # Filter by cuisine type and Veg/Non-Veg, then by max ingredients if provided
    df_filtered = ing_df[(ing_df['C_Type'] == c_type) & (ing_df['Veg_Non'] == veg)]
    if isinstance(ingredients_max, int):
        df_filtered = df_filtered[df_filtered['ingredients_count'] <= ingredients_max]
    
    # Check if any recipes remain after filtering
    if df_filtered.empty:
        print('Not able to find correct recipe for you')
        return

    # Compute Coverage metric scores for the cleaned ingredient lists
    df_filtered = df_filtered.copy()
    df_filtered['c_score'] = df_filtered['Describe Cleaned'].apply(
        lambda ings: CoverageMetric(ingredients_lst, ings)
    )
    
    # Get top 5 recipes by similarity score using pandas nlargest
    top_recipes = df_filtered.nlargest(5, 'c_score')
    
    for _, row in top_recipes.iterrows():
        name = row['Name']
        describe = row['Describe Cleaned']
        print(150 * '-')
        print('Recommended Recipe Name:', name)
        print('Similarity Score:', round(row['c_score'], 2))
        print(f'{len(describe)} ingredients needed to make recipe:', set(describe))
        missing_ingredients = set(describe) - set(ingredients_lst)
        print(f'{len(missing_ingredients)} ingredients are missing to make recipe:', missing_ingredients)
        print('\n')


In the above function, we are trying to filter the data based on the user preferences and then identifying the top 5 most similar food items for given ingredients based on coverage metrics.

Finally, printing the recommendations with all the information including the Recommended recipes, Similarity scores based on the matching of requierd ingredients and missing ingredients

In [7]:
# Calling the above function with some sample values
hybrid_recommender(ingredients, 'Mexican', 'non-veg', 'pepper, egg, basil leaves, ham slices', ingredients_max=20)

------------------------------------------------------------------------------------------------------------------------------------------------------
Recommended Recipe Name: egg and cheddar cheese sandwich
Similarity Score: 0.8
5 ingredients needed to make recipe: {'ham slices', 'basil leaves', 'salt', 'egg', 'pepper'}
1 ingredients are missing to make recipe: {'salt'}


------------------------------------------------------------------------------------------------------------------------------------------------------
Recommended Recipe Name: pan seared thigh of chicken
Similarity Score: 0.1
10 ingredients needed to make recipe: {'extra virgin olive oil', 'lemon', 'chicken thai', 'barley', 'salt', 'mushroom', 'brockley', 'cherry tomato', 'pepper', 'fresh thyme'}
9 ingredients are missing to make recipe: {'extra virgin olive oil', 'lemon', 'chicken thai', 'barley', 'salt', 'mushroom', 'brockley', 'cherry tomato', 'fresh thyme'}


------------------------------------------------------

## Content based Recommnder Systems - TfIdf

In [8]:
# Using Tf-Idf vectorizer to fit and transform to create the vectorized matrix
tfidf_matrix = vectorizer.fit_transform(ingredients['Describe'])

In [9]:
def contentbased_tfidf(user_ingredients, cuisine_type, veg_nonveg, vectorizer, tfidf_matrix, df, top_n=5):
   
    # Convert user input ingredients into a TF-IDF vector
    user_tfidf = vectorizer.transform([user_ingredients])

    # Compute similarity between input ingredients and dataset
    cosine_sim = cosine_similarity(user_tfidf, tfidf_matrix)

    # Get similarity scores for all foods
    sim_scores = list(enumerate(cosine_sim.flatten()))

    # Sort foods by similarity score (highest first)
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Filter recommendations based on Cuisine Type and Veg/Non-Veg
    filtered_recommendations = []
    for idx, score in sim_scores:
        food_cuisine = df.loc[idx, 'C_Type']
        food_veg_nonveg = df.loc[idx, 'Veg_Non']

        if food_cuisine == cuisine_type and food_veg_nonveg == veg_nonveg:
            filtered_recommendations.append((idx, score))

        # Stop when enough recommendations are found
        if len(filtered_recommendations) >= top_n:
            break

    # Display results
    if not filtered_recommendations:
        print("Not able to find correct recipe for you")
    else:
        for idx, score in filtered_recommendations:
            print(150*'-')
            print(f"Recommended Recipe Name: {df.loc[idx, 'Name']}")
            print(f"Similarity: {round(score,2)}")
            print(f"Available Ingredients: {user_ingredients}")
            print(f"Required Ingredients: {df.loc[idx, 'Describe']}")

In [10]:
contentbased_tfidf('pepper, egg, basil leaves, ham slices', 'Mexican', 'non-veg', vectorizer, tfidf_matrix, ingredients)

------------------------------------------------------------------------------------------------------------------------------------------------------
Recommended Recipe Name: egg and cheddar cheese sandwich
Similarity: 0.99
Available Ingredients: pepper, egg, basil leaves, ham slices
Required Ingredients: egg, salt, pepper, ham slices, basil leaves
------------------------------------------------------------------------------------------------------------------------------------------------------
Recommended Recipe Name: cajun spiced turkey wrapped with bacon
Similarity: 0.1
Available Ingredients: pepper, egg, basil leaves, ham slices
Required Ingredients: turkey breast, cajun spice, spinach leaves (cooked and drained), garlic pods, salted butter, feta cheese, bacon strips, ground black pepper, for cajun spice:, onion powder, garlic powder, seasoning salt, paprika, ground black pepper, cayenne pepper, oregano, thyme, red pepper flakes (if you like it spicy))
--------------------------

This is an additional approach that we experimented which seems to be considering the base items by giving less weights to items that are most common in the data. 

In this approach, most of the rule based system is almost same, there are major changes for Content based system in the vectorization technique and choice of evaluation metric.

This approach is relativey better than the previous one as it is capable of prioritizing the rare items/ base ingredients which are important. But it struggles in some of the cases like Chicken is also one of the most common item in this data which is resulting in inaccurate results occassionally.