# FOODMATE RECOMMENDATION SYSTEM


## A. BUSINESS UNDERSTANDING.

## 1. Defining the question.

### a.) Specifying the data analytic question

> **Problem Statement**:The problem addressed by this dataset is the lack of easily accessible and comprehensive information about the nutritional values of common foods and products. This can make it difficult for people to make informed decisions about their diet and can contribute to health problems such as obesity and malnutrition.


### b.) Metric for success.

Our recommender system will be considered successful if it meets the following criteria:

1. Have a recall score of 80% and above.
2. Have a mean absolute precission at least 90%.
3. Have a coverage of around 90%.

### c.) Understanding the context

The food and health industry is a highly competitive and crowded space. With so many diet plans, meal delivery services, and health apps available, it can be overwhelming for individuals to navigate and make informed decisions about their dietary choices. This creates a major business problem for stakeholders who want to provide effective solutions that meet the needs of their customers.
One major issue is the lack of personalized nutrition recommendations available in the market. Many existing meal delivery services and health apps provide generic diet plans that are not tailored to an individual's specific needs and preferences. This can lead to frustration and disappointment, as customers may not see the desired results and may eventually give up on their healthy eating goals altogether.
Another challenge is the time and effort required to plan and prepare healthy meals. Many individuals lead busy lives and do not have the time or energy to research and create nutritious meals every day. This can lead to unhealthy eating habits and may have negative consequences on their overall health and wellbeing.
To address these challenges, stakeholders need to develop innovative solutions that provide personalized and convenient dietary recommendations. The food recommender system with a chatbot offers a unique and effective solution that addresses these challenges. It utilizes advanced technology to analyze an individual's weight and BMI and provides personalized diet recommendations for breakfast, lunch, and dinner. It also includes easy-to-follow recipes for each meal, making healthy eating more accessible and convenient for individuals.
Overall, the business problem for stakeholders is to provide effective solutions that meet the needs of individuals in the highly competitive and crowded food and health industry. The food recommender system with a chatbot offers a unique and innovative solution that addresses the lack of personalized nutrition recommendations and the time and effort required to plan and prepare healthy meals.



### Main objective 

> To develop a Food/recipe Recommendation system that suggests nutritious  food to individuals and gym instructors thus promoting a healthy lifestyle

### Specific objectives

- Identify the key features and factors that impact an individual's overall health, and determine which ones should be incorporated into the food recommendation system.
- Clean and preprocess the nutrition data available in the dataset, and combine it with external data sources to create a comprehensive nutrition database that can be used by the recommendation system.
- Develop and implement recommendation algorithms that can generate personalized food recommendations based on the user's individual characteristics such as age, gender, degree of physical activity,  locally available foods, and dietary customs.
- Create a chatbot that can interact with users and collect relevant information such as dietary preferences, and restrictions, as well as any other relevant information that can be used to personalize food recommendations.
- Integrate the recommendation algorithms and chatbot into a user-friendly and intuitive interface that allows users to easily access and interact with the system.
- Deploy the food recommendation system and chatbot, and conduct user testing to gather feedback and identify areas for improvement.

### Recording the experimental design.

- Research Question: Can a user gaet a Food/recipe Recommendation system that suggests nutritious  food to them thus promoting a healthy lifestyle.
- Data source: The nutrition dataset was obtained from [here](https://www.kaggle.com/datasets/trolukovich/nutritional-values-for-common-foods-and-products).
This dataset contains information on approximately 8.8 thousand types of food. The dataset includes various features related to the nutrition value of each food item per 100gram serving. There are 75 features in total, you can find features like **calories**, **vitamin_d**, **zink**, **protein**, lactose. As you can see features names are very self-explanatory, so a description is not provided.
- Variables: The variables in the merged data set are:id', 'name', 'minutes', 'nutrition', 'tags', 'ingredients', 'steps',
       'calories', 'total fat (PDV)', 'sugar (PDV)', 'sodium (PDV)',
       'protein (PDV)', 'saturated fat (PDV)', 'carbohydrates (PDV)'
- Model evaluation: The metrics used to evaluate the model's performance are:accuracy,recall and RMSE.
- Conclusions and recommendations.

###  Data relevance.

## B. DATA UNDERSTANDING.

In [1]:
#load the relevant libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from wordcloud import WordCloud
import plotly.express as px
from wordcloud import WordCloud, STOPWORDS
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from surprise.model_selection import train_test_split
from scipy.spatial.distance import euclidean, cosine, jaccard
from sklearn.preprocessing import MinMaxScaler
from sklearn.neighbors import NearestNeighbors
import ipywidgets as widgets
from IPython.display import display
from surprise.prediction_algorithms import KNNWithMeans, KNNBasic
from surprise import SVD
from surprise.model_selection import train_test_split
from surprise.model_selection import cross_validate
from surprise.model_selection import GridSearchCV
from surprise.prediction_algorithms import KNNWithMeans, KNNBasic
from surprise import KNNWithMeans
from surprise import accuracy
from surprise import Reader
from surprise import Dataset
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import silhouette_score
import tensorflow as tf
from typing import Any, Text, Dict, List
from rasa_sdk import Action, Tracker
from rasa_sdk.executor import CollectingDispatcher

ModuleNotFoundError: No module named 'rasa_sdk'

## Load the Data Set.

In [None]:
#loading the data set
recipes = pd.read_csv("/Users/mac/Downloads/maindata 2/RAW_recipes.csv")
nutrition = pd.read_csv("/Users/mac/Downloads/maindata 2/nutrition.csv",index_col=0)

In [None]:
#getting a preview of the recipes data set
recipes.head()

In [None]:
#get a preview of the nutrition data set
nutrition.head()

## Data Understanding

This project will include 2 datasets 

**Recipes**

**Nutrition**

Recipes data set was obtained from [ here ](https://www.kaggle.com/datasets/shuyangli94/food-com-recipes-and-user-interactions). It countains a a list of 230186 rows of recipes and 12 columns. 
*   name - Recipe name
*   id - Recipe ID
*   minutes - Minutes to prepare recipe
*   contributor_id - User ID who submitted this recipe
*   submitted - Date recipe was submitted
*   tags - Food.com tags for recipe
*   nutrition - Nutrition information (calories (#), total fat (PDV), sugar (PDV) ,sodium (PDV) , protein (PDV) , saturated fat (PDV) , and carbohydrates (PDV))
*   n_steps - Number of steps in recipe
*   steps - Text for recipe steps, in order
*   description - User-provided description
*   ingredients - List of ingredient names
*   n_ingredients - Number of ingredients



The nutrition dataset was obtained from [here](https://www.kaggle.com/datasets/trolukovich/nutritional-values-for-common-foods-and-products).

This dataset contains information on approximately 8.8 thousand types of food. The dataset includes various features related to the nutrition value of each food item per 100gram serving. There are 75 features in total, you can find features like **calories**, **vitamin_d**, **zink**, **protein**, lactose. As you can see features names are very self-explanatory, so a description is not provided.

## C. DATA PREPROCESSING

In [None]:
#getting info of the nutrition data set.
nutrition.info()

In [None]:
# creating a new dataframe remaining with the relevant features for our model.
nutrition_df = nutrition.loc[:, ['name','serving_size','calories','total_fat','saturated_fat','cholesterol','sodium','potassium']]



> As seen,the nutrition data set has no missing data.However,the data set has object data types which need to be converted to numerical data types.

In [None]:
#creating a function that strips and converts feautures to float type
def clean_df(df, col_name):
  # Create a copy of the input DataFrame to avoid modifying the original data
    cleaned_df = df.copy()
    
    # Strip whitespace characters and replace non-numeric characters with nothing
    cleaned_df[col_name] = cleaned_df[col_name].str.strip().replace('[^\d\.]', '', regex=True)
    
    # Convert the column to float data type
    cleaned_df[col_name] = pd.to_numeric(cleaned_df[col_name], errors='coerce').astype(float)
    
    return cleaned_df
    

In [None]:
#cleaning all the columnns in the nutrition data frame.
cols_to_clean = [ 'serving_size','calories', 'total_fat',
       'saturated_fat', 'cholesterol', 'sodium','potassium' ]
for col in cols_to_clean:
  nutrition_df[col] = nutrition_df[col].astype(str)
  nutrition_df = clean_df(nutrition_df, col)

In [None]:
nutrition_df.head()

In [None]:
# renaming the columns 

nutrition_df = nutrition_df.rename(columns={'serving_size': 'serving_size(g)', 'total_fat': 'total_fat(g)', 
                                            'saturated_fat': 'saturated_fat(g)','cholesterol':'cholesterol(mg)',
                                            'sodium':'sodium(mg)','potassium':'potassium(mg)'})

In [None]:
# checking for the missing values
nutrition_df.isnull().sum()

In [None]:
# working on the missing values in the saturated fat column.
mean_value = nutrition_df["saturated_fat(g)"].mean()
nutrition_df["saturated_fat(g)"].fillna(mean_value,inplace=True)

In [None]:
#checking for duplicates.
nutrition_df.duplicated().sum()

> The dataset has no duplicates. 

In [None]:
#getting info of the recipes data set
recipes.info()

> Most of the column rows are consistent but some rows will need to be dropped and/or replaced.

In [None]:
# creating a new recipes data frame of the recipes dataset with the relevant features.
recipes_df = recipes.loc[:, ['id','name','minutes','nutrition','tags','ingredients','steps']]

> For better analysis of the recipes data set, the nutrition column had to be stripped create separate columns.

In [None]:

#retrieving individual nutrients from the recipes data set.
recipes_df[['calories','total fat (PDV)','sugar (PDV)','sodium (PDV)','protein (PDV)','saturated fat (PDV)','carbohydrates (PDV)']] = recipes.nutrition.str.split(",",expand=True)
recipes_df['calories'] =  recipes_df['calories'].apply(lambda x: x.replace('[','')) 
recipes_df['carbohydrates (PDV)'] =  recipes_df['carbohydrates (PDV)'].apply(lambda x: x.replace(']',''))  

In [None]:
#dropping the nutrition column
recipes_df.drop(['nutrition'],axis=1).head()

In [None]:
#cleaning all the columnns in the recipes data frame.
cols_to_clean = [ 'calories',
       'total fat (PDV)', 'sugar (PDV)', 'sodium (PDV)', 'protein (PDV)',
       'saturated fat (PDV)', 'carbohydrates (PDV)' ]
for col in cols_to_clean:
  recipes_df[col] = recipes_df[col].astype(str)
  recipes_df = clean_df(recipes_df, col)

In [None]:
recipes_df.head()

In [None]:
#checking for null values.
recipes_df.isnull().sum()


In [None]:
#dropping all rows with missing values 
recipes_df.dropna(inplace=True)

In [None]:
#confirming there are no missing values
recipes_df.isnull().sum()

## External Data Source validation

The prevalence of food recommendation systems has increased significantly, with a multitude of factors affecting an individual's overall health, including genetics, exercise, sleep, and other external factors. Nutrition has been identified as one of the most significant modifiable factors, and even minor changes in one's diet can result in substantial outcomes.
Calorie counting has become a popular technique employed by medical professionals and nutritionists to recommend appropriate diets. For individuals of healthy weight, consuming roughly 2000 calories per day is ideal. However, a diversified, balanced, and nutritious diet will vary based on individual characteristics, cultural context, locally available foods, and dietary customs. 
The Food.com website was the source of the data used in this study. Food.com is a digital brand and social networking service that features recipes from both home cooks and celebrity chefs, as well as food news, new and classic shows, and pop culture. The website was launched in September 2017 and offers recipes, photos, articles, and video content on the web as well as video streaming and smartphone apps.

## Exploratory Data Analysis.

In [None]:
#descriptive statistics for nutrition data set.
nutrition_df.describe()

> As seen from above,the average serving size is 100 g.Averagely,per serving the meals would have 226.28 calories.

In [None]:
recipes_df.columns

In [None]:
#descriptive statistics for recipes data set.
recipes_df.describe()

> Avaragely,a meal would take upto 40 minutes to be prepared.

## Nutrition dataset

In [None]:
nutrition_df.columns

In [None]:

# Create a scatter plot for each nutrient column
fig, axs = plt.subplots(nrows=2, ncols=3, figsize=(30, 16))
nutrient_cols = ['total_fat(g)', 'saturated_fat(g)', 'cholesterol(mg)', 'sodium(mg)', 'potassium(mg)']

# Flatten the axes array to iterate over it in a 1D fashion
axs = axs.flatten()

# Loop over each nutrient column to create a scatter plot
for i, col in enumerate(nutrient_cols):
    axs[i].scatter(nutrition_df['calories'], nutrition_df[col])
    axs[i].set_xlabel('Calories')
    axs[i].set_ylabel(col.capitalize())

# Remove any unused axes
for ax in axs[5:]:
    ax.remove()

# Adjust the spacing between the plots
plt.subplots_adjust(wspace=0.4, hspace=0.4)

# Show the plots
plt.show()


In [None]:
#distribution of serving size
plt.figure(figsize=(15, 8))
sns.set_style('whitegrid')
sns.set(font_scale=1.5)
plt.title('Distribution of serving size')
sns.histplot(data=nutrition_df,x='serving_size(g)')
plt.show()

> In the dataset used,the serving size per food was 100 grams.

In [None]:
# Create a correlation matrix of the columns of interest
corr_matrix = nutrition_df[['calories','total_fat(g)',
       'saturated_fat(g)', 'cholesterol(mg)', 'sodium(mg)', 'potassium(mg)']].corr()
mask = np.zeros_like(corr_matrix)
mask[np.triu_indices_from(mask)] = True
# Create the heatmap using seaborn
plt.figure(figsize=(20,7))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm',mask=mask)
# Show the plot
plt.title('Correlation between the columns')
plt.show()

 The highest correlation is seen between the number of and the amount of total fat in the food specificied.Given a scenario a user wants to reduce the number of calorie intake,they would need to avoid foods with high total fat.

## Recipes dataset

> As the recipe is too big and would take up too much computational power,a random sample will be retrieved.

In [None]:
# getting the recipes random sample.
recipes_sample_df = recipes_df.sample(n=2000, random_state=42)

In [None]:
# Concatenate all ingredients into a single string
all_ingredients = ' '.join(recipes_df['ingredients'])

# Create a word cloud
plt.figure(figsize=(20,7))
wordcloud = WordCloud(width=800, height=400, max_words=200, background_color='white').generate(all_ingredients)

# Display the word cloud
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

The most popular ingredients used by the users for their recipes include garlic,cloves,olive oil and black pepper.

In [None]:
sns.histplot(data=recipes_df,x='calories');
plt.title('Distribution of calories')
plt.show()

# Implementing the solution

### Outliers

In [None]:
# getting numerical columns
num_cols = recipes_df.select_dtypes(exclude = ['object'])

In [None]:
#checking for maximum value for calories
recipes_df['calories'].max()

In [None]:
#plotting to check for outliers
fig, axes = plt.subplots(3, 3, figsize=(20, 20))

cols = ['calories', 'total fat (PDV)', 'sugar (PDV)', 
        'sodium (PDV)', 'protein (PDV)', 'saturated fat (PDV)', 
        'carbohydrates (PDV)']

for i, col in enumerate(cols):
    sns.boxplot(ax=axes[i//3, i%3], x=recipes_df[col])

In [None]:
#create a function that removes outliers
def outlierRemover(df, columns):
    for x in columns:
        q25,q75 = np.percentile(df[x],[25,75])
        intr_qr = q75-q25
        min_val = q25-(1.5*intr_qr)
        max_val = q75+(1.5*intr_qr)
        df.loc[df[x] < min_val, x] = np.nan
        df.loc[df[x] > max_val, x] = np.nan
    return df

In [None]:
#calling the removing outlier function
num_cols = ["calories", 'total fat (PDV)', 'sugar (PDV)', 'sodium (PDV)',
       'protein (PDV)', 'saturated fat (PDV)', 'carbohydrates (PDV)']
recipes_df = outlierRemover(recipes_df, num_cols)

In [None]:
#confirming number of missing values
recipes_df.isnull().sum()

In [None]:
# Dropping rows with missing values
recipes_df = recipes_df.dropna()

## Normalize the data 

In [None]:
scaler = MinMaxScaler()
recipes_data = scaler.fit_transform(recipes_df.drop(columns=['name','id','minutes','nutrition','tags','ingredients','steps']))
normalized_recipes = scaler.fit_transform(recipes_data)
normalized_recipes_df = pd.DataFrame(normalized_recipes)

## D. MODELLING

### Baseline Model.

In [None]:

class CalorieBasedRecommender:
    
    def __init__(self, df):
        self.df = df
        
    def recommend(self, target_calories, num_recommendations=5):
        recommendations = []
        for index, row in self.df.iterrows():
            if row['calories'] <= target_calories:
                recommendations.append(row['name'])
            if len(recommendations) == num_recommendations:
                break
        return recommendations


This is the baseline model with a class that takes in the dataframe.The recommend method takes as input a target calorie value and the number of recommendations desired, and returns a list of food recommendations that have equal  calories than the target calorie value.

In [None]:
recommender = CalorieBasedRecommender(nutrition_df)
target_calories = 1000
num_recommendations = 20

recommendations = recommender.recommend(target_calories, num_recommendations)


In [None]:
recommendations

In [None]:

class CalorieBasedRecommender1:
    
    def __init__(self, df):
        self.df = df
        
    def recommend(self, target_calories, num_recommendations=5):
        recommendations = []
        for index, row in self.df.iterrows():
            if row['calories'] == target_calories:
                recommendations.append((row['name'], row['ingredients'],row['steps']))
            if len(recommendations) == num_recommendations:
                break
        return recommendations


In [None]:
recommender = CalorieBasedRecommender1(recipes_df)
target_calories = 51.5
num_recommendations = 1

recommendations = recommender.recommend(target_calories, num_recommendations)

In [None]:
recommendations

## KNN Model

Due to the size of the data set and lack of computational power,a sample has to be created from the main data set.

In [None]:
recipes_sample = recipes_df.sample(n=20000, random_state=42)

In [None]:
# Define the format of the dataframe
reader = Reader(rating_scale=(0, 5))

# Convert your dataframe to a Surprise dataset
data = Dataset.load_from_df(recipes_sample[['id', 'name', 'minutes']], reader)

#generating a trainset
dataset = data.build_full_trainset()
print('Number of users: ', dataset.n_users, '\n')
print('Number of items: ', dataset.n_items)

In [None]:
# knn algoritms
print('KNN_BASIC')
print('*************************************************************************')

cv_knn_basic = cross_validate(KNNBasic(), data, cv=5, n_jobs=5, verbose=True)

print('KNN_MEANS')
print('*************************************************************************')


cv_knn_means = cross_validate(KNNWithMeans(), data, cv=5, n_jobs=5, verbose=True)

> For the first model, KNN_MEANS, the mean RMSE across all folds is 26.9996, with a standard deviation of 6.9996. The mean MAE is 102.2880. The fit time is approximately 18.4 seconds per fold, and the test time is approximately 0.09 seconds per fold.

> For the second model,the mean RMSE across all folds is 117.5033, with a standard deviation of 37.5033. The mean MAE is 143.7001. The fit time is approximately 19.4 seconds per fold, and the test time is approximately 0.05 seconds per fold.

Overall, it seems like the first model (KNN_MEANS) performs better than the second model, with lower RMSE and MAE values.

## SVD Model

In [None]:
#svd
print('SVD')
print('*************************************************************************')

cv_svd = cross_validate(SVD(), data, cv=5, n_jobs=5, verbose=True)

In [None]:
#summary of SVD and KNN results
print('Evaluation Results:')
print('Algoritm\t RMSE\t\t MAE')
print()


print('KNN Basic', '\t', round(cv_knn_basic['test_rmse'].mean(), 4), '\t\t', round(cv_knn_basic['test_mae'].mean(), 4))
print('KNN Means', '\t', round(cv_knn_means['test_rmse'].mean(), 4), '\t', round(cv_knn_means['test_mae'].mean(), 4))
print()
print('SVD', '\t\t', round(cv_svd['test_rmse'].mean(), 4), '\t', round(cv_svd['test_mae'].mean(), 4))

In [None]:
#compare accuracy of each algorithm


# Plotting graphs for comparing accuracy of each algo
x_algo = ['KNN Basic', 'KNN Means', 'SVD']
all_algos_cv = [cv_knn_basic, cv_knn_means, cv_svd]

rmse_cv = [round(res['test_rmse'].mean(), 4) for res in all_algos_cv]
mae_cv = [round(res['test_mae'].mean(), 4) for res in all_algos_cv]

plt.figure(figsize=(20,5))

# RMSE graph
plt.subplot(1, 2, 1)
plt.title('Comparison of Algorithms on RMSE', loc='center', fontsize=15)
plt.plot(x_algo, rmse_cv, label='RMSE', color='darkgreen', marker='o')
plt.xlabel('Algorithms', fontsize=15)
plt.ylabel('RMSE Value', fontsize=15)
plt.legend()
plt.grid(ls='dashed')

# MAE graph
plt.subplot(1, 2, 2)
plt.title('Comparison of Algorithms on MAE', loc='center', fontsize=15)
plt.plot(x_algo, mae_cv, label='MAE', color='navy', marker='o')
plt.xlabel('Algorithms', fontsize=15)
plt.ylabel('MAE Value', fontsize=15)
plt.legend()
plt.grid(ls='dashed')
#plt.savefig('Images/RMSE_MAE')
plt.show()

## Hyperparameter tuning model

In [None]:
# define the parameter grid for k
param_grid = {'k': [5, 10, 15, 20, 25]}

# perform grid search for KNNBasic
knn_basic_grid = GridSearchCV(KNNBasic, param_grid, measures=['rmse'], cv=5)
knn_basic_grid.fit(data)

# get the best k value and its corresponding RMSE score for KNNBasic
best_k_basic = knn_basic_grid.best_params['rmse']['k']
best_rmse_basic = knn_basic_grid.best_score['rmse']

# perform grid search for KNNWithMeans
knn_means_grid = GridSearchCV(KNNWithMeans, param_grid, measures=['rmse'], cv=5)
knn_means_grid.fit(data)

# get the best k value and its corresponding RMSE score for KNNWithMeans
best_k_means = knn_means_grid.best_params['rmse']['k']
best_rmse_means = knn_means_grid.best_score['rmse']

# print the best k values and their corresponding RMSE scores
print('Best k value and RMSE score for KNNBasic: k = {}, RMSE = {:.4f}'.format(best_k_basic, best_rmse_basic))
print('Best k value and RMSE score for KNNWithMeans: k = {}, RMSE = {:.4f}'.format(best_k_means, best_rmse_means))

Optimal k is at 5.Check to see best distance metrics out of pearson, cosine, msd, and pearson baseline.

In [None]:
# comparing distance matrix

knn_means_cosine = cross_validate(KNNWithMeans(k=5, sim_options={'name':'cosine'}), data, cv=5, n_jobs=5, verbose=False)
knn_means_pearson = cross_validate(KNNWithMeans(k=5, sim_options={'name':'pearson'}), data, cv=5, n_jobs=5, verbose=False)
knn_means_msd = cross_validate(KNNWithMeans(k=5, sim_options={'name':'msd'}), data, cv=5, n_jobs=5, verbose=False)
knn_means_pearson_baseline = cross_validate(KNNWithMeans(k=5, sim_options={'name':'pearson_baseline'}), data, cv=5, n_jobs=5, verbose=False)


x_distance = ['cosine', 'pearson', 'msd', 'pearson_baseline',]
all_distances_cv = [knn_means_cosine, knn_means_pearson, knn_means_msd, knn_means_pearson_baseline]

rmse_cv = [round(res['test_rmse'].mean(), 4) for res in all_distances_cv]
mae_cv = [round(res['test_mae'].mean(), 4) for res in all_distances_cv]

plt.figure(figsize=(20,5))

plt.subplot(1, 2, 1)
plt.title('Comparison of Distance Metrics on RMSE', loc='center', fontsize=15)
plt.plot(x_distance, rmse_cv, label='RMSE', color='darkgreen', marker='o')
plt.xlabel('Distance Metrics', fontsize=15)
plt.ylabel('RMSE Value', fontsize=15)
plt.legend()
plt.grid(ls='dashed')

plt.subplot(1, 2, 2)
plt.title('Comparison of Distance Metrics on MAE', loc='center', fontsize=15)
plt.plot(x_distance, mae_cv, label='MAE', color='navy', marker='o')
plt.xlabel('Distance Metrics', fontsize=15)
plt.ylabel('MAE Value', fontsize=15)
plt.legend()
plt.grid(ls='dashed')
plt.savefig('Images/Comparison_of_Distance_metrics')
plt.show()

> Based on hyperparameter tuning above, best is KNN-Means with cosine similarity where k is 5. The RMSE is =4350.

Use gridsearch to optimize SVD model for number of epochs, regularization, and learning rate. 

In [None]:
#Parameter space
svd_param_grid = {'n_epochs': [20, 25, 30, 40, 50],
                  'lr_all': [0.007, 0.009, 0.01, 0.02],
                  'reg_all': [0.02, 0.04, 0.1, 0.2]}

# This will take 20 to 30 minutes.
gs_svd = GridSearchCV(SVD, svd_param_grid, measures=['rmse', 'mae'], cv=5, n_jobs=5)
gs_svd.fit(data)

print('Best value for SVD  -RMSE:', round(gs_svd.best_score['rmse'], 4), '; MAE:', round(gs_svd.best_score['mae'], 4))
print('Optimal params RMSE =', gs_svd.best_params['rmse'])
print('optimal params MAE =', gs_svd.best_params['mae'])

## Test Predictions 

Evaluate to see if the KNN and SVD models above are working as expected and choose the best.

In [None]:
# fit knn_means model on training set
dataset = data.build_full_trainset()
final_knn_model = KNNWithMeans(k=5, sim_options={'name': 'cosine'})
final_knn_model.fit(dataset)

In [None]:
#fit svd model on training set
final_svd_model = SVD(n_epochs=20, lr_all=0.007, reg_all=0.02)
final_svd_model.fit(dataset)

In [None]:
# Specify the recipe ID for which you want to get the list of similar recipes
rec_id = 31490

final_df = recipes_df[recipes_df['id'] == rec_id][['id', 'name', 'minutes']]
final_df.head(20)

In [None]:
def recommend_recipes_by_calories(df, target_calories, num_recipes=10): 
    # Filter recipes with similar calorie counts
    similar_recipes = df[df['calories'].between(target_calories - 100, target_calories + 100)]
    
    # Sort the recipes by their similarity to the target calorie count
    similar_recipes = similar_recipes.sort_values(by=['calories'], ascending=True)
    
    # Select the top N most similar recipes
    recommended_recipes = similar_recipes.head(num_recipes)
    
    return recommended_recipes


In [None]:
recommender = recommend_recipes_by_calories(recipes_df)
target_calories = 100
num_recipes = 2

recommended_recipes = recommend_recipes_by_calories(recipes_df, target_calories, num_recipes)


This code loads a dataset of recipes and their nutritional information, normalizes the features, and calculates the cosine similarity matrix between all recipes based on their nutritional information. Then, the recommend_recipe_by_calories function takes as input a desired calorie count and a number of recommendations to return, selects recipes with similar calorie counts, calculates the mean similarity of each selected recipe with all recipes, ranks the recipes based on similarity, and returns the top N recommended recipe codes.

It takes as input the desired calorie count and the number of recommendations, and then computes the calorie distance between each recipe in the dataset and the target calorie count. It then sorts the recipes in ascending order based on their calorie distance and selects the top N recipes with the closest calorie count to the target. Finally, it ranks the recommended recipes based on their similarity to the input recipe using the content-based filtering model. The function returns a list of ranked recommendations, where each item in the list is a tuple containing the recommended recipe ID and its similarity score. You can modify this function to suit your specific needs and model architecture.

## E. EVALUATION.