<a href="https://colab.research.google.com/github/Mugangasia/NutriPal-Recipe-Recommendation-System-/blob/main/NutriPal_Recipe_Recommendation_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Businesss Understanding

In the rapidly growing health and wellness industry, individuals are increasingly seeking effective solutions to make informed dietary choices and improve their overall well-being. However, navigating the vast array of diet plans, meal delivery services, and health apps can be overwhelming. Stakeholders in this industry face a critical challenge of providing personalized and accurate nutrition recommendations that meet the unique needs and preferences of individuals.

A significant problem is the lack of tailored nutrition guidance available in the market. Existing solutions often offer generic diet plans that do not consider individual factors such as age, gender, body composition, dietary restrictions, and cultural preferences. Consequently, individuals may experience frustration and disappointment when these solutions fail to deliver the desired results, leading to a decline in motivation and a higher likelihood of abandoning their healthy eating goals.

Furthermore, the fast-paced nature of modern lifestyles presents another obstacle. Many individuals struggle to find the time and energy required to research, plan, and prepare nutritious meals regularly. This often results in resorting to unhealthy eating habits, negatively impacting their overall health and well-being.

By providing accurate and personalized nutrition recommendations, stakeholders in the health and wellness industry can differentiate their offerings, enhance customer satisfaction, and foster long-term adherence to healthy eating habits. Additionally, the utilization of advanced technologies and user-friendly interfaces can create a competitive advantage and position stakeholders as leaders in the market.


## Problem Statement

* Lack of personalized nutrition recommendations: Existing solutions fail to
provide tailored nutrition recommendations that consider individual characteristics, resulting in suboptimal outcomes and reduced motivation for individuals seeking to improve their dietary habits.

* Time and effort constraints for meal planning: Busy lifestyles make it challenging for individuals to dedicate sufficient time and effort to plan and prepare healthy meals regularly, leading to a reliance on unhealthy food choices.


## Objectives
# Main objective.
Develop a Food/Recipe Recommendation System that suggests nutritious food to individuals and  promoting a healthy lifestyle.

## Specific Objectives.


*   Identify the key features and factors that impact an individual's overall health, and determine which ones should be incorporated into the food recommendation system.
*   Clean and preprocess the nutrition data available in the dataset, and combine it with external data sources to create a comprehensive nutrition database that can be used by the recommendation system.

*   Develop and implement recommendation algorithms that can generate personalized food recommendations based on the user's individual characteristics such as age, gender, degree of physical activity, locally available foods, and dietary customs.
*   Create a chatbot that can interact with users and collect relevant information such as dietary preferences, and restrictions, as well as any other relevant information that can be used to personalize food recommendations.

*   Integrate the recommendation algorithms and chatbot into a user-friendly and intuitive interface that allows users to easily access and interact with the system.
*   Deploy the food recommendation system and chatbot, and conduct user testing to gather feedback and identify areas for improvement.






# Metrics Of Success.
Our recommender system will be considered successful if it meets the following criteria:

* Have a recall score of 80% and above.
* Have a mean absolute precission at least 90%.
* Have a coverage of around 90%.

In [None]:
#import the relevant libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
#loading the data
Nutrition = pd.read_csv('nutrition.csv')
Nutrition.head()

Unnamed: 0.1,Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,...,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
0,0,Cornstarch,100 g,381,0.1g,,0,9.00 mg,0.4 mg,0.00 mcg,...,0.05 g,0.009 g,0.016 g,0.025 g,0.00 mg,0.0 g,0.09 g,0.00 mg,0.00 mg,8.32 g
1,1,"Nuts, pecans",100 g,691,72g,6.2g,0,0.00 mg,40.5 mg,22.00 mcg,...,71.97 g,6.180 g,40.801 g,21.614 g,0.00 mg,0.0 g,1.49 g,0.00 mg,0.00 mg,3.52 g
2,2,"Eggplant, raw",100 g,25,0.2g,,0,2.00 mg,6.9 mg,22.00 mcg,...,0.18 g,0.034 g,0.016 g,0.076 g,0.00 mg,0.0 g,0.66 g,0.00 mg,0.00 mg,92.30 g
3,3,"Teff, uncooked",100 g,367,2.4g,0.4g,0,12.00 mg,13.1 mg,0,...,2.38 g,0.449 g,0.589 g,1.071 g,0,0,2.37 g,0,0,8.82 g
4,4,"Sherbet, orange",100 g,144,2g,1.2g,1mg,46.00 mg,7.7 mg,4.00 mcg,...,2.00 g,1.160 g,0.530 g,0.080 g,1.00 mg,0.0 g,0.40 g,0.00 mg,0.00 mg,66.10 g


In [None]:
#getting information on the Nutrition data set
Nutrition. info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8789 entries, 0 to 8788
Data columns (total 77 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   Unnamed: 0                   8789 non-null   int64 
 1   name                         8789 non-null   object
 2   serving_size                 8789 non-null   object
 3   calories                     8789 non-null   int64 
 4   total_fat                    8789 non-null   object
 5   saturated_fat                7199 non-null   object
 6   cholesterol                  8789 non-null   object
 7   sodium                       8789 non-null   object
 8   choline                      8789 non-null   object
 9   folate                       8789 non-null   object
 10  folic_acid                   8789 non-null   object
 11  niacin                       8789 non-null   object
 12  pantothenic_acid             8789 non-null   object
 13  riboflavin                   8789

In [None]:
# craeting data set with relevant features
Nutrition_df = Nutrition.loc[:, ['name','serving_size','calories','total_fat','saturated_fat','cholesterol','sodium','potassium']]


In [None]:
#creating a function that strips and converts feautures to float type
def clean_df(df, col_name):
  # Create a copy of the input DataFrame to avoid modifying the original data
    cleaned_df = df.copy()

    # Strip whitespace characters and replace non-numeric characters with nothing
    cleaned_df[col_name] = cleaned_df[col_name].str.strip().replace('[^\d\.]', '', regex=True)

    # Convert the column to float data type
    cleaned_df[col_name] = pd.to_numeric(cleaned_df[col_name], errors='coerce').astype(float)

    return cleaned_df


In [None]:
#cleaning all the columnns in the nutrition data frame.
cols_to_clean = [ 'serving_size','calories', 'total_fat',
       'saturated_fat', 'cholesterol', 'sodium','potassium' ]
for col in cols_to_clean:
  Nutrition_df[col] = Nutrition_df[col].astype(str)
  Nutrition_df = clean_df(Nutrition_df, col)

In [None]:
Nutrition.head()

Unnamed: 0.1,Unnamed: 0,name,serving_size,calories,total_fat,saturated_fat,cholesterol,sodium,choline,folate,...,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
0,0,Cornstarch,100 g,381,0.1g,,0,9.00 mg,0.4 mg,0.00 mcg,...,0.05 g,0.009 g,0.016 g,0.025 g,0.00 mg,0.0 g,0.09 g,0.00 mg,0.00 mg,8.32 g
1,1,"Nuts, pecans",100 g,691,72g,6.2g,0,0.00 mg,40.5 mg,22.00 mcg,...,71.97 g,6.180 g,40.801 g,21.614 g,0.00 mg,0.0 g,1.49 g,0.00 mg,0.00 mg,3.52 g
2,2,"Eggplant, raw",100 g,25,0.2g,,0,2.00 mg,6.9 mg,22.00 mcg,...,0.18 g,0.034 g,0.016 g,0.076 g,0.00 mg,0.0 g,0.66 g,0.00 mg,0.00 mg,92.30 g
3,3,"Teff, uncooked",100 g,367,2.4g,0.4g,0,12.00 mg,13.1 mg,0,...,2.38 g,0.449 g,0.589 g,1.071 g,0,0,2.37 g,0,0,8.82 g
4,4,"Sherbet, orange",100 g,144,2g,1.2g,1mg,46.00 mg,7.7 mg,4.00 mcg,...,2.00 g,1.160 g,0.530 g,0.080 g,1.00 mg,0.0 g,0.40 g,0.00 mg,0.00 mg,66.10 g


In [None]:
# checking for the missing values
Nutrition_df.isnull().sum()

name                0
serving_size        0
calories            0
total_fat           0
saturated_fat    1590
cholesterol         0
sodium              0
potassium           0
dtype: int64

In [None]:
#removing the null values
mean_value = Nutrition_df["saturated_fat"].mean()
Nutrition_df["saturated_fat"].fillna(mean_value,inplace=True)


In [None]:
#checking for duplicates.
Nutrition_df.duplicated().sum()

0

In [None]:
#loading the data
#recipes = pd.read_csv('RAW_recipes.csv')
recipes = pd.read_csv('RAW_recipes.csv', error_bad_lines=False)

recipes.head(10)



  recipes = pd.read_csv('RAW_recipes.csv', error_bad_lines=False)
Skipping line 808: expected 12 fields, saw 34
Skipping line 4143: expected 12 fields, saw 17
Skipping line 5720: expected 12 fields, saw 39
Skipping line 8972: expected 12 fields, saw 13
Skipping line 13097: expected 12 fields, saw 16
Skipping line 20577: expected 12 fields, saw 17
Skipping line 22226: expected 12 fields, saw 13
Skipping line 23967: expected 12 fields, saw 13
Skipping line 28984: expected 12 fields, saw 18
Skipping line 30616: expected 12 fields, saw 16
Skipping line 31483: expected 12 fields, saw 21
Skipping line 32348: expected 12 fields, saw 14
Skipping line 35731: expected 12 fields, saw 36
Skipping line 37330: expected 12 fields, saw 13
Skipping line 38182: expected 12 fields, saw 21



Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients
0,arriba baked winter squash mexican style,137739,55,47892,2005-09-16,"['60-minutes-or-less', 'time-to-make', 'course...","[51.5, 0.0, 13.0, 0.0, 2.0, 0.0, 4.0]",11,"['make a choice and proceed with recipe', 'dep...",autumn is my favorite time of year to cook! th...,"['winter squash', 'mexican seasoning', 'mixed ...",7.0
1,a bit different breakfast pizza,31490,30,26278,2002-06-17,"['30-minutes-or-less', 'time-to-make', 'course...","[173.4, 18.0, 0.0, 17.0, 22.0, 35.0, 1.0]",9,"['preheat oven to 425 degrees f', 'press dough...",this recipe calls for the crust to be prebaked...,"['prepared pizza crust', 'sausage patty', 'egg...",6.0
2,all in the kitchen chili,112140,130,196586,2005-02-25,"['time-to-make', 'course', 'preparation', 'mai...","[269.8, 22.0, 32.0, 48.0, 39.0, 27.0, 5.0]",6,"['brown ground beef in large pot', 'add choppe...",this modified version of 'mom's' chili was a h...,"['ground beef', 'yellow onions', 'diced tomato...",13.0
3,alouette potatoes,59389,45,68585,2003-04-14,"['60-minutes-or-less', 'time-to-make', 'course...","[368.1, 17.0, 10.0, 2.0, 14.0, 8.0, 20.0]",11,['place potatoes in a large pot of lightly sal...,"this is a super easy, great tasting, make ahea...","['spreadable cheese with garlic and herbs', 'n...",11.0
4,amish tomato ketchup for canning,44061,190,41706,2002-10-25,"['weeknight', 'time-to-make', 'course', 'main-...","[352.9, 1.0, 337.0, 23.0, 3.0, 0.0, 28.0]",5,['mix all ingredients& boil for 2 1 / 2 hours ...,my dh's amish mother raised him on this recipe...,"['tomato juice', 'apple cider vinegar', 'sugar...",8.0
5,apple a day milk shake,5289,0,1533,1999-12-06,"['15-minutes-or-less', 'time-to-make', 'course...","[160.2, 10.0, 55.0, 3.0, 9.0, 20.0, 7.0]",4,"['combine ingredients in blender', 'cover and ...",,"['milk', 'vanilla ice cream', 'frozen apple ju...",4.0
6,aww marinated olives,25274,15,21730,2002-04-14,"['15-minutes-or-less', 'time-to-make', 'course...","[380.7, 53.0, 7.0, 24.0, 6.0, 24.0, 6.0]",4,['toast the fennel seeds and lightly crush the...,my italian mil was thoroughly impressed by my ...,"['fennel seeds', 'green olives', 'ripe olives'...",9.0
7,backyard style barbecued ribs,67888,120,10404,2003-07-30,"['weeknight', 'time-to-make', 'course', 'main-...","[1109.5, 83.0, 378.0, 275.0, 96.0, 86.0, 36.0]",10,['in a medium saucepan combine all the ingredi...,this recipe is posted by request and was origi...,"['pork spareribs', 'soy sauce', 'fresh garlic'...",22.0
8,bananas 4 ice cream pie,70971,180,102353,2003-09-10,"['weeknight', 'time-to-make', 'course', 'main-...","[4270.8, 254.0, 1306.0, 111.0, 127.0, 431.0, 2...",8,"['crumble cookies into a 9-inch pie plate , or...",,"['chocolate sandwich style cookies', 'chocolat...",6.0
9,beat this banana bread,75452,70,15892,2003-11-04,"['weeknight', 'time-to-make', 'course', 'main-...","[2669.3, 160.0, 976.0, 107.0, 62.0, 310.0, 138.0]",12,"['preheat oven to 350 degrees', 'butter two 9x...",from ann hodgman's,"['sugar', 'unsalted butter', 'bananas', 'eggs'...",9.0


In [None]:
#getting info of the recipes data set.
recipes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39802 entries, 0 to 39801
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   name            39801 non-null  object 
 1   id              39801 non-null  object 
 2   minutes         39801 non-null  object 
 3   contributor_id  39800 non-null  object 
 4   submitted       39800 non-null  object 
 5   tags            39799 non-null  object 
 6   nutrition       39799 non-null  object 
 7   n_steps         39799 non-null  object 
 8   steps           39798 non-null  object 
 9   description     38810 non-null  object 
 10  ingredients     39793 non-null  object 
 11  n_ingredients   39790 non-null  float64
dtypes: float64(1), object(11)
memory usage: 3.6+ MB


In [None]:
# creating a new recipes data frame of the recipes dataset with the relevant features.
recipes_df = recipes.loc[:, ['id','name','minutes','nutrition','tags','ingredients','steps']]

In [None]:
# Splitting the nutrition column into individual nutrient columns
recipes_df[['calories', 'total fat (PDV)', 'sugar (PDV)', 'sodium (PDV)', 'protein (PDV)', 'saturated fat (PDV)', 'carbohydrates (PDV)']] = recipes['nutrition'].str.split(",", expand=True)

# Removing extra characters from specific columns
recipes_df['calories'] = recipes_df['calories'].str.replace('[', '')
recipes_df['carbohydrates (PDV)'] = recipes_df['carbohydrates (PDV)'].str.replace(']', '')


  recipes_df['calories'] = recipes_df['calories'].str.replace('[', '')
  recipes_df['carbohydrates (PDV)'] = recipes_df['carbohydrates (PDV)'].str.replace(']', '')


In [None]:
#dropping the nutrition column
recipes_df.drop(['nutrition'],axis=1).head()

Unnamed: 0,id,name,minutes,tags,ingredients,steps,calories,total fat (PDV),sugar (PDV),sodium (PDV),protein (PDV),saturated fat (PDV),carbohydrates (PDV)
0,137739,arriba baked winter squash mexican style,55,"['60-minutes-or-less', 'time-to-make', 'course...","['winter squash', 'mexican seasoning', 'mixed ...","['make a choice and proceed with recipe', 'dep...",51.5,0.0,13.0,0.0,2.0,0.0,4.0
1,31490,a bit different breakfast pizza,30,"['30-minutes-or-less', 'time-to-make', 'course...","['prepared pizza crust', 'sausage patty', 'egg...","['preheat oven to 425 degrees f', 'press dough...",173.4,18.0,0.0,17.0,22.0,35.0,1.0
2,112140,all in the kitchen chili,130,"['time-to-make', 'course', 'preparation', 'mai...","['ground beef', 'yellow onions', 'diced tomato...","['brown ground beef in large pot', 'add choppe...",269.8,22.0,32.0,48.0,39.0,27.0,5.0
3,59389,alouette potatoes,45,"['60-minutes-or-less', 'time-to-make', 'course...","['spreadable cheese with garlic and herbs', 'n...",['place potatoes in a large pot of lightly sal...,368.1,17.0,10.0,2.0,14.0,8.0,20.0
4,44061,amish tomato ketchup for canning,190,"['weeknight', 'time-to-make', 'course', 'main-...","['tomato juice', 'apple cider vinegar', 'sugar...",['mix all ingredients& boil for 2 1 / 2 hours ...,352.9,1.0,337.0,23.0,3.0,0.0,28.0


In [None]:
#cheching for missing values
missing_values = recipes_df.isnull().sum()
print(missing_values)


In [None]:
#appling function columns to clean to all numerical columns .
cols_to_clean = [ 'calories',
       'total fat (PDV)', 'sugar (PDV)', 'sodium (PDV)', 'protein (PDV)',
       'saturated fat (PDV)', 'carbohydrates (PDV)' ]
for col in cols_to_clean:
  recipes_df[col] = recipes_df[col].astype(str)
  recipes_df = clean_df(recipes_df, col)

In [None]:
#dropping all rows with missing values
recipes_df.dropna(inplace=True)

In [None]:
#confirming there are no missing values
recipes_df.isnull().sum()

In [None]:
#confirming the data columns has been cleaned
recipes_df.head()

### Exploratory Data Analysis.

In [None]:
#descriptive statistics for nutrition data set.
Nutrition_df.describe()

In [None]:
#listing the colams in the raw recipes data set
recipes_df.columns

In [None]:
#descriptive statistics for recipes data set.
recipes_df.describe()

### Nutrition Data set

In [None]:
Nutrition_df.columns

In [None]:
import matplotlib.pyplot as plt

# Scatter plot: calories vs total_fat
plt.scatter(Nutrition_df['calories'], Nutrition_df['total_fat'])
plt.xlabel('Calories')
plt.ylabel('Total Fat')
plt.title('Scatter Plot: Calories vs Total Fat')
plt.show()

# Scatter plot: calories vs saturated_fat
plt.scatter(Nutrition_df['calories'], Nutrition_df['saturated_fat'])
plt.xlabel('Calories')
plt.ylabel('Saturated Fat')
plt.title('Scatter Plot: Calories vs Saturated Fat')
plt.show()

# Scatter plot: calories vs cholesterol
plt.scatter(Nutrition_df['calories'], Nutrition_df['cholesterol'])
plt.xlabel('Calories')
plt.ylabel('Cholesterol')
plt.title('Scatter Plot: Calories vs Cholesterol')
plt.show()

# Scatter plot: calories vs sodium
plt.scatter(Nutrition_df['calories'], Nutrition_df['sodium'])
plt.xlabel('Calories')
plt.ylabel('Sodium')
plt.title('Scatter Plot: Calories vs Sodium')
plt.show()

# Scatter plot: calories vs potassium
plt.scatter(Nutrition_df['calories'], Nutrition_df['potassium'])
plt.xlabel('Calories')
plt.ylabel('Potassium')
plt.title('Scatter Plot: Calories vs Potassium')
plt.show()


There is a strong positive correlation between total fat and calories. As total fat increases there is an increase in calories
There is a weak positive correlation between saturated fat and calories. As saturated fat increase there is slight increase
There is no correlation between cholesterol and calories. As seen an increase in cholesterol doesn't increase calories
There is no correlation between sodium and calories. As seen an increase in sodium doesn't increase calories
There is no correlation between pottasium and calories. As seen an increase in pottasium doesn't increase calories

In [None]:


# Select the relevant nutrient columns
nutrients = Nutrition_df[['calories', 'total_fat', 'saturated_fat', 'cholesterol', 'sodium', 'potassium']]

# Calculate the correlation matrix
correlation_matrix = nutrients.corr()

# Plot the correlation matrix as a heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Matrix of Nutrients')
plt.show()


In [None]:
# Select the relevant nutrient columns
nutrients = Nutrition_df[['calories', 'total_fat', 'saturated_fat', 'cholesterol', 'sodium', 'potassium']]

# Calculate the correlation matrix
correlation_matrix = nutrients.corr()

# Print the correlation matrix
print(correlation_matrix)

There is a strong positive correlation between total fat and calories. As total fat increases there is an increase in calories There is a weak positive correlation between saturated fat and calories. As saturated fat increase there is slight increase There is no correlation between cholesterol and calories. As seen an increase in cholesterol doesn't increase calories There is no correlation between sodium and calories. As seen an increase in sodium doesn't increase calories There is no correlation between pottasium and calories. As seen an increase in pottasium doesn't increase calories

### Recipes dataset

In [None]:
# getting the recipes random sample.
recipes_sample_df = recipes_df.sample(n=2000, random_state=42)

In [None]:
#confirming number of missing values
recipes_df.isnull().sum()