# Food Recommendation Based on User Meal History

## Introduction
Recommendation based on a dataset of food items. It uses the TF-IDF (Term Frequency-Inverse Document Frequency) vectorization technique and the K-Nearest Neighbors (KNN) algorithm to find the most similar food items based on user history meal.

## Import all Python library used in this project

In [None]:
import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import NearestNeighbors
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Data Understanding

Read the food data using the `read_csv` function from pandas.

In [None]:
food_data = pd.read_csv('/content/drive/MyDrive/Data nutrisi/nutrition_dataset.csv')

To find out brief information including the number of rows and columns, the data type of each column, and the number of non-null values in each column of the food data, the info() function of the pandas library is used.

In [None]:
food_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 77 entries, 0 to 76
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   id              77 non-null     object 
 1   makanan         77 non-null     object 
 2   kalori          77 non-null     float64
 3   protein         77 non-null     float64
 4   lemak           77 non-null     float64
 5   karbohidrat     77 non-null     float64
 6   Kategori        77 non-null     object 
 7   vegan/nonvegan  77 non-null     object 
 8   tag             77 non-null     object 
dtypes: float64(4), object(5)
memory usage: 5.5+ KB


to find out information about a categorical column we use the describe(include - "0") function from the pandas library

In [None]:
food_data.describe(include = "O")

Unnamed: 0,id,makanan,Kategori,vegan/nonvegan,tag
count,77,77,77,77,77
unique,77,77,7,2,55
top,M-001,ayam bakar,Ikan,non-vegan,"goreng, gurih"
freq,1,1,21,41,8


to display the first 5 data we use the head() function from the pandas library.

In [None]:
food_data.head()

Unnamed: 0,id,makanan,kalori,protein,lemak,karbohidrat,Kategori,vegan/nonvegan,tag
0,M-001,ayam bakar,242.0,30.0,14.0,10.0,Daging,non-vegan,"panggang, gurih, manis"
1,M-002,ayam geprek,240.0,24.0,15.0,12.0,Daging,non-vegan,"goreng, pedas"
2,M-003,Bakso,200.0,15.0,10.0,15.0,Serealia,vegan,"rebus, rempah, gurih, pedas"
3,M-004,gado-gado,137.0,6.1,3.2,21.0,Sayur,vegan,"mentah, gurih, pedas"
4,M-005,Mie ayam,102.0,6.2,3.9,10.5,Serealia,vegan,"mie, rebus, gurih"


## Data Preprocessing

We removed the category column because it is not very useful for analysis.

In [None]:
food_data = food_data.drop(['Kategori'], axis = 1)
food_data.head()

Unnamed: 0,id,makanan,kalori,protein,lemak,karbohidrat,vegan/nonvegan,tag
0,M-001,ayam bakar,242.0,30.0,14.0,10.0,non-vegan,"panggang, gurih, manis"
1,M-002,ayam geprek,240.0,24.0,15.0,12.0,non-vegan,"goreng, pedas"
2,M-003,Bakso,200.0,15.0,10.0,15.0,vegan,"rebus, rempah, gurih, pedas"
3,M-004,gado-gado,137.0,6.1,3.2,21.0,vegan,"mentah, gurih, pedas"
4,M-005,Mie ayam,102.0,6.2,3.9,10.5,vegan,"mie, rebus, gurih"


We created a new feature, 'tag_makanan' which is obtained from the merger between the 'vegan/nonvegan' column and the 'tag' column. this is done to collect all the information about food which will be matched to the user's preference data. after that, we chose the columns that will be used as food datasets, these columns are 'makanan', 'kalori', 'protein', 'lemak', 'karbohidrat', 'tag_makanan'.

In [None]:
food_data['tag_makanan'] = ''
for index, row in food_data.iterrows():
    vegan = ' '.join(row['vegan/nonvegan'].split(',')).lower()
    tag = ' '.join(row['tag'].replace(' ', '').split(',')).lower()
    food_data.at[index, 'tag_makanan'] = vegan + ' ' + tag



food_data = food_data[['id','makanan', 'kalori', 'protein', 'lemak', 'karbohidrat', 'tag_makanan']]
food_data


Unnamed: 0,id,makanan,kalori,protein,lemak,karbohidrat,tag_makanan
0,M-001,ayam bakar,242.0,30.0,14.0,10.0,non-vegan panggang gurih manis
1,M-002,ayam geprek,240.0,24.0,15.0,12.0,non-vegan goreng pedas
2,M-003,Bakso,200.0,15.0,10.0,15.0,vegan rebus rempah gurih pedas
3,M-004,gado-gado,137.0,6.1,3.2,21.0,vegan mentah gurih pedas
4,M-005,Mie ayam,102.0,6.2,3.9,10.5,vegan mie rebus gurih
...,...,...,...,...,...,...,...
72,M-073,Telur ayam dadar,61.9,251.0,16.3,19.4,non-vegan goreng gurih
73,M-074,Telur bebek dadar,55.1,301.0,20.0,23.7,non-vegan goreng gurih
74,M-075,Tempe pasar goreng,336.0,20.0,28.0,7.8,vegan goreng gurih
75,M-076,Teri balado,25.8,365.0,23.7,22.3,non-vegan goreng asin gurih


In [None]:
food_data.to_csv('food_data_final.csv')

## Making Recommendations
We use the KNN (K-Nearest Neighbors) model to provide food recommendations to users according to their preferences. To determine a user's preference, we utilize their food history data, meanwhile, for training our KNN model we use food tag feature from food dataset. By calculating the cosine distance between the preference vector and the food data, we identify the nearest neighbor to the user's preference. Subsequently, we select the top 4 foods based on the obtained index, ensuring that users receive food recommendations that align with their preferences.

### Create and Train the Model
First, we take the 'tag_makanan' feature from the food data and convert it into a vector using the TfidfVectorizer. Then, we convert the vector into an array. This array will be used as input for the KNN model.

In [None]:
features = food_data['tag_makanan']
vectorizer = TfidfVectorizer()
tf_idf_tag = vectorizer.fit_transform(features)

to_array = [v.toarray() for v in tf_idf_tag]
tf_idf_tag_array = np.concatenate(to_array, axis=0)

Next, we created a K-NN model by setting the number of nearest neighbors (n_neighbors) to 4 and we use 'cosine' metric. We train the model with the array from the previous step.

In [None]:
model = NearestNeighbors(n_neighbors=4, metric = 'cosine')


model.fit(tf_idf_tag_array)

### Test The Model

To determine the user's preference, we use the user's food history data stored in the user_history_meals dictionary. We set a threshold of 2, and if the amount of food consumption exceeds the threshold, the food will be the user's preference.

In [None]:
user_history_meals = {
    "pedas": 2,
    "gurih": 3,
    "rebus": 1,
    "goreng": 5
}
user_preference = []

threshold = 2

for key, value in user_history_meals.items():
    if value > threshold:
        user_preference.append(key)

user_preference = ' '.join(user_preference)



Just like the previous step, we convert user preferences into vectors using the TfidfVectorizer. Then, we convert the vector into an array.

After that, we use the trained model to find the closest food to the user's preferences, for which we use the kneighbors() method. This method returns a matrix of distances between the user's preferences and the tags of the foods, as well as the indices of the foods that are closest.

Finally, we take the top 4 foods based on the indices found from the previous step.

In [None]:
user_preference_tf_idf = vectorizer.transform([user_preference]).toarray()


distances, indices = model.kneighbors(user_preference_tf_idf)

top_4_food = food_data.iloc[indices[0]]

top_4_food

Unnamed: 0,id,makanan,kalori,protein,lemak,karbohidrat,tag_makanan
74,M-075,Tempe pasar goreng,336.0,20.0,28.0,7.8,vegan goreng gurih
11,M-012,Bakwan,280.0,8.2,10.2,39.0,vegan goreng gurih
69,M-070,Taoge goreng,79.2,88.0,3.2,2.1,vegan goreng gurih
67,M-068,Tahu goreng,115.0,9.7,8.5,2.5,vegan goreng gurih


## Save Model
To deploy a model in a cloud environment, we typically save our model in a .pkl (pickle file). This pickle format allows us to store the model's parameters, architecture, and other necessary information. When there is a request to use the model, we can load the pickle file and initialize the model, making it ready for inference or prediction in the cloud environment.

In [None]:
import pickle as pkl

with open('tfidf_model_fix.pkl', 'wb') as file:
    pkl.dump(vectorizer, file)

with open('knn_model_fix.pkl', 'wb') as file:
    pkl.dump(model, file)