### Recipe Recommentation Algorithims
In this notebook we are going to create recommendation systems for the recipes. We are going to create the following recommendation algorithims based on what we have in our dataset.

1. `Demographic Filtering`

* In this algorithim we are going to generalise the recipes and recommend the recipes to all the users. We are going to use this recommendation to recommend good recipes to the user when the user does not have either:
  * search history
  * recently liked recipes
* As soon as the user have that we are going to switch from using this algorithm to something that suits the user.

> Demographic filtering is a simple algorithm that recommends users products based on the context that, recipes that are popular and critically acclaimed will have a higher probability of being liked by the average audience.



2. `Content Based Filtering`
* We are going to futher on create an algorithm that will recomment the user recipes if he/she has:
  * search history about recipes
  * recently liked recipes
* In this system we are going to use recipe metadata, such as `category`, `author`, `description` and `difficult`, etc., to make these recommendations.

> The general idea behind these recommender systems is that if a person liked a particular item, he or she will also like an item that is similar to it.

* for `Content Based Filtering` we are going to create `2` algorithms the other one will be used to search and the other one will be used to recomend.

### Data

The data that we are going to use in this notebook was scraped from [bccgoodfood.com](https://www.bbcgoodfood.com/) and the process of scrapping data and cleaning the data can be found in [these notebooks](https://github.com/CrispenGari/web-scrapping-python/tree/main/bs4/00_RECIPES). The data files that we will be using can also be found on [my gists](https://gist.github.com/CrispenGari/794a10de80b0bc3f5ff3a7b99ebb88de). The following are the files that we are going to have:

```shell
- recipes.json
- health.json
- baking.json
- budget.json
- inspiration.json
- flattened_recipes.json
```
### Data Preparation

The data that we are going to have here is in `json` format.


### Imports
In the following code cell we are going to import all the packages that we are going to use in this notebook.


In [1]:
import pandas as pd
import numpy as np
import random
import time
import os
import json
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
from ast import literal_eval

from google.colab import drive, files

### Demographic Recommendation

Our system will demographically recommend recipes based on the category among the following categories:

```shell
- recipes
- health
- baking
- budget
- inspiration
```

So we need to create `5` different demographic models that will recommend the recipe based on it's category. Befor anything:


1. We need a metric to `score` or rate recipe
2. Calculate the `score` for every recipe
3. `Sort` the scores and recommend the best rated recipe to the users.

> We can use the average ratings of a recipe as the score but using this won't be fair enough since a recipe with `4.3` average rating and only `3` votes cannot be considered better than the recipe with `3.8` as as average rating but 40 votes. So, we'll be using `IMDB's weighted rating (wr)` which is given as :-

<p align="center"><img src="https://camo.githubusercontent.com/3210726e3fc7a95bd6b46a0a4557997c8f32350911442a32c8e929c8e131cd46/68747470733a2f2f696d6167652e6962622e636f2f6a59575a70392f77722e706e67" alt="img" /></p>


where:

* $v$ - is the number of votes for the recipe;
* $m$ - is the minimum votes required to be listed in the chart;
  *  $m$, the minimum votes required to be listed in the chart. We will use `90th` percentile as our cutoff. In other words, for a movie to feature in the charts, it must have more votes than at least `90%` of the movies in the list.
* $R$ - is the average rating of the recipe; And
* $C$ - is the mean vote across the whole report




In [68]:
class DR:
  def __init__(self, filename: str):
    self.filename= filename
    self.dataframe = pd.read_json(filename)
    self.C = self.dataframe.rattings.mean()
    self.m = self.dataframe.vote_count.quantile(0.9)
  def weighted_rating(self, dataframe):
      v = dataframe.vote_count
      R = dataframe.rattings
      return (v/(v+self.m) * R) + (self.m/(self.m+v) * self.C)

  def __call__(self):
    self.dataframe['score'] = self.dataframe.apply(self.weighted_rating, axis=1)
    self.dataframe.sort_values('score', ascending=False, inplace=True)
    return [i for i in self.dataframe['id'].head()]

DR('recipes.json')()


['c5aa4a2e-79bc-4c9c-af9b-52e244ec1220',
 'f174da06-5cec-4338-b268-317876fface8',
 'c771aeac-b2df-46f0-8abb-093a71bee95c',
 'c83a5362-cba4-4f59-a398-1058bdcf6300',
 '0cae51e1-3159-484f-9403-37fa80dea3b7']

### Content Based Filtering and Recommendation

Now that we created a simple demographic recommandation algorithing for all the users that does not have `search-history` and `recipes-liked`. We want to create other algorithims that will be able to recommend or recipes based on the `search-history` and `recipes-liked` using `decription` and other meta data.

So we are going to use `2` algorithm, the one that uses decription to recomment recipes and the other one that will use metadata to recomment recipes. These functions will return a list of `recipes` id that the system is recommending.


In [64]:
def get_recommendations_from_description(name, filename):
  """
  The idea is that when you like or search the recipe of this name,
  you probabbly want the simmilar recipes to that one.
  """
  dataframe = pd.read_json(filename)
  tfidf = TfidfVectorizer(stop_words='english')
  tfidf_matrix = tfidf.fit_transform(dataframe.description)
  cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
  indices = pd.Series(dataframe.index, index=dataframe['name']).drop_duplicates()
  idx = indices[name]
  sim_scores = list(enumerate(cosine_sim[idx]))
  sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
  sim_scores = sim_scores[:11]
  recipe_indices = [i[0] for i in sim_scores]
  return [i for i in dataframe['id'].iloc[recipe_indices]]
get_recommendations_from_description("Smoked salmon, quinoa & dill lunch pot", "recipes.json")

['7bac6ec3-fb66-4543-80f5-4b692de7a811',
 '2eb91edf-06c5-4df6-bd8d-2f46cede0e02',
 '7b2562da-aabd-4fcd-9924-a924bcee1560',
 'bcd9ff50-69a5-4289-9aa5-f3a8f1bc9fd0',
 '6896cade-c070-4ef1-9863-149c4599c78a',
 'eafda57d-db94-4ed5-8a5c-fee0688b61e1',
 '3dca0ef0-5503-450f-bc4e-681b332adf7c',
 '25fb0357-f11d-49f1-8f43-27cedcabda69',
 '83773965-f6a2-44f9-a579-a812e1349caa',
 '4043cb11-be1c-47fd-9aa9-1df2ef322e21',
 '3e4f3819-86bf-4d55-984b-1b52957db766']

In [65]:
features = ['author', 'difficult', 'subcategory', 'dish_type', 'maincategory']
def clean_data(x):
  if isinstance(x, str):
    return str.lower(x.replace(" ", ""))
  else:
    return ''

def create_soup(x):
  return x['author'] + x['difficult'] + ' ' + x['subcategory']  + ' ' + x['dish_type']+ ' ' + x['maincategory']

def get_recommendations_from_meta_data(name, filename):
  dataframe = pd.read_json(filename)
  for feature in features:
    dataframe[feature] = dataframe[feature].apply(clean_data)
  dataframe['soup'] = dataframe.apply(create_soup, axis=1)
  count = CountVectorizer(stop_words='english')
  count_matrix = count.fit_transform(dataframe.soup)
  cosine_sim = cosine_similarity(count_matrix, count_matrix)
  indices = pd.Series(dataframe.index, index=dataframe.name)
  idx = indices[name]
  sim_scores = list(enumerate(cosine_sim[idx]))
  sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
  sim_scores = sim_scores[0:11]
  recipes_indices = [i[0] for i in sim_scores]
  return [i for i in dataframe['id'].iloc[recipes_indices]]
get_recommendations_from_meta_data("Smoked salmon, quinoa & dill lunch pot", "recipes.json")

['7bac6ec3-fb66-4543-80f5-4b692de7a811',
 'e2707dfa-22de-4290-9239-c2a4c64e285d',
 '945da5ed-e263-4d28-a791-8e94c3c5f57c',
 '25009e0a-0e60-4169-9f76-f464225549a9',
 '63ff8ff6-2871-4e92-96fd-39964c423a39',
 '9e62330f-5ea0-4c2a-b0c8-edf386b63fb7',
 'c095701c-a11e-4f47-90a0-a3ea20f7c752',
 'dfe9a8ed-576e-4d01-b2f9-b622d09da03b',
 '25ad8549-7f6c-47cc-9c4b-79df9d80e3a9',
 '7d28c52d-7c2b-450b-9e70-3f7e9319ff1e',
 '97ba01d2-9fbc-4b43-93da-5061256c99c4']

### Refs

1. [00_MOVIE_RECOMMENTATION_SYSTEM.ipynb](https://github.com/CrispenGari/recommentation-algorithms/blob/main/00_MOVIE_RECOMMENTATION_SYSTEM/00_MOVIE_RECOMMENTATION_SYSTEM.ipynb)