# 🦾 Prototyping models for Meet Fresh recommender - Product recommendations
The Meet Fresh product design includes both ingredient-level and product-level recommendations. Ingredient-level recommendation models are outlined separately. Here we build a POC for recommending product items by content-based filtering that utilizes customer-ingredient rating matrix and product-ingredient feature matrix.

In [None]:
import pandas as pd
import re
import string
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense

from scipy import sparse
from sklearn.decomposition import PCA

import nltk
nltk.download('stopwords')
nltk.download('punkt')

%matplotlib inline

2023-07-03 20:06:36.752203: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-07-03 20:06:37.023031: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-07-03 20:06:40.860015: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-07-03 20:06:40.860284: W tensorflow/strea

#### Step 1: Establish assumptions about ingredient ratings
For product-level recommendations, we could simply use ingredient components to construct a feature space. Therefore, we still need to utilize customer-ingredient rating data, but modified and updated after customers have completed final selection of ingredients (ingredient-level recommendations are provided first based on initial ingredient ratings, and customers could select additional ingredients recommended to them to proceed to product recommendations).

In reality, we need to run product-level recommendation processes using the updated customer-ingredient data, and since we do not require customers to provide ratings for ingredients recommended to them, we need to modify the customer-ingredient interaction matrix by only using 1/0 to indicate whether an ingredient is selected up until when customer proceeds to product recommendation stage.


In [None]:
%%bigquery ratings_orig_df
SELECT * FROM `dsxl-ai-advanced-program.meetfresh.ft_customer_ingredient_ratings`

Query is running:   0%|          |

Downloading:   0%|          |

In [None]:
ratings_orig_df['customer_id'] = ratings_orig_df['customer_id'].astype(int)
ratings_orig_df['meetfresh_rating'] = ratings_orig_df['meetfresh_rating'].astype(float)
ratings_orig_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500029 entries, 0 to 500028
Data columns (total 4 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   customer_id       500029 non-null  int64  
 1   ingredient_id     500029 non-null  object 
 2   ingredient_name   500029 non-null  object 
 3   meetfresh_rating  500029 non-null  float64
dtypes: float64(1), int64(1), object(2)
memory usage: 15.3+ MB


In [None]:
# for prototype purpose, we only need to modify the original ratings data by turning ratings into 1/0
# in reality, real-time data streaming pipeline needs to be set up to capture final ingredient selections made by customers
df = ratings_orig_df[['customer_id','ingredient_name','meetfresh_rating']]
customer_ingredients_df = df.pivot_table(values='meetfresh_rating', index=['customer_id'],
                                         columns='ingredient_name', aggfunc='mean', fill_value=0)

customer_ingredients = tf.constant(customer_ingredients_df, dtype = tf.float64)

customer_ingredients = tf.where(tf.not_equal(customer_ingredients, 0), tf.ones_like(customer_ingredients), customer_ingredients)
customer_ingredients

2023-07-03 20:06:54.257754: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-07-03 20:06:54.257817: W tensorflow/stream_executor/cuda/cuda_driver.cc:263] failed call to cuInit: UNKNOWN ERROR (303)
2023-07-03 20:06:54.257856: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (c430169df936): /proc/driver/nvidia/version does not exist
2023-07-03 20:06:54.258343: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in 

<tf.Tensor: shape=(155594, 22), dtype=float64, numpy=
array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 1., 1., 0.]])>

In [None]:
%%bigquery product_df
SELECT * FROM `dsxl-ai-advanced-program.meetfresh.dim_product`

Query is running:   0%|          |

Downloading:   0%|          |

In [None]:
# pre-processing product ingredient data and clean up some value inconsistencies
prod_df = product_df[['product_name','product_ingredient_name']]
prod_df['ingredients'] = prod_df['product_ingredient_name'].str.split(',')
prod_feature_df = prod_df.explode('ingredients')[['product_name','ingredients']]
prod_feature_df['ingredients'] = prod_feature_df['ingredients'].str.strip()

# clean up some value inconsistencies from primary dim_ingredient data table
prod_feature_df.loc[prod_feature_df['ingredients'] == 'Mini Q','ingredients'] = 'Mini Q (Mini Taro Ball)'
prod_feature_df.loc[prod_feature_df['ingredients'] == 'Coco Sago Soup','ingredients'] = 'Coco Sago'
prod_feature_df.loc[prod_feature_df['ingredients'] == 'Grass jelly','ingredients'] = 'Grass Jelly'

# after this step this dataframe only contains those with ingredients breakdown
prod_feature_df['indicator'] = 1
prod_feature_df = prod_feature_df.pivot_table(values = 'indicator', index=['product_name'],
                                         columns='ingredients', fill_value=0)
prod_feature_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


ingredients,Almond Flakes,Almond Pudding,Almond Soup,Black Sugar Boba,Boba,Caramel Pudding,Chocolate Chip,Chocolate Chips,Chocolate Egg Waffle,Chocolate Syrup,...,Rice Balls,Sago,Sesame Rice Balls,Shaved Ice,Strawberry,Strawberry Syrup,Taro,Taro Balls,Taro Paste,Tofu Pudding
product_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Black Sugar Boba Milky Shaved Ice,0,0,0,1,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Boba & Caramel Pudding,0,0,0,0,1,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Boba Tofu Pudding,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
Chocolate Chip Egg Waffle,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Chocolate Deluxe Egg Waffle,0,0,0,0,0,0,0,1,0,1,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Strawberry Milk Shaved Ice,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,1,0,0,0,0
Taro Ball Tofu Pudding,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,1
Taro Paste Volcano Shaved Ice,0,0,0,0,1,1,0,0,0,0,...,0,0,0,0,0,0,1,0,1,0
Taro Tofu Pudding,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,1


In [None]:
# create the dataframe for product ingredient feature indicators
# we only include products with ingredient breakdown data, and include all primary ingredients identified in customer_ingredients_df
prod_feature_df = prod_feature_df[prod_feature_df.columns[prod_feature_df.columns.isin(customer_ingredients_df.columns)]]
product_ingredients = tf.constant(prod_feature_df, dtype = tf.float64)
product_ingredients

<tf.Tensor: shape=(65, 22), dtype=float64, numpy=
array([[0., 0., 1., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 1.],
       ...,
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 0., 1.],
       [0., 0., 0., ..., 0., 0., 0.]])>

#### Step 2: Determine product recommendations

In [None]:
# make sure that the dimensions match
print(f'customer_ingredients type is: {type(customer_ingredients)}, with dimension {customer_ingredients.shape}', '\n',
      f'ingredient_feats type is: {type(product_ingredients)}, with dimension {product_ingredients.shape}')

customer_ingredients type is: <class 'tensorflow.python.framework.ops.EagerTensor'>, with dimension (155594, 22) 
 ingredient_feats type is: <class 'tensorflow.python.framework.ops.EagerTensor'>, with dimension (65, 22)


In [None]:
# next compute customer feature matrix using the two matrices defined above
# for each customer feature vector, normalize it to sum to 1
predicted_products = tf.matmul(customer_ingredients, tf.transpose(product_ingredients))
predicted_products = predicted_products / tf.reduce_sum(predicted_products, axis=1, keepdims=True)
predicted_products

<tf.Tensor: shape=(155594, 65), dtype=float64, numpy=
array([[0.        , 0.03030303, 0.03030303, ..., 0.03030303, 0.        ,
        0.03030303],
       [0.        , 0.03030303, 0.03030303, ..., 0.03030303, 0.        ,
        0.03030303],
       [0.        , 0.03571429, 0.03571429, ..., 0.03571429, 0.        ,
        0.03571429],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.02      , 0.02      , ..., 0.02      , 0.        ,
        0.02      ],
       [0.01587302, 0.        , 0.        , ..., 0.04761905, 0.        ,
        0.        ]])>

Unlike previously with ingredient-level recommendations where customers provide ratings, here we do not need to mask any products since customers do not provide ratings to products during the process. All product-level results as computed could be sorted and provided to customers.

In [None]:
# for each customer, we grab the top num_recommendations recommended product index
num_recommendations = 10
top_products = tf.nn.top_k(predicted_products, num_recommendations)[1]
top_products

<tf.Tensor: shape=(155594, 10), dtype=int32, numpy=
array([[29, 30, 31, ..., 15, 16, 17],
       [29, 30, 31, ..., 15, 16, 17],
       [ 1,  2,  9, ..., 20, 21, 22],
       ...,
       [13, 29, 30, ...,  2,  3,  4],
       [15, 25, 33, ..., 27, 28, 29],
       [13, 62,  6, ..., 20, 21, 23]], dtype=int32)>

In [None]:
# to see recommended ingredient names for the first 10 customers
customer_list = customer_ingredients_df.reset_index()['customer_id']

for i in range(10):
    product_names = [list(prod_feature_df.index)[index] for index in top_products[i]]
    print('customer_id {}: {}'.format(customer_list[i], product_names))

customer_id 1: ['Icy Grass Jelly Combo A', 'Icy Grass Jelly Combo B', 'Icy Grass Jelly Combo C', 'Boba & Caramel Pudding', 'Boba Tofu Pudding', 'Cold Coco Sago Soup Signature', 'Double Taro Signature', 'Hot Almond Soup Combo A', 'Hot Almond Soup Combo B', 'Hot Almond Soup Combo C']
customer_id 2: ['Icy Grass Jelly Combo A', 'Icy Grass Jelly Combo B', 'Icy Grass Jelly Combo C', 'Boba & Caramel Pudding', 'Boba Tofu Pudding', 'Cold Coco Sago Soup Signature', 'Double Taro Signature', 'Hot Almond Soup Combo A', 'Hot Almond Soup Combo B', 'Hot Almond Soup Combo C']
customer_id 4: ['Boba & Caramel Pudding', 'Boba Tofu Pudding', 'Cold Coco Sago Soup Signature', 'Hot Almond Soup Combo A', 'Hot Almond Soup Combo B', 'Hot Almond Soup Combo C', 'Hot Almond Soup Signature', 'Hot Grass Jelly Soup Combo A', 'Hot Grass Jelly Soup Combo B', 'Hot Grass Jelly Soup Combo C']
customer_id 5: ['Icy Grass Jelly Combo A', 'Icy Grass Jelly Combo B', 'Icy Grass Jelly Combo C', 'Boba & Caramel Pudding', 'Boba Tof