<a href="https://colab.research.google.com/github/ben-ogden/pinecone-examples/blob/main/workshop-product_recommender.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Product Recommendation Engine

Learn how to build a product recommendation engine using collaborative filtering and Pinecone.

In this example, we will generate product recommendations for ecommerce customers based on previous orders and trending items. This example covers preparing the vector embeddings, creating and deploying the Pinecone service, writing data to Pinecone, and finally querying Pinecone to receive a ranked list of recommended products.

---

🚨 _Note that running this on CPU is slow! If running on Google Colab you go to **Runtime > Change runtime type > Hardware accelerator > GPU** to switch to GPU._

---

## Data Preparation

In [5]:
!pip install -qU numpy pandas scipy

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.2/18.2 MB[0m [31m37.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.3/12.3 MB[0m [31m62.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m346.6/346.6 kB[0m [31m35.8 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires fastapi, which is not installed.
lida 0.0.10 requires kaleido, which is not installed.
lida 0.0.10 requires python-multipart, which is not installed.
lida 0.0.10 requires uvicorn, which is not installed.
bigframes 0.18.0 requires pandas<2.1.4,>=1.5.0, but you have pandas 2.1.4 which is incompatible.
google-colab 1.0.0 requires pandas==1.5.3, but you have pandas 2.1.4 which is incompatible.[0m[31m
[0m

**Import Python Libraries**

In [6]:
import os
import time
import numpy as np
import pandas as pd
import scipy.sparse as sparse
import itertools

In [7]:
!pip install -qU kaggle

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/84.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m81.9/84.6 kB[0m [31m2.7 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.6/84.6 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for kaggle (setup.py) ... [?25l[?25hdone


In [8]:
try:
    import kaggle
except OSError as e:
    print(e)

Could not find kaggle.json. Make sure it's located in /root/.kaggle. Or use the environment method.


The first time you `import kaggle` you will see an `OSError`, that is because we need to add our Kaggle credentials to the `/root/.kaggle/kaggle.json` file. You can find these credentials on [Kaggle](https://kaggle.com) by accessing your profile in the top-right corner of the page. This will download a `kaggle.json` file which contains your username and secret key. You can enter them below:

In [9]:
import json
from google.colab import userdata

with open('/root/.kaggle/kaggle.json', 'w') as fp:
    fp.write(
        json.dumps(
            {
                "username":userdata.get('KAGGLE_USERNAME'),
                "key":userdata.get('KAGGLE_KEY')
            }
        ))

Now we can download the dataset:

In [10]:
!kaggle competitions download -c instacart-market-basket-analysis

Downloading instacart-market-basket-analysis.zip to /content
 99% 195M/196M [00:07<00:00, 27.1MB/s]
100% 196M/196M [00:07<00:00, 27.5MB/s]


This downloads a set of zip files, we extract them like so:

In [11]:
import zipfile

files = [
    'instacart-market-basket-analysis.zip',
    'order_products__train.csv.zip',
    'order_products__prior.csv.zip',
    'products.csv.zip',
    'orders.csv.zip'
]

for filename in files:
    with zipfile.ZipFile(filename, 'r') as zip_ref:
        zip_ref.extractall('./')

Now we can move on to loading the dataset.

**Load the (Example) Instacart Data**

We are going to use the [Instacart Market Basket Analysis](https://www.kaggle.com/c/instacart-market-basket-analysis/data) dataset for this task.

The data used throughout this example is a set of files describing customers' orders over time. The main focus is on the *orders.csv* file, where each line represents a relation between a user and the order. In other words, each line has information on *userid* (user who made the order) and *orderid*. Note there is no information about products in this table. Product information related to specific orders is stored in the *order_product__*.csv* dataset.

In [12]:
order_products_train = pd.read_csv('order_products__train.csv')
order_products_prior = pd.read_csv('order_products__prior.csv')
products = pd.read_csv('products.csv')
orders = pd.read_csv('orders.csv')

order_products = pd.concat([order_products_train, order_products_prior])

**Preparing data for the model**


The Collaborative Filtering model used in this example requires only users’ historical preferences on a set of items. As there is no explicit rating in the data we are using, the purchase quantity can represent a “confidence” in terms of how strong the interaction was between the user and the products.

The dataframe data will store this data and will be the base for the model.

In [13]:
customer_order_products = pd.merge(orders, order_products, how='inner',on='order_id')

# creating a table with "confidences"
data = customer_order_products.groupby(['user_id', 'product_id'])[['order_id']].count().reset_index()
data.columns=["user_id", "product_id", "total_orders"]
data.product_id = data.product_id.astype('int64')

# Create a lookup frame so we can get the product names back in readable form later.
products_lookup = products[['product_id', 'product_name']].drop_duplicates()
products_lookup['product_id'] = products_lookup.product_id.astype('int64')

We will create two prototype users here and add them to our data dataframe. Each user will be buying only a specific product:
- The first user will be buying only **Mineral Water**
- The second user will be buying baby products: **No More Tears Baby Shampoo** and **Baby Wash & Shampoo**

These users will be later used for querying and examination of the model results.

In [14]:
data_new = pd.DataFrame([[data.user_id.max() + 1, 22802, 97],
                         [data.user_id.max() + 2, 26834, 89],
                         [data.user_id.max() + 2, 12590, 77]
                        ], columns=['user_id', 'product_id', 'total_orders'])
data_new

Unnamed: 0,user_id,product_id,total_orders
0,206210,22802,97
1,206211,26834,89
2,206211,12590,77


In [15]:
data = pd.concat([data, data_new]).reset_index(drop = True)
data.tail()

Unnamed: 0,user_id,product_id,total_orders
13863744,206209,48697,1
13863745,206209,48742,2
13863746,206210,22802,97
13863747,206211,26834,89
13863748,206211,12590,77


In the next step, we will first extract user and item unique ids, in order to create a CSR (Compressed Sparse Row) matrix.


In [16]:
users = list(np.sort(data.user_id.unique()))
items = list(np.sort(products.product_id.unique()))
purchases = list(data.total_orders)

# create zero-based index position <-> user/item ID mappings
index_to_user = pd.Series(users)

# create reverse mappings from user/item ID to index positions
user_to_index = pd.Series(data=index_to_user.index + 1, index=index_to_user.values)

# create zero-based index position <-> item/user ID mappings
index_to_item = pd.Series(items)

# create reverse mapping from item/user ID to index positions
item_to_index = pd.Series(data=index_to_item.index, index=index_to_item.values)

# Get the rows and columns for our new matrix
products_rows = data.product_id.astype(int)
users_cols = data.user_id.astype(int)

# Create a sparse matrix for our users and products containing number of purchases
sparse_product_user = sparse.csr_matrix((purchases, (products_rows, users_cols)), shape=(len(items) + 1, len(users) + 1))
sparse_product_user.data = np.nan_to_num(sparse_product_user.data, copy=False)

sparse_user_product = sparse.csr_matrix((purchases, (users_cols, products_rows)), shape=(len(users) + 1, len(items) + 1))
sparse_user_product.data = np.nan_to_num(sparse_user_product.data, copy=False)

## Implicit Model

In this section we will demonstrate creation and training of a recommender model using the **implicit** library. The recommendation model is based off the algorithms described in the paper [Collaborative Filtering for Implicit Feedback Datasets](https://www.researchgate.net/publication/220765111_Collaborative_Filtering_for_Implicit_Feedback_Datasets) with performance optimizations described in [Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.379.6473&rep=rep1&type=pdf).


In [17]:
!pip install -qU implicit

In [18]:
import implicit
from implicit import evaluation

#split data into train and test sets
train_set, test_set = evaluation.train_test_split(sparse_user_product, train_percentage=0.9)

# initialize a model
model = implicit.als.AlternatingLeastSquares(factors=100,
                                             regularization=0.05,
                                             iterations=50,
                                             num_threads=1)

alpha_val = 15
train_set = (train_set * alpha_val).astype('double')

# train the model on a sparse matrix of item/user/confidence weights
model.fit(train_set, show_progress = True)

Exception ignored on calling ctypes callback function: <function ThreadpoolController._find_libraries_with_dl_iterate_phdr.<locals>.match_library_callback at 0x7c0077b60ca0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/threadpoolctl.py", line 847, in match_library_callback
    self._make_controller_from_path(filepath)
  File "/usr/local/lib/python3.10/dist-packages/threadpoolctl.py", line 984, in _make_controller_from_path
    lib_controller = controller_class(filepath=filepath, prefix=prefix)
  File "/usr/local/lib/python3.10/dist-packages/threadpoolctl.py", line 111, in __init__
    self.dynlib = ctypes.CDLL(filepath, mode=_RTLD_NOLOAD)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/local/lib/python3.10/dist-packages/numpy.libs/libopenblas64_p-r0-742d56dc.3.20.so: cannot open shared object file: No such file or directory
  check_blas_config()


  0%|          | 0/50 [00:00<?, ?it/s]

We will evaluate the model using the inbuilt library function

In [19]:
test_set = (test_set * alpha_val).astype('double')
evaluation.ranking_metrics_at_k(model, train_set, test_set, K=100,
                         show_progress=True, num_threads=1)

  0%|          | 0/192783 [00:00<?, ?it/s]

{'precision': 0.27462592032041505,
 'map': 0.04413022171027158,
 'ndcg': 0.1437475336225694,
 'auc': 0.6547401854375167}

This is what item and user factors look like. These vectors will be stored in our vector index later and used for recommendation.

In [20]:
model.item_factors[1:3]

array([[ 4.86363890e-03, -3.51347681e-03,  1.18922982e-02,
         2.12346949e-02, -1.93563628e-03, -6.15045009e-03,
        -1.24394323e-03,  2.65446529e-02, -6.29687263e-03,
         3.46705958e-04, -1.04795815e-02,  2.46070325e-02,
         1.42868422e-02,  1.31567428e-02, -7.17208674e-03,
         2.58469436e-06,  1.42325405e-02,  5.96568454e-03,
        -5.15353028e-03,  1.36528471e-02,  5.95494593e-03,
        -1.31513993e-03,  1.23108143e-03, -1.14448536e-02,
        -5.26455755e-04,  9.00922343e-03,  2.08124574e-02,
         5.25700627e-03,  3.28439474e-02,  1.42087834e-02,
         9.07266606e-03, -6.66492386e-03,  1.50988214e-02,
         3.08601838e-03, -4.65596328e-03, -1.35251954e-02,
         2.04247087e-02,  1.23158731e-02,  1.45075023e-02,
         1.15914587e-02,  2.01413296e-02,  5.27187112e-05,
         6.97884941e-03, -2.80817039e-03,  1.56567097e-02,
         1.22134145e-02, -7.75416754e-03,  2.11812044e-03,
        -3.05169318e-02,  4.36338101e-04,  3.43702571e-0

In [21]:
model.user_factors[1:3]

array([[-1.87397277e+00, -1.41261125e+00, -5.46095550e-01,
        -5.87379932e-01, -1.17200232e+00, -8.91933858e-01,
        -6.40960455e-01,  1.64989963e-01, -1.29865468e-01,
         4.71674055e-01, -1.03679240e+00,  2.37171960e+00,
         4.82920229e-01,  6.59824073e-01,  5.32044992e-02,
         1.51624000e+00,  2.87712485e-01,  5.31544685e-01,
        -4.38841552e-01,  9.51839328e-01,  2.12342978e+00,
         1.03661251e+00, -4.48441565e-01, -2.10318065e+00,
         5.04051387e-01,  1.40737224e+00, -6.95613682e-01,
         7.90376216e-02,  9.29752290e-01,  7.99878836e-01,
         3.67368549e-01, -1.10910773e+00,  7.53536403e-01,
         4.23777670e-01, -1.11155534e+00, -1.46048427e+00,
         1.44817460e+00, -1.53088540e-01, -7.16688573e-01,
         2.54461952e-02,  1.23308174e-01,  7.78118491e-01,
        -1.25791475e-01,  9.74291787e-02,  5.14509618e-01,
         6.85907364e-01, -1.58471131e+00, -2.02171254e+00,
        -1.09998953e+00,  6.63737535e-01, -7.10377991e-0

## Configure Pinecone

Install and setup Pinecone

In [22]:
!pip install -qU pinecone-client

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/179.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━[0m [32m112.6/179.4 kB[0m [31m3.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.4/179.4 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/62.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.5/62.5 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/300.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m297.0/300.4 kB[0m [31m10.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m300.4/300.4 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [24]:
import pinecone

In [25]:
# Load Pinecone API key
api_key = userdata.get('PINECONE_API_KEY')
# Set Pinecone environment. Find next to API key in console
env = userdata.get('PINECONE_ENVIRONMENT')

pinecone.init(api_key=api_key, environment=env)

[Get a Pinecone API key](http://app.pinecone.io/) if you don't have one.

In [26]:
#List all present indexes associated with your key, should be empty on the first run
pinecone.list_indexes()

[]

**Create an Index**

In [27]:
# Set a name for your index
index_name = 'product-recommender'

In [28]:
# Make sure service with the same name does not exist
if index_name in pinecone.list_indexes():
    pinecone.delete_index(index_name)
pinecone.create_index(name=index_name, dimension=100)

**Connect to the new index**

In [29]:
index = pinecone.Index(index_name=index_name)

## Load Data

Uploading all items (products that one can buy) and displaying some examples of products and their vector representations.


In [30]:
!pip install -qU torch

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m670.2/670.2 MB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m69.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m823.6/823.6 kB[0m [31m65.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.1/14.1 MB[0m [31m90.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m731.7/731.7 MB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m410.6/410.6 MB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m121.6/121.6 MB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.5/56.5 MB[0m [31m29.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━

In [31]:
import torch

# Get all of the items
all_items_titles = [{'title': title} for title in products_lookup['product_name']]
all_items_ids = [str(product_id) for product_id in products_lookup['product_id']]

# Transform items into factors
items_factors = model.item_factors

device = "cuda" if torch.cuda.is_available() else "cpu"

item_embeddings = items_factors[1:].tolist() if device == "cuda" else items_factors[1:].tolist()

# Prepare item factors for upload
items_to_insert = list(zip(all_items_ids, item_embeddings, all_items_titles))
display(items_to_insert[:2])

[('1',
  [0.004863638896495104,
   -0.003513476811349392,
   0.011892298236489296,
   0.021234694868326187,
   -0.001935636275447905,
   -0.006150450091809034,
   -0.0012439432321116328,
   0.02654465287923813,
   -0.00629687262699008,
   0.00034670595778152347,
   -0.010479581542313099,
   0.024607032537460327,
   0.014286842197179794,
   0.013156742788851261,
   -0.007172086741775274,
   2.5846943572105374e-06,
   0.014232540503144264,
   0.005965684540569782,
   -0.005153530277311802,
   0.013652847148478031,
   0.0059549459256231785,
   -0.0013151399325579405,
   0.0012310814345255494,
   -0.011444853618741035,
   -0.0005264557548798621,
   0.0090092234313488,
   0.02081245742738247,
   0.005257006268948317,
   0.032843947410583496,
   0.014208783395588398,
   0.009072666056454182,
   -0.00666492385789752,
   0.015098821371793747,
   0.0030860183760523796,
   -0.004655963275581598,
   -0.01352519541978836,
   0.02042470872402191,
   0.012315873056650162,
   0.014507502317428589,
  

**Insert items into the index**

In [32]:
from tqdm.auto import tqdm

BATCH_SIZE = 100

print('Index statistics before upsert:', index.describe_index_stats())

for i in tqdm(range(0, len(items_to_insert), BATCH_SIZE)):
        index.upsert(vectors=items_to_insert[i:i+BATCH_SIZE])

print('Index statistics after upsert:', index.describe_index_stats())

Index statistics before upsert: {'dimension': 100,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}


  0%|          | 0/497 [00:00<?, ?it/s]

Index statistics after upsert: {'dimension': 100,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 49688}},
 'total_vector_count': 49688}


This is a helper method for analysing recommendations later.
This method returns top N products that someone bought in the past (based on product quantity).

In [33]:
def products_bought_by_user_in_the_past(user_id: int, top: int = 10):

    selected = data[data.user_id == user_id].sort_values(by=['total_orders'], ascending=False)

    selected['product_name'] = selected['product_id'].map(products_lookup.set_index('product_id')['product_name'])
    selected = selected[['product_id', 'product_name', 'total_orders']].reset_index(drop=True)
    if selected.shape[0] < top:
        return selected

    return selected[:top]

In [34]:
data.tail()

Unnamed: 0,user_id,product_id,total_orders
13863744,206209,48697,1
13863745,206209,48742,2
13863746,206210,22802,97
13863747,206211,26834,89
13863748,206211,12590,77


## Query for Recommendations

We are now retrieving user factors for users that we have manually created before for testing purposes. Besides these users, we are adding a random existing user. We are also displaying these users so you can see what these factors look like.

In [68]:
user_ids = [206210, 206211, 111111]
user_factors = model.user_factors[user_to_index[user_ids]]

display(user_factors)

array([[-0.7706813 , -0.6228603 ,  0.25665346, -0.5694573 , -0.05912157,
        -0.88797593, -0.5554997 ,  0.27156895,  0.61488545,  0.2805311 ,
        -0.2396878 ,  0.7415056 , -0.40337327,  0.4912151 ,  0.25589284,
         0.5227414 , -0.7331012 ,  1.0916723 , -0.24043101, -0.52741635,
         0.16291273,  0.22507975, -0.31680998, -1.4631283 , -0.20690207,
         0.4915874 , -0.11645824, -0.5647357 ,  0.18124792, -0.03952505,
         0.06562759, -0.49943838,  0.27199635,  0.5223039 , -0.6681625 ,
        -0.5360947 , -0.08303327, -0.67583114, -0.10948445,  0.21020076,
         0.5627407 ,  0.5052254 ,  0.20957701,  0.5437263 , -0.27449813,
         0.4695584 , -1.1682717 , -0.5499969 , -0.40575063,  1.1634142 ,
        -0.3060769 , -0.16221562,  0.5601063 , -1.3281413 ,  0.26473796,
         0.08705077,  0.10177101, -0.6210664 , -0.03426556, -0.1103953 ,
         0.84007764,  0.13313206, -0.04408165, -0.09110891,  0.49866924,
        -0.89355206,  0.11108791, -0.06239279, -0.2

### Model recommendations

We will now retrieve recommendations from our model directly, just to have these results as a baseline.

In [75]:
print("Model recommendations\n")

start_time = time.process_time()
recommendations0 = model.recommend(userid=user_ids[0], user_items=sparse_user_product[0])
recommendations1 = model.recommend(userid=user_ids[1], user_items=sparse_user_product[1])
recommendations2 = model.recommend(userid=user_ids[2], user_items=sparse_user_product[2])
print("Time needed for retrieving recommended products: " + str(time.process_time() - start_time) + ' seconds.\n')

print('\nRecommendations for person 0:')
for recommendation in recommendations0[0]:
    print(products_lookup[products_lookup.product_id == recommendation]['product_name'].values)

print('\nRecommendations for person 1:')
for recommendation in recommendations1[0]:
    print(products_lookup[products_lookup.product_id == recommendation]['product_name'].values)

print('\nRecommendations for person 2:')
for recommendation in recommendations2[0]:
    print(products_lookup[products_lookup.product_id == recommendation]['product_name'].values)

Model recommendations

Time needed for retrieving recommended products: 0.03411008400053106 seconds.


Recommendations for person 0:
['Sparkling Water']
['Mineral Water']
['Smartwater']
['Sparkling Natural Mineral Water']
['Soda']
['Distilled Water']
['Coconut Water']
['Zero Calorie Cola']
['Organic Coconut Water']
['Orange & Lemon Flavor Variety Pack Sparkling Fruit Beverage']

Recommendations for person 1:
['Baby Wipes Sensitive']
['Vitamin D Organic Whole Milk']
['Organic Whole Milk with DHA Omega-3']
['Free and Gentle High Efficiency Liquid Laundry Detergent']
['YoBaby Peach Pear Yogurt']
['Red Raspberries']
['Whole Almonds']
['Free and Gentle Liquid Laundry Detergent']
['YoKids Blueberry & Strawberry/Vanilla Yogurt']
['Organic Ezekiel 4:9 Sesame Bread']

Recommendations for person 2:
['Coke Classic']
['Iceberg Lettuce']
['Large Lemon']
['Whole Milk']
['Soda']
['Fridge Pack Cola']
['2% Reduced Fat Milk']
['Cucumber Kirby']
['Limes']
['Cola']


### Query the index

Let's now query the index to check how quickly we retrieve results. Please note that query speed depends in part on your internet connection.

In [77]:
# Query by user factors
user_embeddings = user_factors.tolist()

start_time = time.process_time()
query_results = index.query(queries=user_embeddings, top_k=10, include_metadata=True)
print("Time needed for retrieving recommended products using Pinecone: " + str(time.process_time() - start_time) + ' seconds.\n')

for _id, res in zip(user_ids, query_results.results):
    print(f'user_id={_id}')
    df = pd.DataFrame(
        {
            'products': [match.metadata['title'] for match in res.matches],
            'scores': [match.score for match in res.matches]
        }
    )
    print("Recommendation: ")
    display(df)
    print("Top buys from the past: ")
    display(products_bought_by_user_in_the_past(_id, top=15))

Time needed for retrieving recommended products using Pinecone: 0.08365781399970729 seconds.

user_id=206210
Recommendation: 


Unnamed: 0,products,scores
0,Mineral Water,0.91313
1,Sparkling Water,0.657853
2,Zero Calorie Cola,0.647207
3,Orange & Lemon Flavor Variety Pack Sparkling F...,0.612793
4,Organic Coconut Water,0.598766
5,Tall Kitchen Bag With Febreze Odor Shield,0.588826
6,Popcorn,0.586775
7,XL Pick-A-Size Paper Towel Rolls,0.568167
8,Extra Fancy Unsalted Mixed Nuts,0.558868
9,Organic Variety Pack,0.553084


Top buys from the past: 


Unnamed: 0,product_id,product_name,total_orders
0,22802,Mineral Water,97


user_id=206211
Recommendation: 


Unnamed: 0,products,scores
0,Baby Wash & Shampoo,0.738772
1,No More Tears Baby Shampoo,0.683696
2,Size 5 Cruisers Diapers Super Pack,0.555848
3,Size 6 Baby Dry Diapers,0.542459
4,Baby Wipes Sensitive,0.539
5,White Buttermints,0.503045
6,Strawberry Yogurt Melts,0.471988
7,Stage 1 Newborn Hypoallergenic Liquid Detergent,0.471483
8,Head-to-Toe Baby Wash,0.462627
9,1pk 270ct Refill,0.461261


Top buys from the past: 


Unnamed: 0,product_id,product_name,total_orders
0,26834,No More Tears Baby Shampoo,89
1,12590,Baby Wash & Shampoo,77


user_id=111111
Recommendation: 


Unnamed: 0,products,scores
0,Coke Classic,0.581112
1,Banana,0.523755
2,Large Lemon,0.481204
3,Lemon-Lime Fridge Pack Soda,0.460208
4,Limes,0.43854
5,Original Restaurant Style Tortilla Chips,0.438514
6,Iceberg Lettuce,0.431055
7,Strawberries,0.425819
8,Naturally Hickory Smoked Thick Cut Bacon,0.408348
9,Fresh Cut Golden Sweet Whole Kernel Corn,0.404015


Top buys from the past: 


Unnamed: 0,product_id,product_name,total_orders
0,16696,Coke Classic,5
1,3512,Black Beans Reduced Sodium,2
2,47472,Cheesy Enchilada Hamburger Helper,2
3,32047,Original Apple 100% Juice,2
4,24852,Banana,2
5,28993,Iceberg Lettuce,2
6,12403,Original Mashed Potatoes,1
7,18017,Organic Tomatoes,1
8,5727,Sandwich Potato Bread,1
9,46979,Asparagus,1


*Note* The inference using Pinecone is much faster compared to retrieving recommendations from a model directly. Please note that this result depends on your internet connection as well.

All that’s left to do is surface these recommendations on the shopping site, or feed them into other applications.

## Clean up

Delete the index once you are sure that you do not want to use it anymore. Once it is deleted, you cannot reuse it.

In [None]:
pinecone.delete_index(index_name)

## Summary

In this example we used [Pinecone](https://www.pinecone.io/) to build and deploy a product recommendation engine that uses collaborative filtering, relatively quickly.

Once deployed, the product recommendation engine can index new data, retrieve recommendations in milliseconds, and send results to production applications.