# Recommender ChatBot: Collaborative Filtering recommender using user embeddings from chromadb

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os

BASE_PATH    = '../..'
API_PATH     = f'{BASE_PATH}/chat-bot-api'
LIB_PATH     = f'{BASE_PATH}/lib'


os.environ['TMP_PATH']         = f'{BASE_PATH}/tmp'
os.environ['DATASET_PATH']     = f'{BASE_PATH}/datasets'
os.environ['WEIGHTS_PATH']     = f'{BASE_PATH}/weights'
os.environ['METRICS_PATH']     = f'{BASE_PATH}/metrics'
os.environ['MONGODB_URL']      = 'mongodb://0.0.0.0:27017'
os.environ['MONGODB_DATABASE'] = 'chatbot'
os.environ['CHROMA_HOST']      = '0.0.0.0'
os.environ['CHROMA_PORT']      = '9090'

In [3]:
import sys
sys.path.append(LIB_PATH)
sys.path.append(API_PATH)

import pytorch_common.util as pu
from app_context import AppContext

2024-02-24 17:57:17.360600: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-24 17:57:18.087334: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-02-24 17:57:18.098481: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your 

<Figure size 640x480 with 0 Axes>

## Setup

In [4]:
pu.LoggerBuilder().on_console().build()

<RootLogger root (INFO)>

In [5]:
ctx = AppContext()

2024-02-24 17:57:19,505 - INFO - Load pretrained SentenceTransformer: all-mpnet-base-v2
2024-02-24 17:57:20,066 - INFO - Use pytorch device: cuda
2024-02-24 17:57:20,069 - INFO - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
2024-02-24 17:57:20,091 - INFO - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.


## Recommender

In [157]:
recommendations = await ctx.database_user_item_filtering_recommender.recommend(
    user_id            = "adrianmarino@gmail.com",
    text_query         = 'I want to see sci-fi movies',
    k_sim_users        = 10,
    max_items_by_user  = 20,
    text_query_limit   = 1500,
    min_rating_by_user = 4,
    not_seen           = True
)

2024-02-24 18:31:56,112 - INFO - Found 10 similar users
2024-02-24 18:31:56,116 - INFO - Found 449 similar users interactions
2024-02-24 18:31:56,116 - INFO - Select 134 similar users interactions
2024-02-24 18:31:56,364 - INFO - Found 1500 items by text query
2024-02-24 18:31:56,374 - INFO - Select 31 similar user items by text query
2024-02-24 18:31:56,377 - INFO - Select 31 similar user unseen items


In [158]:
recommendations.show_seen()

Not Found items!


In [159]:
recommendations.show(
    sort_by        = ['user_sim_weighted_pred_rating_score'],
    k              = 10
)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
Poster,,,,,,,,,,
Genres,ComedyDramaSci-fi,AdventureComedySci-fi,ActionComedySci-fi,DramaSci-fi,AdventureAnimationChildrenDrama,ActionAdventureComedySci-fi,AdventureAnimationChildrenRomanceSci-fi,ActionAdventureSci-fi,ActionDramaWestern,Sci-fiThriller
Mean Rating,4.117647,4.101064,3.786517,3.953947,3.995122,3.908297,4.114754,4.208738,3.960526,3.185185
Predicted Rating,4.75918,4.655716,4.494987,4.472497,4.469231,4.438426,4.423465,4.387161,4.349764,4.32298
User sim weighted rating score,0.967706,0.964074,0.89013,0.92949,0.939126,0.919856,0.967026,0.98891,0.931108,0.74877
User sim weighted predicted rating score,0.989114,0.967877,0.934463,0.929787,0.929066,0.923806,0.919341,0.911609,0.904342,0.898704
User Item Similarity,0.989114,0.989386,0.989386,0.989386,0.98934,0.990567,0.989114,0.98891,0.989462,0.989386


### Notes
* Movies seen by similar users weighted by user predicted rating.
* Ordered by **User sim weighted predicted rating score**.
* **User sim weighted predicted rating score** = similar_user_similarity(0..1) * user_predicted_rating (Normalize to 0..1)

In [160]:
recommendations.show(
    sort_by        = ['user_sim_weighted_rating_score'],
    k              = 10
)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
Poster,,,,,,,,,,
Genres,ActionAdventureSci-fi,ComedyDramaSci-fi,AdventureAnimationChildrenRomanceSci-fi,AdventureComedySci-fi,ActionSci-fiThrillerImax,ActionHorrorSci-fiThriller,AdventureAnimationChildrenDrama,DramaSci-fiThriller,DramaSci-fiThriller,ActionDramaWestern
Mean Rating,4.208738,4.117647,4.114754,4.101064,4.089855,4.045455,3.995122,3.976636,3.960526,3.960526
Predicted Rating,4.387161,4.75918,4.423465,4.655716,4.305705,4.299498,4.469231,4.261951,4.251078,4.349764
User sim weighted rating score,0.98891,0.967706,0.967026,0.964074,0.961174,0.951075,0.939126,0.934566,0.932148,0.931108
User sim weighted predicted rating score,0.911609,0.989114,0.919341,0.967877,0.894867,0.893892,0.929066,0.885773,0.884812,0.904342
User Item Similarity,0.98891,0.989114,0.989114,0.989386,0.989114,0.989462,0.98934,0.989114,0.990567,0.989462


### Notes
* Movies seen by similar users weighted by movie mean rating.
* Ordered by **User sim weighted rating score**.
* **User sim weighted rating score** = similar_user_similarity(0..1) * mean_movie_rating (Normalize to 0..1) 

In [161]:
recommendations.show(
    sort_by        = ['user_item_sim'],
    k              = 10
)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
Poster,,,,,,,,,,
Genres,DramaSci-fiThriller,CrimeDramaMystery,ActionAdventureComedySci-fi,HorrorMysteryThriller,ActionDramaWestern,ActionAdventureComedyFantasyHorror,ActionHorrorSci-fiThriller,Sci-fiThriller,DramaSci-fi,ActionAdventureSci-fi
Mean Rating,3.960526,3.876404,3.908297,2.875,3.960526,3.934211,4.045455,3.185185,3.953947,3.487603
Predicted Rating,4.251078,4.219775,4.438426,3.568155,4.349764,4.157475,4.299498,4.32298,4.472497,3.946019
User sim weighted rating score,0.932148,0.912349,0.919856,0.675975,0.931108,0.924922,0.951075,0.74877,0.92949,0.819862
User sim weighted predicted rating score,0.884812,0.878297,0.923806,0.741918,0.904342,0.864364,0.893892,0.898704,0.929787,0.820338
User Item Similarity,0.990567,0.990567,0.990567,0.989565,0.989462,0.989462,0.989462,0.989386,0.989386,0.989386


### Notes
* Movies seen by similar users.
* Ordered by **similar user similarity by movie**.
* take movies sample from each similar user, then assign similary and order by similary.