### load models
- in the finalized version, there are a total of 4 models:
    1. SentenceTransformer 
        - specifically uses PhraseBERT, which accounts for lexical dissimilarity
        - Embeds the query, which is used to compute the most similar genres
    2. CrossEncoder
        - Computes similarity between two documents (query and summary) with greater accuracy
        - but this requires both the raw query and the summaries as input, the summaries cannot be preembedded
            - thus there is an accuracy and speed trade-off
    3. Rake - Rapid Keyword Extraction
        - this model is responsible for extracting keywords from the query 
    4. ALSRecommender
        - uses a collaborative filtering, AlternatingLeastSquares, model to personalize recommendations
        

In [1]:
from sentence_transformers import SentenceTransformer
from sentence_transformers.cross_encoder import CrossEncoder
from rake_nltk import Rake
from models import ALSRecommender

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
genre_embedding_model = SentenceTransformer('whaleloops/phrase-bert')
ranking_model = CrossEncoder('cross-encoder/stsb-TinyBERT-L-4')
keyword_model = Rake()

In [2]:
# load model
directory = r"C:\Users\tanch\Documents\NTU\NTU Year 4\Semester 1\CZ4125 - Developing Data Products\Assignments\Team Assignment - book recommendation\code\models"
model = ALSRecommender(model_name = "AlternatingLeastSquares", 
                       config_dict = {}, 
                       train_csr = None, 
                       user_map = None, 
                       item_map = None)
model.load(directory)

### get pipeline functions
- refering to semanticSearch and personalizedSearch()

In [4]:
from pipeline import *

### connect mongo
- **40516** books

In [3]:
from pymongo import MongoClient
import certifi
ca = certifi.where()
client = MongoClient("mongodb+srv://tartiniglia:W.I.T.C.H.@atlascluster.tv8xjir.mongodb.net/?retryWrites=true&w=majority",
                     serverSelectionTimeoutMS = 5000, tlsCAFile=ca)
db = client["bookEater"]
book_collection = db["Books"]
full_genre_collection = db["full_genre"]
# from pymongo import MongoClient
# import certifi
# ca = certifi.where()
# client = MongoClient("mongodb+srv://tanchingfhen:978775!Mj@dataproducts.hcjk1ct.mongodb.net/?retryWrites=true&w=majority", tlsCAFile=ca)
# db = client["DP"] 
# book_collection = db["books"]
# full_genre_collection = db["full_genre"]

In [5]:
from pymongo import MongoClient
import certifi
ca = certifi.where()
client2 = MongoClient("mongodb+srv://tanchingfhen:978775!Mj@dataproducts.hcjk1ct.mongodb.net/?retryWrites=true&w=majority", tlsCAFile=ca)
user_collection = client2["DP"]["users"]

In [9]:
cur = user_collection.find({})

In [11]:
next(cur)

{'_id': '276726',
 'User-ID': '276726',
 'password': '1{m2G-U`p-M1XmS',
 'interactions': ['0155061224']}

## semanticSearch
- results are filtered by "desired_genres", then reranked
- we can see that the top results do contain the ideas of detective work AND murder 

In [37]:
query = "assassination stories and detectives investigating"

In [38]:
%%timeit
results, desired_genres = semanticSearch(
    query = query, 
    book_collection = book_collection, 
    genre_collection = full_genre_collection, 
    genre_embedder = genre_embedding_model, 
    ranking_model = ranking_model, 
    keyword_model = keyword_model,
    stage_one_limit = 200, 
    stage_two_limit = 3, 
    num_genre_per_keyword = 3
)

6.26 s ± 473 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [17]:
# our algorithm guesses that the user is looking for these genres
# results must contain at least one genre in each list
desired_genres

[['Detective', 'Police', 'Occult Detective'],
 ['Conspiracy Theories', 'Hoaxes', 'Murder Mystery']]

In [20]:
# results below are ranked by CrossEncoder 
for doc in results:
    print(doc["ISBN"])
    print(doc["Summary"])
    print(doc["Genre"])
    print()

0446609323
['The breathtaking suspense of ', " and the authenticity of N.Y.P.D. Blue: Welcome to James Patterson's classic superthriller, ", ". A courageous federal agent, a powerful and resourceful woman lawyer - only they can possibly stop the unspeakable from happening. New York City is under siege by a secret militia group - and that's just the beginning of the relentless terror of ", '.']
['Action', 'Crime', 'Detective', 'Fiction', 'Murder Mystery', 'Mystery', 'Mystery Thriller', 'Suspense', 'Terrorism', 'Thriller', 'War']

2253182389
["Curieux endroit, curieuse affaire… St\xa0Anselm, collège de théologie de l'Église anglicane perdu sur la côte sud de l'Angleterre, n'est pas à proprement parler le lieu où l'on s'attend à une mort violente. Il est vrai que lorsque l'infirmière-gouvernante Margaret Munroe découvre le corps inanimé du jeune élève Ronald Treeves au bas d'une falaise, il ne s'agit encore que d'un accident… thèse évidemment réfutée par le richissime père adoptif du jeun

In [34]:
%%timeit
query = "a life story about bouldering"
results, desired_genres = semanticSearch(
    query = query, 
    book_collection = book_collection, 
    genre_collection = full_genre_collection, 
    genre_embedder = genre_embedding_model, 
    ranking_model = ranking_model, 
    keyword_model = keyword_model,
    stage_one_limit = 200, 
    stage_two_limit = 3, 
)

4.14 s ± 1.17 s per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [35]:
desired_genres

[['Biography', 'History', 'Autobiography'],
 ['Climbing', 'Mountaineering', 'Rock N Roll']]

In [36]:
for doc in results:
    print(doc["ISBN"])
    print(doc["Summary"])
    print(doc["Genre"])
    print()

0525945784
['Erik Weihenmayer was born with retinoscheses, a degenerative eye disorder that would progressively unravel his retinas. Erik learned from doctors that he was destined to lose his sight by age thirteen. Yet from early on, he was determined to rise above this devastating disability and lead a fulfilling, exciting life. In Touch the Top of the World, Erik recalls his struggle to push past the limits placed on him by his visual impairment -- and by a seeing world. He speaks movingly of the role his family played in his battle to break through the barriers of blindness: the mother who prayed for the miracle that would restore her son\'s sight; the father who encouraged him to strive for that unreachable mountaintop.Erik was the first blind man to summit McKinley. Soon he became the first blind person to scale the infamous 3000-foot rock wall of El Capitan and then Argentina\'s Aconcagua, the highest peak outside of Asia. He was married to his longtime sweetheart at 13,000 feet 

## personalizedSearch
- these are personalized recommendations for user 210959

In [39]:
%%timeit
personalizedSearch(book_collection, model, 210959, 3)

330 ms ± 10.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
