Created by: Rosamund

## Overview

NOTE: 
Additional packages required: 
- pip install -U sentence-transformers==3.0.0

To avoid TqdmWarning: IProgress not found:
- pip install --upgrade jupyter ipywidgets


A Sentence Transformer is a type of natural language processing model designed specifically to produce meaningful and useful sentence embeddings. Sentence embeddings are fixed-length numerical representations that capture the semantic meaning of a sentence.

Reference: https://github.com/VishalS-HK/product-recommendation-system-BERT/blob/main/Product_Recommendation_System_BERT.ipynb

## Import Libraries

In [11]:
import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer
import tqdm as tqdm
import pickle
import cornac
from cornac.data import Reader
from cornac.datasets import citeulike
from cornac.eval_methods import RatioSplit
from cornac.data import TextModality, FeatureModality

## Extract data

In [6]:
anime_df = pd.read_csv('../data/anime_final_cleaned.csv')

In [7]:
print(f'No. of rows: {anime_df.shape[0]:,}')
print(f'No. of columns: {anime_df.shape[1]:,}')

No. of rows: 11,094
No. of columns: 22


In [8]:
anime_df.head(3)

Unnamed: 0,anime_id,name,type,episodes,mal_score,members,studio,release-season,release-year,release-date,...,themes,demographics,synopsis,image_url,rating,va_list,staff_list,recommended_review_count,mixedfeelings_review_count,notrecommended_review_count
0,32281,Kimi no Na wa.,Movie,1,9.37,200630,['CoMix Wave Films'],summer,2016.0,,...,[],,"Mitsuha Miyamizu, a high school girl, yearns t...",https://cdn.myanimelist.net/images/anime/5/870...,PG-13 - Teens 13 or older,"['Kamishiraishi, Mone', 'Kamiki, Ryunosuke', '...","['Bezerra, Wendel', 'Kawamura, Genki', 'Itou, ...",808.0,88.0,50.0
1,5114,Fullmetal Alchemist: Brotherhood,TV,64,9.26,793665,['Bones'],spring,2009.0,,...,['Military'],Shounen,After a horrific alchemy experiment goes wrong...,https://cdn.myanimelist.net/images/anime/1208/...,R - 17+ (violence & profanity),"['Park, Romi', 'Kugimiya, Rie', 'Miki, Shinich...","['Cook, Justin', 'Maruyama, Hiroo', 'Yonai, No...",912.0,59.0,39.0
2,28977,Gintama°,TV,51,9.25,114262,['Bandai Namco Pictures'],spring,2015.0,,...,"['Gag Humor', 'Historical', 'Parody', 'Samurai']",Shounen,"Gintoki, Shinpachi, and Kagura return as the f...",https://cdn.myanimelist.net/images/anime/3/720...,PG-13 - Teens 13 or older,"['Sugita, Tomokazu', 'Kugimiya, Rie', 'Sakaguc...","['Miyawaki, Chizuru', 'Takamatsu, Shinji', 'Yo...",79.0,3.0,1.0


Print out the first synopsis

In [9]:
anime_df.iloc[0]['synopsis']

"Mitsuha Miyamizu, a high school girl, yearns to live the life of a boy in the bustling city of Tokyo—a dream that stands in stark contrast to her present life in the countryside. Meanwhile in the city, Taki Tachibana lives a busy life as a high school student while juggling his part-time job and hopes for a future in architecture.    One day, Mitsuha awakens in a room that is not her own and suddenly finds herself living the dream life in Tokyo—but in Taki's body! Elsewhere, Taki finds himself living Mitsuha's life in the humble countryside. In pursuit of an answer to this strange phenomenon, they begin to search for one another.      revolves around Mitsuha and Taki's actions, which begin to have a dramatic impact on each other's lives, weaving them into a fabric held together by fate and circumstance.    [Written by MAL Rewrite]"

In [10]:
anime_df.iloc[100]['synopsis']

"Fuu Kasumi is a young and clumsy waitress who spends her days peacefully working in a small teahouse. That is, until she accidentally spills a drink all over one of her customers! With a group of samurai now incessantly harassing her, Fuu desperately calls upon another samurai in the shop, Mugen, who quickly defeats them with his wild fighting technique, utilizing movements reminiscent to that of breakdancing. Unfortunately, Mugen decides to pick a fight with the unwilling ronin Jin, who wields a more precise and traditional style of swordfighting, and the latter proves to be a formidable opponent. The only problem is, they end up destroying the entire shop as well as accidentally killing the local magistrate's son.    For their crime, the two samurai are captured and set to be executed. However, they are rescued by Fuu, who hires the duo as her bodyguards. Though she no longer has a place to return to, the former waitress wishes to find a certain samurai who smells of sunflowers and 

Preprocessing:
At the end of every synopsis there is '[Written by MAL Rewrite]'. We want to remove this

In [11]:
anime_df['synopsis'] = anime_df['synopsis'].str.replace(r'\s*\[Written by MAL Rewrite\]\s*','',regex=True)

Let's inspect if that phrase is still there 

In [12]:
anime_df.iloc[0]['synopsis']

"Mitsuha Miyamizu, a high school girl, yearns to live the life of a boy in the bustling city of Tokyo—a dream that stands in stark contrast to her present life in the countryside. Meanwhile in the city, Taki Tachibana lives a busy life as a high school student while juggling his part-time job and hopes for a future in architecture.    One day, Mitsuha awakens in a room that is not her own and suddenly finds herself living the dream life in Tokyo—but in Taki's body! Elsewhere, Taki finds himself living Mitsuha's life in the humble countryside. In pursuit of an answer to this strange phenomenon, they begin to search for one another.      revolves around Mitsuha and Taki's actions, which begin to have a dramatic impact on each other's lives, weaving them into a fabric held together by fate and circumstance."

In [13]:
anime_df.iloc[100]['synopsis']

"Fuu Kasumi is a young and clumsy waitress who spends her days peacefully working in a small teahouse. That is, until she accidentally spills a drink all over one of her customers! With a group of samurai now incessantly harassing her, Fuu desperately calls upon another samurai in the shop, Mugen, who quickly defeats them with his wild fighting technique, utilizing movements reminiscent to that of breakdancing. Unfortunately, Mugen decides to pick a fight with the unwilling ronin Jin, who wields a more precise and traditional style of swordfighting, and the latter proves to be a formidable opponent. The only problem is, they end up destroying the entire shop as well as accidentally killing the local magistrate's son.    For their crime, the two samurai are captured and set to be executed. However, they are rescued by Fuu, who hires the duo as her bodyguards. Though she no longer has a place to return to, the former waitress wishes to find a certain samurai who smells of sunflowers and 

Ensure that there are no more NaN values

In [28]:
anime_df['synopsis'].isna().sum()

0

Type conversion to string

In [29]:
anime_df['synopsis'] = anime_df['synopsis'].astype(str)

## Sentence Transformers

Instantiate Bert sentence transformer

In [18]:
model = SentenceTransformer('bert-base-nli-mean-tokens')



Generate embeddings

In [30]:
# testing on first 5
synopsis_list = anime_df['synopsis'].tolist()
sentence_embeddings = model.encode(synopsis_list, show_progress_bar=True)

# Note: for .encode(), there is an optional arugment: normalize_embeddings (bool, optional) – Whether to 
# normalize returned vectors to have length 1. In that case, the faster dot-product (util.dot_score) instead 
# of cosine similarity can be used. Defaults to False.

Batches:   0%|          | 0/347 [00:00<?, ?it/s]

In [32]:
sentence_embeddings.shape

(11094, 768)

Save the embeddings

In [35]:
with open('../data/synopsis_embeddings_110624.pkl', 'wb') as f:
    pickle.dump(sentence_embeddings, f)

Open the embeddings

In [2]:
with open('../data/synopsis_embeddings_110624.pkl', 'rb') as f:
    synopsis_embedding = pickle.load(f)

print(type(synopsis_embedding))
print(synopsis_embedding.shape)

<class 'numpy.ndarray'>
(11094, 768)


Open ratings

In [20]:
filepath = '../data/rating_final.csv'
anime_ratings = pd.read_csv(filepath, sep="|")

In [28]:
anime_ratings.head()

Unnamed: 0,user_id,anime_id,rating
0,1,20,-1
1,1,24,-1
2,1,79,-1
3,1,226,-1
4,1,241,-1


In [31]:
anime_ratings_tuple = []
for uir in anime_ratings.values:
    anime_ratings_tuple.append(tuple(uir))

In [34]:
print(type(anime_ratings_tuple))
for uir in anime_ratings_tuple:
    print(uir)
    break

<class 'list'>
(1, 20, -1)


Define Feature Modality 
- Reference: https://cornac.readthedocs.io/en/latest/api_ref/data.html#module-cornac.data.modality

In [35]:
item_ids = anime_df['anime_id'].values.tolist()

item_text_modality = FeatureModality(
    features=synopsis_embedding,
    ids=item_ids
)

Testing on downstream modelling

In [25]:
# The necessary data can be loaded as follows
docs, item_ids = citeulike.load_text()
feedback = citeulike.load_feedback(reader=Reader(item_set=item_ids))
feedback


[('1', '70', 1.0),
 ('1', '495', 1.0),
 ('1', '1631', 1.0),
 ('1', '2317', 1.0),
 ('1', '2526', 1.0),
 ('1', '2846', 1.0),
 ('1', '2931', 1.0),
 ('1', '3171', 1.0),
 ('1', '3297', 1.0),
 ('1', '3332', 1.0),
 ('1', '3404', 1.0),
 ('1', '3591', 1.0),
 ('1', '3595', 1.0),
 ('1', '3770', 1.0),
 ('1', '3950', 1.0),
 ('1', '4626', 1.0),
 ('1', '4662', 1.0),
 ('1', '4871', 1.0),
 ('1', '4889', 1.0),
 ('1', '5114', 1.0),
 ('1', '5324', 1.0),
 ('1', '5325', 1.0),
 ('1', '5614', 1.0),
 ('1', '5991', 1.0),
 ('1', '6103', 1.0),
 ('1', '6874', 1.0),
 ('1', '6968', 1.0),
 ('1', '7106', 1.0),
 ('1', '7801', 1.0),
 ('1', '7867', 1.0),
 ('1', '8903', 1.0),
 ('1', '9907', 1.0),
 ('1', '10008', 1.0),
 ('1', '10204', 1.0),
 ('1', '10272', 1.0),
 ('1', '10288', 1.0),
 ('1', '10508', 1.0),
 ('1', '10588', 1.0),
 ('1', '11009', 1.0),
 ('1', '11105', 1.0),
 ('1', '11226', 1.0),
 ('1', '11320', 1.0),
 ('1', '11650', 1.0),
 ('1', '11853', 1.0),
 ('1', '11919', 1.0),
 ('1', '12684', 1.0),
 ('1', '12716', 1.0),
 

In [36]:
# Define an evaluation method to split feedback into train and test sets
ratio_split = RatioSplit(
    data=anime_ratings_tuple,
    test_size=0.2,
    exclude_unknowns=True,
    verbose=True,
    seed=123,
    rating_threshold=0.5,
    item_text=item_text_modality,
)

# Instantiate DMRL recommender
dmrl_recommender = cornac.models.dmrl.DMRL(
    batch_size=4096,
    epochs=20,
    log_metrics=False,
    learning_rate=0.01,
    num_factors=2,
    decay_r=0.5,
    decay_c=0.01,
    num_neg=3,
    embedding_dim=100,
)

# Use Recall@300 for evaluations
rec_300 = cornac.metrics.Recall(k=300)
prec_30 = cornac.metrics.Precision(k=30)

# Put everything together into an experiment and run it
cornac.Experiment(
    eval_method=ratio_split, models=[dmrl_recommender], metrics=[prec_30, rec_300]
).run()

ValueError: input_modality has to be instance of TextModality but <class 'cornac.data.modality.FeatureModality'>