# Recommenders 3 -- Sequence Recommenders (45m) 

## Goals of this practical:

- Understand the sequence recommendation framework (~5min)
- Load/Format dataset (~5min)
- Understand/train the prod2vec model (~10min)
- Evaluate (~10min)
- Visualize (~10min)
- Fiddle (~5min)



In [1]:
#! pip install gensim --upgrade

In [2]:
import pandas as pd
import numpy as np

# Sequence Recommenders:

> What will you click next ?

The sequence recommendation setting is a particular case of the implicit collaborative filtering setting. Given a sequence of items $i_0,i_1,...,i_n$ the goal is to predict the $i_{(n+1)},...$ items the user will consume. Playlist continuation is a neat use case of sequence recommenders. You've been listening to those songs, what can you listen to now ?


This setting differs from the classical collaborative filtering because the history is the recent trace and not the full saved interactions. Also, it's possible to do sequence recommendation without any specific latent user profile. 

#### Here we propose to explore this unpersonalized sequence recommandation

## Data used : [smallest movie-lens dataset](https://grouplens.org/datasets/movielens/)

Here we'll use the same data as before but instead of seeing $(user,item,rating)$ triplets or a $(user,item)$ interaction , we'll see item sequences: $user: [item, item,...]$

## Loading Data (same as before but in chronological order):

In [3]:
# We load the ratings
ratings = pd.read_csv("dataset/ratings.csv")
ratings = ratings.sort_values("timestamp",ascending=True)
print(ratings.iloc[0]["timestamp"] < ratings.iloc[-1]["timestamp"] ) # just checking

# Let's check what the ratings look like
ratings.head(5)

True


Unnamed: 0,userId,movieId,rating,timestamp
66719,429,595,5.0,828124615
66716,429,588,5.0,828124615
66717,429,590,5.0,828124615
66718,429,592,5.0,828124615
66712,429,432,3.0,828124615


In [4]:
# we also load titles and create an id2title dictionnary
titleCSV = pd.read_csv("dataset/movies.csv")
id2title = titleCSV[["movieId","title"]].set_index("movieId").to_dict()["title"]

# Let's check wthat the titles look like
titleCSV.head(5)

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


## (a) Create sequence datasets:
For this task, we need sequences of items as data:

## (Todo): extract all movie sequences (in chronological order) from the dataset:


In this dataset, each user has seen at least 20 movies.


- We need to extract all movie rating sequences (there is one per user) from the dataset:

`sequence_of_movies = [[movieid,...],[movieid,...],...]`

In [5]:
sequences_of_movies = [list(ratings[ratings["userId"] == id]["movieId"].values) for id in ratings["userId"].unique()]

## (Todo): Create a train/test dataset

Here, we propose as task to predict the last 5 items of each sequence.

In [6]:
train_seq,test_seq = [],[]

# We take for last 5 for the test sequence and the remaining ones for the train sequence
for seq in sequences_of_movies:
    train_seq.append(seq[:-5])
    test_seq.append(seq[-5:])
    
last_consumed_item = [seq[-1] for seq in train_seq] # We save the last consumed item for each list
                                                    # We'll use it as a starting point

## (Todo): Create the list of the most popular movies

- Here, popular is the number of times the movie appears in a list

In [7]:
from collections import Counter

# We can sum Counter objects -> values from matching keys are summed
counts = np.asarray([Counter(seq) for seq in sequences_of_movies]).sum()
most_popular = list(dict(counts.most_common()).keys())
num_items = len(most_popular)

# We check what the first 10 most popular movies look like
print(most_popular[:10])

[356, 318, 296, 593, 2571, 260, 480, 110, 589, 527]


``` python
#Most popular looks like this:
[356,318,296,2571,593,260,480,110,589,...]
 ```

## Word2Vec skip-gram <=> Prod2Vec


### Word2Vec

The MAIN idea of word2vec is to maximise the similarity (dot product) between the vectors for words which appear close together (in the context of each other) in text, and minimise the similarity of words that do not. 

This can be applied to products instead of words: it clusters similar products together.


#### Paper Abstract:
> In recent years online advertising has become increasingly ubiquitous and effective. Advertisements shown to visitors fund sites and apps that publish digital content, manage social networks, and operate e-mail services. Given such large variety of internet resources, determining an appropriate type of advertising for a given platform has become critical to financial success. Native advertisements, namely ads that are similar in look and feel to content, have had great success in news and social feeds. However, to date there has not been a winning formula for ads in e-mail clients. In this paper we describe a system that leverages user purchase history determined from e-mail receipts to deliver highly personalized product ads to Yahoo Mail users. We propose to use a novel neural language-based algorithm specifically tailored for delivering effective product recommendations, which was evaluated against baselines that included showing popular products and products predicted based on co-occurrence. We conducted rigorous offline testing using a large-scale product purchase data set, covering purchases of more than 29 million users from 172 e-commerce websites. Ads in the form of product recommendations were successfully tested on online traffic, where we observed a steady 9% lift in click-through rates over other ad formats in mail, as well as comparable lift in conversion rates. Following successful tests, the system was launched into production during the holiday season of 2014

[Prod2Vec Model](https://arxiv.org/abs/1606.07154)



## Gensim has the best python implementation of word2vec's algorithms:

We can just use these raw implementations. The only thing to do is to consider items as words:

In [8]:
import gensim
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)



train_seq_str = [list(map(str,seq)) for seq in train_seq] # we just say that our items id's are strings..
    

# the following configuration is the default configuration
# At first, the init method got the unexpected keyword arguments "size" and "iter"
# By checking the documentation, we can find that the "size" argument is now called "vector_size" and
# that "iter" is now called "epochs"
w2v = gensim.models.word2vec.Word2Vec(sentences=train_seq_str,
                                vector_size=50, window=10,               ### here we train a cbow model 
                                min_count=0,                      
                                sample=0.001, ns_exponent=0.75, workers=10,
                                sg=1, hs=0, negative=15,          ### set sg to 1 to train a sg model => Prod2Vec
                                cbow_mean=0,
                                epochs=50)

2022-04-29 16:29:51,301 : INFO : collecting all words and their counts
2022-04-29 16:29:51,302 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2022-04-29 16:29:51,313 : INFO : collected 9616 word types from a corpus of 97786 raw words and 610 sentences
2022-04-29 16:29:51,314 : INFO : Creating a fresh vocabulary
2022-04-29 16:29:51,337 : INFO : Word2Vec lifecycle event {'msg': 'effective_min_count=0 retains 9616 unique words (100.0%% of original 9616, drops 0)', 'datetime': '2022-04-29T16:29:51.337484', 'gensim': '4.1.2', 'python': '3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.19043-SP0', 'event': 'prepare_vocab'}
2022-04-29 16:29:51,339 : INFO : Word2Vec lifecycle event {'msg': 'effective_min_count=0 leaves 97786 word corpus (100.0%% of original 97786, drops 0)', 'datetime': '2022-04-29T16:29:51.339985', 'gensim': '4.1.2', 'python': '3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.192

### A few things:

In [9]:
print(w2v.wv.vectors[0])              # The vector of index 0

# Again, when executing this node, we get an error. It seems like a few of the functions and attributs used have been replaced
# in newer versions of gensim. The "index2word" attribute is now called "index_to_key"
print(w2v.wv.index_to_key[0])           # codes for the movieId 356

# The vocab "attribute" has been removed. We can use different attributes instead like :
# KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val)
# We now use the key_to_index attribute
print(w2v.wv.key_to_index["356"])      # Inverse mapping

[-3.0074254e-01 -6.4518154e-01 -1.9506796e-01  3.7094378e-01
 -5.3686053e-01 -2.0066336e-01  7.6492526e-02 -4.5271805e-01
 -6.3443112e-01 -5.9595663e-02 -1.6577612e-01 -6.4148939e-01
 -2.9267016e-01  4.8130924e-01  5.4462796e-01  7.8360009e-01
  1.7059387e-01 -1.3491103e-01 -2.9239038e-01 -3.2902139e-01
 -5.0423551e-01  2.9892731e-01  1.7792910e-01 -6.2749010e-01
 -3.3014727e-01  7.9042345e-01 -2.7299440e-01 -1.7648122e-01
  4.0832553e-02  4.4780809e-01 -1.8865852e-01  1.1530963e-02
  5.9730673e-01 -1.2696083e-01 -1.7503755e-01 -1.8936023e-01
 -6.0698751e-02 -6.2536383e-01  2.8162514e-04  2.0446047e-02
  5.4630548e-01 -4.3348706e-01 -1.0088697e-01  2.1373785e-01
  1.1573187e-01 -6.2917203e-01  3.8120708e-01 -2.6983368e-01
 -6.1774734e-02  1.8207285e-01]
356
0


## **Major note**

As I have modified some of the functions used in order to make the code runnable again, I don't know if it will have any effects of the rest of the notebook but let's hope not. If any of the results seems strange or out of place, I will try to fix the code I've modified.

## Getting similar items:

The heart of the algorithm is in the similar item search. As in word2vec, we simply use cosine distance between items to find "similar items"

### We can search by id's

In [10]:
def get_similar_ids(w2vmodel,iid,num=5):
    
    if str(iid) in w2vmodel.wv.index_to_key:
        return [int(iid) for iid,_ in w2vmodel.wv.most_similar(str(iid),topn=num)] 
    else:
        return []

get_similar_ids(w2v,last_consumed_item[0],num=5)

[231, 410, 292, 185, 434]

### Or by vector

In [11]:
w2v.wv[str(last_consumed_item[0])]

array([ 0.12833856, -0.9748843 , -0.588088  ,  0.1255272 , -0.37283424,
       -0.3683326 , -0.582962  , -0.04290681, -0.52555585, -0.16514345,
        0.22018643, -0.02168382, -1.0273906 ,  0.2903019 ,  0.9128572 ,
        0.5097222 ,  0.1810354 , -0.23724912,  0.20306313, -0.5100846 ,
       -0.12417614,  0.25936684,  0.22079636, -1.1885504 , -0.39398322,
        0.47633925, -0.36126354, -0.19677164, -0.48905674,  0.9990013 ,
        0.42597494,  0.5236524 ,  0.8724856 , -0.5313531 ,  0.16758032,
       -0.41889164,  0.30819002, -0.65761197,  0.39206782, -0.02582163,
        0.32290298, -0.48980114,  0.21802032,  0.18339832,  0.8385788 ,
       -1.1030978 ,  0.07435931, -0.35324025, -0.18675724, -0.2055685 ],
      dtype=float32)

In [12]:
def get_similar_vectors(w2vmodel,vec,num=5):
        return [int(iid) for iid,_ in w2vmodel.wv.most_similar(positive=[vec],topn=num)] 

get_similar_vectors(w2v,w2v.wv[str(last_consumed_item[0])],num=5) # items are strings

[339, 231, 410, 292, 185]

### Let's see if this works

We can query by id

In [13]:
ID = 1
NUM_SIM = 3

print("Movies similar to: ", id2title[ID])
print("")
for x in get_similar_ids(w2v,ID,NUM_SIM):
    print("--> ",id2title[x])

Movies similar to:  Toy Story (1995)

-->  Forrest Gump (1994)
-->  Toy Story 2 (1999)
-->  Lion King, The (1994)


We can also query by vector

**NOTE:** the 1st results can be the item(s) you've used to query

In [24]:
ID = 1
NUM_SIM = 4
print("Movies similar to: ", id2title[ID])
print("")
for x in get_similar_vectors(w2v,w2v.wv[str(ID)],NUM_SIM):
    print("--> ",id2title[x])

Movies similar to:  Toy Story (1995)

-->  Toy Story (1995)
-->  Forrest Gump (1994)
-->  Toy Story 2 (1999)
-->  Lion King, The (1994)


Our result was the following (for ID = 1 & NUM_SIM = 3)

Movies similar to:  Toy Story (1995)
>-  Beauty and the Beast (1991)
>-   Toy Story 2 (1999)
>-   Lion King, The (1994)

Using vectors enables operations like additions to be made

In [25]:
ID1 = 2571
ID2 = 589
NUM_SIM = 10

vec = np.max([w2v.wv[str(ID1)],w2v.wv[str(ID2)]],axis=0)

print("Movies similar to: ", id2title[ID1] , "+",  id2title[ID2] )
print("")
for x in get_similar_vectors(w2v,vec ,NUM_SIM):
    print("--> ",id2title[x])

Movies similar to:  Matrix, The (1999) + Terminator 2: Judgment Day (1991)

-->  Matrix, The (1999)
-->  Star Wars: Episode V - The Empire Strikes Back (1980)
-->  Venom (1982)
-->  Saving Private Ryan (1998)
-->  Terminator, The (1984)
-->  Terminator 2: Judgment Day (1991)
-->  Sixth Sense, The (1999)
-->  Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)
-->  Die Hard (1988)
-->  Gladiator (2000)


Let's try with another combination of IDs and the same number of similar movies (10).

In [27]:
# The IDs have been generated randomly from the most_popular list
ID1 = most_popular[6759]
ID2 = most_popular[3921]
NUM_SIM = 10

vec = np.max([w2v.wv[str(ID1)],w2v.wv[str(ID2)]],axis=0)

print("Movies similar to: ", id2title[ID1] , "+",  id2title[ID2] )
print("")
for x in get_similar_vectors(w2v,vec ,NUM_SIM):
    print("--> ",id2title[x])

Movies similar to:  Dark Victory (1939) + Once Bitten (1985)

-->  Once Bitten (1985)
-->  Dark Victory (1939)
-->  Unprecedented: The 2000 Presidential Election (2002)
-->  Hotel Chevalier (Part 1 of 'The Darjeeling Limited') (2007)
-->  Brave New World (1998)
-->  Grand Hotel (1932)
-->  I'm Here (2010)
-->  Royal Wedding (1951)
-->  Dr. Jekyll and Mr. Hyde (1931)
-->  Jacket, The (2005)


Ok, we now have a good base for our sequence recommendation algorithm, let's write something to evaluate our predictions

## (Todo) write a `get_relevance_list(proposed_ids,real_ids)` function:

This function will be used to compare proposed items w/ real items:


- A relevant item is an item which is in the ground truth
- It returns a list which length is the number of proposed items filled of 0's and 1's : 0 means the item is not relevant, 1 means it's relevant.

- get_relevance_list([1,2,3,4],[1,4,5,6]) should returns [1,0,0,1]  because items 1 and 4 are relevant.


In [29]:
def get_relevance_list(proposed_ids,real_ids):
    real_ids = set(real_ids)
    return [1 if x in real_ids else 0 for x in proposed_ids]
get_relevance_list([1,2,3,4],[1,4,5,6]) #returns [1,0,0,1]

[1, 0, 0, 1]

### Let's test our function on our data

In [30]:
get_relevance_list(most_popular[:25],test_seq[1])

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In [31]:
get_relevance_list(get_similar_ids(w2v,last_consumed_item[0],25),test_seq[0])

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

## Ok, now, let's write prediction funtions:

- `predict_pop` will recommend the k's most popular items
- `predict_w2v` will recommend the k's most similar items to the last one consumed

#### (TODO) : complete those functions

In [34]:
def predict_pop(last_seen,k):
    # Here the last_seen argument is useless as we only return the most popular movies
    return most_popular[:k]

def predict_w2v(last_seen,k):
    return get_similar_ids(w2v, last_seen, k)

#data is list of last_consumed:
def get_predictions(predict_func,data,truth,k=5):
    if k == -1 or k == 0:
        k = num_items
    return [get_relevance_list(predict_func(last_seen,k),will_see) for last_seen,will_see in zip(data,truth)]

**Note**: The `get_predictions(...)` function returns the relevant list associated to predictions

### The following cells should return list of lists

In [35]:
print(get_predictions(predict_pop,last_consumed_item[:5],test_seq[:5],3))
print(get_predictions(predict_w2v,last_consumed_item[:5],test_seq[:5],3))

[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
[[0, 0, 0], [0, 0, 1], [0, 0, 0], [0, 0, 0], [0, 0, 0]]


expected output: 
```
[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]

```

## The return of the MRR and nDCG functions

In [40]:
test_list = [[0,0,1],[0,1,0],[1,0,0],[0,0,0]]

def rr(list_items):
    relevant_indexes = np.asarray(list_items).nonzero()[0]
    
    if len(relevant_indexes) > 0:
        return 1/(relevant_indexes[0]+1) # arrays are indexed from 0
    else:
        return 0

def mrr(list_list_items):
    return np.mean([rr(list_item) for list_item in list_list_items])

mrr(test_list) #0.4583333333333333

# The dcg@k is the sum of the relevance, penalized gradually
def dcg_at_k(r, k):
    """Score is discounted cumulative gain (dcg)
        r: Relevance scores (list or numpy) in rank order
            (first element is the first item)
        k: Number of results to consider
        
    """
    r = np.asfarray(r)[:k]
    if r.size:
        return np.sum(r / np.log2(np.arange(2, r.size + 2)))
        
    return 0.

# test values
# r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0]
# dcg_at_k(r, 1) => 3.0
# dcg_at_k(r, 2) => 4.2618595071429155
r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0]
print(dcg_at_k(r, 1))
print(dcg_at_k(r, 2))

def mean_dcg(rel_lists,k):
    return np.mean([dcg_at_k(rel_list,k) for rel_list in rel_lists])

# And it's normalized version
def ndcg_at_k(r, k):
    """
        r: Relevance scores (list or numpy) in rank order
            (first element is the first item)
        k: Number of results to consider
    """
    dcg_max =  dcg_at_k(sorted(r)[::-1],k) 
    if not dcg_max:
        return 0.
    return dcg_at_k(r, k) / dcg_max

# test values
# r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0]
# ndcg_at_k(r, 1) => 1.0
# ndcg_at_k(r, 4) => 0.794285
    
r = [3, 2, 3, 0, 0, 1, 2, 2, 3, 0]    
ndcg_at_k(r, 4)

def mean_ndcg(rel_lists,k):
    return np.mean([ndcg_at_k(rel_list,k) for rel_list in rel_lists])

3.0
4.2618595071429155


## Let's see how this naïve way of predicting items to show works

In [41]:
pop_preds = get_predictions(predict_pop,last_consumed_item,test_seq,-1)
w2v_preds = get_predictions(predict_w2v,last_consumed_item,test_seq,-1)

print("1/MRR")
print(1/mrr(pop_preds))
print(1/mrr(w2v_preds))
print("")
print("DCG")
print(mean_dcg(pop_preds,5))
print(mean_dcg(w2v_preds,5))
print("")
print("nDCG")
print(mean_ndcg(pop_preds,5))
print(mean_ndcg(w2v_preds,5))

1/MRR
17.87234931650308
11.694294147313169

DCG
0.051054704276776476
0.08992996018517257

nDCG
0.017315723982695277
0.030900502616531756


### (TODO) Can we do better ?

Now, try a different strategy: 


- History should be discarded from prediction
- Instead of basing the prediction on the last seen item, we'll take all the `seen[-n:]` ones (horizon) into account
- To aggregate all items, we'll simply take the min rank to take into account the history offset.
- Equal scores can be handled using the history offset.
Example: 

> Let's say you chose to use the two last seen items `[item 44, Item 398]` to predict the following items

Therefore, using `get_similar_ids` method on both items will yield two lists of **similar** ranked item id's:
 - Similar to item 44: `[item 1, item 33, item 5]`
 - Similar to item 398: `[item 25, item 1, item 5]`
 scores (rank,offset):
 ```
 scores := {item 1: (0,0), item 33: (1,0), item 5: (2,0) , item 25: (0,1)}```
 
 
Then, aggregation by best rank should yield: `[1,25,33,5]`


In [46]:
def predict_max_w2v(seen,k,horizon=2):
    # We first have to get the last seen movies depending on the horizon and a score dictionnary
    last_seen = seen[-horizon:]
    scores = dict()
    
    # We now iterate on the last seen movies
    for rank in range(len(last_seen)) :
        # We predict the movies based on the last seen one
        pred_sim = get_similar_ids(w2v, last_seen[rank], k)

        # We now iterate on the predicted ids
        for offset in range(len(pred_sim)) :
            item = pred_sim[offset]

            # We now save the scores in the dictionnary depending on the rank and the offset
            # We only save the scores one time for each time (in case they appear multiple times)
            if item not in scores.keys() :
                scores[item] = (rank, offset)
    
    # We now return a list of the items ordered by the rank
    return list(dict(sorted(scores.items(), key=lambda item: item[1])).keys())


w2v_best_preds = get_predictions(predict_max_w2v,train_seq,test_seq,-1)

print(1/mrr(w2v_best_preds))
print(mean_dcg(w2v_best_preds,5))
print(mean_ndcg(w2v_best_preds,5))

14.833507857440182
0.06765394054546285
0.023082468573992283


#### => Not really better

## Let's visualize learned embeddings

Just like in the 1st practical, we propose to visualize learnt items embeddings with the [Tensorflow projector](https://projector.tensorflow.org/).

In [60]:
# This function saves embeddings (a numpy array) and associated labels into tsv files.

def save_embeddings(embs,dict_label,path="saved_word_vectors"):
    """
    embs is Numpy.array(N,size)
    dict_label is {str(word)->int(idx)} or {int(idx)->str(word)}
    """
    def int_first(k,v):
        if type(k) == int:
            return (k,v)
        else:
            return (v,k)

    np.savetxt(f"{path}_vectors.tsv", embs, delimiter="\t")

    #labels 
    if dict_label:
        sorted_labs = np.array([lab for idx,lab in sorted([int_first(k,v) for k,v in dict_label.items()])])
        print(sorted_labs)
        with open(f"{path}_metadata.tsv","w") as metadata_file:
            for x in sorted_labs: #hack for space
                if len(x.strip()) == 0:
                    x = f"space-{len(x)}"
                    
                metadata_file.write(f"{x}\n")

In [61]:
vec2title = {i:id2title[int(mid)] for i,mid in enumerate(w2v.wv.index_to_key)}

In [62]:
save_embeddings(w2v.wv.vectors,vec2title)

['Forrest Gump (1994)' 'Shawshank Redemption, The (1994)'
 'Pulp Fiction (1994)' ... 'Little Miss Marker (1980)'
 'Late Marriage (Hatuna Meuheret) (2001)'
 'Andrew Dice Clay: Dice Rules (1991)']


We now have two new files, "saved_word_vectors_metadata.tsv" and "saved_word_vectors_metadata.tsv". Let's quickly check what's inside of these files.

Note - For some reason, I had to delete the last empty row from the metadata file in order to make it usable with the pandas library.

In [64]:
# We use the pandas library in order to read the files (tsv files are similar to csv files)
pd.read_csv("saved_word_vectors_metadata.tsv", sep="\t")

Unnamed: 0,Forrest Gump (1994)
0,"Shawshank Redemption, The (1994)"
1,Pulp Fiction (1994)
2,"Matrix, The (1999)"
3,"Silence of the Lambs, The (1991)"
4,Star Wars: Episode IV - A New Hope (1977)
...,...
9610,"Facing Windows (Finestra di fronte, La) (2003)"
9611,"Eighth Day, The (Huiti�me jour, Le) (1996)"
9612,Little Miss Marker (1980)
9613,Late Marriage (Hatuna Meuheret) (2001)


In [65]:
pd.read_csv("saved_word_vectors_vectors.tsv", sep="\t")

Unnamed: 0,-1.308204531669616699e-01,-9.373525977134704590e-01,-1.903456151485443115e-01,8.866167664527893066e-01,-2.364774942398071289e-01,-4.611333608627319336e-01,-3.990370035171508789e-01,4.677708260715007782e-03,-1.037948727607727051e+00,-4.934164285659790039e-01,...,4.002290666103363037e-01,-6.634631752967834473e-01,-4.466506242752075195e-01,4.443196654319763184e-01,2.412061840295791626e-01,-3.203268349170684814e-01,5.735866725444793701e-02,-1.806772649288177490e-01,-1.404088884592056274e-01,2.504607439041137695e-01
0,0.191728,-0.337386,-0.133508,0.583781,0.054748,-0.359095,-0.285114,-0.410546,-1.203608,-0.559385,...,1.016586,-0.961386,0.080649,0.595540,0.835402,-0.125174,0.412031,-0.797428,-0.610982,0.626047
1,-0.666284,0.405146,0.261234,0.966058,-0.186964,-0.124168,0.119240,0.573752,-0.516475,-0.772345,...,1.760169,-0.822677,-1.047733,1.061342,-0.173978,-0.848040,0.719524,-0.480560,0.090974,0.356793
2,0.082193,-0.953947,0.041595,0.446591,0.255977,-0.500937,0.635074,1.076976,-1.133167,-0.281876,...,1.177377,-0.482142,0.625845,0.280632,0.309906,-0.222284,0.277571,-0.852292,-0.221956,0.398970
3,0.151991,0.003838,-0.415382,0.517119,0.114304,0.106529,-0.197031,0.141737,-1.134646,-0.624014,...,0.848399,-0.904153,-0.054174,0.541750,0.535916,-0.321128,0.732958,-0.565319,-0.470649,0.639322
4,-0.412334,-0.637332,-0.116872,0.365549,-0.076938,-0.705615,-0.593116,-0.198439,-0.859540,-0.925937,...,1.011723,0.839060,0.433474,0.125990,0.484771,-0.472632,0.437122,-1.083687,0.757188,0.826228
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9610,-0.014765,-0.089912,-0.059354,0.109058,-0.114996,-0.218520,-0.085448,0.297680,-0.398717,-0.181221,...,0.491519,-0.310350,0.096876,0.256428,0.243989,-0.265557,0.234304,-0.477310,0.192237,0.193429
9611,-0.046386,-0.061463,-0.041949,0.089724,-0.117973,-0.276404,-0.012729,0.356145,-0.470224,-0.170283,...,0.606497,-0.333302,0.091526,0.202758,0.252701,-0.215177,0.185192,-0.528985,0.254846,0.247713
9612,-0.041802,-0.048932,-0.048242,0.124537,-0.147270,-0.242986,0.016474,0.335927,-0.451703,-0.144594,...,0.556747,-0.310317,0.086251,0.209893,0.212469,-0.215106,0.196116,-0.526052,0.217108,0.252636
9613,-0.046434,-0.076680,-0.023759,0.085632,-0.139343,-0.194416,-0.024847,0.274385,-0.364096,-0.104656,...,0.514042,-0.265569,0.092049,0.186155,0.145397,-0.195049,0.184652,-0.446149,0.144857,0.226070


## How to:

- Now, [open this link](https://projector.tensorflow.org/), and select "load".
- look for saved_word_vectors_vectors.tsv and saved_word_vectors_metadata.tsv. 

=> These are respectively, the items latent representations and their labels

## Hyperparameters matter when using Word2Vec for Item recommendation:


> Skip-gram with negative sampling, a popular variant of Word2vec originally designed and tuned to create word embeddings for Natural Language Processing, has been used to create item embeddings with successful applications in recommendation. While these fields do not share the same type of data, neither evaluate on the same tasks, recommendation applications tend to use the same already tuned hyperparameters values, even if optimal hyperparameters values are often known to be data and task dependent. We thus investigate the marginal importance of each hyperparameter in a recommendation setting through large hyperparameter grid searches on various datasets. Results reveal that optimizing neglected hyperparameters, namely negative sampling distribution, number of epochs, subsampling parameter and window-size, significantly improves performance on a recommendation task, and can increase it by an order of magnitude. Importantly, we find that optimal hyperparameters configurations for Natural Language Processing tasks and Recommendation tasks are noticeably different. 

[Hyperparameters matter](https://arxiv.org/abs/1804.04212)

#### It turns out that  hyperparameters are really important for this task: especially the sampling parameter.  Try and learn multiple models to see how the ns_exponent parameter modifies the results:


In [None]:
# the following configuration is the default configuration
# Same modifications as before (size and iter replaced by vector_size and epochs)
w2v = gensim.models.word2vec.Word2Vec(sentences=train_seq_str,
                                vector_size=50, window=3,               ### here we train a cbow model 
                                min_count=0,                      
                                sample=0.001, ns_exponent=-0.4, workers=10,
                                sg=1, hs=0, negative=15,          ### set sg to 1 to train a sg model => Prod2Vec
                                cbow_mean=0,
                                epochs=25)



2022-04-29 17:26:17,522 : INFO : collecting all words and their counts
2022-04-29 17:26:17,523 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2022-04-29 17:26:17,532 : INFO : collected 9616 word types from a corpus of 97786 raw words and 610 sentences
2022-04-29 17:26:17,533 : INFO : Creating a fresh vocabulary
2022-04-29 17:26:17,555 : INFO : Word2Vec lifecycle event {'msg': 'effective_min_count=0 retains 9616 unique words (100.0%% of original 9616, drops 0)', 'datetime': '2022-04-29T17:26:17.555719', 'gensim': '4.1.2', 'python': '3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.19043-SP0', 'event': 'prepare_vocab'}
2022-04-29 17:26:17,556 : INFO : Word2Vec lifecycle event {'msg': 'effective_min_count=0 leaves 97786 word corpus (100.0%% of original 97786, drops 0)', 'datetime': '2022-04-29T17:26:17.556219', 'gensim': '4.1.2', 'python': '3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.192

In [None]:
pop_preds = get_predictions(predict_pop,last_consumed_item,test_seq,-1)
w2v_preds = get_predictions(predict_w2v,last_consumed_item,test_seq,-1)

In [None]:
print(1/mrr(pop_preds))
print(1/mrr(w2v_preds))

print(mean_dcg(pop_preds,5))
print(mean_dcg(w2v_preds,5))

print(mean_ndcg(pop_preds,5))
print(mean_ndcg(w2v_preds,5))

17.87234931650308
12.268373706288253
0.051054704276776476
0.09120217861070297
0.017315723982695277
0.03121773183451276


## Still got time ? Try making a more clever item selection mechanism:

- You could, for example, cluster items in groups (using k-means) and propose the most popular items of the last seen group