# Exercise 4: Baselines II

### Recommendation Pipeline Steps:


|#|Step|Check|Description|
|---|---|---|---|
|<font color='grey'>0</font>|<font color='grey'>Customer requirements</font>|<font color='grey'>-</font>|<font color='grey'>Get an idea of what an expected solution should do</font>|
|1|Prepare your data|✔️|<font color='grey'>Get acquainted with the data</font>|
|2|Come up with a baseline solution|✔️|<font color='grey'>Trivial, stable, explainable solution</font>|
|<font color='red'>**3**</font>|<font color='red'>**Evaluate your solution**</font>|❌|<font color='red'>**Train/Test/(Val) split, calculate metrics**</font>|
|4|Come up with improvements|✔️|<font color='grey'>Re-design/improve your recommender</font>|
|<font color='grey'>5</font>|<font color='grey'>Deploy and support</font>|<font color='grey'>-</font>|<font color='grey'>Make sure your solution works under real-world conditions</font>|

We have already done quite a lot of work. We know our data (LFM 2B) and what is available to us; in Exercise 3 we came up with a simple POP baseline recommender; we know a lot of different recommendation techniques to compare against the baseline (and going to learn even more).

In order to complete the picture we need to split our data and learn how to evaluate our candidate solutions. After that we'll be able to alternate between development and evaluation to produce the final solution for our recommendation problem.


In [1]:
import pandas as pd
import numpy as np
import random as rnd

You can rely on the three files below being placed next to your jupyter notebook:

* 'sampled_1000_items_inter.txt' - data about user-item interactions;
* 'sampled_1000_items_tracks.txt' - track-related information;
* 'sampled_1000_items_demo.txt' - user-related information;

## <font color='red'>TASK 1/3</font>: Data Split (4 points)
Write a function that randomly samples a given proportion of interactions for a test set (e.g. 0.2 == 20% of ALL interactions).

### (2 points):
* It should receive a name of a file, containing interaction data in LFM2B format (as in the previous exercise) as input.
* The function is expected to **randomly** split the records from the file, approximately in given proportion into Train and Test sets (proportion of 0.2 means that the number of Test records to the number of Train records should be 20:80).
* The function needs to save the result into two separate files in LFM2B format, one for Test interactions, one for Train interactions.

### (2 more points):
* Make sure that every **item from the Test** set is also **present in the Train** set.

Please follow the signature below:

In [2]:
def split_interactions(inter_file = 'sampled_1000_items_inter.txt',
                       proportion = 0.2,
                       res_test_file = 'sampled_1000_items_inter_TEST.txt',
                       res_train_file = 'sampled_1000_items_inter_TRAIN.txt'):
    
    '''
    inter_file - string - path to the file with interaction data in LFM2B format;
    proportion - float - proportion of records from inter_file to become the Test Set;
    res_test_file - string - Test records will be saved here;
    res_train_file - string - Train records will be saved here;
    
    returns - nothing, but saves the two files in LF2B format;
    '''
    rnd.seed() # new seed to get a new split on every call
    interactions = pd.read_csv(inter_file, sep='\t', header=None, names=['user','item','num_inters'])

    #---------------------#
    # code something here #
    test = pd.DataFrame()
    train = interactions
    number_needed_samples = len(interactions) * proportion
    collected_samples_so_far = len(test)
    interactions_it = set(list(interactions['item']))
    
    while  collected_samples_so_far < number_needed_samples:
        
        if len(interactions_it) == 0:
            interactions_it = set(list(interactions['item']))
            random_item = rnd.sample(interactions_it, 1) # choose random itemID
            interactions_it.remove(random_item[0])
            inter2 = interactions.drop(test.index, axis=0, inplace=False)
            max_number_random_item = len(inter2[inter2['item'] == random_item[0]])-1
        else:
            
            random_item = rnd.sample(interactions_it, 1) # choose random itemID

            max_number_random_item = len(interactions[interactions['item'] == random_item[0]])-1
            
            interactions_it.remove(random_item[0]) # potential items shrinked 
        
        
        remaining_free_space = number_needed_samples - collected_samples_so_far

        number_samples_for_testset = rnd.randint(0, min(max_number_random_item, int(remaining_free_space)+1) )
        
        collected_samples_so_far += number_samples_for_testset
        
        df_a = interactions[interactions['item']==random_item[0]]
        random_df = df_a.sample(number_samples_for_testset,random_state=rnd.seed())
        
        test = pd.concat([test, random_df])
        
    train = train.drop(test.index, axis=0, inplace=False) # remove the testset samples from training set
    
 
    #---------------------#
    
    # saving the res files
    # train and test - pd.DataFrames
    train.to_csv(res_train_file, index=False, header=False, sep='\t') 
    test.to_csv(res_test_file, index=False, header=False, sep='\t')
    return

## <font color='red'>TASK 2/3</font>: Evaluation function (4 points)
### (2 points): Method
Write an evaluation function which receives the recomender to be evaluated, "train" and "test" matrices and calculates MRR (Mean Reciprocal Rank) over the users from the test set:

$MRR = \frac{1}{|T_{users}|}\sum \limits_{i=1}^{|T_{users}|}\frac{1}{rank_i}$

$T_{users}$ - set of users ended up in the Test Set

$rank_i$ - rank of the **first relevant** track shown to the user (position of our best guess) for $i$-th user

###### Example:
Imagine that the table below is a recommendation list for user $i$, sorted in the order of descending recommendation score (the system assumes that the tracks on the top are more likely to interest the user). In the last column we have ground truth information that allows us to evaluate the system's recommendation.

In this case $rank_i = 2$, as the highest on the list item, user actually interacted with, is on the second position.<br>
This way $RR$ for user $i$ is equal to $\frac{1}{2}$. <br>
$MRR$ would be mean of $RR$ over all users we know ground truth about.


| Rank | Track Name | Track ID | Interacted with in<br>the test set? |
| ---    |   ---  |   ---  |   ---  |
| 1 | Dido - Thank You | 432  | False |
| 2 | U2 - Vertigo | 12  | **True** |
| 3 | Botch - Lobster Song | 57 | False |
| 4 | U2 - Walk On | 311 | **True** |

* What should $RR$ be equal to if the system didn't guess any of the tracks the user interacted with? Why?
* What other metric could we use to take into account hits beyond the first one?

* What should $RR$ be equal to if the system didn't guess any of the tracks the user interacted with? Why?<br>
RR should be 0 because the lower the rank the closer $RR=\frac{1}{rank_i}$ comes to 0. <br><br>
* What other metric could we use to take into account hits beyond the first one?<br>
Rank Correlation:
Kendall's Tau $\tau$ is not only considering the correctly ranked item pairs but also the incorrectly ones. Compare the interacted items of a user marked as high rank with the recommended items.

In [3]:
def eval_MRR(rec_func, train, test, topK = 10):
    '''
    rec_func - recommendation function, allowing for call: rec_func(train, user_id, topK)
    train - 2D np.array - Train interaction matrix, as produced by inter_matr_binary from Ex3
    test - 2D np.array - Test interaction matrix, as produced by inter_matr_binary from Ex3
    topK - int - length of the recommended list rec_func should provide
    
    returns - float - MRR score
    '''
    MRR = 0

    #---------------------#
    # code something here # 
   
    
    number_users_testset = test.shape[0]
    for user_id in range(number_users_testset):
        
        list_rec = list(rec_func(train,user_id,topK))

        user_row = test[user_id]
        interaction_items = np.where(user_row==1) # get indices of items interested in. These indices are
        # also the item IDs!

        interacted = list(interaction_items[0])
        long_list = list_rec
        long_list.extend(interacted)
      

        if len(long_list) != len(set(long_list)) and len(interacted)>0: # there are duplicates
    
            for inter_item in interacted:
                if inter_item in list_rec:
                    
                    
                    rank_index = list_rec.index(inter_item)
                    break
            MRR += float(1/(int(rank_index)+1))  

    MRR = float(MRR/number_users_testset)
    #---------------------#
    
    return MRR

### (1 point): Evaluate your TopKPOP
Use the evaluation function you have written to evaluate **recTopKPop** - your own baseline recommender from Exercise 3 (Task 4).
Use your function **inter_matr_binary** from Exercise 3 (Task 1) to convert the Train and Test files into numpy arrays.

Copy needed code from Exercise 3 into function-dummies below. Run the cell and allow the result to remain in the **\_eval_res_0** variable.

In [4]:
def inter_matr_binary(usr_path = 'sampled_1000_items_demo.txt',
                      itm_path = 'sampled_1000_items_tracks.txt',
                      inter_path = 'sampled_1000_items_inter.txt'):
    '''
    usr_path - string path to the file with users data;
    itm_path - string path to the file with item data;
    inter_path - string path to the file with interaction data;
    
    returns - 2D np.array, rows - users, columns - items;
    '''
    
    res = None
    
    # ---------------------------------#
    # your old converter function here #
    tracks = pd.read_csv(itm_path, delimiter='\t')
    numb_col_tracks = tracks.shape[1]
    tracks = pd.read_csv(itm_path, delimiter='\t',names=[i for i in range(numb_col_tracks)]) 
    numb_tracks = tracks.shape[0]
    
    users = pd.read_csv(usr_path, delimiter='\t')
    numb_col_usr = users.shape[1]
    users = pd.read_csv(usr_path, delimiter='\t', names=[e for e in range(numb_col_usr)])
    numb_usr = users.shape[0] # alter sol: len(users.index)
    
    res = np.zeros(shape=(numb_usr, numb_tracks)) # we need to fill this array with interactions
    
    inter = pd.read_csv(inter_path, delimiter='\t')
    numb_col_inter = inter.shape[1]
    inter = pd.read_csv(inter_path, delimiter='\t', names=[str(l) for l in range(numb_col_inter)])

    
    for usr_id in range(numb_usr):
        small_inter = inter[inter['0']== usr_id]
        list_track_id = small_inter['1'].tolist()
        res[usr_id][list_track_id] = 1
    # ---------------------------------#
    
    return res

def recTopKPop(prepaired_data: np.array,
               user: int,
               top_k: int) -> np.array:
    '''
    prepaired_data - np.array from the Ex3 Task 1;
    user - user_id, integer;
    top_k - expected length of the resulting list;
    
    returns - list/array of top K popular items that the user has never seen
              (sorted in the order of descending popularity);
    '''
    pop_res = None
    
    # --------------------------#
    # your old recommender here #
    active_usr_row = prepaired_data[user]
    active_usr_find_0 = np.where(active_usr_row==0.0)[0].tolist() # these indices of features can be kept because
    # user hasn't seen them so far
    
    filteredUsr_array = prepaired_data[:,active_usr_find_0] #get all rows of similar users
    sum_col = np.sum(filteredUsr_array, axis=0).tolist()
    ind_li = active_usr_find_0

    rec1 = [(l,i) for l,i in zip(sum_col,ind_li)] # create list of tuples (numer_interact, ItemID)
    tup_sort = sorted(rec1, key = lambda x: x[0], reverse=True) # sort by interaction amount
    pop_res=[]

    for i in tup_sort[:top_k]:
        pop_res.append(i[1])

 
    # --------------------------#
    
    return np.array(pop_res)

# now running it all: #

split_interactions()

# after this the Test and Train files should be saved nearby:
# 'sampled_1000_items_inter_TEST.txt' and 'sampled_1000_items_inter_TRAIN.txt' are ...
# ... exppected to exist

# creating the train matrix from the new saved file
_train_matr_0 = inter_matr_binary(usr_path = 'sampled_1000_items_demo.txt',
                      itm_path = 'sampled_1000_items_tracks.txt',
                      inter_path = 'sampled_1000_items_inter_TRAIN.txt')

# creating the test matrix from the new saved file
_test_matr_0 = inter_matr_binary(usr_path = 'sampled_1000_items_demo.txt',
                      itm_path = 'sampled_1000_items_tracks.txt',
                      inter_path = 'sampled_1000_items_inter_TEST.txt')

_eval_res_0 = eval_MRR(recTopKPop, _train_matr_0, _test_matr_0)
print('\nTask 2 evaluation result: ', _eval_res_0)



Task 2 evaluation result:  0.010046444342926755


### (1 point): Try different splits
Compute MRR of recTopKPop on 10 different splits. Investigate the mean and standard deviation of the resulting distribution of scores (use numpy methods).

put a 1D array of your results into the variable **_eval_10_res**, put the mean into **_eval_10_mean**, put the standard deviation into **_eval_10_std**;

In [5]:
_eval_10_res_0 = None
_eval_10_mean_0 = None
_eval_10_std_0 = None

# ----------------------------------------#
# compute MRR on 10 different data splits #
MRRs = []
for splitter in [0.1,0.2,0.3,0.4,0.45,0.5,0.6,0.7,0.8,0.9]:
    split_interactions(proportion=splitter)
    
    train = inter_matr_binary(usr_path = 'sampled_1000_items_demo.txt',
                      itm_path = 'sampled_1000_items_tracks.txt',
                      inter_path = 'sampled_1000_items_inter_TRAIN.txt')
    test = inter_matr_binary(usr_path = 'sampled_1000_items_demo.txt',
                      itm_path = 'sampled_1000_items_tracks.txt',
                      inter_path = 'sampled_1000_items_inter_TEST.txt')
    
    MRR_value = eval_MRR(recTopKPop,train,test)
    MRRs.append(MRR_value)
    
_eval_10_res_0 = np.asarray(MRRs)
_eval_10_mean_0 = _eval_10_res_0.mean()
_eval_10_std_0 = _eval_10_res_0.std()
   
# ----------------------------------------#

print('MRR scores of recTopKPOP baseline recommender on 10 different splits:\n',
      _eval_10_res_0,
      '\n\nMean MRR on 10 runs: ', _eval_10_mean_0,
      '\n\nWith standard deviation: ', _eval_10_std_0)

MRR scores of recTopKPOP baseline recommender on 10 different splits:
 [0.00105325 0.01720396 0.01019978 0.01328993 0.01057186 0.04622792
 0.03623293 0.03203125 0.03864011 0.03999709] 

Mean MRR on 10 runs:  0.02454480797445626 

With standard deviation:  0.014945581767028522


**Do the results differ from split to split? Why is it so?**

The more items we have in the testset the more interactions are in this set. So we get a higher chance that one of these interactions also appear in the recommender list. That often leads to a higher MRR score.

## <font color='red'>TASK 3/3</font>: <font color='darkblue'>(BONUS) Improve your Baseline (4 points or ~30%)</font>
<font color='darkblue'>This is a bonus task, it can grant you up to 0.3 of a full Exercise in case you didn't have max score on all of the previous exercises/test.

Take your implementation of recTopKPop() as a base and write a new function recTopKPop_improved() (use the template below).
Try to make it perform better in terms of MRR than the old baseline through pre-filtering the train matrix before calculating top K most popular items for every user.

**Assumption: users with similar demographic characteristics (location, age, ...) share their interests.**

The only difference between **recTopKPop_improved()** and its older version is that for a given user **U** the former calculates track popularity basing only on users who are similar to the user **U** according to some demographic parameters (e.g. user from UK will get recommended what is popular among UK users).

**Come up with your own definition of demograpich similarity that gives improvement in MRR!**

**Test your solution against the baseline on at least 10 different data splits! Try to achieve consistently superiour performance for recTopKPop_improved() through selecting filtering parameters.**

Evaluate your new recommender and allow the result to be stored in the **\_eval_res_1** variable (Your function needs to allow evaluation via **eval_MRR()**, **make sure that it is callable with three parameters, same as old topKPOP**). 'Default-value' or hardcode additional parameters if you need them. Define helper functions if you want, just make sure to preserve the interface of the improved function.</font>

In [6]:
def recTopKPop_improved(prepaired_data: np.array,
               user: int,
               top_k: int,
               usr_path = 'sampled_1000_items_demo.txt',
               itm_path = 'sampled_1000_items_tracks.txt') -> np.array:
    '''
    prepaired_data - np.array from the Ex3 Task 1;
    user - user_id, integer;
    top_k - expected length of the resulting list;
    
    usr_path, itm_path - files containing user and item related,
                         use (some of) them for pre-filtering;
    
    returns - list/array of top K popular (demograpy aware) items that the user
              has never seen (sorted in the order of descending popularity);
    '''
    pop_res = None
    
    #---------------------#
    # code something here #
   
    usr_info = pd.read_csv(usr_path, delimiter='\t')
    numb_col_usr = usr_info.shape[1]
    usr_info = pd.read_csv(usr_path, delimiter='\t',names=[i for i in range(numb_col_usr)]) 

    col_names = usr_info.columns
    
    #we delete all columns where it has value 1:
    active_usr_row = prepaired_data[user]

    active_usr_find_0 = np.where(active_usr_row==0.0)[0].tolist() # these indices of features can be kept because
    # werent seen by active user
            

    check_type_list = list(usr_info.loc[user])

    if not pd.isna(check_type_list[0]): # then we can improve recommender
        similar_usr_df = usr_info[usr_info[col_names[0]]==check_type_list[0]] # df with users from same country
        
        if len(similar_usr_df) <= 1: # there are no similar users to active user: recommend most popular items

            # regular recommender:
            
            filteredUsr_array = prepaired_data[:,active_usr_find_0] #get all rows of similar users
            
            sum_col = np.sum(filteredUsr_array, axis=0).tolist()
            ind_li = active_usr_find_0
            rec1 = [(l,i) for l,i in zip(sum_col,ind_li)]
            tup_sort = sorted(rec1, key = lambda x: x[0], reverse=True) # sort by interaction amount
            pop_res=[]
            for i in tup_sort[:top_k]:
                pop_res.append(i[1])
            
        
        else: # there exist similar users to active user
            
            similarUsrID = similar_usr_df.index.tolist() # which users are similar
            
            similarUsr_array = prepaired_data[similarUsrID][:,active_usr_find_0] #get all rows of similar users
            
            sum_col = np.sum(similarUsr_array, axis=0).tolist()
            
           
            number_interact = len(np.where(np.array(sum_col) > 0)[0])
            ind_li = active_usr_find_0 # get itemIDs which are relevant for top_k

            rec1 = [(l,i) for l,i in zip(sum_col,ind_li)] # create list of tuples (numer_interact, ItemID)
            tup_sort = sorted(rec1, key = lambda x: x[0], reverse=True) # sort by interaction amount
            pop_res=[]

            if number_interact < top_k: # PROBLEM
                missing_k = top_k - number_interact

                # top itemIDs of similar users
                for i in tup_sort[:number_interact]:
                       pop_res.append(i[1])

                # top itemIDs in general (most popular):  
                # regular recommender:
                active_usr_row = prepaired_data[user]
                active_usr_find_0 = np.where(active_usr_row==0.0)[0].tolist() # these indices of features can be kept because

                filteredUsr_array = prepaired_data[:,active_usr_find_0] #get all rows of similar users

                sum_col = np.sum(filteredUsr_array, axis=0).tolist()
                ind_li = active_usr_find_0

                rec1 = [(l,i) for l,i in zip(sum_col,ind_li)] # create list of tuples (numer_interact, ItemID)
                tup_sort = sorted(rec1, key = lambda x: x[0], reverse=True) # sort by interaction amount
                pop_res2=[]

                for i in tup_sort[:missing_k]:
                    pop_res2.append(i[1])

              
                # Get a final list with at first items of similar users and then items which are more general:
                pop_res.extend(pop_res2)


            else: # we have enough interactions to create top_k recommender list
                rec1 = [(l,i) for l,i in zip(sum_col,ind_li)]
                tup_sort = sorted(rec1, key = lambda x: x[0], reverse=True) # sort by interaction amount
                pop_res=[]
                for i in tup_sort[:top_k]:
                    pop_res.append(i[1])

            
       
    else: # of current user no country is given, therefore use top popular items in general
        # regular recommender:
        filteredUsr_array = prepaired_data[:,active_usr_find_0] #get all rows of similar users

        sum_col = np.sum(filteredUsr_array, axis=0).tolist()
        ind_li = active_usr_find_0
        
        rec1 = [(l,i) for l,i in zip(sum_col,ind_li)] # create list of tuples (numer_interact, ItemID)
        tup_sort = sorted(rec1, key = lambda x: x[0], reverse=True) # sort by interaction amount
        pop_res=[]

        for i in tup_sort[:top_k]:
            pop_res.append(i[1])

    
    #---------------------#    
    
    return np.array(pop_res)

_eval_res_1 = eval_MRR(recTopKPop_improved, _train_matr_0, _test_matr_0)
print('\nTask 3 evaluation result: ', _eval_res_1)


Task 3 evaluation result:  0.011659367061377119


### <font color='darkblue'> Demonstrate your solution in action</font>
<font color='darkblue'>Similar to Task 2, test **both** recTopKPOP_improved and recTopKPOP on 10 different splits.

Put the results as 1D arrays into **_eval_10_res_1** for the improved version and **_eval_10_res_1_old** for the old version.
    <br>
    
**In each of the arrays $i$-th position should correspond to the same $i$-th split (in order to make the results comparable).**
    <br>
    
Calculate and assign appropriate values for mean- and std- related variables (on these new 10 splits, for both old and new versions).
</font>

In [7]:
# results of the recTopKPOP_improved
_eval_10_res_1 = None
_eval_10_mean_1 = None
_eval_10_std_1 = None

# results of the old recTopKPOP
_eval_10_res_1_old = None
_eval_10_mean_1_old = None
_eval_10_std_1_old = None

# ----------------------------------------#
# compute MRR on 10 different data splits #
MRRs = []
MRR2s = []
for splitter in [0.1,0.2,0.3,0.4,0.45,0.5,0.6,0.7,0.8,0.9]:
    split_interactions(proportion=splitter)
    
    train = inter_matr_binary(usr_path = 'sampled_1000_items_demo.txt',
                      itm_path = 'sampled_1000_items_tracks.txt',
                      inter_path = 'sampled_1000_items_inter_TRAIN.txt')
    test = inter_matr_binary(usr_path = 'sampled_1000_items_demo.txt',
                      itm_path = 'sampled_1000_items_tracks.txt',
                      inter_path = 'sampled_1000_items_inter_TEST.txt')
    
    MRR_value = eval_MRR(recTopKPop,train,test)
    MRRs.append(MRR_value)
    
    
    MRR_value2 = eval_MRR(recTopKPop_improved,train,test)
    MRR2s.append(MRR_value2)
    
    
    
_eval_10_res_1_old = np.asarray(MRRs)
_eval_10_mean_1_old = _eval_10_res_1_old.mean()
_eval_10_std_1_old = _eval_10_res_1_old.std()

_eval_10_res_1 = np.asarray(MRR2s)
_eval_10_mean_1 = _eval_10_res_1.mean()
_eval_10_std_1 = _eval_10_res_1.std()
# ----------------------------------------#

print('MRR scores of recTopKPOP_IMPROVED recommender on 10 different splits:\n',
      _eval_10_res_1,
      '\nMRR of the old recTopKPOP recommender on the same 10 splits in the same order:\n',
      _eval_10_res_1_old,
      '\n\nMean MRR new vs old: ', _eval_10_mean_1,' vs ',_eval_10_mean_1_old,
      '\n\nStd new vs old: ', _eval_10_std_1,' vs ',_eval_10_std_1_old)

MRR scores of recTopKPOP_IMPROVED recommender on 10 different splits:
 [0.0036414  0.00490482 0.01527593 0.01460891 0.04036244 0.04023394
 0.05132701 0.05066192 0.05555072 0.04393184] 
MRR of the old recTopKPOP recommender on the same 10 splits in the same order:
 [0.00167504 0.00080081 0.00707841 0.00416911 0.03056362 0.02995594
 0.04305338 0.02837048 0.04857156 0.03274659] 

Mean MRR new vs old:  0.03204989292300862  vs  0.022698494275127486 

Std new vs old:  0.019175537909843558  vs  0.016852958834391484


## Reminder
* Test before you submit.
* Cleanup the notebook after your tests, don't leave unnecessary code;
* Restart kernel and run all cells to make sure the notebook is runnable from the beginning to the end;

In [8]:
# Leave this cell the way it is, please.