# Content-based recommendations
- Daniel-Alexandru Bejan (474404)
- Patrick Schaper (534366)

In [None]:
from IPython.core.display import HTML
from movie_display import movie_display
import pandas as pd

# Example of showing movie information

In [None]:
# Load movies into a dataframe
df = pd.read_json('./dataset/imdbdata.json', orient='columns')

# Display information about some movies
HTML(movie_display.show([df.iloc[0], df.iloc[1], df.iloc[2], df.iloc[3], df.iloc[4]]))

## Analyze the data set

In [None]:
df.shape

The data set has 9125 entries and 18 features. Next, we look at the data type of the features.

In [None]:
df.info()

In [None]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows',None)
df

## Demographic Filtering
*During our research on the Internet we found the following [link](https://www.kaggle.com/code/yadukaggle/demographic-filtering-recommendation). Here, as a first recommendation, a demographic filtering based on the formula of IMDB was shown. Even though this was not part of our task, we found the approach good and applied it to our data.*

For the first step of the recommendation, we use the rating of each film. However, there is a challenge with this. We need a metric for rating movies. We can use the average rating of the movie as a score, but that would not be fair because a movie with an average rating of 9.5 and only 5 votes cannot be considered better than a movie with an average rating of 7.9 and 55 votes. Therefore, we will use the weighted rating (wr) from IMDB, which is given as follows:

$$Weighted Rating(WR) = (\frac{v}{v+m}*R)+(\frac{m}{v+m}*C)$$

- v is the number of votes for the movie (imdbVotes)
- m is the minimum votes required to be listed in the chart;
- R is the average rating of the movie (imdbRating)
- C is the mean vote across the whole report (df['imdbRating'].mean())

Since some movies are not rated and there is a "N/A" in the column or use a comma as separator we still need to convert this to 0 and remove the divider.

In [None]:
df['imdbRating'] = df['imdbRating'].str.replace('N/A','0')
df['imdbVotes'] = df['imdbVotes'].str.replace('N/A','0')
df['imdbVotes'] = df['imdbVotes'].str.replace(',','')

In [None]:
C = pd.to_numeric(df['imdbRating']).mean()
print(C)

The mean value is therefore 6.7 on a scale that goes up to 10. Next, "m" must be defined. m is the minimum number of ratings to be relevant for a recommendation at all. We will use the 80th percentile as the cutoff. This means: for a movie to be included in the hit list, it must have more votes than at least 80% of the movies in the list.

In [None]:
m = pd.to_numeric(df['imdbVotes']).quantile(0.8)
print(m)

In [None]:
qualify_movies = df.copy().loc[pd.to_numeric(df['imdbVotes']) >= m]
qualify_movies.shape

We see that only 1825 films made it to the list of qualifying films. Now we need to calculate our metric for each qualified movie. To do this, we define a function, weighted_rating():

In [None]:
def weighted_rating(x, m=m, C=C):
    v = pd.to_numeric(x['imdbVotes'])
    R = pd.to_numeric(x['imdbRating'])
    return (v/(v+m) * R) + (m/(m+v) * C)

In [None]:
qualify_movies['imdbCalculatedScore'] = qualify_movies.apply(weighted_rating, axis=1)

In [None]:
#Sort movies
qualify_movies = qualify_movies.sort_values('imdbCalculatedScore', ascending=False)

#Print the top 15 movies
qualify_movies[['Title', 'imdbVotes', 'imdbRating', 'imdbCalculatedScore']].head(15)

Now we have a first recommendation on the rating of all users. Next is a rating based on the preference of a single movie.

## Content Based Filtering
Next, we come to the actual task of this assessment. The content based filtering/recommendation. 

The requirement was to include only the "Plot" and "Writer" columns, however we felt the "Title" was just as important. You will see the reason for this later. However, we always focus on "Plot" and "Writer" first. The "Title" column always follows as an addition. 

The first step is to select only the required columns and copy them into a new dataset.

In [None]:
movie_df = df.loc[:, ['imdbId','Title','Plot','Writer']]

In [None]:
movie_df.isnull().sum()

In [None]:
movie_df.head(20)

## Cleaning Data
For clean processing we first prepare the texts by removing unnecessary strings and eliminating upper and lower case.

In [None]:
movie_df['Writer'] = movie_df['Writer'].str.replace(r' \([^)]*\)', '', regex=True)
movie_df['Writer'] = movie_df['Writer'].str.replace(r'\s+', '',regex=True)
movie_df['Writer'] = movie_df['Writer'].str.replace(r',', ', ',regex=True)
movie_df['Writer'] = movie_df['Writer'].str.lower()

movie_df['Plot'] = movie_df['Plot'].str.lower()

movie_df['Title'] = movie_df['Title'].str.lower()

movie_df.head(20)

Now we eliminate morphological and inflexional endings with **Snowball Stemmer**. Snowball Stemmer is also known as the Porter2 stemming algorithm because it is a better version of the Porter Stemmer. It is more aggressive than Porter Stemmer.

In [None]:
import nltk
from nltk.stem.snowball import SnowballStemmer
ss = SnowballStemmer(language='english')

In [None]:
def stem(text):
    y=[]
    for i in text.split():
        y.append(ss.stem(i))
    return " ".join(y)

In [None]:
movie_df['Plot'] = movie_df['Plot'].apply(stem)
movie_df['Writer'] = movie_df['Writer'].apply(stem)
movie_df['Title'] = movie_df['Title'].apply(stem)

movie_df.head(20)

## Words to numbers

We start with the CountVectorizer. This converts a collection of text documententen to a matrix of token counts. Note that we use the "Stopword" function to exclude unnecessary words that occur frequently, such as "a" or "of".

At this point, let's first look at the "Plot" and "Writer" columns. The editing for the "Title" will follow later but according to the same procedure.

Hint:
- Bag of Words (BOW): Converting words to numbers with no semantic information.
- TF-IDF: It is also converting the words to numbers or vectors with some weighted information.

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
cv_plot = CountVectorizer(analyzer='word', stop_words='english')
cv_writer = CountVectorizer(analyzer='word', stop_words='english')

In [None]:
cv_plot.fit_transform(movie_df['Plot']).toarray().shape

In [None]:
cv_writer.fit_transform(movie_df['Writer']).toarray().shape

In [None]:
X_plot = cv_plot.fit_transform(movie_df['Plot']).toarray()
print("BOW Matrix Plot:")
pd.DataFrame(X_plot[0:20,400:500], columns=cv_plot.get_feature_names_out()[400:500])

After we have created our "bag on words" we check the whole thing. To do this, we display the 10 records again and check the words from 400-500. For example, we see that the 7th record contains the word "accu" once.

In [None]:
X_writer = cv_writer.fit_transform(movie_df['Writer']).toarray()
print("BOW Matrix Writer:")
pd.DataFrame(X_writer[0:20,200:300], columns=cv_writer.get_feature_names_out()[200:300])

The second approach to turn words into Numbers is to use TF-IDF. Works similarly, but still brings a weighting of the words. The important thing is that we use the transformer, so we work on the result of BOW.

In [None]:
from sklearn.feature_extraction.text import TfidfTransformer
cv_plot_tfidf = TfidfTransformer()
cv_writer_tfidf = TfidfTransformer()

In [None]:
X_plot_tfidf = cv_plot_tfidf.fit_transform(X_plot).toarray()
print("TFIDF Matrix Plot:")
pd.DataFrame(X_plot_tfidf[0:20,400:500], columns=cv_plot.get_feature_names_out()[400:500])

In [None]:
X_writer_tfidf = cv_writer_tfidf.fit_transform(X_writer).toarray()
print("TFIDF Matrix Writer:")
pd.DataFrame(X_writer_tfidf[0:20,200:300], columns=cv_writer.get_feature_names_out()[200:300])

## Similarities and differences using the Cosine Simultary

Now that we know which words appear where and how often, we can use cosines to calculate which movies have more similar text content and which do not. This happens for Bag of words (BOW) and for the TF-IDF.

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
cosine_plot_bow = cosine_similarity(X_plot)
cosine_plot_bow

In [None]:
cosine_plot_bow.shape

In [None]:
cosine_writer_bow = cosine_similarity(X_writer)
cosine_writer_bow

In [None]:
cosine_writer_bow.shape

In [None]:
cosine_plot_tf = cosine_similarity(X_plot_tfidf)
cosine_plot_tf

In [None]:
cosine_plot_tf.shape

In [None]:
cosine_writer_tf = cosine_similarity(X_writer_tfidf)
cosine_writer_tf

In [None]:
cosine_plot_tf.shape

## What movies are similar...

Now we have everything we need to make a recommendation based on Plot or Writer. For this we have written a function that returns a recommendation based on a movie

Addition: This function has grown during the working process and some functionalities have been added which will be used later.

In [None]:
def recommendation(movieIndex, cosine, output="print", number_of_recommendations=6, movie_count=1):
    distance = cosine[movieIndex]
    movie_list = sorted(list(enumerate(distance)), reverse=True, key=lambda x:x[1])[movie_count:number_of_recommendations+1] #start by 1 because 0 is the element self
    
    if output == "print":
        for i in movie_list:
            print(f'{df.iloc[i[0]].Title:{30}} {i[1]:{20}}')
        print("-"*50)
    if output == "tupel":
        back = []
        for i in movie_list:
            back.append((df.iloc[i[0]].Title,i[1]))
        return back
    if output == "titles":
        back = []
        for i in movie_list:
            back.append(df.iloc[i[0]].Title)
        return back

To better compare our results, we now set a movie for the tests.

In [None]:
testMovieId = 0 # Toy Story
#testMovieId = 6916 # The Dark Knight
HTML(movie_display.show([df.iloc[testMovieId]]))

What are our different cosine values for our test film?

In [None]:
recommendation(testMovieId,cosine_plot_tf)
recommendation(testMovieId,cosine_plot_bow)
recommendation(testMovieId,cosine_writer_tf)
recommendation(testMovieId,cosine_writer_bow)

As a test film, we first had the children's film "Toy Story". As a recommendation, we got a horror movie called "Chuck the Killer Doll" based on the plot. And this although there are "Toy Story 2" and "Toy Story 3". For this reason, we decided to include the title. 

But by the way, this was funny result.

In [None]:
cv_title = CountVectorizer(analyzer='word', stop_words='english')
cv_title.fit_transform(movie_df['Title']).toarray().shape
X_title = cv_title.fit_transform(movie_df['Title']).toarray()

cv_title_tfidf = TfidfTransformer()
X_title_tfidf = cv_title_tfidf.fit_transform(X_title).toarray()
cosine_title_bow = cosine_similarity(X_title)
cosine_title_tf = cosine_similarity(X_title_tfidf)

In [None]:
recommendation(testMovieId,cosine_plot_tf)
recommendation(testMovieId,cosine_plot_bow)
recommendation(testMovieId,cosine_writer_tf)
recommendation(testMovieId,cosine_writer_bow)
recommendation(testMovieId,cosine_title_tf)
recommendation(testMovieId,cosine_title_bow)

For a better result presentation we wrote a small function that shows us the movies in HTML based on the title.

In [None]:
def showMovieInHtml(names=[]):
    movies = []
    for x in range(len(names)):
        movieIndex = df.loc[df['Title'] == names[x]].index[0]
        movies.append(df.iloc[movieIndex])
    return HTML(movie_display.show(movies))

In [None]:
print('Plot tfidf:')
showMovieInHtml(recommendation(testMovieId,cosine_plot_tf,"titles"))

In [None]:
print('Plot bow:')
showMovieInHtml(recommendation(testMovieId,cosine_plot_bow,"titles"))

In [None]:
print('Writer tfidf:')
showMovieInHtml(recommendation(testMovieId,cosine_writer_tf,"titles"))

In [None]:
print('Writer bow:')
showMovieInHtml(recommendation(testMovieId,cosine_writer_bow,"titles"))

In [None]:
print('Title tfidf:')
showMovieInHtml(recommendation(testMovieId,cosine_title_tf,"titles"))

In [None]:
print('Title bow:')
showMovieInHtml(recommendation(testMovieId,cosine_title_bow,"titles"))

## Only the best of the best

In order to create a clear list of the best films, we have decided to take the first approach, which is to simply take the films with the best value. Regardless of whether with BOW or TF-IDF.

In [None]:
def get_top_list(movieId, unique=True, number_of_recommendations=6):
    top_list = recommendation(movieId,cosine_plot_tf,'tupel',number_of_recommendations) + \
        recommendation(movieId,cosine_plot_bow,'tupel',number_of_recommendations) + \
        recommendation(movieId,cosine_writer_tf,'tupel',number_of_recommendations) + \
        recommendation(movieId,cosine_writer_bow,'tupel',number_of_recommendations) + \
        recommendation(movieId,cosine_title_tf,'tupel',number_of_recommendations) + \
        recommendation(movieId,cosine_title_bow,'tupel',number_of_recommendations)
    top_list = sorted(top_list, key=lambda x: x[1], reverse=True)
    if unique==False:
        return top_list
    else:
        titles = []
        back = []
        for movie in top_list:
            if movie[0] not in titles:
                titles.append(movie[0])
                back.append(movie)
        return back[:number_of_recommendations]

In [None]:
test_movie_top_list = get_top_list(testMovieId, False)
for movie in test_movie_top_list:
    print(f'{movie[0]:<{50}} {movie[1]:<{30}}')

In [None]:
test_movie_top_list = get_top_list(testMovieId)
for movie in test_movie_top_list:
    print(f'{movie[0]:<{50}} {movie[1]:<{30}}')

In [None]:
test_movie_top_list = get_top_list(testMovieId,True)
test_movie_titles = []
for movie in test_movie_top_list:
    test_movie_titles.append(movie[0])

showMovieInHtml(test_movie_titles)

## Another approach

Another approach to making recommendations was to add the cosine similarities. This has the advantage that we can add a weighting to the individual columns. For example, we can say that the title is considered twice or something similar. So that this task does not explode however the framework, we set the weight everywhere on 1.

In [None]:
import numpy as np

def calculate_cosine_sum(array_cosines):
    sum = np.array(0)
    for cos in array_cosines:
        sum = np.add(sum, cos * 1)
    return sum

In [None]:
test_movie_consine_sum = calculate_cosine_sum([cosine_plot_tf, cosine_plot_bow, cosine_writer_tf, cosine_writer_bow, cosine_title_tf, cosine_title_bow])

recommendation(testMovieId, test_movie_consine_sum)

In [None]:
def get_top_list_from_sum(movieId, consine_sum, number_of_recommendations=6):
    top_list = recommendation(movieId, consine_sum, 'tupel', number_of_recommendations)
    top_list = sorted(top_list, key=lambda x: x[1], reverse=True)
    titles = []
    back = []
    for movie in top_list:
        if movie[0] not in titles:
            titles.append(movie[0])
            back.append(movie)
    return back[:number_of_recommendations]

# Movie recommendation based on one movie

In [None]:
import IPython
from IPython.display import clear_output
import ipywidgets as widgets

In [None]:
cosine_sum_for_one = np.array(0)

selected_movie_for_one = widgets.Dropdown(
    options=list(zip(df.Title, df.index)),
    description='Select a movie:\n ',
    disabled=False,
    layout={'width': 'max-content'}
)
recommendations_for_one = widgets.IntText(
    min=0,
    value=3,
    description='Number of recommendations:\n ',
    disabled=False
)
merge_strategy_for_one = widgets.RadioButtons(
    options=[('Each matrix by itself',0), ('Sum matrix',1)],
    description='Merge strategy:',
    disabled=False
)
button_for_one = widgets.Button(
    description='Recommendation',
    disabled=False,
)

def execute_function_for_one(_):
    global cosine_sum_for_one
    with out_for_one:
          clear_output()
          recommendation_titles = []
        
          if merge_strategy_for_one.value == 0:
              recommendation_list = get_top_list(selected_movie_for_one.value, True, recommendations_for_one.value)
          if merge_strategy_for_one.value == 1:
              if cosine_sum_for_one.size == 1:
                  cosine_sum_for_one = calculate_cosine_sum([cosine_plot_tf, cosine_plot_bow, cosine_writer_tf, cosine_writer_bow, cosine_title_tf, cosine_title_bow])
              recommendation_list = get_top_list_from_sum(selected_movie_for_one.value, cosine_sum_for_one, recommendations_for_one.value)
          
          for movie in recommendation_list:
            recommendation_titles.append(movie[0])
            
          print("Selected Movie:")
          display(HTML(movie_display.show([df.iloc[selected_movie_for_one.value]])))
            
          print("Recommendation(s):")
          display(showMovieInHtml(recommendation_titles))
            
button_for_one.on_click(execute_function_for_one)
out_for_one = widgets.Output()

box_for_one = widgets.VBox([selected_movie_for_one, recommendations_for_one, merge_strategy_for_one, button_for_one, out_for_one])
box_for_one

# Movie recommendation based on multiple movies - First approach!

Next, we tried to implement the recommendation based on several films. The first idea was to create an "own" film from the selected films. This means that the selected movies are merged together before the cosine calculation takes place.

You can see a copy-and-paste prototype that performs a new cosine calculation each time. Since we have found that this is very computationally expensive and therefore not a practical solution, we left it at the prototype. For this reason there are no different merge strategies. 

Be careful when using it! It works, but may take a little while and may crash the browser.

In [None]:
selected_movies_for_more = []
multi_movie_df = movie_df[['Title','Plot','Writer']]

movies_for_more = widgets.Dropdown(
    options=list(zip(df.Title, df.index)),
    description='Select a movie:\n ',
    disabled=False,
    layout={'width': 'max-content'}
)
button_add_for_more = widgets.Button(
    description='Add movie',
    disabled=False,
)
button_remove_all_for_more = widgets.Button(
    description='Delete all selected movies',
    disabled=False,
)
recommendations_for_more = widgets.IntText(
    min=0,
    value=3,
    description='Number of recommendations:\n ',
    disabled=False
)
button_recommendation_for_more = widgets.Button(
    description='Recommendation',
    disabled=False,
)
def execute_function_add_for_more(_):
    with out_add_for_more:
        clear_output()
        if not movies_for_more.value in selected_movies_for_more:
            selected_movies_for_more.append(movies_for_more.value)
        movies_df_for_more = []
        for movie in selected_movies_for_more:
            movies_df_for_more.append(df.iloc[movie])
        print("Selected Movie(s):")
        display(HTML(movie_display.show(movies_df_for_more)))

def execute_function_remove_all_for_more(_):
    with out_add_for_more:
        clear_output()
        selected_movies_for_more.clear()
        
def merge_movies():
  multi_title = ''
  multi_plot = ''
  multi_writer = ''
  selected_movies_for_more.sort()
  for movieId in selected_movies_for_more:
      multi_title += ' ' + multi_movie_df.iloc[movieId]['Title']
      multi_plot += ' ' + multi_movie_df.iloc[movieId]['Plot']
      multi_writer += ' ' + multi_movie_df.iloc[movieId]['Writer']
  new_multi_movi = [multi_title, multi_plot, multi_writer]
  multi_movie_df.loc[len(multi_movie_df)] = new_multi_movi
  multi_movie_df.drop(selected_movies_for_more, 0, inplace=True)
  #print(len(multi_movie_df))

def create_cosine():
    pass
    
def get_top_list_for_more(movieId, unique=True, number_of_recommendations=6):
    cv_title_multi = CountVectorizer(analyzer='word', stop_words='english')
    cv_plot_multi = CountVectorizer(analyzer='word', stop_words='english')
    cv_writer_multi = CountVectorizer(analyzer='word', stop_words='english')
    
    cv_title_tfidf_multi = TfidfTransformer()
    cv_plot_tfidf_multi = TfidfTransformer()
    cv_writer_tfidf_multi = TfidfTransformer()
    
    X_title_multi = cv_title_multi.fit_transform(multi_movie_df['Title']).toarray()
    X_plot_multi = cv_plot_multi.fit_transform(multi_movie_df['Plot']).toarray()
    X_writer_multi = cv_writer_multi.fit_transform(multi_movie_df['Writer']).toarray()
    
    X_title_tfidf_multi = cv_title_tfidf_multi.fit_transform(X_title_multi).toarray()
    X_plot_tfidf_multi = cv_plot_tfidf_multi.fit_transform(X_plot_multi).toarray()
    X_writer_tfidf_multi = cv_writer_tfidf_multi.fit_transform(X_writer_multi).toarray()
    
    cosine_title_tf_multi = cosine_similarity(X_plot_tfidf_multi)
    cosine_plot_tf_multi = cosine_similarity(X_plot_tfidf_multi)
    cosine_writer_tf_multi = cosine_similarity(X_plot_tfidf_multi)
    
    cosine_title_bow_multi = cosine_similarity(X_title_multi)
    cosine_plot_bow_multi = cosine_similarity(X_plot_multi)
    cosine_writer_bow_multi = cosine_similarity(X_writer_multi)
    
    print("Cosine done")
    
    top_list_for_more = recommendation(movieId,cosine_plot_tf_multi,'tupel',number_of_recommendations, len(selected_movies_for_more)) + \
        recommendation(movieId,cosine_plot_bow_multi,'tupel',number_of_recommendations, len(selected_movies_for_more)) + \
        recommendation(movieId,cosine_writer_tf_multi,'tupel',number_of_recommendations, len(selected_movies_for_more)) + \
        recommendation(movieId,cosine_writer_bow_multi,'tupel',number_of_recommendations, len(selected_movies_for_more)) + \
        recommendation(movieId,cosine_title_tf_multi,'tupel',number_of_recommendations, len(selected_movies_for_more)) + \
        recommendation(movieId,cosine_title_bow_multi,'tupel',number_of_recommendations, len(selected_movies_for_more))
    top_list_for_more = sorted(top_list_for_more, key=lambda x: x[1], reverse=True)
    if unique==False:
        return top_list_for_more
    else:
        titles = []
        back = []
        for movie in top_list_for_more:
            if movie[0] not in titles:
                titles.append(movie[0])
                back.append(movie)
        return back[:number_of_recommendations]
    
def execute_function_recommendation_for_more(_):
    global multi_movie_df
    with out_recommendation_for_more:
          clear_output()
          merge_movies()
          create_cosine()
          recommendation_titles = []
          recommendation_list = get_top_list_for_more( len(multi_movie_df)-1, True, recommendations_for_more.value)
          for movie in recommendation_list:
            recommendation_titles.append(movie[0])
          print("Recommendation(s):")
          display(showMovieInHtml(recommendation_titles))
          multi_movie_df = movie_df[['Title','Plot','Writer']]
            
button_add_for_more.on_click(execute_function_add_for_more)
button_remove_all_for_more.on_click(execute_function_remove_all_for_more)
button_recommendation_for_more.on_click(execute_function_recommendation_for_more)
out_add_for_more = widgets.Output()
out_recommendation_for_more = widgets.Output()

box_for_more = widgets.VBox([movies_for_more, button_add_for_more, button_remove_all_for_more, recommendations_for_more, button_recommendation_for_more, out_add_for_more, out_recommendation_for_more])
box_for_more

# Movie recommendation based on multiple movies

The second and final approach was to make a recommendation for each film and then combine the recommendations at the end. Because each film in our case has the same weighting for the recommendations, the results can simply be added together.

In [None]:
cosine_sum_for_multi = np.array(0)
selected_movies_for_multi = []

movies_for_multi = widgets.Dropdown(
    options=list(zip(df.Title, df.index)),
    description='Select a movie:\n ',
    disabled=False,
    layout={'width': 'max-content'}
)
button_add_for_multi = widgets.Button(
    description='Add movie',
    disabled=False,
)
button_remove_all_for_multi = widgets.Button(
    description='Delete all selected movies',
    disabled=False,
)
recommendations_for_multi = widgets.IntText(
    min=0,
    value=3,
    description='Number of recommendations:\n ',
    disabled=False
)
merge_strategy_for_multi = widgets.RadioButtons(
    options=[('Each matrix by itself',0), ('Sum matrix',1)],
    description='Merge strategy:',
    disabled=False
)
button_recommendation_for_multi = widgets.Button(
    description='Recommendation',
    disabled=False,
)
def get_top_list_for_multi(movieIds, number_of_recommendations=6):
    top_list = []
    for movieId in movieIds:
        top_list = top_list + \
            recommendation(movieId,cosine_plot_tf,'tupel',number_of_recommendations) + \
            recommendation(movieId,cosine_plot_bow,'tupel',number_of_recommendations) + \
            recommendation(movieId,cosine_writer_tf,'tupel',number_of_recommendations) + \
            recommendation(movieId,cosine_writer_bow,'tupel',number_of_recommendations) + \
            recommendation(movieId,cosine_title_tf,'tupel',number_of_recommendations) + \
            recommendation(movieId,cosine_title_bow,'tupel',number_of_recommendations)
    top_list = sorted(top_list, key=lambda x: x[1], reverse=True)
    titles = []
    back = []
    for movie in top_list:
        if movie[0] not in titles:
            titles.append(movie[0])
            back.append(movie)
    return back[:number_of_recommendations]

def get_top_list_from_sum_for_multi(movieIds, consine_sum, number_of_recommendations=6):
    top_list = []
    for movieId in movieIds:
        top_list = top_list + \
                   recommendation(movieId, consine_sum, 'tupel', number_of_recommendations)
    top_list = sorted(top_list, key=lambda x: x[1], reverse=True)
    titles = []
    back = []
    for movie in top_list:
        if movie[0] not in titles:
            titles.append(movie[0])
            back.append(movie)
    return back[:number_of_recommendations]

def execute_function_add_for_multi(_):
    with out_add_for_multi:
        clear_output()
        if not movies_for_multi.value in selected_movies_for_multi:
            selected_movies_for_multi.append(movies_for_multi.value)
        movies_df_for_multi = []
        for movie in selected_movies_for_multi:
            movies_df_for_multi.append(df.iloc[movie])
        print("Selected Movie(s):")
        display(HTML(movie_display.show(movies_df_for_multi)))

def execute_function_remove_all_for_multi(_):
    with out_add_for_multi:
        clear_output()
        selected_movies_for_multi.clear()

def execute_function_recommendation_for_multi(_):
    global cosine_sum_for_multi
    with out_recommendation_for_multi:
          clear_output()
          recommendation_titles = []

          if merge_strategy_for_multi.value == 0:
              recommendation_list = get_top_list_for_multi(selected_movies_for_multi, recommendations_for_multi.value)
          if merge_strategy_for_multi.value == 1:
              if cosine_sum_for_multi.size == 1:
                  cosine_sum_for_multi = calculate_cosine_sum([cosine_plot_tf, cosine_plot_bow, cosine_writer_tf, cosine_writer_bow, cosine_title_tf, cosine_title_bow])
              recommendation_list = get_top_list_from_sum_for_multi(selected_movies_for_multi, cosine_sum_for_multi, recommendations_for_one.value)

          for movie in recommendation_list:
              recommendation_titles.append(movie[0])

          print("Recommendation(s):")
          display(showMovieInHtml(recommendation_titles))

button_add_for_multi.on_click(execute_function_add_for_multi)
button_remove_all_for_multi.on_click(execute_function_remove_all_for_multi)
button_recommendation_for_multi.on_click(execute_function_recommendation_for_multi)
out_add_for_multi = widgets.Output()
out_recommendation_for_multi = widgets.Output()

box_for_multi = widgets.VBox([movies_for_multi, button_add_for_multi, button_remove_all_for_multi, merge_strategy_for_multi, recommendations_for_multi, button_recommendation_for_multi, out_add_for_multi, out_recommendation_for_multi])
box_for_multi