<a href="https://colab.research.google.com/github/Sulbae/Study-Material/blob/main/Studi_Kasus_Keempat_Sistem_Rekomendasi.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Singular Value Decomposition

**Definisi.** Bilangan real $\sigma_i=\sqrt{\lambda_i}$ untuk $i = 1,2,\ldots,n$ disebut **nilai singular** dari matriks $A$.

**Teorema.** Jika $A$ matriks real berukuran $m \times n$ dan $\sigma_1 \geqslant \sigma_2 \geqslant \cdots \geqslant \sigma_r > 0$ nilai-nilai singular positif dari $A$, maka $r$ adalah rank dari $A$ dan $A=P\Sigma_A 𝑄^T$ dengan $P$ dan $Q$ adalah matriks-matriks ortogonal.

**ALGORITMA**

**Input**: Matriks $A$ berukuran $m \times n$.

**Output**: Matriks-matriks $P$ dan $Q$ yang ortogonal dan $\Sigma_A$ matriks  singular dari A

**Langkah-langkah**: \
1. Tentukan nilai-nilai eigen real dan nonnegatif $\lambda_1, \lambda_2, \ldots, \lambda_n$ dari $A^T A$ sedemikian hingga $\lambda_1 \geqslant \lambda_2 \geqslant \cdots \geqslant \lambda_r>0$ dan $\lambda_i=0$ jika $i>r$ (bilangan $r$ adalah rank dari matriks $A$) dan vektor-vektor eigen (ortonormal) yang berkorespondensi $q_1, q_2, \ldots, q_n$.
2. Tentukan matriks $Q=\begin{bmatrix}
q_1 & q_2 & \cdots & q_n \\
\end{bmatrix}$.
3. Tentukan $p_i=\frac{A q_i}{\left\|A q_i\right\|} \in R^m$ untuk setiap $i=1,2, \ldots, r$ dan perluas menjadi basis ortonormal $\left\{p_1, \ldots, p_r, \ldots, p_m\right\}$ untuk $R^m$.
4. Tentukan matriks $P=\begin{bmatrix}
p_1 & \cdots & p_r & \cdots & p_m \\
\end{bmatrix}$.
5. Tentukan nilai-nilai singular $\sigma_1, \ldots, \sigma_n$ dengan $\sigma_i=\sqrt{\lambda_i}$ untuk setiap $i$ dari matriks $A$.
6. Tentukan matriks $\Sigma_A=\left[\begin{array}{cc}\operatorname{diag}\left(\sigma_1, \ldots, \sigma_r\right) & 0 \\ 0 & 0\end{array}\right]_{m \times n}$.
7. Dapat dilihat bahwa $A=P \Sigma_A Q^T$.

In [None]:
import numpy as np
from scipy.linalg import null_space
np.set_printoptions(suppress = True)

def gramschmidt(A):
    m = np.size(A, 0)
    n = np.size(A, 1)
    E = np.copy(A[:, 0:1])
    for k in range(1,n):
        S = np.zeros((m, 1))
        for i in range(0,k):
            S = S + ((A[:, k:(k + 1)].T@E[:, i:(i + 1)])/(E[:, i:(i + 1)].T @ E[:, i:(i + 1)]))*E[:, i:(i + 1)]
        E = np.concatenate((E, A[:, k:(k + 1)] - S), axis = 1)
    return E

def normal(N):
    n = np.size(N, 1)
    for i in range(0,n):
        N[:, [i]] = N[:, [i]]/((N[:, [i]].T@N[:, [i]])**(1/2))
    return N

def svd(A):
    U = np.copy(A)
    V = U.T@U                                   #matriks A.T@A
    u,v = np.linalg.eig(V)                      #nilai dan vektor eigen A
    m = np.size(U, 0)
    n = np.size(U, 1)
    idx = np.argsort(u)[::-1]                   #sorting nilai eigen
    u = u[idx]
    v = v[:, idx]                               #sorting vektor eigen yang bersesuaian dengan nilai eigennya
    r = np.linalg.matrix_rank(U)                #rank(A)

    #membuat matriks Q
    Q = np.empty((n, 1))
    for i in range(n):
        x = normal(v[:, [i]])
        if i == 0:
            Q = x
        else:
            Q = np.concatenate((Q, x), axis = 1)

    #membuat matriks P
    if m <= n:
        P = np.empty((m, 1))
        for i in range(r):
            q = U@v[:, [i]]
            p = normal(q)
            if i == 0:
                P = p
            else:
                P = np.concatenate((P, p), axis = 1)
    else:
        P = np.empty((m,1))
        ns = null_space(U.T)
        nsg = gramschmidt(ns)
        nns = normal(nsg)
        for i in range(r):
            q = U@v[:, [i]]
            p = normal(q)
            if i == 0:
                P = p
            else:
                P = np.concatenate((P, p), axis = 1)
        P = np.concatenate((P, nns), axis = 1)

    #membuat matriks sigma
    sigma = np.zeros((m, n))
    s = np.sqrt(u)
    for i in range(0, r):
        sigma[i, i] = s[i]

    return V, P, Q, sigma

In [None]:
A = np.array([[1, 1], [0, 1], [-1, 1]])

V, P, Q, sigma = svd(A)
print('matriks A =\n', A)
print('\nmatriks A.T@A =\n', V)
print('\n', null_space(A.T))
print('\nmatriks P =\n', P)
print('\nmatriks sigma =\n', sigma)
print('\nmatriks Q =\n', Q)
print('\ntranspose Q =\n', Q.T)
print('\nCek matriks A = P@sigma@Q.T : \n', np.round(P@sigma@Q.T))

matriks A =
 [[ 1  1]
 [ 0  1]
 [-1  1]]

matriks A.T@A =
 [[2 0]
 [0 3]]

 [[ 0.40824829]
 [-0.81649658]
 [ 0.40824829]]

matriks P =
 [[ 0.57735027  0.70710678  0.40824829]
 [ 0.57735027  0.         -0.81649658]
 [ 0.57735027 -0.70710678  0.40824829]]

matriks sigma =
 [[1.73205081 0.        ]
 [0.         1.41421356]
 [0.         0.        ]]

matriks Q =
 [[0. 1.]
 [1. 0.]]

transpose Q =
 [[0. 1.]
 [1. 0.]]

Cek matriks A = P@sigma@Q.T : 
 [[ 1.  1.]
 [ 0.  1.]
 [-1.  1.]]


In [None]:
B = np.array([[1, 1, 1, 0, 0],
              [3, 3, 3, 0, 0],
              [4, 4, 4, 0, 0],
              [5, 5, 5, 0, 0],
              [0, 2, 0, 4, 4],
              [0, 0, 0, 5, 5],
              [0, 1, 0, 2, 2]])

V, P, Q, sigma = svd(B)
print('matriks B =\n', B)
print('\nmatriks B.T@B =\n', V)
print('\n', null_space(B.T))
print('\nmatriks P =\n', P)
print('\nmatriks sigma =\n', sigma)
print('\nmatriks Q =\n', Q)
print('\ntranspose Q =\n', Q.T)
print('\nCek matriks B = P@sigma@Q.T : \n', np.round(P@sigma@Q.T))

matriks B =
 [[1 1 1 0 0]
 [3 3 3 0 0]
 [4 4 4 0 0]
 [5 5 5 0 0]
 [0 2 0 4 4]
 [0 0 0 5 5]
 [0 1 0 2 2]]

matriks B.T@B =
 [[51 51 51  0  0]
 [51 56 51 10 10]
 [51 51 51  0  0]
 [ 0 10  0 45 45]
 [ 0 10  0 45 45]]

 [[-0.87439372  0.19838647  0.37573457 -0.18786729]
 [-0.32220159  0.07310258 -0.75597437  0.37798719]
 [ 0.00683366 -0.80226531  0.18460376 -0.09230188]
 [ 0.36273277  0.55827341  0.2307547  -0.11537735]
 [ 0.          0.         -0.2        -0.4       ]
 [-0.         -0.         -0.          0.        ]
 [ 0.          0.          0.4         0.8       ]]

matriks P =
 [[ 0.13759913  0.02361145 -0.01080847 -0.87439372  0.19838647  0.37573457
  -0.18786729]
 [ 0.41279738  0.07083435 -0.03242542 -0.32220159  0.07310258 -0.75597437
   0.37798719]
 [ 0.5503965   0.09444581 -0.04323389  0.00683366 -0.80226531  0.18460376
  -0.09230188]
 [ 0.68799563  0.11805726 -0.05404236  0.36273277  0.55827341  0.2307547
  -0.11537735]
 [ 0.15277509 -0.59110096  0.65365084  0.          0.    

  s = np.sqrt(u)


# 2. Similarity Measures

## A. Cosine Similarity

Pada library `sklearn`, kita cukup menggunakan `from sklearn.metrics.pairwise import cosine_similarity`. Pada kali ini, kita akan mencoba membuat cosine similarity dari scratch. Rumus dari cosine similarity adalah

$$cos(\theta) = \frac{A \cdot B}{||A|| \cdot ||B||}$$

In [None]:
import numpy as np

# Mendefinisikan 2 vektor
A = np.array([2, 1, 2, 3, 2, 9])
B = np.array([3, 4, 2, 4, 5, 5])

print("A:", A)
print("B:", B)

# Menghitung Cosine Similarity dari vektor A dan B
cosine = np.dot(A, B)/(np.linalg.norm(A)*np.linalg.norm(B))
print("Cosine Similarity:", cosine)

A: [2 1 2 3 2 9]
B: [3 4 2 4 5 5]
Cosine Similarity: 0.8188504723485274


In [None]:
cosine_similarity = (2*3 + 1*4 + 2*2 + 3*4 + 2*5 + 9*5) / \
                    (np.sqrt(2**2 + 1**2 + 2**2 + 3**2 + 2**2 + 9**2) * \
                    np.sqrt(3**2 + 4**2 + 2**2 + 4**2 + 5**2 + 5**2))
print(cosine_similarity)

0.8188504723485274


In [None]:
# Mendefinisikan 1 matriks dan 1 vektor
A = np.array([[2, 1, 2],
              [3, 2, 9],
              [-1, 2, -3]])
B = np.array([3, 4, 2])
print("A:\n", A)
print("B:\n", B)

# Menghitung cosine similarity per baris matriks
cosine = np.dot(A, B)/(np.linalg.norm(A, axis = 1)*np.linalg.norm(B))
print("Cosine Similarity:\n", cosine)

A:
 [[ 2  1  2]
 [ 3  2  9]
 [-1  2 -3]]
B:
 [3 4 2]
Cosine Similarity:
 [ 0.86657824  0.67035541 -0.04962917]


In [None]:
A1_dot_B = 2*3 + 1*4 + 2*2
A2_dot_B = 3*3 + 2*4 + 9*2
A3_dot_B = -1*3 + 2*4 + -3*2

A1_norm = np.sqrt(2**2 + 1**2 + 2**2)
A2_norm = np.sqrt(3**2 + 2**2 + 9**2)
A3_norm = np.sqrt((-1)**2 + 2**2 + (-3)**2)
B_norm = np.sqrt(3**2 + 4**2 + 2**2)

cosine_similarity_A1 = A1_dot_B / (A1_norm * B_norm)
cosine_similarity_A2 = A2_dot_B / (A2_norm * B_norm)
cosine_similarity_A3 = A3_dot_B / (A3_norm * B_norm)

print("Cosine Similarity Baris Pertama A terhadap B: ", cosine_similarity_A1)
print("Cosine Similarity Baris Kedua A terhadap B: ", cosine_similarity_A2)
print("Cosine Similarity Baris Ketiga A terhadap B: ", cosine_similarity_A3)

Cosine Similarity Baris Pertama A terhadap B:  0.8665782448262421
Cosine Similarity Baris Kedua A terhadap B:  0.6703554099445802
Cosine Similarity Baris Ketiga A terhadap B:  -0.049629166698546515


In [None]:
# Mendefinisikan 2 matriks
A = np.array([[1, 2, 2],
              [3, 2, 2],
              [-2, 1, -3]])
B = np.array([[4, 2, 4],
              [2, -2, 5],
              [3, 4, -4]])

print("A:\n", A)
print("B:\n", B)

# Menghitung cosine similarity per kolom matriks
cosine = np.sum(A*B, axis = 1)/(np.linalg.norm(A, axis = 1)*np.linalg.norm(B, axis = 1))
print("Cosine Similarity:\n", cosine)

A:
 [[ 1  2  2]
 [ 3  2  2]
 [-2  1 -3]]
B:
 [[ 4  2  4]
 [ 2 -2  5]
 [ 3  4 -4]]
Cosine Similarity:
 [0.88888889 0.5066404  0.41739194]


## B. Euclidean Distance

Pada library `sklearn`, kita cukup menggunakan `from sklearn.metrics.pairwise import euclidean_distances`. Pada kali ini, kita akan mencoba membuat cosine similarity dari scratch. Rumus dari euclidean distance adalah

$$d(x, y) = \sqrt{\sum_{i = 1}^n (x_i - y_i)^2}$$

In [None]:
import numpy as np

# Mendefinisikan 2 titik pada 2 dimensi
point1 = [3, 4]
point2 = [7, 1]

# Menghitung Euclidean distance
distance = np.sqrt((point2[0] - point1[0])**2 + (point2[1] - point1[1])**2)
print("Euclidean Distance (2D):", distance)

Euclidean Distance (2D): 5.0


In [None]:
# Mendefinisikan 2 titik pada 3 dimensi
point1 = [1, 2, 3]
point2 = [4, 5, 6]

# Calculate Euclidean distance
distance = ((point2[0] - point1[0])**2 +
            (point2[1] - point1[1])**2 +
            (point2[2] - point1[2])**2)**0.5
print("Euclidean Distance (3D):", distance)

Euclidean Distance (3D): 5.196152422706632


In [None]:
#Mendefinisikan list 2 titik yang berisi list
points1 = [[1, 2], [3, 4], [5, 6]]
points2 = [[7, 8], [9, 10], [11, 12]]

# Menghitung Euclidean distance secara pairwise
distances = []
for p1, p2 in zip(points1, points2):
    distance = sum((p2[i] - p1[i])**2 for i in range(len(p1)))**0.5
    distances.append(distance)

print("Pairwise Euclidean Distances:", distances)

Pairwise Euclidean Distances: [8.48528137423857, 8.48528137423857, 8.48528137423857]


# 3. Content-based Filtering

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Reading movies file
movies = pd.read_csv("movies.csv", sep = ",", encoding = "latin-1", usecols = ["title", "genres"])
movies.head()

Unnamed: 0,title,genres
0,Toy Story,Adventure|Animation|Children|Comedy|Fantasy
1,Jumanji,Adventure|Children|Fantasy
2,Grumpier Old Men,Comedy|Romance
3,Waiting to Exhale,Comedy|Drama|Romance
4,Father of the Bride Part II,Comedy


In [None]:
# Break up the big genre string into a string array
movies["genres"] = movies["genres"].str.split("|")
# Convert genres to string value
movies["genres"] = movies["genres"].fillna("").astype("str")

## A. Rekomendasi Film Berdasarkan Genre

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(analyzer = "word",ngram_range = (1, 2),min_df = 1, stop_words = "english")
tfidf_matrix = tf.fit_transform(movies["genres"])
tfidf_matrix.shape

(9742, 177)

In [None]:
from sklearn.metrics.pairwise import cosine_similarity
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
cosine_sim[:4, :4]

array([[1.        , 0.31379419, 0.0611029 , 0.05271111],
       [0.31379419, 1.        , 0.        , 0.        ],
       [0.0611029 , 0.        , 1.        , 0.35172407],
       [0.05271111, 0.        , 0.35172407, 1.        ]])

In [None]:
# Build a 1-dimensional array with movie titles
titles = movies["title"]
indices = pd.Series(movies.index, index = movies["title"])

# Function that get movie recommendations based on the cosine similarity score of movie genres
def genre_recommendations(title):
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key = lambda x: x[1], reverse = True)
    sim_scores = sim_scores[1:21]
    movie_indices = [i[0] for i in sim_scores]
    return titles.iloc[movie_indices]

In [None]:
genre_recommendations("Dark Knight ").head(20)

Unnamed: 0,title
8387,Need for Speed
8149,"Grandmaster, The (Yi dai zong shi)"
123,Apollo 13
8026,Life of Pi
8396,Noah
38,Dead Presidents
341,Bad Company
347,Faster Pussycat! Kill! Kill!
430,Menace II Society
568,"Substitute, The"


## B. Rekomendasi Film Berdasarkan Judul

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(analyzer = "word", ngram_range = (1, 2), min_df = 1, stop_words = "english")
tfidf_matrix = tf.fit_transform(movies["title"])
tfidf_matrix.shape

(9742, 20558)

In [None]:
from sklearn.metrics.pairwise import cosine_similarity
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
cosine_sim[:4, :4]

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [None]:
# Build a 1-dimensional array with movie titles
titles = movies["title"]
indices = pd.Series(movies.index, index = movies["title"])

# Function that get movie recommendations based on the cosine similarity score of movie genres
def title_recommendations(title):
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key = lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:21]
    movie_indices = [i[0] for i in sim_scores]
    return titles.iloc[movie_indices]

In [None]:
title_recommendations('Dark Knight ').head(20)

Unnamed: 0,title
7768,"Dark Knight Rises, The"
8032,"Batman: The Dark Knight Returns, Part 1"
8080,"Batman: The Dark Knight Returns, Part 2"
140,First Knight
2417,"Cry in the Dark, A"
5778,Alone in the Dark
7375,Knight and Day
3576,Black Knight
3190,"Knight's Tale, A"
6858,Alone in the Dark II


# 4. Collaborative Filtering

In [None]:
import math
import operator

#Building Custom Data for Movie Rating
review = {
    "Marlon Brando": {
    "The Godfather": 5.00,
    "The Godfather Part II": 4.29,
    "Apocalypse Now": 5.00,
    "Jaws": 1.
    },
    "Stephen King": {
    "The Shawshank Redemption": 4.89,
    "The Shining": 4.93 ,
    "The Green Mile": 4.87,
    "The Godfather": 1.33,
    },
    "Steven Spielberg": {
    "Raiders of the Lost Ark": 5.0,
    "Jaws": 4.89,
    "Saving Private Ryan": 4.78,
    "Star Wars Episode IV - A New Hope": 4.33,
    "Close Encounters of the Third Kind": 4.77,
    "The Godfather":  1.25,
    "The Godfather Part II": 1.72
    },
    "George Lucas":{
    "Star Wars Episode IV - A New Hope": 5.00
    },
    "Al Pacino": {
    "The Godfather": 4.02,
    "The Godfather Part II": 5.00,
    },
    "Robert DeNiro": {
    "The Godfather": 3.07,
    "The Godfather Part II": 4.29,
    "Raging Bull": 5.00,
    "Goodfellas":  4.89
    },
    "Robert Duvall": {
    "The Godfather": 3.80,
    "The Godfather Part II": 3.61,
    "Apocalypse Now": 4.26
    },
    "Jack Nicholson": {
    "The Shining": 5.0,
    "One Flew Over The Cuckoos Nest": 5.0,
    "The Godfather": 2.22,
    "The Godfather Part II": 3.34
    },
    "Morgan Freeman": {
    "The Shawshank Redemption": 4.98,
    "The Shining": 4.42,
    "Apocalypse Now": 1.63,
    "The Godfather": 1.12,
    "The Godfather Part II": 2.16
    },
    "Harrison Ford": {
    "Raiders of the Lost Ark": 5.0,
    "Star Wars Episode IV - A New Hope": 4.84,
    },
    "Tom Hanks": {
    "Saving Private Ryan": 3.78,
    "The Green Mile": 4.96,
    "The Godfather": 1.04,
    "The Godfather Part II": 1.03
    },
    "Francis Ford Coppola": {
    "The Godfather": 5.00,
    "The Godfather Part II": 5.0,
    "Jaws": 1.24,
    "One Flew Over The Cuckoos Nest": 2.02
    },
    "Martin Scorsese": {
    "Raging Bull": 5.0,
    "Goodfellas": 4.87,
    "Close Encounters of the Third Kind": 1.14,
    "The Godfather": 4.00
    },
    "Diane Keaton": {
    "The Godfather": 2.98,
    "The Godfather Part II": 3.93,
    "Close Encounters of the Third Kind": 1.37
    },
    "Richard Dreyfuss": {
    "Jaws": 5.0,
    "Close Encounters of the Third Kind": 5.0,
    "The Godfather": 1.07,
    "The Godfather Part II": 0.63
    },
    "Joe Pesci": {
    "Raging Bull": 4.89,
    "Goodfellas": 5.0,
    "The Godfather": 4.87,
    "Star Wars Episode IV - A New Hope": 1.32
    }
}

## A. Judul film yang dibuat oleh sutradara yang sama

In [None]:
# Function to get common movies between Users
def get_common_movies(criticA, criticB):
    return [movie for movie in review[criticA] if movie in review[criticB]]

In [None]:
get_common_movies("Marlon Brando", "Robert DeNiro")

['The Godfather', 'The Godfather Part II']

In [None]:
get_common_movies("Steven Spielberg", "Tom Hanks")

['Saving Private Ryan', 'The Godfather', 'The Godfather Part II']

In [None]:
get_common_movies("Martin Scorsese", "Joe Pesci")

['Raging Bull', 'Goodfellas', 'The Godfather']

## A. Review film yang dibuat oleh sutradara yang sama

In [None]:
# Function to get reviews from the common movies
def get_reviews(criticA,criticB):
    common_movies = get_common_movies(criticA, criticB)
    return [(review[criticA][movie], review[criticB][movie]) for movie in common_movies]

In [None]:
get_reviews("Marlon Brando", "Robert DeNiro")

[(5.0, 3.07), (4.29, 4.29)]

In [None]:
get_reviews("Steven Spielberg", "Tom Hanks")

[(4.78, 3.78), (1.25, 1.04), (1.72, 1.03)]

In [None]:
get_reviews("Martin Scorsese", "Joe Pesci")

[(5.0, 4.89), (4.87, 5.0), (4.0, 4.87)]

## C. Mencari Similarity

In [None]:
# Function to get Euclidean Distance between 2 points
def euclidean_distance(points):
    squared_diffs = [(point[0] - point[1]) ** 2 for point in points]
    summed_squared_diffs = sum(squared_diffs)
    distance = math.sqrt(summed_squared_diffs)
    return distance

In [None]:
# Function to  calculate similarity more similar less the distance and vice versa
# Added 1 for if highly similar can make the distance zero and give NotDefined Error
def similarity(reviews):
    return 1 / (1 + euclidean_distance(reviews))

In [None]:
# Function to get similarity between 2 users
def get_critic_similarity(criticA, criticB):
    reviews = get_reviews(criticA, criticB)
    return similarity(reviews)

In [None]:
get_critic_similarity("Marlon Brando", "Robert DeNiro")

0.341296928327645

In [None]:
get_critic_similarity("Steven Spielberg", "Tom Hanks")

0.4478352722730117

In [None]:
get_critic_similarity("Martin Scorsese", "Joe Pesci")

0.5300793497254199

## D. Membuat sistem rekomendasi

In [None]:
# Function to give recommendation to users based on their reviews.
def recommend_movies(critic, num_suggestions):
    similarity_scores = [(get_critic_similarity(critic, other), other) for other in review if other != critic]
    # Get similarity Scores for all the critics
    similarity_scores.sort()
    similarity_scores.reverse()
    similarity_scores = similarity_scores[0:num_suggestions]

    recommendations = {}
    # Dictionary to store recommendations
    for similarity, other in similarity_scores:
        reviewed = review[other]
        # Storing the review
        for movie in reviewed:
            if movie not in review[critic]:
                weight = similarity * reviewed[movie]
                # Weighing similarity with review
                if movie in recommendations:
                    sim, weights = recommendations[movie]
                    recommendations[movie] = (sim + similarity, weights + [weight])
                    # Similarity of movie along with weight
                else:
                    recommendations[movie] = (similarity, [weight])


    for recommendation in recommendations:
        similarity, movie = recommendations[recommendation]
        recommendations[recommendation] = sum(movie) / similarity
        # Normalizing weights with similarity

    sorted_recommendations = sorted(recommendations.items(), key = operator.itemgetter(1), reverse = True)
    #Sorting recommendations with weight
    return sorted_recommendations

In [None]:
recommend_movies("Marlon Brando", 4)

[('Goodfellas', 5.000000000000001),
 ('Raiders of the Lost Ark', 5.0),
 ('Raging Bull', 4.89),
 ('Star Wars Episode IV - A New Hope', 3.8157055214723923),
 ('One Flew Over The Cuckoos Nest', 2.02)]

In [None]:
recommend_movies("Robert DeNiro", 4)

[('Raiders of the Lost Ark', 5.0),
 ('Star Wars Episode IV - A New Hope', 4.92),
 ('Close Encounters of the Third Kind', 1.2744773851327365)]

In [None]:
recommend_movies("Steven Spielberg", 4)

[('The Shawshank Redemption', 4.928285762244913),
 ('The Green Mile', 4.87),
 ('The Shining', 4.71304734727882),
 ('Apocalypse Now', 1.63)]

In [None]:
recommend_movies("Tom Hanks", 4)

[('Raiders of the Lost Ark', 5.0),
 ('Jaws', 5.0),
 ('Close Encounters of the Third Kind', 5.0),
 ('The Shining', 4.93),
 ('Star Wars Episode IV - A New Hope', 4.92),
 ('The Shawshank Redemption', 4.89)]