### Матричные факторизации

В данной работе вам предстоит познакомиться с практической стороной матричных разложений.
Работа поделена на 4 задания:
1. Вам необходимо реализовать SVD разложения используя SGD на explicit данных
2. Вам необходимо реализовать матричное разложения используя ALS на implicit данных
3. Вам необходимо реализовать матричное разложения используя BPR(pair-wise loss) на implicit данных
4. Вам необходимо реализовать матричное разложения используя WARP(list-wise loss) на implicit данных

Мягкий дедлайн 28 Сентября (пишутся замечания, выставляется оценка, есть возможность исправить до жесткого дедлайна)

Жесткий дедлайн 5 Октября (Итоговая проверка)

In [1]:
import implicit
import pandas as pd
import numpy as np
import scipy.sparse as sp

from lightfm.datasets import fetch_movielens



В данной работе мы будем работать с explicit датасетом movieLens, в котором представленны пары user_id movie_id и rating выставленный пользователем фильму

Скачать датасет можно по ссылке https://grouplens.org/datasets/movielens/1m/

In [2]:
ratings = pd.read_csv('RecSysHSE/ml-1m/ratings.dat', delimiter='::', header=None, 
        names=['user_id', 'movie_id', 'rating', 'timestamp'], 
        usecols=['user_id', 'movie_id', 'rating'], engine='python')

In [3]:
movie_info = pd.read_csv('RecSysHSE/ml-1m/movies.dat', delimiter='::', header=None, 
        names=['movie_id', 'name', 'category'], engine='python')

In [4]:
ratings = ratings.sort_values(by = ['user_id', 'movie_id'])

Explicit данные

In [5]:
ratings.head(10)

Unnamed: 0,user_id,movie_id,rating
40,1,1,5
25,1,48,5
39,1,150,5
44,1,260,4
23,1,527,5
49,1,531,4
33,1,588,4
8,1,594,4
10,1,595,5
51,1,608,4


Для того, чтобы преобразовать текущий датасет в Implicit, давайте считать что позитивная оценка это оценка >=4

In [6]:
implicit_ratings = ratings.loc[(ratings['rating'] >= 4)]
implicit_ratings['rating'] = 1

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [7]:
implicit_ratings.head(10)

Unnamed: 0,user_id,movie_id,rating
40,1,1,1
25,1,48,1
39,1,150,1
44,1,260,1
23,1,527,1
49,1,531,1
33,1,588,1
8,1,594,1
10,1,595,1
51,1,608,1


Удобнее работать с sparse матричками, давайте преобразуем DataFrame в CSR матрицы

In [8]:
users = implicit_ratings["user_id"]
movies = implicit_ratings["movie_id"]
user_item = sp.coo_matrix((np.ones_like(users), (users, movies)))
user_item_t_csr = user_item.T.tocsr()
user_item_csr = user_item.tocsr()

В качестве примера воспользуемся ALS разложением из библиотеки implicit

Зададим размерность латентного пространства равным 64, это же определяет размер user/item эмбедингов

In [9]:
model = implicit.als.AlternatingLeastSquares(factors=64, iterations=100, calculate_training_loss=True)



В качестве loss здесь всеми любимый RMSE

In [10]:
model.fit(user_item_t_csr)

HBox(children=(FloatProgress(value=0.0), HTML(value='')))




Построим похожие фильмы по 1 movie_id = Истории игрушек

In [11]:
movie_info.head(5)

Unnamed: 0,movie_id,name,category
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


In [12]:
get_similars = lambda item_id, model : [movie_info[movie_info["movie_id"] == x[0]]["name"].to_string() 
                                        for x in model.similar_items(item_id)]

Как мы видим, симилары действительно оказались симиларами.

Качество симиларов часто является хорошим способом проверить качество алгоритмов.

P.S. Если хочется поглубже разобраться в том как разные алгоритмы формируют разные латентные пространства, рекомендую загружать полученные вектора в tensorBoard и смотреть на сформированное пространство

In [13]:
get_similars(1, model)

['0    Toy Story (1995)',
 '3045    Toy Story 2 (1999)',
 "2286    Bug's Life, A (1998)",
 '33    Babe (1995)',
 '2315    Babe: Pig in the City (1998)',
 '584    Aladdin (1992)',
 '1526    Hercules (1997)',
 '2252    Pleasantville (1998)',
 '2692    Iron Giant, The (1999)',
 '1838    Mulan (1998)']

Давайте теперь построим рекомендации для юзеров

Как мы видим юзеру нравится фантастика, значит и в рекомендациях ожидаем увидеть фантастику

In [14]:
get_user_history = lambda user_id, implicit_ratings : [movie_info[movie_info["movie_id"] == x]["name"].to_string() 
                                            for x in implicit_ratings[implicit_ratings["user_id"] == user_id]["movie_id"]]

In [15]:
get_user_history(4, implicit_ratings)

['257    Star Wars: Episode IV - A New Hope (1977)',
 '476    Jurassic Park (1993)',
 '1023    Die Hard (1988)',
 '1081    E.T. the Extra-Terrestrial (1982)',
 '1180    Raiders of the Lost Ark (1981)',
 '1183    Good, The Bad and The Ugly, The (1966)',
 '1196    Alien (1979)',
 '1220    Terminator, The (1984)',
 '1366    Jaws (1975)',
 '1885    Rocky (1976)',
 '1959    Saving Private Ryan (1998)',
 '2297    King Kong (1933)',
 '2623    Run Lola Run (Lola rennt) (1998)',
 '2878    Goldfinger (1964)',
 '2882    Fistful of Dollars, A (1964)',
 '3349    Thelma & Louise (1991)',
 '3399    Hustler, The (1961)',
 '3633    Mad Max (1979)']

Получилось! 

Мы действительно порекомендовали пользователю фантастику и боевики, более того встречаются продолжения тех фильмов, которые он высоко оценил

In [16]:
get_recommendations = lambda user_id, model : [movie_info[movie_info["movie_id"] == x[0]]["name"].to_string() 
                                               for x in model.recommend(user_id, user_item_csr)]

In [17]:
get_recommendations(4, model)

['585    Terminator 2: Judgment Day (1991)',
 '1271    Indiana Jones and the Last Crusade (1989)',
 '1182    Aliens (1986)',
 '1284    Butch Cassidy and the Sundance Kid (1969)',
 '1178    Star Wars: Episode V - The Empire Strikes Back...',
 '3402    Close Encounters of the Third Kind (1977)',
 '2502    Matrix, The (1999)',
 '1884    French Connection, The (1971)',
 '847    Godfather, The (1972)',
 '3458    Predator (1987)']

Теперь ваша очередь реализовать самые популярные алгоритмы матричных разложений

Что будет оцениваться:
1. Корректность алгоритма
2. Качество получившихся симиларов
3. Качество итоговых рекомендаций для юзера

### Задание 1. Не использую готовые решения, реализовать SVD разложение используя SGD на explicit данных

In [18]:
from sklearn.neighbors import NearestNeighbors

In [32]:
class SGD:
    def __init__(self, r = 64, reg_par = 0.01, lr = 0.01, eps = 5e-4, iters = int(5e6)):
        self.r = r
        self.reg_par = reg_par
        self.lr = lr
        self.eps = eps
        self.iters = iters
        
    def fit(self, M):   
        row = np.max(M['user_id'])
        col = np.max(M['movie_id'])
        self.W = np.random.uniform(0, 1/np.sqrt(self.r), (row, self.r))
        self.H = np.random.uniform(0, 1/np.sqrt(self.r), (self.r, col))
        #B = np.random.uniform(0, 1/np.sqrt(self.r), (row, col))
        self.B_u = np.zeros((row, 1))
        self.B_i = np.zeros((1, col))
        self.mu = M['rating'].mean()
        total_len = len(M)
        rmse = np.linalg.norm((self.W@self.H + self.B_u + self.B_i + self.mu)[M['user_id']-1, M['movie_id']-1]
                                                   - M['rating'])/total_len
        t = 0
        while t < self.iters:
            k = np.random.randint(total_len)
            i = M.iloc[k]['user_id']-1
            j = M.iloc[k]['movie_id']-1     
            w, h, b_u, b_i = self.W[i,:], self.H[:,j], self.B_u[i], self.B_i[:,j]
            #b = np.array([B[i[t],j[t]] for t in range(len(i))])
            error = w@h + b_u + b_i + self.mu - M.iloc[k]['rating']

            self.W[i,:] -= self.lr*(error*h.T + self.reg_par*w)
            self.H[:, j] -= self.lr*(error*w.T + self.reg_par*h)
            #B[i, j] -= self.lr*error
            self.B_u[i] -= self.lr*(error + self.reg_par*b_u)
            self.B_i[:,j] -= self.lr*(error + self.reg_par*b_i)
            if t%500000 == 0:
                rmse = np.linalg.norm((self.W@self.H + self.B_u + self.B_i + self.mu)[M['user_id']-1, M['movie_id']-1]
                                                   - M['rating'])/total_len
                if rmse < self.eps:
                    break
                print("rmse", rmse)
            t+=1
        return self.W, self.H, self.B_u, self.B_i, self.mu 
    
    def similar_items(self, item_id):
        item_id-=1
        nbrs = NearestNeighbors(n_neighbors=6, algorithm = 'kd_tree' ).fit(self.H.T)
        distances, indices = nbrs.kneighbors((self.H.T)[item_id].reshape(1,-1))
        return list(zip(indices[0] + 1, distances[0]))
    
    def recommend(self, user_id, M, movie_info):
        user_movie_matrix = self.W@self.H + self.B_u + self.B_i + self.mu
        user_row = user_movie_matrix[user_id - 1]
        watched_movies = M[M['user_id'] == user_id]['movie_id']
        unwatched_movies = [i for i in movie_info['movie_id'] if i not in watched_movies]
        unwatched_movies.sort(key = lambda x: user_row[x-1], reverse = True)
        return list(zip(unwatched_movies[:10], user_row[unwatched_movies[:10]]))

In [19]:
get_recommendations = lambda user_id, model : [movie_info[movie_info["movie_id"] == x[0]]["name"].to_string() 
                                               for x in model.recommend(user_id, ratings, movie_info)]

In [34]:
model = SGD()
W, H, B_u, B_i, mu = model.fit(ratings)

rmse 0.001145023688262441
rmse 0.0009305519983153932
rmse 0.0009128954555596631
rmse 0.0009044049328397242
rmse 0.000898116603460106
rmse 0.0008919042464388948
rmse 0.0008844346384120224
rmse 0.0008756991067001193
rmse 0.0008654110640417119
rmse 0.0008539762637158209


In [35]:
get_similars(1, model)

['0    Toy Story (1995)',
 '3045    Toy Story 2 (1999)',
 '1949    Bambi (1942)',
 '2011    Lady and the Tramp (1955)',
 '584    Aladdin (1992)',
 '591    Beauty and the Beast (1991)']

In [36]:
get_recommendations(4, model)

['847    Godfather, The (1972)',
 '2836    Sanjuro (1962)',
 '1189    To Kill a Mockingbird (1962)',
 '910    Sunset Blvd. (a.k.a. Sunset Boulevard) (1950)',
 '892    Rear Window (1954)',
 '1950    Seven Samurai (The Magnificent Seven) (Shichin...',
 '1162    Paths of Glory (1957)',
 '735    Close Shave, A (1995)',
 '315    Shawshank Redemption, The (1994)',
 '901    Maltese Falcon, The (1941)']

### Задание 2. Не использую готовые решения, реализовать матричное разложение используя ALS на implicit данных

In [28]:
class ALS:
    def __init__(self, r = 64, reg_par = 0.001, lr = 1e-3, eps = 1e-3, iters = 100):
        self.r = r
        self.reg_par = reg_par
        self.lr = lr
        self.eps = eps
        self.iters = iters
        self.movie_info = movie_info
        
    def fit(self, M):   
        row = np.max(M['user_id'])
        col = np.max(M['movie_id'])
        self.W = np.random.uniform(0, 1/np.sqrt(self.r), (row, self.r))
        self.H = np.random.uniform(0, 1/np.sqrt(self.r), (self.r, col))
        total_len = len(M)
        rmse = np.linalg.norm((self.W@self.H)[M['user_id']-1, M['movie_id']-1]
                                                   - M['rating'])/total_len
        t = 0
        while t < self.iters:
            WH = self.W@self.H
            diff = WH
            diff[M['user_id']-1, M['movie_id']-1] = WH[M['user_id']-1, M['movie_id']-1]- M['rating']
            
            if t % 2 == 0:
                self.W = self.W - self.lr * (diff @ self.H.T + self.reg_par * self.W)
            else:
                self.H = self.H - self.lr * (self.W.T @diff  + self.reg_par * self.H)
                
            if t%10 == 0:
                rmse = np.linalg.norm((self.W@self.H)[M['user_id']-1, M['movie_id']-1]
                                                   - M['rating'])/total_len
                print("rmse", rmse)
                
            if rmse < self.eps:
                break
            t+=1
        
        return self.W, self.H  
    
    def similar_items(self,item_id):
        item_id-=1
        nbrs = NearestNeighbors(n_neighbors=10).fit(self.H.T)
        distances, indices = nbrs.kneighbors((self.H.T)[item_id].reshape(1,-1))
        return list(zip(indices[0] + 1, distances[0]))
    
    def recommend(self, user_id, M, movie_info):
        user_movie_matrix = self.W@self.H
        user_row = user_movie_matrix[user_id - 1]
        watched_movies = M[M['user_id'] == user_id]['movie_id']
        unwatched_movies = [i for i in movie_info['movie_id'] if i not in watched_movies]
        unwatched_movies.sort(key = lambda x: user_row[x-1], reverse = True)
        return list(zip(unwatched_movies[:10], user_row[unwatched_movies[:10]]))
    

In [29]:
model = ALS()
W, H = model.fit(implicit_ratings)

rmse 0.0012476559992730177
rmse 0.0011873506361842945
rmse 0.001129319010494593
rmse 0.0010986231303928575
rmse 0.0010862891137024039
rmse 0.001076605644001585
rmse 0.0010617294794914153
rmse 0.001042069134595005
rmse 0.0010228374209121562
rmse 0.001005967096743313


In [30]:
get_similars(1, model)

['0    Toy Story (1995)',
 '3045    Toy Story 2 (1999)',
 '1245    Groundhog Day (1993)',
 "2286    Bug's Life, A (1998)",
 '584    Aladdin (1992)',
 '33    Babe (1995)',
 '352    Forrest Gump (1994)',
 '2327    Shakespeare in Love (1998)',
 '360    Lion King, The (1994)',
 "1854    There's Something About Mary (1998)"]

In [31]:
get_recommendations(4, model)

['257    Star Wars: Episode IV - A New Hope (1977)',
 '1178    Star Wars: Episode V - The Empire Strikes Back...',
 '1180    Raiders of the Lost Ark (1981)',
 '1192    Star Wars: Episode VI - Return of the Jedi (1983)',
 '1220    Terminator, The (1984)',
 '1196    Alien (1979)',
 '1182    Aliens (1986)',
 '585    Terminator 2: Judgment Day (1991)',
 '2502    Matrix, The (1999)',
 '1959    Saving Private Ryan (1998)']

### Задание 3. Не использую готовые решения, реализовать матричное разложение BPR на implicit данных

In [105]:
class BPR:
    def __init__(self, r = 64, reg_par = 0.001, lr = 1e-3, eps = 1e-5, iters = int(1e6)):
        self.r = r
        self.reg_par = reg_par
        self.lr = lr
        self.eps = eps
        self.iters = iters
        self.movie_info = movie_info
    

    def fit(self, M, movie_info): 
        
        def partial_BPR(x_uij, partial_x):
            exp_x = np.exp(-x_uij)
            return exp_x / (1 + exp_x) * partial_x
        
        row = np.max(M['user_id'])
        col = np.max(M['movie_id'])
        self.W = np.random.uniform(0, 1/np.sqrt(self.r), (row, self.r))
        self.H = np.random.uniform(0, 1/np.sqrt(self.r), (self.r, col))
        total_len = len(M)
        rmse = np.linalg.norm((self.W@self.H)[M['user_id']-1, M['movie_id']-1]
                                                   - M['rating'])/total_len
        
        watched_movies = {}
        unwatched_movies = {}
        
        for u in range(1, row + 1):
            watched_movies[u] = np.array(M[M['user_id'] == u]['movie_id'])
            unwatched_movies[u] = np.array([i for i in movie_info['movie_id'] if i not in watched_movies[u]])
        print("end of pre-count")
        
        t = 0
        while t < self.iters:
            k = np.random.randint(total_len)
            u = M.iloc[k]['user_id']-1
            #watched_movies = M[M['user_id'] == u + 1]['movie_id']
            i = np.random.choice(watched_movies[u+1])-1
            #unwatched_movies = [i for i in movie_info['movie_id'] if i not in watched_movies]
            j = np.random.choice(unwatched_movies[u+1])-1
            
            WH_i = self.W[u]@self.H[:,i] + self.B_i[:,i]
            WH_j = self.W[u]@self.H[:,j] + self.B_i[:,j]
            x_uij = WH_i - WH_j
            
            self.W[u] += self.lr * (partial_BPR(x_uij, self.H[:,i] - self.H[:,j]) - self.reg_par * self.W[u])
            self.H[:,i] += self.lr * (partial_BPR(x_uij, self.W[u]) - self.reg_par * self.H[:,i])
            self.H[:,j] += self.lr * (partial_BPR(x_uij, -self.W[u]) - self.reg_par * self.H[:,j])
            
   
            if t%100000 == 0:
                rmse = np.linalg.norm((self.W@self.H)[M['user_id']-1, M['movie_id']-1]
                                                   - M['rating'])/total_len
                print("rmse", rmse)
                
            if rmse < self.eps:
                break
            t+=1
        
        return self.W, self.H  
    
    def similar_items(self, item_id):
        item_id-=1
        nbrs = NearestNeighbors(n_neighbors=10, algorithm = 'kd_tree' ).fit(self.H.T)
        distances, indices = nbrs.kneighbors((self.H.T)[item_id].reshape(1,-1))
        return list(zip(indices[0] + 1, distances[0]))
    
    def recommend(self, user_id, M, movie_info):
        user_movie_matrix = self.W@self.H
        user_row = user_movie_matrix[user_id - 1]
        watched_movies = M[M['user_id'] == user_id]['movie_id']
        unwatched_movies = [i for i in movie_info['movie_id'] if i not in watched_movies]
        unwatched_movies.sort(key = lambda x: user_row[x-1], reverse = True)
        return list(zip(unwatched_movies[:10], user_row[unwatched_movies[:10]]))
    

In [106]:
%%time
model = BPR(iters = int(1e6))
W, H = model.fit(implicit_ratings, movie_info)

end of pre-count
rmse 0.0009919170868044694
rmse 0.0009770220793227254
rmse 0.0009631845530290926
rmse 0.0009502903026334152
rmse 0.0009380619840987668
rmse 0.0009265139883945947
rmse 0.0009154732609761482
rmse 0.0009049266659662641
rmse 0.0008948757501341175
rmse 0.0008851596311048403
Wall time: 5min 54s


In [107]:
get_similars(1, model)

['0    Toy Story (1995)',
 '476    Jurassic Park (1993)',
 '373    Speed (1994)',
 '1179    Princess Bride, The (1987)',
 '1180    Raiders of the Lost Ark (1981)',
 '2647    Ghostbusters (1984)',
 '589    Silence of the Lambs, The (1991)',
 '1195    GoodFellas (1990)',
 '315    Shawshank Redemption, The (1994)',
 '2502    Matrix, The (1999)']

In [108]:
get_recommendations(4, model)

['2789    American Beauty (1999)',
 '257    Star Wars: Episode IV - A New Hope (1977)',
 '1178    Star Wars: Episode V - The Empire Strikes Back...',
 '1180    Raiders of the Lost Ark (1981)',
 '1959    Saving Private Ryan (1998)',
 '589    Silence of the Lambs, The (1991)',
 '1192    Star Wars: Episode VI - Return of the Jedi (1983)',
 '2693    Sixth Sense, The (1999)',
 '2502    Matrix, The (1999)',
 "523    Schindler's List (1993)"]

### Задание 4. Не использую готовые решения, реализовать матричное разложение WARP на implicit данных

In [127]:
class WARP:
    def __init__(self, r = 64, reg_par = 0.0001, lr = 1e-4, eps = 1e-5, iters = int(1e6)):
        self.r = r
        self.reg_par = reg_par
        self.lr = lr
        self.eps = eps
        self.iters = iters
        self.movie_info = movie_info
    

    def fit(self, M, movie_info):     
        row = np.max(M['user_id'])
        col = np.max(M['movie_id'])
        self.W = np.random.uniform(0, 1/np.sqrt(self.r), (row, self.r))
        self.H = np.random.uniform(0, 1/np.sqrt(self.r), (self.r, col))
        total_len = len(M)
        rmse = np.linalg.norm((self.W@self.H)[M['user_id']-1, M['movie_id']-1]
                                                   - M['rating'])/total_len
        
        watched_movies = {}
        unwatched_movies = {}
        for u in range(1, row + 1):
            watched_movies[u] = np.array(M[M['user_id'] == u]['movie_id'])
            unwatched_movies[u] = np.array([i for i in movie_info['movie_id'] if i not in watched_movies[u]])
        print("end of pre-count")
        
        t = 0
        while t < self.iters:
            k = np.random.randint(total_len)
            u = M.iloc[k]['user_id']-1
            i = np.random.choice(watched_movies[u+1])-1
            unwatched_movies_for_u = unwatched_movies[u+1] - 1
           
            WH = self.W[u]@self.H[:, i]
            count = 0
            for j in np.random.permutation(unwatched_movies_for_u):
                count  += 1
                if self.W[u]@self.H[:, j] + 1 > WH:
                    self.W[u] -= self.lr * (np.log(len(unwatched_movies_for_u) / count ) * (self.H[:, j] - self.H[:, i]) + \
                                            self.reg_par*self.W[u])
                    self.H[:, i] += self.lr * (np.log(len(unwatched_movies_for_u) / count ) * self.W[u] - \
                                               self.reg_par * self.H[:, i])
                    self.H[:, j] -= self.lr * (np.log(len(unwatched_movies_for_u) / count ) * self.W[u] + \
                                               self.reg_par * self.H[:, j])
                    break            
   
            if t%100000 == 0:
                rmse = np.linalg.norm((self.W@self.H)[M['user_id']-1, M['movie_id']-1]
                                                   - M['rating'])/total_len
                print("rmse", rmse)
                
            if rmse < self.eps:
                break
            t+=1
        
        return self.W, self.H  
    
    def similar_items(self,item_id):
        item_id-=1
        nbrs = NearestNeighbors(n_neighbors=10).fit(self.H.T)
        distances, indices = nbrs.kneighbors((self.H.T)[item_id].reshape(1,-1))
        return list(zip(indices[0] + 1, distances[0]))
    
    def recommend(self, user_id, M, movie_info):
        user_movie_matrix = self.W@self.H
        user_row = user_movie_matrix[user_id - 1]
        watched_movies = M[M['user_id'] == user_id]['movie_id']
        unwatched_movies = [i for i in movie_info['movie_id'] if i not in watched_movies]
        unwatched_movies.sort(key = lambda x: user_row[x-1], reverse = True)
        return list(zip(unwatched_movies[:10], user_row[unwatched_movies[:10]]))
    

In [128]:
%%time
model = WARP(iters = int(1e6))
W, H = model.fit(implicit_ratings, movie_info)

end of pre-count
rmse 0.0009875700471273963
rmse 0.0009620998988249654
rmse 0.0009368593304117218
rmse 0.0009115692308429302
rmse 0.0008863046618560765
rmse 0.0008610523108657894
rmse 0.0008359013135150505
rmse 0.0008109487363241949
rmse 0.0007866829093494168
rmse 0.0007637264332646435
Wall time: 10min 2s


In [129]:
get_similars(1, model)

['0    Toy Story (1995)',
 '1239    Stand by Me (1986)',
 '2647    Ghostbusters (1984)',
 '1220    Terminator, The (1984)',
 '293    Pulp Fiction (1994)',
 '1081    E.T. the Extra-Terrestrial (1982)',
 '3509    Gladiator (2000)',
 '2928    Being John Malkovich (1999)',
 '1245    Groundhog Day (1993)',
 '352    Forrest Gump (1994)']

In [130]:
get_recommendations(4, model)

['2789    American Beauty (1999)',
 '257    Star Wars: Episode IV - A New Hope (1977)',
 '1178    Star Wars: Episode V - The Empire Strikes Back...',
 '1959    Saving Private Ryan (1998)',
 '589    Silence of the Lambs, The (1991)',
 '2502    Matrix, The (1999)',
 '1192    Star Wars: Episode VI - Return of the Jedi (1983)',
 '1180    Raiders of the Lost Ark (1981)',
 '315    Shawshank Redemption, The (1994)',
 '2693    Sixth Sense, The (1999)']