<h2> Content Based Recommender System - Text </h2>

The goal of this notebook is to implement content based recommender system on the Movielens 100k dataset.

The movie profile is based on the movie genres and the titles

<b> Approach: </b>

The user profile is either a weighted average of the movie profile he\she rated, or the average of the movie profile he\she liked (rating >=3) - the average rating he\she didn't like (with a lower weight for the disliked movies)

The recommended movies are the closest ones (e.g. by Cosine similarity) to the user profile vector

The implementation is based on this blog post [website] and this [website2]

[website]: https://towardsdatascience.com/movie-recommendation-system-based-on-movielens-ef0df580cd0e\
[website2]: https://heartbeat.fritz.ai/recommender-systems-with-python-part-i-content-based-filtering-5df4940bd831

In [None]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
BINARY_OPTION = True
META_TFIDF = True

<b> Data loading <b>

In [None]:
column_names = ['user_id', 'item_id', 'rating', 'timestamp']
folder = "C:\\Asi\\BD\\eCommerce\\2020\\ml-100k\\"
ratings = pd.read_csv(folder+'u.data',sep='\t',names=column_names) 
# sep cannot infer '\t' from files so explicitly supply arg
ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 4 columns):
 #   Column     Non-Null Count   Dtype
---  ------     --------------   -----
 0   user_id    100000 non-null  int64
 1   item_id    100000 non-null  int64
 2   rating     100000 non-null  int64
 3   timestamp  100000 non-null  int64
dtypes: int64(4)
memory usage: 3.1 MB


In [None]:
ratings.head()

Unnamed: 0,user_id,item_id,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [None]:
def brating(row):
    if row['rating'] >= 3:
        val = 1
    elif row['rating'] >=0:
        val = -1
    return val


ratings['binary_rating'] = ratings.apply(brating, axis=1)
    

<b> Movie profile <b>

In [None]:
item_col = ['item_id','movie title','release date','video release date','IMDb URL','unknown','Action','Adventure','Animation',
              'Children','Comedy','Crime','Documentary','Drama','Fantasy',
              'Film-Noir','Horror','Musical','Mystery','Romance','Sci-Fi','Thriller','War','Western']
movie_titles = pd.read_csv(folder+"u.item",sep='|',encoding='ISO-8859-1',names=item_col)
movie_titles.head()

Unnamed: 0,item_id,movie title,release date,video release date,IMDb URL,unknown,Action,Adventure,Animation,Children,...,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,...,0,0,0,0,0,0,0,0,0,0
1,2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,...,0,0,0,0,0,0,0,1,0,0
2,3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


<b> Data Pre-Procesing <b> 

In [None]:
movie_titles['movie title']=movie_titles['movie title'].str.lower()

In [None]:
movie_titles.head()

Unnamed: 0,item_id,movie title,release date,video release date,IMDb URL,unknown,Action,Adventure,Animation,Children,...,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,1,toy story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,...,0,0,0,0,0,0,0,0,0,0
1,2,goldeneye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,...,0,0,0,0,0,0,0,1,0,0
2,3,four rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,4,get shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,5,copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


<b> Text feature extraction - TFIDF <b>

In [None]:
tf = TfidfVectorizer(analyzer='word', min_df=5, stop_words='english',max_df=0.85)
tfidf_matrix = tf.fit_transform(movie_titles['movie title'])

In [None]:
print(vectorizer.get_feature_names())

['1939', '1940', '1941', '1944', '1946', '1947', '1950', '1951', '1954', '1955', '1956', '1957', '1958', '1959', '1960', '1962', '1963', '1965', '1967', '1968', '1971', '1974', '1975', '1976', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998', 'adventures', 'america', 'american', 'amityville', 'angels', 'away', 'bad', 'big', 'blue', 'body', 'boy', 'boys', 'bride', 'cat', 'city', 'day', 'days', 'dead', 'death', 'der', 'die', 'dog', 'escape', 'family', 'fear', 'garden', 'girl', 'girls', 'good', 'great', 'hard', 'home', 'hood', 'house', 'ii', 'iii', 'il', 'island', 'kid', 'king', 'kiss', 'la', 'land', 'le', 'les', 'life', 'line', 'little', 'lost', 'love', 'man', 'men', 'money', 'mountain', 'movie', 'mr', 'mrs', 'murder', 'new', 'night', 'old', 'paris', 'princess', 'red', 'robin', 'sea', 'secret', 'star', 'story', 'summer', 'sun', 'time', 'trek', 'wedding', 'white', 'wife', 'wild', 

In [None]:
tfidf_matrix_arr = tfidf_matrix.toarray()
type(tfidf_matrix_arr)

numpy.ndarray

In [None]:
tfidf_matrix_arr.shape

(1682, 123)

In [None]:
tfidf_matrix_arr[:5]

array([[0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.40186591, 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.  

In [None]:
movie_titles.shape

(1682, 24)

In [None]:
movie_profile = movie_titles[['item_id','Action','Adventure','Animation',
              'Children','Comedy','Crime','Documentary','Drama','Fantasy',
              'Film-Noir','Horror','Musical','Mystery','Romance','Sci-Fi','Thriller','War','Western']].set_index('item_id')
movie_profile.sort_index(axis=0, inplace=True)

In [None]:
for i,word in enumerate(tf.get_feature_names()):
    movie_profile[word] = tfidf_matrix_arr[:,i]

In [None]:
movie_profile.head()

Unnamed: 0_level_0,Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,...,summer,sun,time,trek,wedding,white,wife,wild,woman,world
item_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0,0,1,1,1,0,0,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1,1,0,0,0,0,0,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0,0,0,0,0,0,0,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1,0,0,0,1,0,0,1,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0,0,0,0,0,1,0,1,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


<b> Text feature extraction - Bag of words <b>

In [None]:
vectorizer = CountVectorizer(min_df=5, stop_words='english',max_df=0.85)
bow_matrix = vectorizer.fit_transform(movie_titles['movie title'])

In [None]:
print(vectorizer.get_feature_names())

['1939', '1940', '1941', '1944', '1946', '1947', '1950', '1951', '1954', '1955', '1956', '1957', '1958', '1959', '1960', '1962', '1963', '1965', '1967', '1968', '1971', '1974', '1975', '1976', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998', 'adventures', 'america', 'american', 'amityville', 'angels', 'away', 'bad', 'big', 'blue', 'body', 'boy', 'boys', 'bride', 'cat', 'city', 'day', 'days', 'dead', 'death', 'der', 'die', 'dog', 'escape', 'family', 'fear', 'garden', 'girl', 'girls', 'good', 'great', 'hard', 'home', 'hood', 'house', 'ii', 'iii', 'il', 'island', 'kid', 'king', 'kiss', 'la', 'land', 'le', 'les', 'life', 'line', 'little', 'lost', 'love', 'man', 'men', 'money', 'mountain', 'movie', 'mr', 'mrs', 'murder', 'new', 'night', 'old', 'paris', 'princess', 'red', 'robin', 'sea', 'secret', 'star', 'story', 'summer', 'sun', 'time', 'trek', 'wedding', 'white', 'wife', 'wild', 

In [None]:
def get_top_n_words(corpus, n=None):
    """
    List the top n words in a vocabulary according to occurrence in a text corpus.

    """
    #vec = CountVectorizer().fit(corpus)
    vec = CountVectorizer(min_df=5, stop_words='english',max_df=0.85).fit(corpus)
    bag_of_words = vec.transform(corpus)
    sum_words = bag_of_words.sum(axis=0) 
    words_freq = [(word, sum_words[0, idx]) for word, idx in     vec.vocabulary_.items()]
    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
    return words_freq[:n]

In [None]:
get_top_n_words(movie_titles['movie title'],n=30)

[('1996', 298),
 ('1995', 296),
 ('1994', 237),
 ('1997', 235),
 ('1993', 130),
 ('1998', 53),
 ('1992', 41),
 ('man', 33),
 ('love', 29),
 ('1990', 24),
 ('1991', 24),
 ('life', 20),
 ('dead', 15),
 ('1986', 15),
 ('1989', 15),
 ('la', 15),
 ('star', 14),
 ('day', 14),
 ('1987', 14),
 ('time', 14),
 ('1982', 13),
 ('1981', 13),
 ('night', 13),
 ('big', 12),
 ('king', 11),
 ('1988', 11),
 ('ii', 11),
 ('little', 11),
 ('paris', 11),
 ('city', 10)]

In [None]:
bow_matrix_arr = bow_matrix.toarray()
type(bow_matrix_arr)

numpy.ndarray

In [None]:
movie_profile = movie_titles[['item_id','Action','Adventure','Animation',
              'Children','Comedy','Crime','Documentary','Drama','Fantasy',
              'Film-Noir','Horror','Musical','Mystery','Romance','Sci-Fi','Thriller','War','Western']].set_index('item_id')
movie_profile.sort_index(axis=0, inplace=True)

In [None]:
for i,word in enumerate(vectorizer.get_feature_names()):
    movie_profile[word] = bow_matrix_arr[:,i]

In [None]:
movie_profile.head()

Unnamed: 0_level_0,Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,...,summer,sun,time,trek,wedding,white,wife,wild,woman,world
item_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0,0,1,1,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1,0,0,0,1,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,1,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0


<b> User profile <b>

In [None]:
# user profile
if BINARY_OPTION:
    rating_column = 'binary_rating'
else:
    rating_column = 'rating'
user_x_movie = pd.pivot_table(ratings, values=rating_column, index=['item_id'], columns = ['user_id'])
user_x_movie.sort_index(axis=0, inplace=True)
userIDs = user_x_movie.columns
user_profile = pd.DataFrame(columns = movie_profile.columns)

In [None]:
for i in range(len(user_x_movie.columns)):
  working_df = movie_profile.mul(user_x_movie.iloc[:,i], axis=0)
  # working_df.replace(0, np.NaN, inplace=True)    
  #working_df: for each movie the user rated the rating in all positve geners otherwise 0
  #user_profile: average rating for all rated movies
  user_profile.loc[userIDs[i]] = working_df.mean(axis=0)

In [None]:
if META_TFIDF:
    df = movie_profile.sum()
    idf = (len(movie_titles)/df).apply(np.log) #log inverse of DF
    TFIDF = movie_profile.mul(idf.values)
else:
    TFIDF = movie_profile.copy()

<b> Predict <b>

In [None]:
# recommendation prediction
cosine_similarity_user_item =cosine_similarity(user_profile,TFIDF)

In [None]:
cosine_similarity_user_item.shape

(943, 1682)

In [None]:
def predict_most_similar_items_per_user(user_id,num_items=10):
    result = np.argsort(cosine_similarity_user_item[user_profile.index.get_loc(user_id),:])[::-1][:num_items]
    ret_result = [movie_profile.index[i] for i in result]
    return ret_result

In [None]:
user_profile.head()

Unnamed: 0,Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,...,summer,sun,time,trek,wedding,white,wife,wild,woman,world
1,0.136029,0.036765,0.022059,-0.025735,0.1875,0.055147,0.018382,0.290441,0.007353,0.003676,...,0.0,0.0,0.0,0.018382,-0.003676,0.011029,0.0,0.0,0.003676,0.0
2,0.129032,0.048387,0.016129,0.032258,0.225806,0.112903,0.0,0.5,0.016129,0.032258,...,0.0,0.0,0.032258,0.0,0.016129,0.0,0.0,0.016129,0.0,0.0
3,0.074074,0.074074,0.0,0.0,-0.037037,0.074074,0.018519,0.074074,0.0,0.0,...,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,-0.018519
4,0.25,0.083333,0.0,0.0,0.166667,0.166667,0.041667,0.25,0.0,0.0,...,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0
5,0.148571,0.097143,0.068571,0.028571,0.114286,0.04,0.0,0.028571,0.0,0.005714,...,0.0,0.0,-0.005714,0.011429,-0.005714,0.005714,0.0,0.0,0.0,0.0


User 2 likes Drama (0.5), Comedy (0.22), Action (0.12)

In [None]:
res = predict_most_similar_items_per_user(2)

In [None]:
res

[1295, 1012, 345, 903, 990, 268, 246, 875, 1300, 1315]

In [None]:
[movie_titles.loc[movie_titles['item_id'] == x, 'movie title'] for x in res]

[1294    kicked in the head (1997)
 Name: movie title, dtype: object,
 1011    private parts (1997)
 Name: movie title, dtype: object,
 344    deconstructing harry (1997)
 Name: movie title, dtype: object,
 902    afterglow (1997)
 Name: movie title, dtype: object,
 989    anna karenina (1997)
 Name: movie title, dtype: object,
 267    chasing amy (1997)
 Name: movie title, dtype: object,
 245    chasing amy (1997)
 Name: movie title, dtype: object,
 874    she's so lovely (1997)
 Name: movie title, dtype: object,
 1299    'til there was you (1997)
 Name: movie title, dtype: object,
 1314    inventing the abbotts (1997)
 Name: movie title, dtype: object]

<b> Similarity <b>

In [None]:
user_profile.head()

Unnamed: 0,Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,...,summer,sun,time,trek,wedding,white,wife,wild,woman,world
1,0.136029,0.036765,0.022059,-0.025735,0.1875,0.055147,0.018382,0.290441,0.007353,0.003676,...,0.0,0.0,0.0,0.018382,-0.003676,0.011029,0.0,0.0,0.003676,0.0
2,0.129032,0.048387,0.016129,0.032258,0.225806,0.112903,0.0,0.5,0.016129,0.032258,...,0.0,0.0,0.032258,0.0,0.016129,0.0,0.0,0.016129,0.0,0.0
3,0.074074,0.074074,0.0,0.0,-0.037037,0.074074,0.018519,0.074074,0.0,0.0,...,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,-0.018519
4,0.25,0.083333,0.0,0.0,0.166667,0.166667,0.041667,0.25,0.0,0.0,...,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0
5,0.148571,0.097143,0.068571,0.028571,0.114286,0.04,0.0,0.028571,0.0,0.005714,...,0.0,0.0,-0.005714,0.011429,-0.005714,0.005714,0.0,0.0,0.0,0.0


In [None]:
movie_profile.head()

Unnamed: 0_level_0,Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,...,summer,sun,time,trek,wedding,white,wife,wild,woman,world
item_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0,0,1,1,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1,0,0,0,1,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,1,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
user_similarity = cosine_similarity(user_profile)
movie_similarity = cosine_similarity(movie_profile)

In [None]:
user_similarity.shape

(943, 943)

In [None]:
movie_similarity.shape

(1682, 1682)

In [None]:
def get_similar_users(user_id,num_users=10):
    result = np.argsort(user_similarity[:,user_profile.index.get_loc(user_id)])[::-1][:num_users]
    ret_result = [user_profile.index[i] for i in result]
    return ret_result
    

In [None]:
def get_similar_movies(movie_id=3,num_movies=10):
    result = np.argsort(movie_similarity[:,movie_profile.index.get_loc(movie_id)])[::-1][:num_movies]
    ret_result = [movie_profile.index[i] for i in result[:num_movies]]
    return ret_result

In [None]:
res = get_similar_users(1)

In [None]:
res

[1, 916, 92, 682, 339, 59, 645, 457, 514, 429]

In [None]:
movie_titles[movie_titles['movie title'].str.contains("star wars")]

Unnamed: 0,item_id,movie title,release date,video release date,IMDb URL,unknown,Action,Adventure,Animation,Children,...,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
49,50,star wars (1977),01-Jan-1977,,http://us.imdb.com/M/title-exact?Star%20Wars%2...,0,1,1,0,0,...,0,0,0,0,0,1,1,0,1,0


In [None]:
res_movie = get_similar_movies(50)

In [None]:
res_movie

[50, 181, 172, 271, 498, 380, 222, 227, 228, 450]

In [None]:
similar_titles = [movie_titles[movie_titles['item_id'] == x]['movie title'] for x in res_movie]

In [None]:
similar_titles

[49    star wars (1977)
 Name: movie title, dtype: object,
 180    return of the jedi (1983)
 Name: movie title, dtype: object,
 171    empire strikes back, the (1980)
 Name: movie title, dtype: object,
 270    starship troopers (1997)
 Name: movie title, dtype: object,
 497    african queen, the (1951)
 Name: movie title, dtype: object,
 379    star trek: generations (1994)
 Name: movie title, dtype: object,
 221    star trek: first contact (1996)
 Name: movie title, dtype: object,
 226    star trek vi: the undiscovered country (1991)
 Name: movie title, dtype: object,
 227    star trek: the wrath of khan (1982)
 Name: movie title, dtype: object,
 449    star trek v: the final frontier (1989)
 Name: movie title, dtype: object]

In [None]:
movie_titles[movie_titles['movie title'].str.contains("star trek")]

Unnamed: 0,item_id,movie title,release date,video release date,IMDb URL,unknown,Action,Adventure,Animation,Children,...,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
221,222,star trek: first contact (1996),22-Nov-1996,,http://us.imdb.com/M/title-exact?Star%20Trek:%...,0,1,1,0,0,...,0,0,0,0,0,0,1,0,0,0
226,227,star trek vi: the undiscovered country (1991),01-Jan-1991,,http://us.imdb.com/M/title-exact?Star%20Trek%2...,0,1,1,0,0,...,0,0,0,0,0,0,1,0,0,0
227,228,star trek: the wrath of khan (1982),01-Jan-1982,,http://us.imdb.com/M/title-exact?Star%20Trek:%...,0,1,1,0,0,...,0,0,0,0,0,0,1,0,0,0
228,229,star trek iii: the search for spock (1984),01-Jan-1984,,http://us.imdb.com/M/title-exact?Star%20Trek%2...,0,1,1,0,0,...,0,0,0,0,0,0,1,0,0,0
229,230,star trek iv: the voyage home (1986),01-Jan-1986,,http://us.imdb.com/M/title-exact?Star%20Trek%2...,0,1,1,0,0,...,0,0,0,0,0,0,1,0,0,0
379,380,star trek: generations (1994),01-Jan-1994,,http://us.imdb.com/M/title-exact?Star%20Trek:%...,0,1,1,0,0,...,0,0,0,0,0,0,1,0,0,0
448,449,star trek: the motion picture (1979),01-Jan-1979,,http://us.imdb.com/M/title-exact?Star%20Trek:%...,0,1,1,0,0,...,0,0,0,0,0,0,1,0,0,0
449,450,star trek v: the final frontier (1989),01-Jan-1989,,http://us.imdb.com/M/title-exact?Star%20Trek%2...,0,1,1,0,0,...,0,0,0,0,0,0,1,0,0,0
