In [1]:
import pandas as pd
import numpy as np

# MovieLens dataset

Files:
```
u.data     -- The full u data set, 100000 ratings by 943 users on 1682 items.
              Each user has rated at least 20 movies.  Users and items are
              numbered consecutively from 1.  The data is randomly
              ordered. This is a tab separated list of 
	         user id | item id | rating | timestamp. 
              The time stamps are unix seconds since 1/1/1970 UTC   

u.info     -- The number of users, items, and ratings in the u data set.

u.item     -- Information about the items (movies); this is a tab separated
              list of
              movie id | movie title | release date | video release date |
              IMDb URL | unknown | Action | Adventure | Animation |
              Children's | Comedy | Crime | Documentary | Drama | Fantasy |
              Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi |
              Thriller | War | Western |
              The last 19 fields are the genres, a 1 indicates the movie
              is of that genre, a 0 indicates it is not; movies can be in
              several genres at once.
              The movie ids are the ones used in the u.data data set.

u.genre    -- A list of the genres.

u.user     -- Demographic information about the users; this is a tab
              separated list of
              user id | age | gender | occupation | zip code
              The user ids are the ones used in the u.data data set.

u.occupation -- A list of the occupations.

u1.base    -- The data sets u1.base and u1.test through u5.base and u5.test
u1.test       are 80%/20% splits of the u data into training and test data.
u2.base       Each of u1, ..., u5 have disjoint test sets; this if for
u2.test       5 fold cross validation (where you repeat your experiment
u3.base       with each training and test set and average the results).
u3.test       These data sets can be generated from u.data by mku.sh.
u4.base
u4.test
u5.base
u5.test

ua.base    -- The data sets ua.base, ua.test, ub.base, and ub.test
ua.test       split the u data into a training set and a test set with
ub.base       exactly 10 ratings per user in the test set.  The sets
ub.test       ua.test and ub.test are disjoint.  These data sets can
              be generated from u.data by mku.sh.
```

[Movielens](https://movielens.org) is a personalized movie recommendation system.
Several datasets have been built using this database, the smallest being Movielens 100k.
It contains 100,000 ratings from 1000 users on 1700 movies.
Various information is available about the users (e.g., age, gender, occupation, zip code) and the movies (e.g., release date, genre).
Additional features can be retrieved as movie titles are available.
Two graphs can be built out of this dataset, and they can be connected using the ratings.

The main purpose of this data is to build a recommender system that can be formulated as a semi-supervised learning problem: given a user, can you predict the ratings that they will give to a new movie?
Graph neural networks can be used for this purpose, but other graph based approaches can be explored as well.

| Users graph | Description                       |                         Amount |
| ----------- | --------------------------------- | -----------------------------: |
| nodes       | users                             |                           1000 |
| edges       | similar features                  | depends how the graph is built |
| features    | age, gender, occupation, zip code |                              4 |
| labels      | ratings of the movies             |                           100k |

| Movies graph | Description               |                         Amount |
| ------------ | ------------------------- | -----------------------------: |
| nodes        | movies                    |                           1700 |
| edges        | similar features          | depends how the graph is built |
| features     | name, release date, genre |                  2 + 19 genres |
| labels       | ratings given by users    |                           100k |

* **Data acquisition**: already collected and packaged
* **Requires down-sampling**: no
* **Network creation**: needs to be built from features

Resources:
* [Data](https://grouplens.org/datasets/movielens/)
* Papers using graph neural networks:
  * [Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks](https://arxiv.org/abs/1704.06803)
  * [Graph Convolutional Matrix Completion](https://arxiv.org/abs/1706.02263)

In [2]:
column_ratings = ["user_id", "movie_id_ml", "rating", "rating_timestamp"]

df_ratings = pd.read_csv('data/u.data', delimiter='\t', names=column_ratings) 
df_ratings["rating_timestamp"] = pd.to_datetime(df_ratings["rating_timestamp"], unit="s")
df_ratings.head()

Unnamed: 0,user_id,movie_id_ml,rating,rating_timestamp
0,196,242,3,1997-12-04 15:55:49
1,186,302,3,1998-04-04 19:22:22
2,22,377,1,1997-11-07 07:18:36
3,244,51,2,1997-11-27 05:02:03
4,166,346,1,1998-02-02 05:33:16


In [3]:
column_item = ["movie_id_ml", "title", "release", "vrelease", "url", "unknown", 
                    "action", "adventure", "animation", "childrens", "comedy",
                   "crime", "documentary", "drama", "fantasy", "noir", "horror",
                   "musical", "mystery", "romance", "scifi", "thriller",
                   "war", "western"]

df_movie = pd.read_csv('data/u.item', delimiter='|', names=column_item, encoding = "ISO-8859-1") 
df_movie = df_movie.drop(columns=["vrelease"])
df_movie["release"] = df_movie["release"].apply(lambda x : str(x).split("-")[-1])

print(df_movie.dtypes)
df_movie.head()

movie_id_ml     int64
title          object
release        object
url            object
unknown         int64
action          int64
adventure       int64
animation       int64
childrens       int64
comedy          int64
crime           int64
documentary     int64
drama           int64
fantasy         int64
noir            int64
horror          int64
musical         int64
mystery         int64
romance         int64
scifi           int64
thriller        int64
war             int64
western         int64
dtype: object


Unnamed: 0,movie_id_ml,title,release,url,unknown,action,adventure,animation,childrens,comedy,...,fantasy,noir,horror,musical,mystery,romance,scifi,thriller,war,western
0,1,Toy Story (1995),1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,...,0,0,0,0,0,0,0,0,0,0
1,2,GoldenEye (1995),1995,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,0,...,0,0,0,0,0,0,0,1,0,0
2,3,Four Rooms (1995),1995,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,4,Get Shorty (1995),1995,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
4,5,Copycat (1995),1995,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


In [4]:
column_user = ["user_id", "user_age", "user_gender", "user_occupation", "user_zipcode"]

df_users = pd.read_csv('data/u.user', delimiter='|', names=column_user, encoding = "ISO-8859-1") 
df_users.head()

Unnamed: 0,user_id,user_age,user_gender,user_occupation,user_zipcode
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213


In [5]:
df_ml1 = pd.merge(df_movie, df_ratings, on="movie_id_ml")
df_ml_full = pd.merge(df_ml1, df_users, on="user_id")

df_ml_full["title"] = df_ml_full["title"].apply(lambda x : (" ".join(x.split(" ")[:-1])).strip() if x.split(" ")[-1].startswith("(") else x)
print(df_ml_full.shape)
df_ml_full.head()

(100000, 30)


Unnamed: 0,movie_id_ml,title,release,url,unknown,action,adventure,animation,childrens,comedy,...,thriller,war,western,user_id,rating,rating_timestamp,user_age,user_gender,user_occupation,user_zipcode
0,1,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,...,0,0,0,308,4,1998-02-17 17:28:52,60,M,retired,95076
1,4,Get Shorty,1995,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,1,...,0,0,0,308,5,1998-02-17 17:51:30,60,M,retired,95076
2,5,Copycat,1995,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,0,...,1,0,0,308,4,1998-02-17 18:20:08,60,M,retired,95076
3,7,Twelve Monkeys,1995,http://us.imdb.com/M/title-exact?Twelve%20Monk...,0,0,0,0,0,0,...,0,0,0,308,4,1998-02-17 18:07:27,60,M,retired,95076
4,8,Babe,1995,http://us.imdb.com/M/title-exact?Babe%20(1995),0,0,0,0,1,1,...,0,0,0,308,5,1998-02-17 17:31:36,60,M,retired,95076


# IMDb dataset

Files: 
```
cast_info.csv
name.csv
title.csv
movie_keywords.csv
movie_companies.csv
company_name.csv
keyword.csv
role_type.csv
```

The IMDb datasets contain information such as crew, rating, and genre for every entertainment product in its database. The Kaggle dataset linked above is a smaller, but similar dataset, and could be used instead of the IMDb one, which is much larger. The goal of this project is to analyze this database in graph form, and attempt to recover missing information from data on cast/crew co-appearance in movies. The graphical analysis requires network creation, for which two possible paths are possible, according to which instances one wishes to consider as the nodes of the network.

The first approach could be to construct a social network of cast/crew members, where the edges are weighted according to co-appearance For example, actor_1 becomes strongly connected to actor_2 if they have appeared in a lot of movies together. The edges of the graph could be weighted according to a count on the number of entertainment products in which the two corresponding people participated together. We can take as a signal on this constructed graph the aggregate ratings of movies each person has participated in.

|          | Description                            |                          Amount |
| -------- | -------------------------------------- | ------------------------------: |
| nodes    | cast/crew                              | millions (IMDb), ~8500 (Kaggle) |
| edges    | co-apearance in movies/TV/etc.         |                  O(10) per node |
| features | ratings of movies taken part in        |                  O(10) per node |
| labels   | movie genre                            |       unknown (IMDb), 3 (Kaggle)|

A second approach could be to create a movie-network, in which movies are strongly connected if they share a lot of crew/cast members (or some other similarity measure combining this and genres, running times, release years, etc.). There are more options for the signal the students could consider on this graph, as they could use either the movie ratings, or the genre labels.

|          | Description                                           |                          Amount |
| -------- | ----------------------------------------------------- | ------------------------------: |
| nodes    | movies                                                | millions (IMDb), ~5000 (Kaggle) |
| edges    | count of common cast/crew + other feature similarity. |                  O(10) per node |
| features | average rating                                        |                               1 |
| labels   | movie genre                                           |      unknown (IMDb), 3 (Kaggle) |

For the extra work, there is plenty of extra information. For instance, the students could try to predict the revenue of movies by potentially including extra metadata. Note however that the number of instances in the original dataset is of the order of **millions**, so a smaller subset of those should be used.

* **Data acquisition**: already collected and packaged
* **Requires down-sampling**: yes if using the original datasets from IMDb, no if using the subsampled dataset from Kaggle
* **Network creation**: needs to be built from features

Resources:
* <https://www.imdb.com/interfaces>
* <https://www.kaggle.com/tmdb/tmdb-movie-metadata/home>

In [6]:
column_cast = ["cast_id", "person_id", "movie_id", "person_role_id", "note", "nr_order", "role_id"]

df_cast = pd.read_csv('data/cast_info.csv', delimiter=',', names=column_cast, encoding = "ISO-8859-1", low_memory=False) 
df_cast['role_id'] = pd.to_numeric(df_cast['role_id'], errors='coerce')
print(df_cast.dtypes)
print(df_cast.shape)
df_cast = df_cast.drop(columns=["note", "nr_order"])
df_cast.head()

cast_id             int64
person_id           int64
movie_id            int64
person_role_id    float64
note               object
nr_order           object
role_id           float64
dtype: object
(36243322, 7)


Unnamed: 0,cast_id,person_id,movie_id,person_role_id,role_id
0,1,1,968504,1.0,1.0
1,2,2,2163857,1.0,1.0
2,3,2,2324528,2.0,1.0
3,4,3,1851347,,1.0
4,5,4,1681365,3.0,1.0


In [7]:
column_people = ["person_id", "name", "imdb_idx", "imdb_id", "gender", "name_cf", "name_nf", "surname", "md5"]

df_people = pd.read_csv('data/name.csv', delimiter=',', names=column_people, encoding = "ISO-8859-1", low_memory=False) 
print(df_people.dtypes)
print(df_people.shape)
df_people = df_people.drop(columns=["imdb_idx", "imdb_id", "md5", "name_cf", "name_nf", "surname"])
df_people.head()

person_id      int64
name          object
imdb_idx      object
imdb_id      float64
gender        object
name_cf       object
name_nf       object
surname       object
md5           object
dtype: object
(4167491, 9)


Unnamed: 0,person_id,name,gender
0,3343,"Abela, Mike",m
1,446,"A., David",m
2,126,"-Alverio, Esteban Rodriguez",m
3,1678,"Abbas, Athar",m
4,3610,"Aberer, Leo",m


In [8]:
columns_movies = ["movie_id", "title", "imdb_idx",
                  "movie_kind", "release", "imdb_id", "phonetic", "episode_id",
                  "season", "episode", "series_years", "md5"]

df_movies = pd.read_csv('data/title.csv', delimiter=',', names=columns_movies, encoding = "ISO-8859-1", low_memory=False) 
df_movies = df_movies.dropna(subset=['release'])
df_movies["release"] = df_movies["release"].apply(lambda x : str(int(x)).split("-")[-1])

print(df_movies.dtypes)
print(df_movies.shape)
df_movies = df_movies.drop(columns=["imdb_idx", "imdb_id", "phonetic", "md5", "episode_id", "episode", "movie_kind", "season", "series_years"])
df_movies.head(5)

movie_id          int64
title            object
imdb_idx         object
movie_kind       object
release          object
imdb_id         float64
phonetic         object
episode_id       object
season           object
episode         float64
series_years     object
md5              object
dtype: object
(2455890, 12)


Unnamed: 0,movie_id,title,release
0,80889,(#1.66),1980
1,5156,Josie Duggar's 1st Shoes,2010
2,197772,(#2.8),1962
3,111913,(2012-09-13),2012
5,40704,Anniversary,1971


In [9]:
column_movie_keyword = ["mkid", "movie_id", "keyword_id"]

df_movie_keyword = pd.read_csv('data/movie_keyword.csv', delimiter=',', names=column_movie_keyword, encoding = "ISO-8859-1") 
print(df_movie_keyword.dtypes)
print(df_movie_keyword.shape)
df_movie_keyword = df_movie_keyword.drop(columns=["mkid"])
df_movie_keyword.head()

mkid          int64
movie_id      int64
keyword_id    int64
dtype: object
(4523930, 3)


Unnamed: 0,movie_id,keyword_id
0,2,1
1,11,2
2,22,2
3,44,3
4,24,2


In [10]:
column_movie_companies = ["mcid", "movie_id", "company_id", "ctype", "note"]

df_movie_companies = pd.read_csv('data/movie_companies.csv', delimiter=',', names=column_movie_companies, encoding = "ISO-8859-1") 
print(df_movie_companies.dtypes)
print(df_movie_companies.shape)
df_movie_companies = df_movie_companies.drop(columns=["mcid", "ctype"])

df_movie_companies.head()

mcid           int64
movie_id       int64
company_id     int64
ctype          int64
note          object
dtype: object
(2609129, 5)


Unnamed: 0,movie_id,company_id,note
0,2,1,(2006) (USA) (TV)
1,2,1,(2006) (worldwide) (TV)
2,11,2,(2012) (worldwide) (all media)
3,44,3,(2013) (USA) (all media)
4,50,4,(2011) (UK) (TV)


In [11]:
column_companies = ["company_id", "name", "country", "imdb_id", "name_nf", "name_sf", "md5"]

df_companies = pd.read_csv('data/company_name.csv', delimiter=',', names=column_companies, encoding = "ISO-8859-1", low_memory=False) 
print(df_companies.dtypes)
print(df_companies.shape)
df_companies = df_companies.drop(columns=["imdb_id", "name_nf", "name_sf", "md5"])
df_companies.head()

company_id     int64
name          object
country       object
imdb_id       object
name_nf       object
name_sf       object
md5           object
dtype: object
(234997, 7)


Unnamed: 0,company_id,name,country
0,34634,Comfilm.de,[de]
1,63635,Dusty Nose Productions,[us]
2,35051,WTTW National Productions,[us]
3,23380,Film House,[us]
4,31373,AVI Group,[us]


In [12]:
column_keyword = ["keyword_id", "keyword", "phonetic"]

df_keyword = pd.read_csv('data/keyword.csv', delimiter=',', names=column_keyword, encoding = "ISO-8859-1") 
print(df_keyword.dtypes)
print(df_keyword.shape)
df_keyword = df_keyword.drop(columns=["phonetic"])
df_keyword.head()

keyword_id     int64
keyword       object
phonetic      object
dtype: object
(134170, 3)


Unnamed: 0,keyword_id,keyword
0,2068,handcuffed-to-a-bed
1,157,jane-austen
2,8309,narcotic
3,1059,woods
4,3991,hanging


In [13]:
columns_roles = ["role_id", "role"]

df_roles = pd.read_csv('data/role_type.csv', delimiter=',', names=columns_roles, encoding = "ISO-8859-1") 
print(df_roles.dtypes)
print(df_roles.shape)
df_roles.head()

role_id     int64
role       object
dtype: object
(12, 2)


Unnamed: 0,role_id,role
0,1,actor
1,2,actress
2,3,producer
3,4,writer
4,5,cinematographer


# Merge IMDb and MovieLens

In [22]:
df = pd.merge(df_ml_full, df_movies, on=["title", "release"])

In [23]:
pd.options.display.max_columns = 100
print(df.shape)
df.head()

(79538, 31)


Unnamed: 0,movie_id_ml,title,release,url,unknown,action,adventure,animation,childrens,comedy,crime,documentary,drama,fantasy,noir,horror,musical,mystery,romance,scifi,thriller,war,western,user_id,rating,rating_timestamp,user_age,user_gender,user_occupation,user_zipcode,movie_id
0,1,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,308,4,1998-02-17 17:28:52,60,M,retired,95076,2445635
1,1,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,287,5,1997-09-27 04:21:28,21,M,salesman,31211,2445635
2,1,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,148,4,1997-10-16 16:30:11,33,M,engineer,97006,2445635
3,1,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,280,4,1998-04-04 14:33:46,30,F,librarian,22903,2445635
4,1,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,66,3,1997-12-31 20:48:44,23,M,student,80521,2445635


In [24]:
movie_ids = list(df.movie_id.unique())

In [25]:
df1 = pd.merge(df, df_movie_keyword[df_movie_keyword.movie_id.isin(movie_ids)], on="movie_id")
print(df1.shape)
df1.head()

(7101818, 32)


Unnamed: 0,movie_id_ml,title,release,url,unknown,action,adventure,animation,childrens,comedy,crime,documentary,drama,fantasy,noir,horror,musical,mystery,romance,scifi,thriller,war,western,user_id,rating,rating_timestamp,user_age,user_gender,user_occupation,user_zipcode,movie_id,keyword_id
0,1,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,308,4,1998-02-17 17:28:52,60,M,retired,95076,2445635,834
1,1,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,308,4,1998-02-17 17:28:52,60,M,retired,95076,2445635,2956
2,1,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,308,4,1998-02-17 17:28:52,60,M,retired,95076,2445635,66752
3,1,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,308,4,1998-02-17 17:28:52,60,M,retired,95076,2445635,93318
4,1,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,308,4,1998-02-17 17:28:52,60,M,retired,95076,2445635,73783


In [27]:
# df2 = pd.merge(df1, df_movie_companies[df_movie_companies.movie_id.isin(movie_ids)], on="movie_id")
# print(df2.shape)
# df2.head()

In [30]:
# df4 = pd.merge(df3, df_companies, on="company_id")  #.drop(columns=["ctype", "note", "company_id", "episode_id", "season", "episode"])

In [33]:
df_all = pd.merge(df1, df_keyword, on="keyword_id")  #.drop(columns=["keyword_id", "mkid", "mcid"])
df_all = df_all.drop(columns=["keyword_id", "movie_id_ml"])
print(df_all.shape)
df_all.head()

(7101818, 31)


Unnamed: 0,title,release,url,unknown,action,adventure,animation,childrens,comedy,crime,documentary,drama,fantasy,noir,horror,musical,mystery,romance,scifi,thriller,war,western,user_id,rating,rating_timestamp,user_age,user_gender,user_occupation,user_zipcode,movie_id,keyword
0,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,308,4,1998-02-17 17:28:52,60,M,retired,95076,2445635,1990s
1,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,287,5,1997-09-27 04:21:28,21,M,salesman,31211,2445635,1990s
2,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,148,4,1997-10-16 16:30:11,33,M,engineer,97006,2445635,1990s
3,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,280,4,1998-04-04 14:33:46,30,F,librarian,22903,2445635,1990s
4,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,66,3,1997-12-31 20:48:44,23,M,student,80521,2445635,1990s


In [37]:
df_all.to_csv("data/movielens_and_imdb.csv", sep=',')

# Read check

In [2]:

df = pd.read_csv('data/movielens_and_imdb.csv', delimiter=',', encoding = "ISO-8859-1") 


In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,title,release,url,unknown,action,adventure,animation,childrens,comedy,...,western,user_id,rating,rating_timestamp,user_age,user_gender,user_occupation,user_zipcode,movie_id,keyword
0,0,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,...,0,308,4,1998-02-17 17:28:52,60,M,retired,95076,2445635,1990s
1,1,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,...,0,287,5,1997-09-27 04:21:28,21,M,salesman,31211,2445635,1990s
2,2,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,...,0,148,4,1997-10-16 16:30:11,33,M,engineer,97006,2445635,1990s
3,3,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,...,0,280,4,1998-04-04 14:33:46,30,F,librarian,22903,2445635,1990s
4,4,Toy Story,1995,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,1,...,0,66,3,1997-12-31 20:48:44,23,M,student,80521,2445635,1990s
