## Movie Recommendation System
In this notebook, we will use an Alternating Least Squares (ALS) algorithm with Spark APIs to predict the ratings for the movies in [MovieLens small dataset](https://grouplens.org/datasets/movielens/latest/)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Spark Setup

In [None]:
# Spark setup
!apt-get update
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q https://apache.osuosl.org/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
!tar xf spark-3.1.2-bin-hadoop3.2.tgz

0% [Working]            Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
0% [Connecting to archive.ubuntu.com (91.189.88.142)] [1 InRelease 14.2 kB/88.7                                                                               Ign:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Get:3 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]
Ign:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release
Hit:6 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease
Hit:7 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Hit:8 http://archive.ubuntu.com/ubuntu bionic InRelease
Get:9 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Hit:10 http://ppa.launchpad.net/cran/libgit2/ubuntu bion

In [None]:
# Set up Spark
!pip install -q findspark
!pip install py4j

!export JAVA_HOME=$(/usr/lib/jvm/java-8-openjdk-amd64 -v 1.8)
! echo $JAVA_HOME
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-3.1.2-bin-hadoop3.2"
os.environ["PYSPARK_PYTHON"] = "python3"
import findspark
findspark.init("spark-3.1.2-bin-hadoop3.2")# SPARK_HOME

from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()

/bin/bash: /usr/lib/jvm/java-8-openjdk-amd64: Is a directory
/usr/lib/jvm/java-8-openjdk-amd64


In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import math

## Part 1: Data ETL and Data Exploration

In [None]:
from pyspark.sql import SparkSession
spark = SparkSession \
  .builder \
  .appName("moive analysis") \
  .config("spark.some.config.option", "some-value") \
  .getOrCreate()

In [None]:
movies_df = spark.read.load("/content/drive/MyDrive/Documents/Projects/Movie_Recommendation_System/ml-latest-small/movies.csv", format='csv', header = True)
ratings_df = spark.read.load("/content/drive/MyDrive/Documents/Projects/Movie_Recommendation_System/ml-latest-small/ratings.csv", format='csv', header = True)
links_df = spark.read.load("/content/drive/MyDrive/Documents/Projects/Movie_Recommendation_System/ml-latest-small/links.csv", format='csv', header = True)
tags_df = spark.read.load("/content/drive/MyDrive/Documents/Projects/Movie_Recommendation_System/ml-latest-small/tags.csv", format='csv', header = True)

In [None]:
movies_df.show(5)

+-------+--------------------+--------------------+
|movieId|               title|              genres|
+-------+--------------------+--------------------+
|      1|    Toy Story (1995)|Adventure|Animati...|
|      2|      Jumanji (1995)|Adventure|Childre...|
|      3|Grumpier Old Men ...|      Comedy|Romance|
|      4|Waiting to Exhale...|Comedy|Drama|Romance|
|      5|Father of the Bri...|              Comedy|
+-------+--------------------+--------------------+
only showing top 5 rows



In [None]:
ratings_df.show(5)

+------+-------+------+---------+
|userId|movieId|rating|timestamp|
+------+-------+------+---------+
|     1|      1|   4.0|964982703|
|     1|      3|   4.0|964981247|
|     1|      6|   4.0|964982224|
|     1|     47|   5.0|964983815|
|     1|     50|   5.0|964982931|
+------+-------+------+---------+
only showing top 5 rows



In [None]:
links_df.show(5)

+-------+-------+------+
|movieId| imdbId|tmdbId|
+-------+-------+------+
|      1|0114709|   862|
|      2|0113497|  8844|
|      3|0113228| 15602|
|      4|0114885| 31357|
|      5|0113041| 11862|
+-------+-------+------+
only showing top 5 rows



In [None]:
tags_df.show(5)

+------+-------+---------------+----------+
|userId|movieId|            tag| timestamp|
+------+-------+---------------+----------+
|     2|  60756|          funny|1445714994|
|     2|  60756|Highly quotable|1445714996|
|     2|  60756|   will ferrell|1445714992|
|     2|  89774|   Boxing story|1445715207|
|     2|  89774|            MMA|1445715200|
+------+-------+---------------+----------+
only showing top 5 rows



In [None]:
tmp1 = ratings_df.groupBy("userID").count().toPandas()['count'].min()
tmp2 = ratings_df.groupBy("movieId").count().toPandas()['count'].min()
print('For the users that rated movies and the movies that were rated:')
print('Minimum number of ratings per user is {}'.format(tmp1))
print('Minimum number of ratings per movie is {}'.format(tmp2))

For the users that rated movies and the movies that were rated:
Minimum number of ratings per user is 20
Minimum number of ratings per movie is 1


In [None]:
tmp1 = sum(ratings_df.groupBy("movieId").count().toPandas()['count'] == 1)
tmp2 = ratings_df.select('movieId').distinct().count()
print('{} out of {} movies are rated by only one user'.format(tmp1, tmp2))

3446 out of 9724 movies are rated by only one user


## Part 2: Spark SQL and OLAP EDA

In [None]:
movies_df.registerTempTable("movies")
ratings_df.registerTempTable("ratings")
links_df.registerTempTable("links")
tags_df.registerTempTable("tags")

### Number of Users

In [None]:
num_users = spark.sql("SELECT COUNT(DISTINCT userId) as num_users FROM ratings")
num_users.show()

+---------+
|num_users|
+---------+
|      610|
+---------+



### Number of Movies

In [None]:
num_movies = spark.sql("SELECT COUNT(DISTINCT movieId) as num_movies FROM movies")
num_movies.show()

+----------+
|num_movies|
+----------+
|      9742|
+----------+



### Number of movies that are rated by users

In [None]:
rated_movies = spark.sql("SELECT COUNT(DISTINCT movieId) as rated_movies FROM ratings")
rated_movies.show()

+------------+
|rated_movies|
+------------+
|        9724|
+------------+



### Movies that are not rated

In [None]:
not_rated_movies = spark.sql("SELECT* FROM movies WHERE movieId NOT IN (SELECT movieId from ratings)")
not_rated_movies.show()

+-------+--------------------+--------------------+
|movieId|               title|              genres|
+-------+--------------------+--------------------+
|   1076|Innocents, The (1...|Drama|Horror|Thri...|
|   2939|      Niagara (1953)|      Drama|Thriller|
|   3338|For All Mankind (...|         Documentary|
|   3456|Color of Paradise...|               Drama|
|   4194|I Know Where I'm ...|   Drama|Romance|War|
|   5721|  Chosen, The (1981)|               Drama|
|   6668|Road Home, The (W...|       Drama|Romance|
|   6849|      Scrooge (1970)|Drama|Fantasy|Mus...|
|   7020|        Proof (1991)|Comedy|Drama|Romance|
|   7792|Parallax View, Th...|            Thriller|
|   8765|This Gun for Hire...|Crime|Film-Noir|T...|
|  25855|Roaring Twenties,...|Crime|Drama|Thriller|
|  26085|Mutiny on the Bou...|Adventure|Drama|R...|
|  30892|In the Realms of ...|Animation|Documen...|
|  32160|Twentieth Century...|              Comedy|
|  32371|Call Northside 77...|Crime|Drama|Film-...|
|  34482|Bro

### Movie Genres

In [None]:
from pyspark.sql.functions import udf
from pyspark.sql.types import ArrayType, StringType
genresSplit = udf(lambda x: x.split('|'), ArrayType(StringType()))
spark.udf.register("genresSplit", genresSplit)
movie_genres = spark.sql("SELECT DISTINCT EXPLODE(genresSplit(genres)) as genres FROM movies ORDER BY 1")
movie_genres.show()

+------------------+
|            genres|
+------------------+
|(no genres listed)|
|            Action|
|         Adventure|
|         Animation|
|          Children|
|            Comedy|
|             Crime|
|       Documentary|
|             Drama|
|           Fantasy|
|         Film-Noir|
|            Horror|
|              IMAX|
|           Musical|
|           Mystery|
|           Romance|
|            Sci-Fi|
|          Thriller|
|               War|
|           Western|
+------------------+



### Movie count by genres

In [None]:
category_count = spark.sql("SELECT genres, COUNT(movieId) as count FROM (SELECT EXPLODE(genresSplit(genres)) as genres, movieId FROM movies) GROUP BY 1 ORDER BY 2 DESC")
category_count.show()

+------------------+-----+
|            genres|count|
+------------------+-----+
|             Drama| 4361|
|            Comedy| 3756|
|          Thriller| 1894|
|            Action| 1828|
|           Romance| 1596|
|         Adventure| 1263|
|             Crime| 1199|
|            Sci-Fi|  980|
|            Horror|  978|
|           Fantasy|  779|
|          Children|  664|
|         Animation|  611|
|           Mystery|  573|
|       Documentary|  440|
|               War|  382|
|           Musical|  334|
|           Western|  167|
|              IMAX|  158|
|         Film-Noir|   87|
|(no genres listed)|   34|
+------------------+-----+



### Movies by genres

In [None]:
category_movies = spark.sql("SELECT genres, concat_ws(',', collect_list(title)) as movies FROM (SELECT EXPLODE(genresSplit(genres)) as genres, title FROM movies) GROUP BY genres")
category_movies.show()

+------------------+--------------------+
|            genres|              movies|
+------------------+--------------------+
|             Crime|Heat (1995),Casin...|
|           Romance|Grumpier Old Men ...|
|          Thriller|Heat (1995),Golde...|
|         Adventure|Toy Story (1995),...|
|             Drama|Waiting to Exhale...|
|               War|Richard III (1995...|
|       Documentary|Nico Icon (1995),...|
|           Fantasy|Toy Story (1995),...|
|           Mystery|Copycat (1995),Ci...|
|           Musical|Pocahontas (1995)...|
|         Animation|Toy Story (1995),...|
|         Film-Noir|Devil in a Blue D...|
|(no genres listed)|La cravate (1957)...|
|              IMAX|Apollo 13 (1995),...|
|            Horror|Dracula: Dead and...|
|           Western|Desperado (1995),...|
|            Comedy|Toy Story (1995),...|
|          Children|Toy Story (1995),...|
|            Action|Heat (1995),Sudde...|
|            Sci-Fi|Powder (1995),Cit...|
+------------------+--------------

## Part 3: Spark ALS based Recommendation System
We will use an Spark ML to predict the ratings, so let's reload "ratings.csv" and convert it to the form of (user, item, rating) tuples.

In [None]:
ratings_df.show()

+------+-------+------+---------+
|userId|movieId|rating|timestamp|
+------+-------+------+---------+
|     1|      1|   4.0|964982703|
|     1|      3|   4.0|964981247|
|     1|      6|   4.0|964982224|
|     1|     47|   5.0|964983815|
|     1|     50|   5.0|964982931|
|     1|     70|   3.0|964982400|
|     1|    101|   5.0|964980868|
|     1|    110|   4.0|964982176|
|     1|    151|   5.0|964984041|
|     1|    157|   5.0|964984100|
|     1|    163|   5.0|964983650|
|     1|    216|   5.0|964981208|
|     1|    223|   3.0|964980985|
|     1|    231|   5.0|964981179|
|     1|    235|   4.0|964980908|
|     1|    260|   5.0|964981680|
|     1|    296|   3.0|964982967|
|     1|    316|   3.0|964982310|
|     1|    333|   5.0|964981179|
|     1|    349|   4.0|964982563|
+------+-------+------+---------+
only showing top 20 rows



In [None]:
movie_ratings=ratings_df.drop('timestamp')

In [None]:
# Data type convert
from pyspark.sql.types import IntegerType, FloatType
movie_ratings = movie_ratings.withColumn("userId", movie_ratings["userId"].cast(IntegerType()))
movie_ratings = movie_ratings.withColumn("movieId", movie_ratings["movieId"].cast(IntegerType()))
movie_ratings = movie_ratings.withColumn("rating", movie_ratings["rating"].cast(FloatType()))

In [None]:
movie_ratings.show()

+------+-------+------+
|userId|movieId|rating|
+------+-------+------+
|     1|      1|   4.0|
|     1|      3|   4.0|
|     1|      6|   4.0|
|     1|     47|   5.0|
|     1|     50|   5.0|
|     1|     70|   3.0|
|     1|    101|   5.0|
|     1|    110|   4.0|
|     1|    151|   5.0|
|     1|    157|   5.0|
|     1|    163|   5.0|
|     1|    216|   5.0|
|     1|    223|   3.0|
|     1|    231|   5.0|
|     1|    235|   4.0|
|     1|    260|   5.0|
|     1|    296|   3.0|
|     1|    316|   3.0|
|     1|    333|   5.0|
|     1|    349|   4.0|
+------+-------+------+
only showing top 20 rows



### ALS Model Selection and Evaluation

With the ALS model, we can use a grid search to find the optimal hyperparameters.

In [None]:
# import package
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.ml.tuning import CrossValidator,ParamGridBuilder

In [None]:
#Create test and train set
(training,test)=movie_ratings.randomSplit([0.8,0.2])

In [None]:
#Create ALS model
als = ALS(maxIter=5, rank=10, regParam=0.01, userCol="userId", itemCol="movieId", ratingCol="rating", coldStartStrategy="drop")

In [None]:
#Tune model using ParamGridBuilder
param_grid = ParamGridBuilder()\
  .addGrid(als.maxIter, [3, 5, 10, 15])\
  .addGrid(als.rank, [5, 10, 15, 20])\
  .addGrid(als.regParam, [2, 1, 0.5, 0.1])\
  .build()

In [None]:
# Define evaluator as RMSE
evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",
                                predictionCol="prediction")

In [None]:
# Build Cross validation 
cv = CrossValidator(estimator=als, estimatorParamMaps=param_grid, evaluator=evaluator, numFolds=5)

In [None]:
#Fit ALS model to training data
cv_model = cv.fit(training)

In [None]:
#Extract best model from the tuning exercise using ParamGridBuilder
best_model = cv_model.bestModel

### Model testing
And finally, make a prediction and check the testing error.

In [None]:
#Generate predictions and evaluate using RMSE
predictions=best_model.transform(test)
rmse = evaluator.evaluate(predictions)

In [None]:
#Print evaluation metrics and model parameters
best_params = cv_model.getEstimatorParamMaps()[np.argmin(cv_model.avgMetrics)]
print("RMSE = "+str(rmse))
print("**Best Model Parameters**")

for i, j in best_params.items():
  print(" " + i.name + ":", j)

RMSE = 0.8846200777501773
**Best Model Parameters**
 maxIter: 15
 rank: 5
 regParam: 0.1


In [None]:
predictions.show()

+------+-------+------+----------+
|userId|movieId|rating|prediction|
+------+-------+------+----------+
|   133|    471|   4.0| 3.0043478|
|   436|    471|   3.0| 3.5471473|
|   182|    471|   4.5|  3.631299|
|   218|    471|   4.0| 3.4296155|
|   610|    471|   4.0| 3.4486492|
|   171|    471|   3.0| 4.7256274|
|   312|    471|   4.0|  3.802827|
|   216|    471|   3.0| 2.9339616|
|   608|    471|   1.5|  2.918876|
|   159|   1088|   4.0| 2.6713765|
|   606|   1088|   3.0| 3.4569247|
|   387|   1088|   1.5|  2.884079|
|   391|   1088|   1.0| 3.0994167|
|    10|   1088|   3.0| 3.0802805|
|    68|   1088|   3.5|  3.164614|
|   104|   1088|   3.0| 3.6109524|
|   587|   1238|   4.0| 3.4363818|
|   268|   1238|   5.0|  3.572527|
|    19|   1238|   3.0|  3.163278|
|   425|   1342|   3.5| 1.9199643|
+------+-------+------+----------+
only showing top 20 rows



### Model Performance

In [None]:
alldata=best_model.transform(movie_ratings)
rmse = evaluator.evaluate(alldata)
print ("RMSE = "+str(rmse))

RMSE = 0.6900217372368342


In [None]:
alldata.registerTempTable("alldata")

In [None]:
spark.sql("SELECT* FROM alldata").show()

+------+-------+------+----------+
|userId|movieId|rating|prediction|
+------+-------+------+----------+
|   191|    148|   5.0| 4.9185295|
|   133|    471|   4.0| 3.0043478|
|   597|    471|   2.0| 3.6863744|
|   385|    471|   4.0|  3.405606|
|   436|    471|   3.0| 3.5471473|
|   602|    471|   4.0| 3.3631961|
|    91|    471|   1.0| 2.6150086|
|   409|    471|   3.0|  3.476297|
|   372|    471|   3.0| 3.0022192|
|   599|    471|   2.5|   2.66378|
|   603|    471|   4.0| 4.1922054|
|   182|    471|   4.5|  3.631299|
|   218|    471|   4.0| 3.4296155|
|   474|    471|   3.0| 3.4533317|
|   500|    471|   1.0|  2.204967|
|    57|    471|   3.0| 3.1132672|
|   462|    471|   2.5| 3.2656152|
|   387|    471|   3.0| 3.1440275|
|   610|    471|   4.0| 3.4486492|
|   217|    471|   2.0| 2.3340392|
+------+-------+------+----------+
only showing top 20 rows



In [None]:
spark.sql("SELECT* FROM movies JOIN alldata ON movies.movieId = alldata.movieId").show()

+-------+--------------------+------+------+-------+------+----------+
|movieId|               title|genres|userId|movieId|rating|prediction|
+-------+--------------------+------+------+-------+------+----------+
|    148|Awfully Big Adven...| Drama|   191|    148|   5.0| 4.9185295|
|    471|Hudsucker Proxy, ...|Comedy|   133|    471|   4.0| 3.0043478|
|    471|Hudsucker Proxy, ...|Comedy|   597|    471|   2.0| 3.6863744|
|    471|Hudsucker Proxy, ...|Comedy|   385|    471|   4.0|  3.405606|
|    471|Hudsucker Proxy, ...|Comedy|   436|    471|   3.0| 3.5471473|
|    471|Hudsucker Proxy, ...|Comedy|   602|    471|   4.0| 3.3631961|
|    471|Hudsucker Proxy, ...|Comedy|    91|    471|   1.0| 2.6150086|
|    471|Hudsucker Proxy, ...|Comedy|   409|    471|   3.0|  3.476297|
|    471|Hudsucker Proxy, ...|Comedy|   372|    471|   3.0| 3.0022192|
|    471|Hudsucker Proxy, ...|Comedy|   599|    471|   2.5|   2.66378|
|    471|Hudsucker Proxy, ...|Comedy|   603|    471|   4.0| 4.1922054|
|    4

## Part 4: Model Application

### Recommend movies to users with id: 575, 232

In [None]:
!pip install koalas
import databricks.koalas as ks



In [146]:
# top 10 recommendations for all users
all_recs = best_model.recommendForAllUsers(10)
all_recs_ks = all_recs.to_koalas()
movies_ks = movies_df.to_koalas()

In [None]:
# function to recommend 10 movies to a given user

def topKRecommendation(id, model):
  '''
  k: number of recommendations
  id: user id
  model: the trained model
  '''
  user_recs = all_recs_ks.loc[id, 'recommendations']
  recs = []
  for i in user_recs:
    recs.append(i[0])
  return movies_ks[movies_ks['movieId'].isin(recs)]

In [None]:
topKRecommendation(10, 575, best_model)

Unnamed: 0,movieId,title,genres
799,1046,Beautiful Thing (1996),Drama|Romance
1948,2582,Twin Dragons (Shuang long hui) (1992),Action|Comedy
2926,3925,Stranger Than Paradise (1984),Comedy|Drama
3685,5075,Waydowntown (2000),Comedy
5037,7842,Dune (2000),Drama|Fantasy|Sci-Fi
5539,26614,"Bourne Identity, The (1988)",Action|Adventure|Drama|Mystery|Thriller
6813,60943,Frozen River (2008),Drama
7536,84847,Emma (2009),Comedy|Drama|Romance
7742,90888,Immortals (2011),Action|Drama|Fantasy
9699,185029,A Quiet Place (2018),Drama|Horror|Thriller


In [None]:
topKRecommendation(10, 232, best_model)

Unnamed: 0,movieId,title,genres
181,213,Burnt by the Sun (Utomlyonnye solntsem) (1994),Drama
2597,3473,Jonah Who Will Be 25 in the Year 2000 (Jonas q...,Comedy
3320,4495,Crossing Delancey (1988),Comedy|Romance
4251,6201,Lady Jane (1986),Drama|Romance
5013,7767,"Best of Youth, The (La meglio gioventù) (2003)",Drama
5136,8235,Safety Last! (1923),Action|Comedy|Romance
5467,26171,Play Time (a.k.a. Playtime) (1967),Comedy
6444,51931,Reign Over Me (2007),Drama
7704,89904,The Artist (2011),Comedy|Drama|Romance
8110,100714,Before Midnight (2013),Drama|Romance


### Find the similar movies for movie with id: 463, 471


In [None]:
item_factors=best_model.itemFactors.to_koalas()

In [143]:
def similarMovies(movieId, matrix = 'cosine_similarity'):
  '''
  id: movie id
  matrix: distance calcluation method
  '''
  try:
    movie_factors = item_factors.loc[item_factors.id==str(movieId),'features'].to_numpy()[0]
  except:
    return "There is no movie with the given id."
  if matrix == "cosine_similarity":
    similar_movies = pd.DataFrame(columns = ('movieId', 'cosine_similarity'))
    for id, factors in item_factors.to_numpy():
      cos_sim = np.dot(movie_factors, factors)/(np.linalg.norm(movie_factors)*np.linalg.norm(factors))
      similar_movies = similar_movies.append({'movieId': str(id), 'cosine_similarity': cos_sim}, ignore_index = True)
    cos_sim_movie = similar_movies.sort_values(by=['cosine_similarity'],ascending = False)[1:11]
    output = cos_sim_movie.merge(movies_ks.to_pandas(), left_on='movieId', right_on = 'movieId', how = 'inner')
  if matrix=='euclidean_distance':
    similar_movies = pd.DataFrame(columns=('movieId','euclidean_distance'))
    for id, factors in item_factors.to_numpy():
      euc_dis = np.linalg.norm(np.array(movie_factors)-np.array(factors))
      similar_movies = similar_movies.append({'movieId': str(id), 'euclidean_distance': euc_dis}, ignore_index=True)
    euc_dis_movie = similar_movies.sort_values(by=['euclidean_distance'])[1:11]
    output = euc_dis_movie.merge(movies_ks.to_pandas(), left_on='movieId', right_on = 'movieId', how = 'inner')
  return output[['movieId','title','genres']]

In [138]:
similarMovies(463)

'There is no movie with the given id.'

In [144]:
similarMovies(471, 'cosine_similarity')

Unnamed: 0,movieId,title,genres
0,946,To Be or Not to Be (1942),Comedy|Drama|War
1,3159,Fantasia 2000 (1999),Animation|Children|Musical|IMAX
2,1454,SubUrbia (1997),Comedy|Drama
3,3088,Harvey (1950),Comedy|Fantasy
4,115617,Big Hero 6 (2014),Action|Animation|Comedy
5,1446,Kolya (Kolja) (1996),Comedy|Drama
6,57274,[REC] (2007),Drama|Horror|Thriller
7,80551,Eat Pray Love (2010),Drama|Romance
8,198,Strange Days (1995),Action|Crime|Drama|Mystery|Sci-Fi|Thriller
9,236,French Kiss (1995),Action|Comedy|Romance


In [145]:
similarMovies(471, 'euclidean_distance')

Unnamed: 0,movieId,title,genres
0,3159,Fantasia 2000 (1999),Animation|Children|Musical|IMAX
1,946,To Be or Not to Be (1942),Comedy|Drama|War
2,1454,SubUrbia (1997),Comedy|Drama
3,115617,Big Hero 6 (2014),Action|Animation|Comedy
4,57274,[REC] (2007),Drama|Horror|Thriller
5,1446,Kolya (Kolja) (1996),Comedy|Drama
6,3088,Harvey (1950),Comedy|Fantasy
7,198,Strange Days (1995),Action|Crime|Drama|Mystery|Sci-Fi|Thriller
8,187593,Deadpool 2 (2018),Action|Comedy|Sci-Fi
9,236,French Kiss (1995),Action|Comedy|Romance
