# Suggesting Movies system

In [1]:
import pandas as pd
import numpy as np
from math import sqrt

## fetch datas

In [27]:
movies_df = pd.read_csv( 'movies.csv' )
ratings_df = pd.read_csv( 'ratings.csv' )

## pre processing & cleaning

### Movies dataframe

In [28]:
movies_df.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


We have 2 problems with this df :

   • In title column, (\d{4}\) is better to be dropped & store in a specific column.

   • Genres must be separate to use

### title correction

In [29]:
movies_df['year'] = movies_df.title.str.extract( r'(\d{4})', expand = False )

movies_df['title'] = movies_df.title.str.replace( r'\(\d{4}\)', '', regex = True )
movies_df['title'] = movies_df['title'].apply( lambda x: x.strip())

movies_df.head()

Unnamed: 0,movieId,title,genres,year
0,1,Toy Story,Adventure|Animation|Children|Comedy|Fantasy,1995
1,2,Jumanji,Adventure|Children|Fantasy,1995
2,3,Grumpier Old Men,Comedy|Romance,1995
3,4,Waiting to Exhale,Comedy|Drama|Romance,1995
4,5,Father of the Bride Part II,Comedy,1995


We'll split genres by "|" and make lists of them first

### genres correction

In [30]:
movies_df['genres'] = movies_df.genres.str.split('|')
movies_df.head()

Unnamed: 0,movieId,title,genres,year
0,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995
2,3,Grumpier Old Men,"[Comedy, Romance]",1995
3,4,Waiting to Exhale,"[Comedy, Drama, Romance]",1995
4,5,Father of the Bride Part II,[Comedy],1995


for each movie in movies_df, this code looks at its list of genres and, based on genres, puts a value of 1 in Genresofmovies_df for each corresponding genre in a particular row.

In [31]:
Genresofmovies_df = movies_df.copy()
for index, row in movies_df.iterrows():
    for genre in row['genres']:
        Genresofmovies_df.at[index, genre] = 1
Genresofmovies_df = Genresofmovies_df.fillna(0)
Genresofmovies_df.head()

Unnamed: 0,movieId,title,genres,year,Adventure,Animation,Children,Comedy,Fantasy,Romance,...,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
0,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995,1.0,1.0,1.0,1.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995,1.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,3,Grumpier Old Men,"[Comedy, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,4,Waiting to Exhale,"[Comedy, Drama, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,5,Father of the Bride Part II,[Comedy],1995,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Now we have cleaned names of movies dataframe wich is sorted by genres.

### Ratings dataframe

In [5]:
ratings_df.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


We do not use timestamp in this question then remove it

In [32]:
ratings_df = ratings_df.drop('timestamp', axis = 1)
ratings_df.head()

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0


## Find the desired movie

In [37]:
result = movies_df[ movies_df['title'].str.contains( 'Founder', case=False, na=False )]
print(result)

      movieId        title   genres  year
9440   166946  The Founder  [Drama]  2016


# Recommenders / Content Based

Now we get the list of watched movies from the user along with the score he gave to them.

In [38]:
userInput = [
            { 'title' : 'Sherlock Holmes', 'rating' : 8 },
            { 'title' : 'Miss Sloane',     'rating' : 9 },
            { 'title' : 'Safe',            'rating' : 7 },
            { 'title' : "Taken",           'rating' : 6 },
            { 'title' : 'The Founder',     'rating' : 7 }
         ] 
inputMovies = pd.DataFrame( userInput )
inputMovies

Unnamed: 0,title,rating
0,Sherlock Holmes,8
1,Miss Sloane,9
2,Safe,7
3,Taken,6
4,The Founder,7


Now we find movie IDs from what user inputs and add them to inputmovies dataframe

In [39]:
inputId = movies_df[ movies_df['title'].isin( inputMovies['title'].tolist() )]

inputMovies = pd.merge(inputId, inputMovies)
inputMovies = inputMovies.drop('genres', axis = 1).drop('year', axis = 1)
inputMovies

Unnamed: 0,movieId,title,rating
0,190,Safe,7
1,59369,Taken,6
2,73017,Sherlock Holmes,8
3,94405,Safe,7
4,166568,Miss Sloane,9
5,166946,The Founder,7


> 'Contents' in this qustion are the Genres of movies! so we must find inputted movie genres first for making user profile.

In [40]:
userMovies = Genresofmovies_df[ Genresofmovies_df['movieId'].isin( inputMovies['movieId'].tolist() )]
userMovies

Unnamed: 0,movieId,title,genres,year,Adventure,Animation,Children,Comedy,Fantasy,Romance,...,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
161,190,Safe,[Thriller],1995,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6746,59369,Taken,"[Action, Crime, Drama, Thriller]",2008,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7214,73017,Sherlock Holmes,"[Action, Crime, Mystery, Thriller]",2009,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7876,94405,Safe,"[Action, Crime, Thriller]",2012,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9436,166568,Miss Sloane,[Thriller],2016,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9440,166946,The Founder,[Drama],2016,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


> Now we have the Genres! let reset the indexes.

So we separate the genres from the rest of the information of the movies because the focus of the user's interests is on the genres

In [43]:
userMovies = userMovies.reset_index( drop = True )

userGenreTable = userMovies.drop('movieId', axis = 1).drop('title', axis = 1).drop('genres', axis = 1).drop('year', axis = 1)
userGenreTable

Unnamed: 0,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


We are going to create the user's profile to show him/her the recommended movies based on his interests.

In [45]:
userProfile = userGenreTable.transpose().dot( inputMovies['rating'] )
userProfile

Adventure              0.0
Animation              0.0
Children               0.0
Comedy                 0.0
Fantasy                0.0
Romance                0.0
Drama                 13.0
Action                21.0
Crime                 21.0
Thriller              37.0
Horror                 0.0
Mystery                8.0
Sci-Fi                 0.0
War                    0.0
Musical                0.0
Documentary            0.0
IMAX                   0.0
Western                0.0
Film-Noir              0.0
(no genres listed)     0.0
dtype: float64

code description:

• First, the table that contains the genres of movies is moved (genres are converted into rows).

• Then internal multiplication is done with the ratings in the 'rating' column of the inputMovies DataFrame. 

• The final result will be an array or series, each element of which represents a summary of scores for each genre.

> In short, this code is used to calculate a composite value of movie scores based on their genre.

We make a table from all movies' genres

In [47]:
genreTable = Genresofmovies_df.set_index( Genresofmovies_df['movieId'] )
genreTable = genreTable.drop('movieId', axis = 1).drop('title', axis = 1).drop('genres', axis = 1).drop('year', axis = 1)
print( genreTable.shape )
genreTable.head()

(9742, 20)


Unnamed: 0_level_0,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


> Weighing Genres by user scores

In [48]:
recommendationTable_df = ( (genreTable * userProfile).sum(axis = 1) )/( userProfile.sum() )
recommendationTable_df.head()

movieId
1    0.00
2    0.00
3    0.00
4    0.13
5    0.00
dtype: float64

code description:

•• (genreTable * userProfile): This section creates a cross between genreTable and userProfile. This action causes the scores of each user to be weighted according to their interests in genres.

•• .sum(axis=1): This method calculates the sum for each row. As a result, the total weighted scores for each genre are obtained.

•• userProfile.sum(): Here, the sum of all user profile values ​​is calculated. This value is used as the user's total score to normalize genre scores.

> sorting movies by their $score$ and $being$ $close$ to the user's interests

In [50]:
recommendationTable_df = recommendationTable_df.sort_values( ascending = False )
print( recommendationTable_df.shape )
recommendationTable_df.head()

(9742,)


movieId
90738     1.0
165347    1.0
79132     1.0
8860      1.0
26701     1.0
dtype: float64

> We have the suggests movies

Let's show them to user

In [56]:
movies_df.loc[ movies_df['movieId'].isin( recommendationTable_df.head(30).keys() )].sort_values(by = 'year', ascending = False)

Unnamed: 0,movieId,title,genres,year
9594,175585,Shot Caller,"[Action, Crime, Drama, Thriller]",2017
9411,165347,Jack Reacher: Never Go Back,"[Action, Crime, Drama, Mystery, Thriller]",2016
9193,150548,Sherlock: The Abominable Bride,"[Action, Crime, Drama, Mystery, Thriller]",2016
8651,120637,Blackhat,"[Action, Crime, Drama, Mystery, Thriller]",2015
8849,132618,Kite,"[Action, Crime, Drama, Mystery, Thriller]",2014
7736,90738,"Double, The","[Action, Crime, Drama, Mystery, Thriller]",2011
7441,81132,Rubber,"[Action, Adventure, Comedy, Crime, Drama, Film...",2010
7372,79132,Inception,"[Action, Crime, Drama, Mystery, Sci-Fi, Thrill...",2010
7260,74510,"Girl Who Played with Fire, The (Flickan som le...","[Action, Crime, Drama, Mystery, Thriller]",2009
7176,72167,"Boondock Saints II: All Saints Day, The","[Action, Crime, Drama, Thriller]",2009


> •••  $The$ $End$  •••

# Recommenders / Collaborative Filtering

## data customization

    In this method we don't need content features so drop them

In [57]:
cf_df = movies_df.drop('genres', axis = 1)
cf_df.head()

Unnamed: 0,movieId,title,year
0,1,Toy Story,1995
1,2,Jumanji,1995
2,3,Grumpier Old Men,1995
3,4,Waiting to Exhale,1995
4,5,Father of the Bride Part II,1995


In [58]:
ratings_df.head()

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0


## Filtering

> Inputted movies are the same so we just use them

In [59]:
inputMovies

Unnamed: 0,movieId,title,rating
0,190,Safe,7
1,59369,Taken,6
2,73017,Sherlock Holmes,8
3,94405,Safe,7
4,166568,Miss Sloane,9
5,166946,The Founder,7


In [61]:
userSubset = ratings_df[ ratings_df['movieId'].isin( inputMovies['movieId'].tolist() )]
print( userSubset.shape )
userSubset.head()

(111, 3)


Unnamed: 0,userId,movieId,rating
312,4,190,2.0
1212,10,73017,3.0
2147,18,73017,4.5
3461,21,59369,3.0
3486,21,73017,2.5


$•$  Based on the ratings that other users have given to the movies our target user likes, we create a subset of the rating table.

In [71]:
userSubsetGroup = userSubset.groupby( ['userId'] )

Now we founded $other$ $users$ which has the same interests

    Allow us to sort them in the code below

In [77]:
userSubsetGroup = sorted( userSubsetGroup,  key = lambda x : len(x[1]), reverse = True )
userSubsetGroup[ 0 : 3 ]

[((380,),
         userId  movieId  rating
  57791     380    59369     5.0
  57844     380    73017     3.0
  57917     380    94405     3.0
  58069     380   166946     4.0),
 ((414,),
         userId  movieId  rating
  64703     414    59369     3.5
  64786     414    73017     4.0
  64961     414   166568     4.0),
 ((551,),
         userId  movieId  rating
  84880     551    59369     4.5
  84886     551    73017     4.0
  84927     551   166568     5.0)]

> From the group of users we have, we keep 100 items so that the volume of data is a little less.  

In [80]:
userSubsetGroup = userSubsetGroup[0:100]

  > We use $ Pearson $ $ Correlation $ for finding the association between user's input & subset groups

In [103]:
pearsonCorrelationDict = {}

for name, group in userSubsetGroup:
    group = group.sort_values( by = 'movieId' )
    inputMovies = inputMovies.sort_values( by = 'movieId' )    #  Both group and inputMovies are sorted by movie ID to ensure that the values are aligned.
    
    nRatings = len(group)    #  The number of points given by the user in the group is calculated.
    
    temp_df = inputMovies[ inputMovies['movieId'].isin( group['movieId'].tolist() )]    #   Filter inputMovies to find movies that are in both inputMovies and the group.
    
    tempRatingList = temp_df['rating'].tolist()
    tempGroupList = group['rating'].tolist()    #   Get rating lists as tempRatingList for input movies and tempGroupList for group
    
    Sxx = sum([ i**2 for i in tempRatingList ]) - pow( sum(tempRatingList), 2 ) / float(nRatings)
    Syy = sum([ i**2 for i in tempGroupList  ]) - pow( sum(tempGroupList),  2 ) / float(nRatings)
    Sxy = sum( i*j for i, j in zip(tempRatingList, tempGroupList) ) - sum(tempRatingList) * sum(tempGroupList) / float(nRatings)
    #   Calculation of correlation components
    
    if Sxx != 0 and Syy != 0:
        pearsonCorrelationDict[name] = Sxy / sqrt(Sxx * Syy)
    else:
        pearsonCorrelationDict[name] = 0

After calculating Sxx, Syy and Sxy, the correlation between user and group scores is stored in $ pearsonCorrelationDict $. Using these values, Pearson's correlation coefficient can be calculated, which indicates the $ degree $ $of$ $ correlation $ between two data sets.

In [102]:
pearsonCorrelationDict.items()

dict_items([((380,), -0.8528028654224417), ((414,), 0.9449111825230704), ((551,), 0.3273268353539889), ((21,), -1.0), ((52,), 1.0), ((68,), -1.0), ((105,), 1.0), ((119,), 1.0), ((177,), 1.0), ((212,), 1.0), ((232,), 1.0), ((249,), 0), ((274,), -1.0), ((292,), 1.0), ((328,), 0), ((331,), 1.0), ((393,), 1.0), ((408,), 1.0), ((432,), -1.0), ((448,), 1.0), ((514,), 0), ((560,), -1.0), ((561,), 1.0), ((599,), 1.0), ((610,), 0), ((4,), 0), ((10,), 0), ((18,), 0), ((28,), 0), ((62,), 0), ((63,), 0), ((73,), 0), ((76,), 0), ((80,), 0), ((103,), 0), ((104,), 0), ((106,), 0), ((111,), 0), ((139,), 0), ((141,), 0), ((195,), 0), ((211,), 0), ((219,), 0), ((227,), 0), ((233,), 0), ((247,), 0), ((275,), 0), ((279,), 0), ((280,), 0), ((282,), 0), ((288,), 0), ((298,), 0), ((305,), 0), ((318,), 0), ((325,), 0), ((326,), 0), ((332,), 0), ((339,), 0), ((341,), 0), ((351,), 0), ((352,), 0), ((365,), 0), ((376,), 0), ((381,), 0), ((382,), 0), ((418,), 0), ((434,), 0), ((460,), 0), ((466,), 0), ((483,), 0)

> With pearson correlation dictionary, making Pearson dataframe.

This will let us to have a similarity's data frame

### Making similarity data frame

also add movies ID & and ratings data frame + sorting the values

In [91]:
pearsonDF = pd.DataFrame.from_dict( pearsonCorrelationDict, orient = 'index' )
pearsonDF.columns = ['similarityIndex']
pearsonDF['userId'] = pearsonDF.index
pearsonDF['userId'] = pearsonDF['userId'].apply( lambda x : x[0] )
pearsonDF.index = range( len(pearsonDF) )
pearsonDF.head()

Unnamed: 0,similarityIndex,userId
0,-0.852803,380
1,0.944911,414
2,0.327327,551
3,-1.0,21
4,1.0,52


In [92]:
topUsers = pearsonDF.sort_values(by = 'similarityIndex', ascending = False)[0:50]
topUsers.head()

Unnamed: 0,similarityIndex,userId
8,1.0,177
4,1.0,52
17,1.0,408
15,1.0,331
7,1.0,119


In [93]:
topUsersRating = topUsers.merge( ratings_df, left_on='userId', right_on='userId', how='inner' )
topUsersRating.head()

Unnamed: 0,similarityIndex,userId,movieId,rating
0,1.0,177,1,5.0
1,1.0,177,2,3.5
2,1.0,177,7,1.0
3,1.0,177,11,3.0
4,1.0,177,16,3.0


### Weighing rates

> This will find similar users with the most matches in the viewed movies!

In [94]:
topUsersRating['weightedRating'] = topUsersRating['similarityIndex'] * topUsersRating['rating']
topUsersRating.head()

Unnamed: 0,similarityIndex,userId,movieId,rating,weightedRating
0,1.0,177,1,5.0,5.0
1,1.0,177,2,3.5,3.5
2,1.0,177,7,1.0,1.0
3,1.0,177,11,3.0,3.0
4,1.0,177,16,3.0,3.0


In [95]:
tempTopUsersRating = topUsersRating.groupby('movieId').sum()[['similarityIndex','weightedRating']]
tempTopUsersRating.columns = ['sum_similarityIndex','sum_weightedRating']
tempTopUsersRating.head()

Unnamed: 0_level_0,sum_similarityIndex,sum_weightedRating
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,7.944911,31.779645
2,5.944911,19.834734
3,2.944911,8.279645
5,2.944911,7.889822
6,3.944911,15.334734


In [96]:
recommendation_df = pd.DataFrame()
recommendation_df['weighted average recommendation score'] = tempTopUsersRating['sum_weightedRating'] / tempTopUsersRating['sum_similarityIndex']
recommendation_df['movieId'] = tempTopUsersRating.index
recommendation_df.head()

Unnamed: 0_level_0,weighted average recommendation score,movieId
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,4.0,1
2,3.336422,2
3,2.811509,3
5,2.679138,5
6,3.887219,6


Code description :

$•$
This code is used to calculate and store the $weighted$ $average$ $score$ for videos based on $user$ $comments$ in a DataFrame.

$•$
In this line, the weighted average score is calculated for each movie. This score is obtained by dividing the sum of weighted scores (sum_weightedRating) by the sum of similarity indexes (sum_similarityIndex).

$•$
In this line, the $id$ of the $movies$ is added to the recommendation_df using the DataFrame tempTopUsersRating index. This action allows us to have *the ID of the movie for each weighted average score*.

In [97]:
recommendation_df = recommendation_df.sort_values( by = 'weighted average recommendation score', ascending = False )
recommendation_df.head(10)

Unnamed: 0_level_0,weighted average recommendation score,movieId
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
175431,5.0,175431
142020,5.0,142020
866,5.0,866
69134,5.0,69134
69529,5.0,69529
138835,5.0,138835
971,5.0,971
955,5.0,955
5650,5.0,5650
2495,5.0,2495


Now we have found the most similar movies that are rated by similar users.
> Only one thing remains ...

### Recommended movies

In [99]:
movies_df.loc[ movies_df['movieId'].isin( recommendation_df.head(25)['movieId'].tolist() )].sort_values(by = 'year', ascending = False).reset_index()

Unnamed: 0,index,movieId,title,genres,year
0,9094,143511,Human,[Documentary],2015
1,8301,106642,"Day of the Doctor, The","[Adventure, Drama, Sci-Fi]",2013
2,7065,69529,Home,[Documentary],2009
3,7045,69134,Antichrist,"[Drama, Fantasy]",2009
4,6977,66511,Berlin Calling,"[Comedy, Drama]",2008
5,6293,47997,Idiocracy,"[Adventure, Comedy, Sci-Fi, Thriller]",2006
6,4231,6159,All the Real Girls,"[Drama, Romance]",2003
7,4358,6375,Gigantic (A Tale of Two Johns),[Documentary],2002
8,4354,6370,"Spanish Apartment, The (L'auberge espagnole)","[Comedy, Drama, Romance]",2002
9,661,866,Bound,"[Crime, Drama, Romance, Thriller]",1996
