In [1]:
import pandas as pd
import numpy as np


Load the data into Pandas dataframes:

In [2]:
movies = pd.read_csv('movies2.csv')  # Movie information, including genres
ratings = pd.read_csv('ratings2.csv')  # Ratings by users

Merge the movies and ratings datasets to associate movies with their ratings:

In [3]:
movie_ratings = pd.merge(movies, ratings, on='movieId')

Group the data by movie and calculate the number of ratings (v) and the average rating (R):

In [4]:
# Calculate number of ratings (v) and average rating (R) for each movie
movie_stats = movie_ratings.groupby('movieId').agg({
    'rating': ['mean', 'count']
})
movie_stats.columns = ['R', 'v']


C : is the mean rating across the whole dataset.
m : is the minimum number of ratings required to consider a movie for recommendation. You can set it to a suitable value like the 90th percentile of the number of ratings for movies.

In [5]:
C = movie_ratings['rating'].mean()  # Mean rating across the whole dataset
m = movie_stats['v'].quantile(0.90)  # Set m to the 90th percentile of the number of ratings

Filter movies with at least m ratings: Only consider movies with more than m ratings to make recommendations more reliable.

In [6]:
qualified_movies = movie_stats[movie_stats['v'] >= m]

Use the formula provided to calculate the weighted score for each movie:

In [7]:
qualified_movies['score'] = (qualified_movies['v'] / (qualified_movies['v'] + m) * qualified_movies['R']) + (m / (m + qualified_movies['v']) * C)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  qualified_movies['score'] = (qualified_movies['v'] / (qualified_movies['v'] + m) * qualified_movies['R']) + (m / (m + qualified_movies['v']) * C)


To recommend movies based on a user’s preferred genre (e.g., "Action"), filter the movies dataframe by genre. Then, merge it with the qualified_movies dataframe to get the final list of recommended movies in the desired genre. eg: "Action" genre

In [8]:
genre = 'Action'
genre_movies = movies[movies['genres'].str.contains(genre, case=False)]
recommended_movies = pd.merge(genre_movies, qualified_movies, on='movieId')

Finally, sort the movies by their calculated weighted score in descending order and pick the top n movies to recommend.

In [9]:
top_n_recommendations = recommended_movies.sort_values('score', ascending=False).head(10)

Display the results

In [10]:
print(top_n_recommendations[['title', 'score']])

                                                 title     score
124                                  Fight Club (1999)  4.176763
112                                 Matrix, The (1999)  4.114653
55   Star Wars: Episode V - The Empire Strikes Back...  4.082786
19           Star Wars: Episode IV - A New Hope (1977)  4.053112
57   Raiders of the Lost Ark (Indiana Jones and the...  4.038848
101                         Saving Private Ryan (1998)  4.020876
56                          Princess Bride, The (1987)  4.020047
40                                 Blade Runner (1982)  4.014919
22   Léon: The Professional (a.k.a. The Professiona...  4.003812
59   Good, the Bad and the Ugly, The (Buono, il bru...  3.996961
