### About Dataset

https://www.kaggle.com/datasets/grouplens/movielens-20m-dataset

**The MoviesLens Dataset** describe ratings and free-text tagging activities from MovieLens, a movie recommendation service. It contains 20000263 ratings and 465564 tag applications across 27278 movies. These data were created by 138493 users between January 09, 1995 and March 31, 2015. This dataset was generated on October 17, 2016.

Users were selected at random for inclusion. All selected users had rated at least 20 movies.

<br>

<hr>

### Import Libraries

In [1]:
import pandas as pd

<br>

<hr>

### Functions

In [6]:
def create_user_movie_df(movie,rating):
    df = movie.merge(rating, how="left", on="movieId")
    comment_counts = pd.DataFrame(df["title"].value_counts())
    rare_movies = comment_counts[comment_counts["title"] <= 1000].index
    common_movies = df[~df["title"].isin(rare_movies)]
    user_movie_df = common_movies.pivot_table(index=["userId"], columns=["title"], values="rating")
    return user_movie_df

def user_based_recommender(random_user, user_movie_df, ratio=60, cor_th=0.65, score=3.5):
    random_user_df = user_movie_df[user_movie_df.index == random_user]
    movies_watched = random_user_df.columns[random_user_df.notna().any()].tolist()
    movies_watched_df = user_movie_df[movies_watched]
    user_movie_count = movies_watched_df.T.notnull().sum()
    user_movie_count = user_movie_count.reset_index()
    user_movie_count.columns = ["userId", "movie_count"]
    perc = len(movies_watched) * ratio / 100
    users_same_movies = user_movie_count[user_movie_count["movie_count"] > perc]["userId"]

    final_df = pd.concat([movies_watched_df[movies_watched_df.index.isin(users_same_movies)],
                          random_user_df[movies_watched]])

    corr_df = final_df.T.corr().unstack().sort_values().drop_duplicates()
    corr_df = pd.DataFrame(corr_df, columns=["corr"])
    corr_df.index.names = ['user_id_1', 'user_id_2']
    corr_df = corr_df.reset_index()

    top_users = corr_df[(corr_df["user_id_1"] == random_user) & (corr_df["corr"] >= cor_th)][
        ["user_id_2", "corr"]].reset_index(drop=True)

    top_users = top_users.sort_values(by='corr', ascending=False)
    top_users.rename(columns={"user_id_2": "userId"}, inplace=True)
    rating = pd.read_csv('dataset/rating.csv')
    top_users_ratings = top_users.merge(rating[["userId", "movieId", "rating"]], how='inner')
    top_users_ratings['weighted_rating'] = top_users_ratings['corr'] * top_users_ratings['rating']

    recommendation_df = top_users_ratings.groupby('movieId').agg({"weighted_rating": "mean"})
    recommendation_df = recommendation_df.reset_index()

    movies_to_be_recommend = recommendation_df[recommendation_df["weighted_rating"] > score].sort_values("weighted_rating", ascending=False)
    movie = pd.read_csv('dataset/movie.csv')
    return movies_to_be_recommend.merge(movie[["movieId", "title"]])

<br>

<hr>

### Read Dataset 

In [4]:
movie = pd.read_csv('dataset/movie.csv')
rating = pd.read_csv('dataset/rating.csv')

In [7]:
user_movie_df = create_user_movie_df(movie,rating)

random_user = int(pd.Series(user_movie_df.index).sample(1).values)

user_based_recommender(random_user, user_movie_df, cor_th=0.70, score=4)

Unnamed: 0,movieId,weighted_rating,title
0,4223,5.0,Enemy at the Gates (2001)
1,277,4.5,Miracle on 34th Street (1994)
2,1036,4.5,Die Hard (1988)
3,1285,4.5,Heathers (1989)
4,1777,4.5,"Wedding Singer, The (1998)"
5,1961,4.5,Rain Man (1988)
6,4698,4.5,Orphans (1997)
7,5620,4.5,Sweet Home Alabama (2002)
