Implement a popularity‐aware content‐based recommendation technique. Write a program that: 
A) Accepts the user ID as input (on the console) 
B) Displays the user profile in terms of the rated items 

In [5]:
import pandas as pd

input_user = int(input("Insert user ID "))

df_ratings = pd.read_table("ratings.dat", sep="::", header=None, names=["user_id", "movie_id", "rating", "timestamp"])

df_movies = pd.read_table("movies.dat", sep="::", names=["movie_id", "title", "genre",], encoding="ISO-8859-1")

merged_df = pd.merge(df_ratings, df_movies, how="left")

rated_movies = merged_df[merged_df.user_id == input_user]   # df with all the movies rated by our user


  df_ratings = pd.read_table("ratings.dat", sep="::", header=None, names=["user_id", "movie_id", "rating", "timestamp"])
  df_movies = pd.read_table("movies.dat", sep="::", names=["movie_id", "title", "genre",], encoding="ISO-8859-1")


C) Prints the top‐10 recommendations on the console. To implement the algorithm: 
   1) Create a user profile based on the genres of the movies. Count how often each movie 
     genre appeared in the set of the movies that the user has liked (i.e., when the rating is 
     greater than 3). 

In [6]:
liked_movies = rated_movies[rated_movies.rating > 3]     # df with only the movies with rating greater than 3

user_profile = liked_movies.genre.apply(lambda row:pd.value_counts(row.split("|"))).sum(axis=0)

user_profile

Drama        52.0
Romance      13.0
Comedy       14.0
Action       26.0
Crime         6.0
Adventure    11.0
Mystery       1.0
Sci-Fi        8.0
War           8.0
Western       3.0
Thriller     15.0
Film-Noir     1.0
dtype: float64

2) Determine the similarity of each recommendable movie to this user profile. Implement a 
   simple strategy that simply determines the overlap in genres, ignoring how many movies 
   of a certain genre the user has liked. Inspect the outcomes of this recommendation 
   strategy for a few users. 

In [7]:
def similarity (movie_genres, user_profile):     # it takes as input the list of genres in the film and the user's profile
    common_genres = set(movie_genres) & set(user_profile.index)   # and returns a similarity value between 0 and 1
    return len(common_genres) / len(user_profile)  #divide the number of genres in common by the number of genres in the user's profile

recommendable_movies = merged_df[~merged_df.movie_id.isin(rated_movies.movie_id)].copy() #movies that the user hasn't seen yet

# create a new column called 'similarity' which contains the similarity values between the 
# user's profile and each recommended film
recommendable_movies["similarity"] = recommendable_movies["genre"].apply(lambda x: similarity(x.split("|"), user_profile))

sorted_movies = recommendable_movies.sort_values(by="similarity", ascending=False)

sorted_movies = sorted_movies.drop_duplicates(subset=["title"])   #remove the duplicates based on the "title" column

top_recommendations = sorted_movies.head(10)

top_recommendations



Unnamed: 0,user_id,movie_id,rating,timestamp,title,genre,similarity
776938,4640,2322,1,964018744,Soldier (1998),Action|Adventure|Sci-Fi|Thriller|War,0.416667
941178,5680,1264,5,958613183,Diva (1981),Action|Drama|Mystery|Romance|Thriller,0.416667
475982,2921,160,3,971670456,Congo (1995),Action|Adventure|Mystery|Sci-Fi,0.333333
679330,4070,1676,4,965455815,Starship Troopers (1997),Action|Adventure|Sci-Fi|War,0.333333
679358,4070,1197,3,965453303,"Princess Bride, The (1987)",Action|Adventure|Comedy|Romance,0.333333
216981,1317,1127,5,975213881,"Abyss, The (1989)",Action|Adventure|Sci-Fi|Thriller,0.333333
767124,4569,1391,1,964470264,Mars Attacks! (1996),Action|Comedy|Sci-Fi|War,0.333333
855407,5136,1215,4,962091166,Army of Darkness (1993),Action|Adventure|Comedy|Horror|Sci-Fi,0.333333
855404,5136,1200,5,962094529,Aliens (1986),Action|Sci-Fi|Thriller|War,0.333333
767119,4569,1377,3,964470341,Batman Returns (1992),Action|Adventure|Comedy|Crime,0.333333


3) Extend the algorithm as follows. When recommending, remove all movies that have no 
overlap with the given user profile. Rank the remaining items based on their popularity. 
Again, test your method with a few users. 

In [8]:
def new_similarity(movie_genres, user_profile): # new similarity function to filter out all movies that have no    
    common_genres = set(movie_genres) & set(user_profile.index)                #overlap with the given user profile. 
    return common_genres



# new_similarity function is applied to the genres of recommendable_movies,
# then we filter out films that do not have genres in common with the user profile
recommendable_movies['common_genres'] = recommendable_movies['genre'].apply(lambda x: new_similarity(x.split('|'), user_profile))
recommendable_movies = recommendable_movies[recommendable_movies['common_genres'].apply(lambda x: len(x) > 0)]

#here I should create the rating_value column of recommendable_movies

# sort the DataFrame in descending order based on the 'rating_count' column
#sorted_movies = recommendable_movies.sort_values(by="rating_count", ascending=False)

recommendable_movies.head(10)


Unnamed: 0,user_id,movie_id,rating,timestamp,title,genre,similarity,common_genres
2,1,914,3,978301968,My Fair Lady (1964),Musical|Romance,0.083333,{Romance}
3,1,3408,4,978300275,Erin Brockovich (2000),Drama,0.083333,{Drama}
4,1,2355,5,978824291,"Bug's Life, A (1998)",Animation|Children's|Comedy,0.083333,{Comedy}
5,1,1197,3,978302268,"Princess Bride, The (1987)",Action|Adventure|Comedy|Romance,0.333333,"{Adventure, Comedy, Action, Romance}"
6,1,1287,5,978302039,Ben-Hur (1959),Action|Adventure|Drama,0.25,"{Adventure, Action, Drama}"
7,1,2804,5,978300719,"Christmas Story, A (1983)",Comedy|Drama,0.166667,"{Comedy, Drama}"
9,1,919,4,978301368,"Wizard of Oz, The (1939)",Adventure|Children's|Drama|Musical,0.166667,"{Adventure, Drama}"
12,1,2398,4,978302281,Miracle on 34th Street (1947),Drama,0.083333,{Drama}
13,1,2918,4,978302124,Ferris Bueller's Day Off (1986),Comedy,0.083333,{Comedy}
15,1,2791,4,978302188,Airplane! (1980),Comedy,0.083333,{Comedy}
