## Implicit vs. explicit data

As mentioned in the video exercise, feedback used in recommendation engines can be explicit or implicit.

The dataset listening_history_df has been loaded for you. This dataset contains columns identifying the users, the songs they listen to, along with:

    - Skipped Track: A Boolean column recording whether the user skipped the song or listened to it to the end.
    - Rating: The score out of 10 the user gave the song.

In this exercise, you will explore the data and from this exploration identify which columns best reflect explicit feedback versus implicit feedback.

### Instructions 1/2
    - Inspect the first 5 rows of listening_history_df.
    - Print the number of unique values in the Rating and Skipped Track columns.
    - Display a histogram of the values in the Rating column


In [None]:
# Inspect the listening_history_df DataFrame
print(listening_history_df.head())

# Calculate the number of unique values
print(listening_history_df[['Rating', 'Skipped Track']].nunique())

# Display a histogram of the values in the Rating column
listening_history_df['Rating'].hist()
plt.show()

## Introduction to non-personalized recommendations

One of the most basic ways to make recommendations is to go with the knowledge of the crowd and recommend what is already the most popular. In this exercise, you will calculate how often each movie in the dataset has been watched and find the most frequently watched movies.

The DataFrame user_ratings_df, which is a subset of the Movie Lens dataset, has been loaded for you. This table contains identifiers for each movie and the user who watched it, along with the rating they gave it.

### Instructions 1/2
    - Calculate the number of times each movie occurs in the dataset.
    - Print the titles of the top five most frequently seen movies.

In [None]:
# Get the counts of occurrences of each movie title
movie_popularity = user_ratings_df["title"].value_counts()

# Inspect the most common values
print(movie_popularity.head().index)

## Improved non-personalized recommendations

Just because a movie has been watched by a lot of people doesn't necessarily mean viewers enjoyed it. To understand how a viewer actually felt about a movie, more explicit data is useful. Thankfully, you also have ratings from each of the viewers in the Movie Lens dataset.

In this exercise, you will find the average rating of each movie in the dataset, and then find the movie with the highest average rating.

You will use the same user_ratings_df as you used in the previous exercise, which has been loaded for you.

### Instructions
    - Find the average rating for each of the movies and store it as a DataFrame called average_rating_df.
    - Sort the average_rating_df DataFrame by the average rating column from highest to lowest and store it as sorted_average_ratings.
    - Print the entries for the top five highest ranked movies in sorted_average_ratings.

In [None]:
# Find the mean of the ratings given to each title
average_rating_df = user_ratings_df[["title", "rating"]].groupby('title').mean()

# Order the entries by highest average rating to lowest
sorted_average_ratings = average_rating_df.sort_values(by="rating", ascending=False)

# Inspect the top movies
print(sorted_average_ratings.head())

## Combining popularity and reviews

In the past two exercises, you have used the two most common non-personalized recommendation methods to find movies to suggest. As you may have noticed, they both have their weaknesses.

Finding the most frequently watched movies will show you what has been watched, but not how people explicitly feel about it. However, finding the average of reviews has the opposite problem where we have customers' explicit feedback, but individual preferences are skewing the data.

In this exercise, you will combine the two previous methods to find the average rating only for movies that have been reviewed more than 50 times.

### Instructions 1/3
    - Generate a list of the names of the movies appearing more than 50 times in user_ratings_df and store it as popular_movies.

In [None]:
# Create a list of only the frequently watched movies
movie_popularity = user_ratings_df["title"].value_counts()
popular_movies = movie_popularity[movie_popularity > 50].index

print(popular_movies)

### Instructions 2/3
    - Filter the original user_ratings_df DataFrame by the popular_movies list to create a popular_movies_rankings DataFrame and print the results.

In [None]:
# Create a list of only movies appearing > 50 times in the dataset
movie_popularity = user_ratings_df["title"].value_counts()
popular_movies = movie_popularity[movie_popularity > 50].index

# Use this popular_movies list to filter the original DataFrame
popular_movies_rankings = user_ratings_df[user_ratings_df["title"].isin(popular_movies)]

# Inspect the movies watched over 50 times
print(popular_movies_rankings)

### Instructions 3/3
    - Find the average rating given to the frequently watched films in popular_movies_rankings and store it as popular_movies_average_rankings.
    - Print the entries in popular_movies_average_rankings that contain the highest-ranked movies.

In [None]:
# Create a list of only movies appearing > 50 times in the dataset
movie_popularity = user_ratings_df["title"].value_counts()
popular_movies = movie_popularity[movie_popularity > 50].index

# Use this popular_movies list to filter the original DataFrame
popular_movies_rankings =  user_ratings_df[user_ratings_df["title"].isin(popular_movies)]

# Find the average rating given to these frequently watched films
popular_movies_average_rankings = popular_movies_rankings[["title", "rating"]].groupby('title').mean()
print(popular_movies_average_rankings.sort_values(by="rating", ascending=False).head())