<a href="https://colab.research.google.com/github/Iveynganga/Movie-Recommender-System-Capstone-Project/blob/main/MovieRecommenderProject.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1: Fetch Genre List from TMDB

This step fetches the list of genres from the TMDB API and creates a mapping from genre IDs to genre names. This mapping will be used later to convert genre IDs into genre names.

In [2]:
import requests

def get_genres(api_key):
    url = f"https://api.themoviedb.org/3/genre/movie/list?api_key={api_key}&language=en-US"
    response = requests.get(url)
    if response.status_code == 200:
        genres = response.json()['genres']
        genre_map = {genre['id']: genre['name'] for genre in genres}
        return genre_map
    else:
        return None

# Example usage
api_key = '01d2a425252c60a07d9035e905a50397'
genre_map = get_genres(api_key)
print(genre_map)


{28: 'Action', 12: 'Adventure', 16: 'Animation', 35: 'Comedy', 80: 'Crime', 99: 'Documentary', 18: 'Drama', 10751: 'Family', 14: 'Fantasy', 36: 'History', 27: 'Horror', 10402: 'Music', 9648: 'Mystery', 10749: 'Romance', 878: 'Science Fiction', 10770: 'TV Movie', 53: 'Thriller', 10752: 'War', 37: 'Western'}


# 2: Fetch Movie Details from TMDB using IMDb ID

This step fetches movie details from TMDB using an IMDb ID which is a unique identifier for movies in the IMDb database, which we can use to query TMDB. This includes information like the movie's title, genres, rating, and popularity.

In [3]:
def get_movie_details(imdb_id, api_key):
    url = f"https://api.themoviedb.org/3/find/{imdb_id}?api_key={api_key}&external_source=imdb_id"
    response = requests.get(url)
    if response.status_code == 200:
        return response.json()
    else:
        return None

# Example usage
imdb_id = 'tt0111161'  # Example IMDb ID for The Shawshank Redemption
movie_details = get_movie_details(imdb_id, api_key)
print(movie_details)


{'movie_results': [{'backdrop_path': '/avedvodAZUcwqevBfm8p4G2NziQ.jpg', 'id': 278, 'title': 'The Shawshank Redemption', 'original_title': 'The Shawshank Redemption', 'overview': 'Imprisoned in the 1940s for the double murder of his wife and her lover, upstanding banker Andy Dufresne begins a new life at the Shawshank prison, where he puts his accounting skills to work for an amoral warden. During his long stretch in prison, Dufresne comes to be admired by the other inmates -- including an older prisoner named Red -- for his integrity and unquenchable sense of hope.', 'poster_path': '/9cqNxx0GxF0bflZmeSMuL5tnGzr.jpg', 'media_type': 'movie', 'adult': False, 'original_language': 'en', 'genre_ids': [18, 80], 'popularity': 227.544, 'release_date': '1994-09-23', 'video': False, 'vote_average': 8.705, 'vote_count': 26561}], 'person_results': [], 'tv_results': [], 'tv_episode_results': [], 'tv_season_results': []}


# 3: Preprocess Movie Data

In this step, we preprocess the movie data to extract relevant information and convert genre IDs to genre names using the genre_map created in Step 1.
This involves converting the raw movie data (with genre IDs) into a more readable format (with genre names) and creating a DataFrame for further analysis.

In [4]:
import pandas as pd

def preprocess_movie_data(movie_details, genre_map):
    if 'movie_results' not in movie_details:
        print("Key 'movie_results' not found in movie_details")
        return pd.DataFrame()

    movies = []
    for result in movie_details['movie_results']:
        movie = {
            'id': result.get('id'),
            'title': result.get('title'),
            'genres': [genre_map.get(genre_id) for genre_id in result.get('genre_ids', [])],
            'rating': result.get('vote_average'),
            'popularity': result.get('popularity')
        }
        movies.append(movie)
    return pd.DataFrame(movies)

# Example usage
movie_data = preprocess_movie_data(movie_details, genre_map)
print(movie_data.head())


    id                     title          genres  rating  popularity
0  278  The Shawshank Redemption  [Drama, Crime]   8.705     227.544


# 4: Fetch Multiple Movies and Create User-Movie Matrix

In [5]:
def fetch_movies(api_key, genre_map, num_movies=10):
    movies = []
    for page in range(1, num_movies // 20 + 2):
        url = f"https://api.themoviedb.org/3/movie/popular?api_key={api_key}&language=en-US&page={page}"
        response = requests.get(url)
        if response.status_code == 200:
            results = response.json()['results']
            for result in results:
                movie = {
                    'id': result.get('id'),
                    'title': result.get('title'),
                    'genres': [genre_map.get(genre_id) for genre_id in result.get('genre_ids', [])],
                    'rating': result.get('vote_average'),
                    'popularity': result.get('popularity')
                }
                movies.append(movie)
                if len(movies) >= num_movies:
                    break
        if len(movies) >= num_movies:
            break
    return pd.DataFrame(movies)

# Example usage
num_movies = 10  # Number of movies to fetch
movie_data = fetch_movies(api_key, genre_map, num_movies)
print(movie_data.head())


        id                   title                                  genres  \
0   533535    Deadpool & Wolverine       [Action, Comedy, Science Fiction]   
1   762441  A Quiet Place: Day One     [Horror, Science Fiction, Thriller]   
2   573435   Bad Boys: Ride or Die       [Action, Crime, Thriller, Comedy]   
3  1022789            Inside Out 2  [Animation, Family, Adventure, Comedy]   
4   519182         Despicable Me 4     [Animation, Family, Comedy, Action]   

   rating  popularity  
0   7.900   23022.328  
1   7.006    6197.201  
2   7.656    5462.371  
3   7.632    4449.143  
4   7.192    3557.455  


# 5: Create User-Movie Matrix

In this step, we will create a user-movie matrix, which is essential for collaborative filtering. This matrix represents user ratings for different movies, with users as rows and movies as columns.

In [6]:
def create_user_movie_matrix(movie_data):
    if 'id' not in movie_data.columns:
        print("Column 'id' not found in movie_data")
        return pd.DataFrame()

    # data for user ratings
    user_ratings = {
        'user_id': [1, 2, 1, 3, 2, 3],
        'movie_id': [movie_data['id'].iloc[0], movie_data['id'].iloc[1], movie_data['id'].iloc[2], movie_data['id'].iloc[3], movie_data['id'].iloc[4], movie_data['id'].iloc[5]],
        'rating': [5, 4, 3, 2, 3, 4]
    }
    ratings_df = pd.DataFrame(user_ratings)

    #Create the user-movie matrix using the pivot table method
    user_movie_matrix = ratings_df.pivot_table(index='user_id', columns='movie_id', values='rating').fillna(0)
    return user_movie_matrix

user_movie_matrix = create_user_movie_matrix(movie_data)
print(user_movie_matrix.head())


movie_id  519182   533535   573435   762441   799583   1022789
user_id                                                       
1             0.0      5.0      3.0      0.0      0.0      0.0
2             3.0      0.0      0.0      4.0      0.0      0.0
3             0.0      0.0      0.0      0.0      4.0      2.0


# 6: Calculate Similarity

This step calculates the similarity between users using the user-movie matrix. We will use cosine similarity, a common metric for calculating similarity in collaborative filtering systems.

In [7]:
from sklearn.metrics.pairwise import cosine_similarity

def calculate_similarity(user_movie_matrix):
    cosine_sim = cosine_similarity(user_movie_matrix)
    return cosine_sim

cosine_sim = calculate_similarity(user_movie_matrix)
print(cosine_sim)


[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


# 7: Recommend Movies

In [8]:
def recommend_movies(user_id, user_movie_matrix, cosine_sim, n=10):
    if user_id not in user_movie_matrix.index:
        print(f"User ID {user_id} not found in user_movie_matrix")
        return pd.Series()

    user_index = user_movie_matrix.index.get_loc(user_id)
    sim_scores = list(enumerate(cosine_sim[user_index]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:n+1]  # Get top n similar users
    movie_indices = [i[0] for i in sim_scores]
    similar_users = user_movie_matrix.iloc[movie_indices]
    movie_recommendations = similar_users.mean(axis=0).sort_values(ascending=False)
    return movie_recommendations.head(n)

# Example usage
user_id = 1  # Example user ID
recommendations = recommend_movies(user_id, user_movie_matrix, cosine_sim)
print(recommendations)


movie_id
762441     2.0
799583     2.0
519182     1.5
1022789    1.0
533535     0.0
573435     0.0
dtype: float64
