### Overview of Library Imports

Here is a list of the primary Python libraries used within this project and their uses:

- **pandas, numpy**: numerical calculations and data manipulation.
- **os, random**: file system access and generation of random numbers.
- **joblib**: saving/loading large objects and models.
- **TfidfVectorizer & cosine_similarity:** text feature extraction and measurement of similarity
- **MinMaxScaler**: scales numeric data.
- **ndcg_score**: measures recommendation ranking quality.
- **Surprise: SVD, Dataset, Reader, accuracy**: Collaborative filtering and evaluation tools.
- **train_test_split (Surprise)**: separates the data for
- **defaultdict**: a dictionary with a default value. 
- **matplotlib, seaborn**: libraries for data visualization.

In [1]:
import pandas as pd
import numpy as np
import random
import os
import joblib
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import ndcg_score
from surprise import SVD, Dataset, Reader, accuracy
from surprise.model_selection import train_test_split
from collections import defaultdict
import matplotlib.pyplot as plt
import seaborn as sns

Creates a **models** folder if it doesn't already exist, avoiding errors if the folder is already there.

In [2]:
os.makedirs("models", exist_ok=True)

Loads IMDb datasets (title.basics, title.ratings, name.basics) from TSV files into pandas DataFrames for further processing.


In [3]:
print("[INFO] Loading IMDb datasets...")
title_basics = pd.read_csv("title.basics.tsv", sep="\t", low_memory=False)
title_ratings = pd.read_csv("title.ratings.tsv", sep="\t", low_memory=False)
name_basics = pd.read_csv("name.basics.tsv", sep="\t", low_memory=False)

[INFO] Loading IMDb datasets...


- Filters **title_basics** to include only rows where the title type is **movie**.
- Selects key columns from **title_basics**: **tconst**, **titleType**, **primaryTitle**, **startYear**, and **genres**.
- Selects key columns from **title_ratings**: **tconst**, **averageRating**, and **numVotes**.
- Selects and cleans **name_basics** by keeping **nconst**, **primaryName**, and **knownForTitles**, and drops rows with missing values.


In [4]:
title_basics = title_basics[title_basics["titleType"] == "movie"]
title_basics = title_basics[["tconst", "titleType", "primaryTitle", "startYear", "genres"]]
title_ratings = title_ratings[["tconst", "averageRating", "numVotes"]]
name_basics = name_basics[["nconst", "primaryName", "knownForTitles"]].dropna()