# Content-Based Movie Recommender
**Goal:** Build a system that recommends movies similar to what you like  
### How it works

- Load the engineered features  

- Calculate similarity between all movies  

- Given a movie, find the most similar ones

- Return top N recommendations  


### 1. Load the engineered features  

In [25]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

print("Libraries imported!")

features_df = pd.read_csv('../date/processed/movies_features.csv')
titles_df   = pd.read_csv('../date/processed/movie_titles.csv')


X_features = features_df.values #transform dataframeul in matrice numerica
titles = titles_df['Title'].tolist()

print("Loaded data successfully!")
print(f"Shape of feature matrix: {X_features.shape[0]} movies × {X_features.shape[1]} embedding features")
print("\nFirst 5 movie titles:")
print(titles[:5])

Libraries imported!
Loaded data successfully!
Shape of feature matrix: 4320 movies × 973 embedding features

First 5 movie titles:
['Parasite', 'Interstellar', 'Barbie', 'Fight Club', 'La La Land']


### 2. Calculate similarity between all movies  

In [26]:
X_features = np.nan_to_num(X_features) #se inlocuiesc valorile NaN cu 0

similarity_matrix = cosine_similarity(X_features) #se calculeaza similaritatea

print(f"Similarity matrix shape: {similarity_matrix.shape}")
print(f"   → {similarity_matrix.shape[0]} movies × {similarity_matrix.shape[1]} movies")


similarity_df = pd.DataFrame(similarity_matrix,index=titles,columns=titles)

print("\nExample: How similar is the a movie to the others?")
title = titles[28]
print(f"\nMovie: {title}")
print(similarity_df.loc[title].sort_values(ascending=False).head(6))


Similarity matrix shape: (4320, 4320)
   → 4320 movies × 4320 movies

Example: How similar is the a movie to the others?

Movie: The Wolf of Wall Street
The Wolf of Wall Street    1.000000
Wall Street                0.502868
Gangs of New York          0.350599
Wolfs                      0.331385
Scarface                   0.330610
Mean Streets               0.312477
Name: The Wolf of Wall Street, dtype: float64


### 3. Return top N recommendations  

In [27]:
def get_recommendations(movie_title, n_recommendations=10):
    if movie_title not in similarity_df.index:
        print(f"Error: '{movie_title}' not found in the database!")
        return None

    similarity_scores = similarity_df.loc[movie_title]
    print(similarity_scores.head(3))
    top_similar = similarity_scores.sort_values(ascending=False).iloc[1:n_recommendations+1] #sar peste primul film pentru ca este el insuusi

    recommendations = pd.DataFrame({
        'Rank': range(1, len(top_similar) + 1),
        'Movie': top_similar.index,
        'Similarity Score': top_similar.values,
        'Match %': (top_similar.values * 100).round(1)
    })

    return recommendations


In [28]:
def search_movie(keyword):
    keyword = keyword.lower()
    matches = [title for title in titles if keyword in title.lower()]

    if len(matches) == 0:
        print(f" No movies found containing '{keyword}'.")
        return None

    print(f"Found {len(matches)} movies containing '{keyword}':\n")
    for m in matches:
        print(f"  → {m}")

    return matches


In [29]:
import pickle #pentru a salva fisiere binare
from pathlib import Path


recommender_system = {
    'titles': titles,
    'X_features': X_features,
}


output_path = Path('../models')
output_path.mkdir(parents=True, exist_ok=True)


with open(output_path / 'content_recommender.pkl', 'wb') as f:
    pickle.dump(recommender_system, f)

print("\nRecommender system saved successfully!")
print("Saved to: ../models/content_recommender.pkl")
print("\nModel includes:")
print(f"  • Similarity matrix: {similarity_matrix.shape}")
print(f"  • Number of movies: {len(titles)}")
print(f"  • Embedding dimensions: {X_features.shape[1]}")



Recommender system saved successfully!
Saved to: ../models/content_recommender.pkl

Model includes:
  • Similarity matrix: (4320, 4320)
  • Number of movies: 4320
  • Embedding dimensions: 973
