# Knowledge-Based Filtering: Comprehensive Analysis

This notebook presents a complete analysis of knowledge-based filtering recommendation systems using the TMDB dataset. It combines theoretical foundations with practical implementations of both constraint-based and case-based approaches, along with comparative analysis.

---

## 1. Introduction to Knowledge-Based Filtering

Knowledge-based filtering is a recommendation technique that uses **explicit knowledge** about user requirements and item attributes to generate recommendations. Unlike collaborative filtering (which relies on user behavior patterns) or content-based filtering (which relies on item similarity), knowledge-based systems engage in a **dialogue with the user** to precisely determine their preferences.

### Key Principle:
*"The system helps users find items by understanding their explicit requirements and constraints, without needing historical interaction data."*

### How It Works:
1. **Elicit User Requirements**: Gather explicit preferences through dialogue or forms
2. **Apply Knowledge Rules**: Use domain knowledge to filter or rank items
3. **Generate Recommendations**: Present items that satisfy user requirements
4. **Refine Through Feedback**: Allow users to critique and refine recommendations

### Types of Knowledge-Based Filtering:

#### **Constraint-Based (Preference-Based)**
- User specifies **hard constraints** (requirements that must be satisfied)
- System filters items to find those matching all constraints
- Example: "Find action movies from the 1990s with rating > 7.0"

<img src="../../images/kb_constraints_architecture.png" alt="Constraint-Based Filtering Architecture" width="1200">

#### **Case-Based Reasoning (CBR)**
- User provides a **reference case** (example item they like)
- System finds similar items and allows **critiquing**
- User can iteratively refine by excluding unwanted features
- Example: "Find movies like 'Finding Nemo' but not animated"

<img src="../../images/kb_case_architecture.png" alt="Case-Based Filtering Architecture" width="1200">

### Advantages:
- ✅ **No Cold Start Problem**: Works immediately with new users and items
- ✅ **Transparency**: Clear explanation of why items are recommended
- ✅ **Precision**: Exact matching of user requirements
- ✅ **No Gray Sheep Problem**: Works for users with unique tastes
- ✅ **Domain Expertise**: Leverages expert knowledge about the domain

### Disadvantages:
- ❌ **User Effort Required**: Users must explicitly specify preferences
- ❌ **Limited Serendipity**: Unlikely to discover unexpected items
- ❌ **Knowledge Engineering**: Requires domain expertise to build
- ❌ **Scalability**: Can be computationally expensive with many constraints
- ❌ **No Learning**: Doesn't automatically adapt to user behavior

---

## 2. Dataset Description

We use the **Full TMDB Movies Dataset 2024** for our knowledge-based filtering experiments.

**Dataset Characteristics**:
- **Source**: The Movie Database (TMDB)
- **Size**: ~10,000 movies
- **Features**: Rich metadata including genres, keywords, production companies, countries, budget, revenue, ratings, runtime, release dates

**Key Attributes for Knowledge-Based Filtering**:
- **Numerical**: vote_average, vote_count, revenue, budget, runtime, popularity
- **Categorical**: genres, keywords, production_companies, production_countries, original_language
- **Temporal**: release_date
- **Textual**: title, overview

This rich feature set makes TMDB ideal for knowledge-based filtering, as it provides multiple dimensions for constraint specification and case-based similarity.

---

## 3. Constraint-Based Filtering

### Theory and Approach

Constraint-based filtering works by applying a set of **hard constraints** (requirements) to filter the item catalog. Each constraint eliminates items that don't satisfy the requirement, resulting in a set of items that match **all** specified criteria.

**Mathematical Formulation**:

Let:
- $I$ = set of all items (movies)
- $C = \{c_1, c_2, ..., c_n\}$ = set of constraints
- $c_i(item)$ = boolean function returning true if item satisfies constraint $i$

The recommendation set $R$ is:

$$R = \{item \in I \mid \forall c_i \in C: c_i(item) = true\}$$

**Constraint Types**:

1. **Range Constraints**: $min \leq attribute \leq max$
   - Example: $6.0 \leq vote\_average \leq 10.0$

2. **Categorical Constraints**: $attribute \in \{value_1, value_2, ...\}$
   - Example: $genre \in \{Action, Adventure\}$

3. **String Matching**: $pattern \subseteq attribute$
   - Example: $"star" \subseteq title$

4. **Temporal Constraints**: $date_1 \leq release\_date \leq date_2$
   - Example: $1980 \leq year \leq 1998$

### Available Constraints

The following table shows the constraints available in our TMDB-based system:

| Constraint | Application |
|------------|-------------|
| **Minimum Average Rating** | Filter movies based on user ratings |
| **Minimum Vote Count** | Find more or less popular movies |
| **Release Date Range** | Find movies released in a specific time period |
| **Minimum Revenue** | Find commercially successful movies |
| **Runtime Range** | Find movies of specific length |
| **Minimum Budget** | Filter low-budget or high-budget movies |
| **Original Language** | Find movies in a specific language |
| **Genres** | Find movies of specific genres |
| **Production Countries** | Find movies produced in specific countries |
| **Production Companies** | Find movies from specific studios |
| **Keywords** | Find movies based on specific themes or topics |
| **Minimum Popularity** | Find trending or well-known movies |
| **Title Keyword** | Find movies with specific words in the title |

---

### Implementation

In [None]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Load TMDB dataset
movies = pd.read_csv('../../Datasets/TMDB/TMDB_movie_dataset_v11.csv')
movies.drop_duplicates(inplace=True)
movies.fillna('', inplace=True)

print(f"Dataset loaded: {len(movies)} movies")
print(f"\nColumns: {list(movies.columns)}")
print(f"\nSample data:")
movies.head()

Dataset loaded: 1047195 movies

Columns: ['id', 'title', 'vote_average', 'vote_count', 'status', 'release_date', 'revenue', 'runtime', 'adult', 'backdrop_path', 'budget', 'homepage', 'imdb_id', 'original_language', 'original_title', 'overview', 'popularity', 'poster_path', 'tagline', 'genres', 'production_companies', 'production_countries', 'spoken_languages', 'keywords']

Sample data:


Unnamed: 0,id,title,vote_average,vote_count,status,release_date,revenue,runtime,adult,backdrop_path,...,original_title,overview,popularity,poster_path,tagline,genres,production_companies,production_countries,spoken_languages,keywords
0,27205,Inception,8.364,34495,Released,2010-07-15,825532764,148,False,/8ZTVqvKDQ8emSGUEMjsS4yHAwrp.jpg,...,Inception,"Cobb, a skilled thief who commits corporate es...",83.952,/oYuLEt3zVCKq57qu2F8dT7NIa6f.jpg,Your mind is the scene of the crime.,"Action, Science Fiction, Adventure","Legendary Pictures, Syncopy, Warner Bros. Pict...","United Kingdom, United States of America","English, French, Japanese, Swahili","rescue, mission, dream, airplane, paris, franc..."
1,157336,Interstellar,8.417,32571,Released,2014-11-05,701729206,169,False,/pbrkL804c8yAv3zBZR4QPEafpAR.jpg,...,Interstellar,The adventures of a group of explorers who mak...,140.241,/gEU2QniE6E77NI6lCU6MxlNBvIx.jpg,Mankind was born on Earth. It was never meant ...,"Adventure, Drama, Science Fiction","Legendary Pictures, Syncopy, Lynda Obst Produc...","United Kingdom, United States of America",English,"rescue, future, spacecraft, race against time,..."
2,155,The Dark Knight,8.512,30619,Released,2008-07-16,1004558444,152,False,/nMKdUUepR0i5zn0y1T4CsSB5chy.jpg,...,The Dark Knight,Batman raises the stakes in his war on crime. ...,130.643,/qJ2tW6WMUDux911r6m7haRef0WH.jpg,Welcome to a world without rules.,"Drama, Action, Crime, Thriller","DC Comics, Legendary Pictures, Syncopy, Isobel...","United Kingdom, United States of America","English, Mandarin","joker, sadism, chaos, secret identity, crime f..."
3,19995,Avatar,7.573,29815,Released,2009-12-15,2923706026,162,False,/vL5LR6WdxWPjLPFRLe133jXWsh5.jpg,...,Avatar,"In the 22nd century, a paraplegic Marine is di...",79.932,/kyeqWdyUXW608qlYkRqosgbbJyK.jpg,Enter the world of Pandora.,"Action, Adventure, Fantasy, Science Fiction","Dune Entertainment, Lightstorm Entertainment, ...","United States of America, United Kingdom","English, Spanish","future, society, culture clash, space travel, ..."
4,24428,The Avengers,7.71,29166,Released,2012-04-25,1518815515,143,False,/9BBTo63ANSmhC4e6r62OJFuK2GL.jpg,...,The Avengers,When an unexpected enemy emerges and threatens...,98.082,/RYMX2wcKCBAr24UyPD7xwmjaTn.jpg,Some assembly required.,"Science Fiction, Action, Adventure",Marvel Studios,United States of America,"English, Hindi, Russian","new york city, superhero, shield, based on com..."


In [None]:
def recommend_movies(movies, title_keyword='', min_vote_average=0, min_vote_count=0,
                     start_date='1900-01-01', end_date='2100-01-01', min_revenue=0,
                     min_runtime=0, max_runtime=1000, min_budget=0, max_budget=float('inf'),
                     language='', genres=[], production_company='', production_country='',
                     keywords=[], min_popularity=0):
    """
    Constraint-based movie recommendation function.

    Parameters:
    -----------
    movies : DataFrame
        Movie dataset
    title_keyword : str
        Keyword to search in movie title
    min_vote_average : float
        Minimum average rating (0-10)
    min_vote_count : int
        Minimum number of votes
    start_date : str
        Earliest release date (YYYY-MM-DD)
    end_date : str
        Latest release date (YYYY-MM-DD)
    min_revenue : float
        Minimum revenue in dollars
    min_runtime : int
        Minimum runtime in minutes
    max_runtime : int
        Maximum runtime in minutes
    min_budget : float
        Minimum budget in dollars
    max_budget : float
        Maximum budget in dollars
    language : str
        Original language code (e.g., 'en', 'fr')
    genres : list
        List of required genres
    production_company : str
        Production company name (partial match)
    production_country : str
        Production country name (partial match)
    keywords : list
        List of required keywords
    min_popularity : float
        Minimum popularity score

    Returns:
    --------
    DataFrame
        Filtered movies matching all constraints
    """
    filtered_movies = movies.copy()

    # Title keyword constraint
    if title_keyword:
        filtered_movies = filtered_movies[filtered_movies['title'].str.contains(title_keyword, case=False, na=False)]

    # Rating constraints
    if min_vote_average > 0:
        filtered_movies = filtered_movies[filtered_movies['vote_average'] >= min_vote_average]

    if min_vote_count > 0:
        filtered_movies = filtered_movies[filtered_movies['vote_count'] >= min_vote_count]

    # Date constraints
    filtered_movies = filtered_movies[(filtered_movies['release_date'] >= start_date) &
                                      (filtered_movies['release_date'] <= end_date)]

    # Revenue constraint
    if min_revenue > 0:
        filtered_movies = filtered_movies[filtered_movies['revenue'] >= min_revenue]

    # Runtime constraints
    filtered_movies = filtered_movies[(filtered_movies['runtime'] >= min_runtime) &
                                      (filtered_movies['runtime'] <= max_runtime)]

    # Budget constraints
    filtered_movies = filtered_movies[(filtered_movies['budget'] >= min_budget) &
                                      (filtered_movies['budget'] <= max_budget)]

    # Language constraint
    if language:
        filtered_movies = filtered_movies[filtered_movies['original_language'] == language]

    # Genre constraints
    if genres:
        filtered_movies = filtered_movies[filtered_movies['genres'].apply(
            lambda x: any(genre in str(x) for genre in genres))]

    # Production company constraint
    if production_company:
        filtered_movies = filtered_movies[filtered_movies['production_companies'].str.contains(
            production_company, case=False, na=False)]

    # Production country constraint
    if production_country:
        filtered_movies = filtered_movies[filtered_movies['production_countries'].str.contains(
            production_country, case=False, na=False)]

    # Keywords constraints
    if keywords:
        filtered_movies = filtered_movies[filtered_movies['keywords'].apply(
            lambda x: any(keyword in str(x) for keyword in keywords))]

    # Popularity constraint
    if min_popularity > 0:
        filtered_movies = filtered_movies[filtered_movies['popularity'] >= min_popularity]

    return filtered_movies

### Star Wars Themed Movies

**User Requirements**:
- Title contains "star"
- Minimum rating: 6.0
- Release date: 1980-1998
- Genres: Action, Adventure
- Keywords: space, alien, battle


In [None]:
recommendations_exp1 = recommend_movies(
    movies,
    title_keyword='star',
    min_vote_average=6.0,
    start_date='1980-01-01',
    end_date='1998-12-31',
    genres=['Action', 'Adventure'],
    keywords=['space', 'alien', 'battle']
)

print(f"Found {len(recommendations_exp1)} movies matching all constraints\n")
print("Recommendations:")
recommendations_exp1[['title', 'vote_average', 'release_date', 'genres', 'keywords']].head(10)

Found 16 movies matching all constraints

Recommendations:


Unnamed: 0,title,vote_average,release_date,genres,keywords
954,Starship Troopers,7.03,1997-11-07,"Adventure, Action, Thriller, Science Fiction","spacecraft, space marine, army, moon, based on..."
1452,Stargate,6.978,1994-10-28,"Action, Adventure, Science Fiction","egypt, teleportation, pyramid, space travel, u..."
2462,Star Trek II: The Wrath of Khan,7.5,1982-06-04,"Action, Adventure, Science Fiction, Thriller","spacecraft, life and death, genetics, asteroid..."
2657,Star Trek: First Contact,7.3,1996-11-22,"Science Fiction, Action, Adventure, Thriller","spacecraft, teleportation, inventor, starship,..."
3115,Star Trek IV: The Voyage Home,7.2,1986-11-26,"Science Fiction, Adventure","spacecraft, saving the world, teleportation, s..."
3352,Star Trek III: The Search for Spock,6.6,1984-06-01,"Science Fiction, Action, Adventure, Thriller","spacecraft, friendship, teleportation, genesis..."
3425,Star Trek: Generations,6.5,1994-11-18,"Science Fiction, Action, Adventure, Thriller","android, spacecraft, teleportation, starship, ..."
3551,Star Trek VI: The Undiscovered Country,7.0,1991-12-06,"Science Fiction, Action, Adventure, Thriller","spacecraft, plan, farewell, court case, telepo..."
3700,Star Trek: Insurrection,6.4,1998-12-11,"Science Fiction, Action, Adventure, Thriller","spacecraft, teleportation, starship, fountain ..."
5581,The Last Starfighter,6.596,1984-07-13,"Adventure, Science Fiction, Action","android, flying car, space marine, trailer par..."


**Analysis**:

The system successfully filtered movies to science fiction films with "star" in the title, matching the specified criteria. Results are consistent with user requirements, demonstrating the system's effectiveness in adapting to specific user needs.

---

### Post-Apocalyptic Dramas

**User Requirements**:
- Minimum popularity: 20
- Minimum rating: 6.8
- Release date: After 2000
- Genres: Drama, Horror
- Keywords: post-apocalyptic, future

In [None]:
recommendations_exp2 = recommend_movies(
    movies,
    min_popularity=20,
    min_vote_average=6.8,
    start_date='2000-01-01',
    genres=['Drama', 'Horror'],
    keywords=['post-apocalyptic', 'future']
)

print(f"Found {len(recommendations_exp2)} movies matching all constraints\n")
print("Recommendations:")
recommendations_exp2[['title', 'vote_average', 'popularity', 'release_date', 'genres', 'keywords']].head(10)

Found 22 movies matching all constraints

Recommendations:


Unnamed: 0,title,vote_average,popularity,release_date,genres,keywords
1,Interstellar,8.417,140.241,2014-11-05,"Adventure, Drama, Science Fiction","rescue, future, spacecraft, race against time,..."
60,Logan,7.8,54.194,2017-02-28,"Action, Drama, Science Fiction","future, experiment, immortality, self-destruct..."
115,I Am Legend,7.201,59.815,2007-12-12,"Drama, Science Fiction, Thriller","new york city, saving the world, based on nove..."
145,Her,7.862,37.608,2013-12-18,"Romance, Science Fiction, Drama","future, artificial intelligence (a.i.), comput..."
155,A Quiet Place,7.396,38.002,2018-04-03,"Horror, Drama, Science Fiction","pregnancy, fireworks, deaf, post-apocalyptic f..."
175,Blade Runner 2049,7.544,70.54,2017-10-04,"Science Fiction, Drama","future, android, bounty hunter, artificial int..."
207,Zombieland,7.3,32.08,2009-10-07,"Comedy, Horror","washington dc, usa, sibling relationship, circ..."
256,Dawn of the Planet of the Apes,7.306,59.336,2014-06-26,"Science Fiction, Action, Drama, Thriller","dystopia, post-apocalyptic future, animal atta..."
325,Bird Box,6.854,27.913,2018-12-13,"Horror, Thriller, Drama","based on novel or book, post-apocalyptic futur..."
346,Snowpiercer,6.902,34.597,2013-08-01,"Action, Science Fiction, Drama","parent child relationship, winter, brothel, ch..."


**Analysis**:

The system suggested movies consistent with the requirements, including dramas with post-apocalyptic themes set in the future. Results confirm that the system effectively filters content according to specified criteria.

---

### Universal Pictures + France Productions

**User Requirements**:
- Minimum vote count: 4000
- Minimum budget: $10,000,000
- Production company: Universal
- Production country: France

In [None]:
recommendations_exp3 = recommend_movies(
    movies,
    min_vote_count=4000,
    min_budget=10000000,
    production_company='Universal',
    production_country='France'
)

print(f"Found {len(recommendations_exp3)} movies matching all constraints\n")
print("Recommendations:")
recommendations_exp3[['title', 'vote_average', 'vote_count', 'budget', 'production_companies', 'production_countries']].head(10)

Found 8 movies matching all constraints

Recommendations:


Unnamed: 0,title,vote_average,vote_count,budget,production_companies,production_countries
30,Inglourious Basterds,8.215,20746,70000000,"The Weinstein Company, Universal Pictures, A B...","France, Germany, United States of America"
460,Pride & Prejudice,8.083,7394,28000000,"Focus Features, Universal Pictures, StudioCana...","France, United Kingdom, United States of America"
490,The Bourne Ultimatum,7.421,7127,70000000,"Universal Pictures, The Kennedy/Marshall Compa...","France, Germany, Spain, United States of America"
508,Hot Fuzz,7.554,6929,12000000,"Universal Pictures, Big Talk Productions, Stud...","France, United Kingdom, United States of America"
635,Love Actually,7.1,6047,40000000,"Working Title Films, DNA Films, Universal Pict...","France, United Kingdom"
787,Casino,8.004,5240,50000000,"Universal Pictures, Syalis DA, De Fina-Cappa, ...","France, United States of America"
841,Non-Stop,6.811,4987,50000000,"TF1, StudioCanal, Silver Pictures, Anton Capit...","Canada, France, United Kingdom, United States ..."
854,The Purge: Election Year,6.394,4919,10000000,"Why Not Productions, Universal Pictures, Blumh...","France, Japan, United States of America"


**Analysis**:

The results show recognizable high-budget films produced by Universal Pictures that were at least partially produced in France. The system correctly applied all imposed constraints, confirming its functionality.

---

### Discussion and Conclusions - Constraint-Based Filtering

**Use Cases**:

Constraint-based filtering is particularly useful when:
- Users have specific, well-defined requirements
- Domain has rich, structured metadata
- Transparency and explainability are important
- Example: Asian cinema enthusiast from the 90s can find content perfectly matching their preferences

**Future Improvements**:

- Expand constraint types and make them more intelligent
- Dynamic adjustment based on user behavior
- More complex search conditions or constraint combinations
- Context-aware constraints (seasonal preferences, current trends)
- Soft constraints (preferences vs. requirements)

---

## 4. Case-Based Reasoning (CBR)

### Theory and Approach

Case-Based Reasoning is an advanced recommendation method where users can **interactively refine** their recommendation list. Unlike traditional content-based or collaborative filtering systems that work statically, case-based systems allow **dynamic modification** of results through critiquing.

**How It Works**:

1. **Select Reference Case**: User chooses a specific movie as a reference point
2. **Find Similar Cases**: System identifies movies with similar features using similarity metrics
3. **Present Recommendations**: Show top-N most similar movies
4. **Critique and Refine**: User provides feedback on unwanted features
5. **Iterate**: System excludes movies with critiqued features and finds new recommendations
6. **Converge**: Process continues until user is satisfied

**Mathematical Formulation**:

Let:
- $C$ = reference case (movie)
- $I$ = set of all items (movies)
- $sim(C, i)$ = similarity between reference case and item $i$
- $E$ = set of excluded features (from critiques)

Initial recommendations:

$$R_0 = \{i \in I \mid i \neq C\}, \text{ ranked by } sim(C, i)$$

After critiquing iteration $k$:

$$R_k = \{i \in R_{k-1} \mid \forall e \in E: e \notin features(i)\}$$

**Similarity Computation**:

We use **TF-IDF + Cosine Similarity** (same as content-based filtering):

1. **Feature Extraction**: Combine genres, keywords, production companies, overview into metadata string
2. **TF-IDF Vectorization**: Convert metadata to numerical vectors
3. **Cosine Similarity**: Calculate similarity between all movie pairs

$$sim(i, j) = \frac{\vec{v}_i \cdot \vec{v}_j}{\|\vec{v}_i\| \times \|\vec{v}_j\|}$$

Where $\vec{v}_i$ and $\vec{v}_j$ are TF-IDF vectors for movies $i$ and $j$.

**Critiquing Mechanism**:

Users can critique recommendations by specifying unwanted features:
- **Genre Critique**: "No Animation"
- **Keyword Critique**: "No Romance"
- **Production Company Critique**: "No Disney"

The system then:
1. Excludes all movies containing the critiqued feature
2. Finds new similar movies from remaining candidates
3. Ensures recommendations list is replenished (e.g., to 20 movies)

---

### Implementation

In [None]:
# Prepare data for case-based reasoning
movies_cbr = movies[(movies['vote_average'] != 0) &
                    (movies['vote_count'] != 0) &
                    (movies['revenue'] != 0) &
                    (movies['budget'] != 0) &
                    (movies['runtime'] > 20) &
                    (movies['vote_count'] > 100)].copy()

movies_cbr['release_date'] = movies_cbr['release_date'].fillna('')

# Convert numerical features to strings for TF-IDF
movies_cbr['vote_average'] = movies_cbr['vote_average'].astype(str)
movies_cbr['vote_count'] = movies_cbr['vote_count'].astype(str)
movies_cbr['runtime'] = movies_cbr['runtime'].astype(str)
movies_cbr['adult'] = movies_cbr['adult'].astype(str)
movies_cbr['popularity'] = movies_cbr['popularity'].astype(str)

# Create metadata column combining relevant features
# Note: This differs from content-based filtering implementation:
# - Content-based uses: vote_average, vote_count, release_date, runtime, adult, language, popularity, tagline
# - Case-based uses: genres, keywords, production_companies, overview
# This results in LOWER similarity scores but MORE FOCUSED recommendations based on content themes
movies_cbr['metadata'] = (movies_cbr['genres'] + ', ' +
                          movies_cbr['keywords'] + ', ' +
                          movies_cbr['production_companies'] + ', ' +
                          movies_cbr['overview'])

print(f"Prepared {len(movies_cbr)} movies for case-based reasoning")
print(f"\nSample metadata:")
movies_cbr[['title', 'metadata']].head(2)

Prepared 7059 movies for case-based reasoning

Sample metadata:


Unnamed: 0,title,metadata
0,Inception,"Action, Science Fiction, Adventure, rescue, mi..."
1,Interstellar,"Adventure, Drama, Science Fiction, rescue, fut..."


In [None]:
# Create TF-IDF matrix
tfidf = TfidfVectorizer(ngram_range=(1, 2), min_df=0.01, max_df=0.9, stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies_cbr['metadata'])

# Calculate cosine similarity
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

print(f"TF-IDF matrix shape: {tfidf_matrix.shape}")
print(f"Cosine similarity matrix shape: {cosine_sim.shape}")

TF-IDF matrix shape: (7059, 912)
Cosine similarity matrix shape: (7059, 7059)


In [None]:
# Create title index for lookups
titles = movies_cbr['title']
indices = pd.Series(movies_cbr.index, index=movies_cbr['title']).drop_duplicates()

def recommend_movies_case_based(movies, movie_title, cosine_sim, excluded_indices=set()):
    """
    Case-based movie recommendation with similarity scoring.

    Parameters:
    -----------
    movies : DataFrame
        Movie dataset
    movie_title : str
        Reference movie title
    cosine_sim : ndarray
        Cosine similarity matrix
    excluded_indices : set
        Indices of movies to exclude from recommendations

    Returns:
    --------
    DataFrame
        Recommended movies with similarity scores
    """
    if movie_title not in indices:
        return "Movie not found in the dataset"

    idx = indices[movie_title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    recommendations = []
    for index, score in sim_scores:
        if index not in excluded_indices and index != idx:
            recommendations.append({
                'index': index,
                'title': titles.iloc[index],
                'genres': movies.iloc[index]['genres'],
                'keywords': movies.iloc[index]['keywords'],
                'similarity_score': score
            })

    return pd.DataFrame(recommendations)

In [None]:
def refine_recommendations(movies, movie_title, cosine_sim, excluded_criteria, excluded_indices):
    """
    Refine recommendations by excluding movies with critiqued features.

    Parameters:
    -----------
    movies : DataFrame
        Movie dataset
    movie_title : str
        Reference movie title
    cosine_sim : ndarray
        Cosine similarity matrix
    excluded_criteria : dict
        Dictionary of {criterion: set of values} to exclude
    excluded_indices : set
        Indices of movies to exclude

    Returns:
    --------
    DataFrame
        Refined recommendations
    """
    # Update excluded indices based on criteria
    for criterion, values in excluded_criteria.items():
        for value in values:
            if criterion == 'genres':
                excluded_indices.update(movies[movies['genres'].str.contains(value, case=False, na=False)].index)
            elif criterion == 'keywords':
                excluded_indices.update(movies[movies['keywords'].str.contains(value, case=False, na=False)].index)

    # Get new recommendations
    new_recommendations = recommend_movies_case_based(movies, movie_title, cosine_sim, excluded_indices)

    # Filter out any remaining movies with excluded criteria
    for criterion, values in excluded_criteria.items():
        for value in values:
            if criterion == 'genres':
                new_recommendations = new_recommendations[~new_recommendations['genres'].str.contains(value, case=False, na=False)]
            elif criterion == 'keywords':
                new_recommendations = new_recommendations[~new_recommendations['keywords'].str.contains(value, case=False, na=False)]

    return new_recommendations

### Finding Nemo with Iterative Refinement

**Reference Movie**: Finding Nemo

**Iteration 1 - Initial Recommendations**:

In [None]:
movie_title = 'Finding Nemo'
excluded_indices = set()
excluded_criteria = {'genres': set(), 'keywords': set()}

# Get initial recommendations
recommendations = recommend_movies_case_based(movies_cbr, movie_title, cosine_sim)

print(f"Initial recommendations for '{movie_title}':\n")
recommendations[['title', 'genres', 'keywords', 'similarity_score']].head(20)

Initial recommendations for 'Finding Nemo':



Unnamed: 0,title,genres,keywords,similarity_score
0,Short Term 12,Drama,"child abuse, parent child relationship, suicid...",0.362222
1,The Life Aquatic with Steve Zissou,"Adventure, Comedy, Drama","parent child relationship, red cap, dysfunctio...",0.360884
2,Honey Boy,Drama,"vietnam veteran, rehabilitation, motel, clown,...",0.350984
3,Daddy Day Care,"Comedy, Family","competition, success, father, kindergarten, un...",0.339121
4,Rudderless,"Music, Drama, Comedy","parent child relationship, rock band, mourning...",0.3276
5,Nothing in Common,"Drama, Comedy, Romance","parent child relationship, divorce",0.32179
6,Ken Park,Drama,"infidelity, california, home, pregnancy, haras...",0.320136
7,Despicable Me 2,"Animation, Comedy, Family","parent child relationship, adoptive father, se...",0.312263
8,The Croods,"Animation, Adventure, Family, Fantasy, Comedy,...","daughter, parent child relationship, stone age...",0.310726
9,House at the End of the Street,"Horror, Thriller","parent child relationship, child hero, cross d...",0.303516


**Iteration 2 - Critique: Exclude Drama and Horror**

User feedback: "I want to watch with family, so exclude Drama and Horror genres"


In [None]:
# Add critique - exclude genres not suitable for family viewing
excluded_criteria['genres'].add('Drama')
excluded_criteria['genres'].add('Horror')

# Refine recommendations
recommendations = refine_recommendations(movies_cbr, movie_title, cosine_sim, excluded_criteria, excluded_indices)

print(f"\nRecommendations after excluding Drama and Horror:\n")
recommendations[['title', 'genres', 'keywords', 'similarity_score']].head(20)

**Iteration 3 - Critique: Exclude Inappropriate Content**

User feedback: "Also exclude movies with mature themes (drugs, death, violence, kidnapping, sex)"


In [None]:
# Add more critiques - exclude keywords with mature/inappropriate content
excluded_criteria['keywords'].add('drugs')
excluded_criteria['keywords'].add('death')
excluded_criteria['keywords'].add('darkness')
excluded_criteria['keywords'].add('kidnapping')
excluded_criteria['keywords'].add('sex')
excluded_criteria['keywords'].add('ransom')

# Refine recommendations again
recommendations = refine_recommendations(movies_cbr, movie_title, cosine_sim, excluded_criteria, excluded_indices)

print(f"\nRecommendations after excluding mature/inappropriate content:\n")
recommendations[['title', 'genres', 'keywords', 'similarity_score']].head(20)

**Analysis**:

The case-based reasoning system successfully adapted to user critiques for **family-friendly viewing**:

1. **Initial**: Recommended animated family films similar to Finding Nemo (e.g., other Pixar/Disney animations)
2. **After Critique 1**: Excluded Drama and Horror genres, keeping family-friendly genres (Animation, Adventure, Comedy, Fantasy)
3. **After Critique 2**: Excluded movies with mature themes (drugs, death, violence, kidnapping, sex), resulting in wholesome family entertainment

This demonstrates the system's ability to:
- Understand user preferences through dialogue
- Dynamically adjust recommendations based on feedback
- Maintain similarity to the reference case while respecting constraints
- Provide increasingly personalized results through iteration
- Handle real-world scenarios (family-friendly content filtering)

---

### Discussion and Conclusions - Case-Based Reasoning

**Comparison with Content-Based Filtering**:

| Aspect | Content-Based | Case-Based Reasoning |
|--------|---------------|---------------------|
| **Starting Point** | User profile/history | Single reference item |
| **Interaction** | Static | Interactive (critiquing) |
| **Refinement** | Implicit (through ratings) | Explicit (through feedback) |
| **Flexibility** | Fixed similarity metric | Dynamic constraint addition |
| **User Control** | Low | High |
| **Metadata Used** | All features (ratings, dates, runtime, etc.) | Content features (genres, keywords, overview) |
| **Similarity Scores** | Higher (more features = more overlap) | Lower (fewer features = stricter matching) |

**Use Cases**:

Case-based reasoning is particularly useful when:
- Users can identify a reference item they like
- Users want to explore variations of a theme
- Domain has rich, critiquable features
- Interactive refinement is acceptable
- **Family-friendly filtering**: "Find movies like Finding Nemo but exclude Drama/Horror and mature themes"
- **Genre exploration**: "Find movies like Inception but not sci-fi" → "Also exclude action"
- **Actor preferences**: "Similar to The Godfather but exclude Marlon Brando"

**Future Improvements**:

- Implement more sophisticated critiquing (e.g., "more like X", "less like Y")
- Add explanation of why each recommendation was suggested
- Learn from critique patterns to predict future preferences
- Combine with collaborative filtering for hybrid approach
- Support multi-criteria critiquing in single iteration

---

## 5. Comparative Analysis: Constraint-Based vs Case-Based

### Methodology Comparison

| Aspect | Constraint-Based | Case-Based Reasoning |
|--------|------------------|---------------------|
| **Input Required** | Explicit constraints (filters) | Reference item + critiques |
| **Interaction Model** | One-shot (specify all constraints upfront) | Iterative (refine through dialogue) |
| **Recommendation Basis** | Boolean matching (satisfy all constraints) | Similarity + exclusions |
| **User Effort** | High initial effort (must know all requirements) | Lower initial effort (start with example) |
| **Flexibility** | Rigid (all constraints must be met) | Flexible (soft similarity + hard exclusions) |
| **Result Set** | Can be empty if constraints too strict | Always returns results (if any items remain) |
| **Transparency** | Very high (exact constraint matching) | High (similarity + excluded features) |
| **Serendipity** | Low (only exact matches) | Medium (similar items with variations) |
| **Computational Cost** | Low (simple filtering) | Medium (similarity computation + filtering) |
| **Learning Curve** | Low (familiar search interface) | Medium (must understand critiquing) |

### Hybrid Approach

In practice, the most effective knowledge-based systems combine both approaches:

1. **Start with Constraints**: Apply basic filters (language, rating, date range)
2. **Use Case-Based Refinement**: Find similar items within filtered set
3. **Allow Critiquing**: Let users exclude unwanted features
4. **Iterate**: Refine until user is satisfied

**Example Workflow**:
```
User: "Find action movies from 2010-2020 with rating > 7.0" (Constraint-Based)
System: Returns 500 movies
User: "Show me movies like Inception from this set" (Case-Based)
System: Returns 20 similar movies
User: "No sci-fi" (Critique)
System: Returns 15 action thrillers without sci-fi elements
User: "Perfect!"
```

### Strengths and Limitations Summary

**Comparison with Other Techniques**:

| Aspect | Content-Based | Collaborative | Knowledge-Based |
|--------|---------------|---------------|-----------------|
| **Cold Start (Users)** | ❌ Problem | ❌ Problem | ✅ No problem |
| **Cold Start (Items)** | ✅ No problem | ❌ Problem | ✅ No problem |
| **Gray Sheep** | ✅ No problem | ❌ Problem | ✅ No problem |
| **Serendipity** | ❌ Low | ✅ High | ❌ Low |
| **Transparency** | ✅ High | ❌ Low | ✅ Very high |
| **User Effort** | ❌ Low (passive) | ❌ Low (passive) | ✅ High (active) |
| **Personalization** | ✅ High | ✅ Very high | ⚠️ Medium |
| **Scalability** | ✅ Good | ⚠️ Challenging | ✅ Good |
| **Data Required** | Item features | User-item interactions | Item features + user input |

---