**LAB1 - NeSy Recommendation System**

Movie Recommendation System
Recommend movies to users based on explicit preferences.

We will go through probabilistic uncertainty, vague knowledge, commonsense reasoning, and similarity-based inference. More precisely:

* Classical Prolog= Structured rules for explicit knowledge (e.g., “Alice likes Sci-Fi → Recommend Sci-Fi movies”)
* Probabilistic Prolog: Uncertain preferences (e.g., “Alice probably likes Sci-Fi with 80% confidence”)
* Possibilistic Prolog Vague or imprecise knowledge (e.g., “If Alice likes Sci-Fi, she might like Cyberpunk”)
* Commonsense Reasoning	Interpolation & analogy-based reasoning (e.g., “Alice likes Action and Sci-Fi → Infer she might like Cyberpunk”)
* Similarity-Based Reasoning	Vector embeddings for contextual similarity (e.g., “Inception is similar to Interstellar → Recommend Interstellar”)

Let's do concept spaces

Let's start with the following step

* Step 1: Define the Logical Rules (Prolog)
* Step 2: Introduce Probabilistic Reasoning (Probabilistic Prolog)
* Step 3: Handle Vague Knowledge (Possibilistic Prolog)
* Step 4: Implement Commonsense Reasoning
* Step 5: Implement Similarity-Based Reasoning Using Vector Embeddings
* Step 6:
    - Integrate real-world datasets (IMDB, MovieLens)
    - Enhance user personalization (e.g., feedback loops)
    - Reason with Embeddings


In [2]:
from pyswip import Prolog

## Logical rules

In [3]:
prolog = Prolog()


prolog.assertz("likes(alice, sci_fi)")
prolog.assertz("genre(alice, cyberpunk)")
prolog.assertz("genre(alice, action)")


result = list(prolog.query("genre(alice, X)"))

# Print result
print(result)


[{'X': 'cyberpunk'}, {'X': 'action'}]


In [None]:
prolog = Prolog()


prolog.assertz("likes(alice, sci_fi)")
prolog.assertz("likes(alice, action)")


prolog.assertz("movie(interstellar, sci_fi)")
prolog.assertz("movie(inception, sci_fi)")
prolog.assertz("movie(matrix, sci_fi)")
prolog.assertz("movie(john_wick, action)")
prolog.assertz("movie(mad_max, action)")
prolog.assertz("movie(blade_runner, cyberpunk)")


prolog.assertz("similar(inception, interstellar)")
prolog.assertz("similar(matrix, blade_runner)")


prolog.assertz("recommend(User, Movie) :- likes(User, Genre), movie(Movie, Genre)")


prolog.assertz("recommend_similar(User, Movie) :- likes(User, Genre), movie(LikedMovie, Genre), similar(LikedMovie, Movie)")

# Query for movie recommendations for Alice
result = list(prolog.query("recommend(alice, Movie)"))
print("Genre-Based Recommendations:", result)

# # Query for similarity-based recommendations for Alice
# similar_results = list(prolog.query("recommend_similar(alice, Movie)"))
# print("Similarity-Based Recommendations:", similar_results)


Genre-Based Recommendations: [{'Movie': 'interstellar'}, {'Movie': 'inception'}, {'Movie': 'matrix'}, {'Movie': 'interstellar'}, {'Movie': 'inception'}, {'Movie': 'matrix'}, {'Movie': 'interstellar'}, {'Movie': 'inception'}, {'Movie': 'matrix'}, {'Movie': 'interstellar'}, {'Movie': 'inception'}, {'Movie': 'matrix'}, {'Movie': 'john_wick'}, {'Movie': 'mad_max'}, {'Movie': 'john_wick'}, {'Movie': 'mad_max'}, {'Movie': 'interstellar'}, {'Movie': 'inception'}, {'Movie': 'matrix'}, {'Movie': 'interstellar'}, {'Movie': 'inception'}, {'Movie': 'matrix'}, {'Movie': 'john_wick'}, {'Movie': 'mad_max'}, {'Movie': 'john_wick'}, {'Movie': 'mad_max'}, {'Movie': 'interstellar'}, {'Movie': 'inception'}, {'Movie': 'matrix'}, {'Movie': 'interstellar'}, {'Movie': 'inception'}, {'Movie': 'matrix'}, {'Movie': 'interstellar'}, {'Movie': 'inception'}, {'Movie': 'matrix'}, {'Movie': 'interstellar'}, {'Movie': 'inception'}, {'Movie': 'matrix'}, {'Movie': 'john_wick'}, {'Movie': 'mad_max'}, {'Movie': 'john_wick

## Probabilistic reasoning

In [11]:
from problog.program import PrologString
from problog import get_evaluatable


problog_code = """
% Facts with probabilities
0.8::likes(alice, sci_fi).
0.6::likes(alice, cyberpunk).
0.7::likes(alice, action).

% Movie genres
movie(interstellar, sci_fi).
movie(inception, sci_fi).
movie(matrix, sci_fi).
movie(john_wick, action).
movie(mad_max, action).
movie(blade_runner, cyberpunk).

% Probabilistic similarity between movies
0.9::similar(inception, interstellar).
0.7::similar(matrix, blade_runner).

% Probabilistic recommendation rule
0.9::recommend(User, Movie) :- likes(User, Genre), movie(Movie, Genre).
0.8::recommend(User, Movie) :- likes(User, Genre), movie(Movie, Genre), similar(Movie, SimilarMovie).
"""

# Load ProbLog program
problog_model = PrologString(problog_code)

# Query probabilistic recommendations for Alice
result = get_evaluatable().create_from(problog_model).evaluate()
print(result)

# Print recommendation probabilities
for key, value in result.items():
    if "recommend(alice" in str(key):  # Filter only recommendation results
        print(f"{key}: {value:.2f}")


{}


In [6]:
import pandas as pd
import torch

data = pd.read_csv("imdb_top_1000.csv")
data.head()

Unnamed: 0,Poster_Link,Series_Title,Released_Year,Certificate,Runtime,Genre,IMDB_Rating,Overview,Meta_score,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross
0,https://m.media-amazon.com/images/M/MV5BMDFkYT...,The Shawshank Redemption,1994,A,142 min,Drama,9.3,Two imprisoned men bond over a number of years...,80.0,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,2343110,28341469
1,https://m.media-amazon.com/images/M/MV5BM2MyNj...,The Godfather,1972,A,175 min,"Crime, Drama",9.2,An organized crime dynasty's aging patriarch t...,100.0,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,1620367,134966411
2,https://m.media-amazon.com/images/M/MV5BMTMxNT...,The Dark Knight,2008,UA,152 min,"Action, Crime, Drama",9.0,When the menace known as the Joker wreaks havo...,84.0,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,2303232,534858444
3,https://m.media-amazon.com/images/M/MV5BMWMwMG...,The Godfather: Part II,1974,A,202 min,"Crime, Drama",9.0,The early life and career of Vito Corleone in ...,90.0,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,1129952,57300000
4,https://m.media-amazon.com/images/M/MV5BMWU4N2...,12 Angry Men,1957,U,96 min,"Crime, Drama",9.0,A jury holdout attempts to prevent a miscarria...,96.0,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,689845,4360000


From embeddings, use bag of words to find most frequent words, SVM to cluster and find interpretable direction

In [None]:
# Implement Similarity-Based Reasoning Using Vector Embeddings
from sentence_transformers import SentenceTransformer

# Load pre-trained Sentence Transformer model
model = SentenceTransformer("paraphrase-MiniLM-L6-v2")

# Encode movie titles into vector embeddings
movie_embeddings = model.encode(data["Overview"], convert_to_tensor=True)


[0.99999994 0.99999994 0.9999999  0.99999994 1.         1.
 1.0000001  0.99999994 1.         1.         1.         1.
 1.0000001  1.         0.99999994 1.0000001  1.         1.
 1.0000001  1.         0.9999999  1.         1.         1.
 0.9999999  1.0000001  0.9999998  1.         1.         1.0000001
 1.         1.         0.99999994 1.         1.0000002  1.
 1.0000001  0.99999994 0.9999999  1.0000001  1.         0.99999994
 0.99999994 1.         1.0000001  1.0000001  0.9999999  1.
 1.0000001  0.99999994 1.0000001  0.99999994 0.9999999  1.
 1.         1.         1.         0.99999994 0.99999976 0.99999994
 1.         0.99999994 1.0000001  1.         1.         1.
 0.9999999  1.         1.         0.99999994 0.9999999  1.
 1.         1.         0.9999999  1.0000002  1.         1.
 1.         1.         0.99999994 0.99999994 0.99999994 1.0000001
 0.9999999  0.9999999  1.         1.         1.         0.9999999
 1.0000001  1.0000001  1.         1.         1.         1.0000001
 0.9999998  

We now know how to develop mention encoders, so we can develop one for movies:
* Take the list of movie names.
* For each name, find mention sentences in a corpus.
* Based on these sentences, extract representations using a BERT-family encoder.
* Average the mention vectors.
* Done.

In [12]:
from transformers import BertTokenizer, BertModel
import torch

In [13]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased")

KeyboardInterrupt: 

In [10]:
# Extract sentences that mention the movie names
movie_names = data['Series_Title'].tolist()
movie_overviews = data['Overview'].tolist()

def extract_mention_sentences(movie_names, movie_overviews):
    mention_sentences = [sentence for sentence in movie_overviews if movie_names.lower() in sentence.lower()]
    return mention_sentences

for i in movie_names:
    mention_sentences = extract_mention_sentences(i, movie_overviews)
    print(f"Movie: {i}")
    print(f"Mentioned Sentences: {mention_sentences}\n")


Movie: The Shawshank Redemption
Mentioned Sentences: []

Movie: The Godfather
Mentioned Sentences: []

Movie: The Dark Knight
Mentioned Sentences: []

Movie: The Godfather: Part II
Mentioned Sentences: []

Movie: 12 Angry Men
Mentioned Sentences: []

Movie: The Lord of the Rings: The Return of the King
Mentioned Sentences: []

Movie: Pulp Fiction
Mentioned Sentences: []

Movie: Schindler's List
Mentioned Sentences: []

Movie: Inception
Mentioned Sentences: []

Movie: Fight Club
Mentioned Sentences: ['An insomniac office worker and a devil-may-care soapmaker form an underground fight club that evolves into something much, much more.']

Movie: The Lord of the Rings: The Fellowship of the Ring
Mentioned Sentences: []

Movie: Forrest Gump
Mentioned Sentences: []

Movie: Il buono, il brutto, il cattivo
Mentioned Sentences: []

Movie: The Lord of the Rings: The Two Towers
Mentioned Sentences: []

Movie: The Matrix
Mentioned Sentences: []

Movie: Goodfellas
Mentioned Sentences: []

Movie: Sta

In [None]:
# encode snetences with BERT
def encode_sentence(sentences, tokenizer, model):
    inputs = tokenizer(sentences, return_tensors='pt', padding=True, turncation=True)
    with torch.no_grad():
        outputs = model(**inputs)
    sentences_embeddings = outputs.last_hidden_state[:, 0, :]
    return sentences_embeddings

encoded_sentences = encode_sentence(mention_sentences, tokenizer, model)

In [None]:
# average the embeddings
def average_embeddings(embeddings):
    return torch.mean(embeddings, dim=0)

average_embedding = average_embeddings(encoded_sentences)