---
# *Building personal AI pal that recommends what your **mood** wants, not just what the crowd watches.*

---
### Core Features

- **Text-Based Recommendations** – Input natural language tags; get content that matches the feel you're looking for.
- **Book Recommender** – Suggests books using thematic tag similarity.
- **Movie Recommender** – Finds movies based on user-defined themes or concepts.
- **Content-Based Filtering** – Works even if you’re a new user with no past data.

---
##  Datasets Used

#### Books
- **Source**: [GoodBooks-10k](https://www.kaggle.com/datasets/zygmunt/goodbooks-10k)
- **Details**: Contains 10,000 popular books with metadata and user tags.
- **Prep**: Cleaned and merged tags into a single text field per book for vectorization.

#### Movies
- **Source**: [MovieLens + Genome Tags](https://www.kaggle.com/datasets/grouplens/movielens)
- **Files Used**: `movies.csv`, `genome_tags.csv`, `genome_scores.csv`
- **Prep**: Selected top relevant tags per movie and combined with metadata.

---

In [17]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [40]:
# Loading the data sets of the books from kaggle
# for book reco
books = pd.read_csv("Data/books.csv")
book_tags = pd.read_csv("Data/book_tags.csv")
tags = pd.read_csv("Data/tags.csv")
#for movies reco
movies = pd.read_csv("Data/movie.csv")
genome_tags = pd.read_csv("Data/genome_tags.csv")
genome_scores = pd.read_csv("Data/genome_scores.csv")

In [43]:
books.head(1)

Unnamed: 0,id,book_id,best_book_id,work_id,books_count,isbn,isbn13,authors,original_publication_year,original_title,...,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
0,1,2767052,2767052,2792775,272,439023483,9780439000000.0,Suzanne Collins,2008.0,The Hunger Games,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...


In [44]:
tags.head(1)

Unnamed: 0,tag_id,tag_name
0,0,-


In [7]:
book_tags.head(2)

Unnamed: 0,goodreads_book_id,tag_id,count
0,1,30574,167697
1,1,11305,37174


#### Clean & Merge Tags for books

In [11]:
books.rename(columns={"id": "goodreads_book_id"}, inplace=True)
book_tags = book_tags.merge(tags, on="tag_id")
filtered_tags = book_tags[book_tags["count"] > 50]

# Group tags per book
tags_per_book = filtered_tags.groupby("goodreads_book_id")["tag_name"].apply(list).reset_index()
books_with_tags = books.merge(tags_per_book, on="goodreads_book_id")

books_with_tags = books_with_tags[["book_id", "title", "tag_name"]]
books_with_tags.rename(columns={"tag_name": "tags"}, inplace=True)

books_with_tags.to_csv("Data/cleaned_books.csv", index=False)

In [15]:
bookshelf = pd.read_csv("Data/cleaned_books.csv")

In [16]:
bookshelf.head(2)

Unnamed: 0,book_id,title,tags
0,2767052,"The Hunger Games (The Hunger Games, #1)","['to-read', 'fantasy', 'favorites', 'currently..."
1,3,Harry Potter and the Sorcerer's Stone (Harry P...,"['to-read', 'currently-reading', 'fantasy', 'f..."


### Clean and merge the data sets of Movies

In [48]:
genome_scores.head(1)

Unnamed: 0,movieId,tagId,relevance
0,1,1,0.025


In [54]:
genome_tags.head(1)

Unnamed: 0,tagId,tag
0,1,7


In [50]:
movies.head(1)

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy


In [61]:
tagged_scores = genome_scores.merge(genome_tags, on="tagId")
##Filter tags by relevance score
filtered_tags = tagged_scores[tagged_scores["relevance"] > 0.1]

tags_per_movie = filtered_tags.groupby("movieId")["tag"].apply(list).reset_index()
movies_with_tags = movies.merge(tags_per_movie, on="movieId")
movies_with_tags = movies_with_tags[["movieId", "title", "tag"]]
movies_with_tags.to_csv("Data/cleaned_movies.csv", index=False)

In [62]:
cleaned_movies = pd.read_csv("Data/cleaned_movies.csv")
cleaned_movies.head(1)

Unnamed: 0,movieId,title,tag
0,1,Toy Story (1995),"['1930s', '1950s', '1970s', '1980s', '3d', '70..."


## Converting Tags List to Text

TF-IDF requires string type senetence input, so we convert the tag lists into joined strings per book.


In [63]:
def list_to_string(tags):
    if isinstance(tags, str):
        tags = eval(tags)
    result = ""
    for i in range(len(tags)):
        result += tags[i] + " "
    return result.strip()

bookshelf["tags_joined"] = bookshelf["tags"].apply(list_to_string)
movies_with_tags["tag"] = movies_with_tags["tag"].apply(list_to_string)

In [20]:
bookshelf.head(1)

Unnamed: 0,book_id,title,tags,tags_joined
0,2767052,"The Hunger Games (The Hunger Games, #1)","['to-read', 'fantasy', 'favorites', 'currently...",to-read fantasy favorites currently-reading yo...


In [64]:
movies_with_tags.head(1)

Unnamed: 0,movieId,title,tag
0,1,Toy Story (1995),1930s 1950s 1970s 1980s 3d 70mm 80s aardman st...


##### TF-IDF Vectorization of Tags

We use `TfidfVectorizer` from `sklearn` to turn the tag text into numerical vectors. This captures the importance of tags for each book.


In [66]:
book_vectorizer = TfidfVectorizer()
tfidf_book_matrix = book_vectorizer.fit_transform(bookshelf["tags_joined"])
similarity_book_matrix = cosine_similarity(tfidf_book_matrix)

movie_vectorizer = TfidfVectorizer()
tfidf_movies_matrix = movie_vectorizer.fit_transform(movies_with_tags["tag"])
similarity_movies_matrix = cosine_similarity(tfidf_movies_matrix)

# Funtion of recommending
### By title


In [75]:
def recommend_books(title, top_n=5):
    # Making the title lowercase so we can match it
    title = title.lower()

    # Trying to find the book in the dataset
    book_found = False
    for i in range(len(bookshelf)):
        if bookshelf.loc[i, 'title'].lower() == title:
            book_index = i
            book_found = True
            break
    if not book_found:
        print("Sorry, that book was not found!")
        return
    similarities = cosine_similarity(tfidf_book_matrix[book_index], tfidf_book_matrix)
    similarities = similarities[0]

    similar_books = []
    for i in range(len(similarities)):
        if i != book_index:  # for Skiping itself so that we wont recommend the book the user entered
            similar_books.append((i, similarities[i]))

    # Sort the list by weight score
    similar_books.sort(key=lambda x: x[1], reverse=True)
    print(f"Books similar to '{title.title()}':\n")
    for i in range(top_n):
        index = similar_books[i][0]
        score = round(similar_books[i][1], 2)
        book_name = bookshelf.loc[index, 'title']
        print(f"{book_name}  -->  Similarity Score: {score}")

        
def recommend_movies(movie_title, top_n=5):
    movie_title = movie_title.lower()
    # Trying to find the movie in the dataset
    found = False
    for i in range(len(movies_with_tags)):
        if movies_with_tags.loc[i, 'title'].lower() == movie_title:
            movie_index = i
            found = True
            break

    if not found:
        print("Sorry, that movie was not found!")
        return
    similarities = similarity_movies_matrix[movie_index]

    # List for (movie_index, similarity_score)
    similar_movies = [(i, similarities[i]) for i in range(len(similarities)) if i != movie_index]
    similar_movies.sort(key=lambda x: x[1], reverse=True)

    print(f"\nMovies similar to '{movie_title.title()}':\n")
    for i in range(top_n):
        index = similar_movies[i][0]
        score = round(similar_movies[i][1], 2)
        name = movies_with_tags.loc[index, 'title']
        print(f"{name}  -->  Similarity Score: {score}")


# Funtion of recommending
### By tags

In [110]:
def recommend_book_by_tags(input_tags, top_n=5):
    input_vector = book_vectorizer.transform([input_tags])
    similarities = cosine_similarity(input_vector, tfidf_book_matrix)[0]

    similar_books = []
    for i in range(len(similarities)):
        similar_books.append((i, similarities[i]))
    # Sort by similarity
    similar_books.sort(key=lambda x: x[1], reverse=True)
    print(f"\nBooks similar to tags like: '{input_tags}'\n")
    for i in range(top_n):
        book_index = similar_books[i][0]
        book_title = bookshelf.loc[book_index, 'title']
        score = round(similar_books[i][1], 2)
        print(f"{book_title}  -->  Similarity Score: {score}")

def recommend_movies_by_tags(input_tags, top_n=5):
    input_vector = movie_vectorizer.transform([input_tags])
    similarities = cosine_similarity(input_vector, tfidf_movies_matrix)
    similarities = similarities[0]

    # list for (movie_index, similarity_score)
    similar_movies = [(i, similarities[i]) for i in range(len(similarities))]
    similar_movies.sort(key=lambda x: x[1], reverse=True)
    print(f"\nTop {top_n} movies similar to tags: '{input_tags}'\n")
    for i in range(top_n):
        index = similar_movies[i][0]
        title = movies_with_tags.loc[index, 'title']
        score = round(similar_movies[i][1], 2)
        print(f"{title}  -->  Similarity Score: {score}")


# Some Examples as test


In [81]:
recommend_books("The Great Gatsby")

Books similar to 'The Great Gatsby':

Twilight (Twilight, #1)  -->  Similarity Score: 0.98
The Fault in Our Stars  -->  Similarity Score: 0.96
The Hunger Games (The Hunger Games, #1)  -->  Similarity Score: 0.95
Harry Potter and the Sorcerer's Stone (Harry Potter, #1)  -->  Similarity Score: 0.79
Private Games (Private #3)  -->  Similarity Score: 0.69


In [109]:
recommend_book_by_tags("adventure",2)


Books similar to tags like: 'adventure'

The Quiche of Death (Agatha Raisin, #1)  -->  Similarity Score: 0.42
The Lost Boy (Dave Pelzer #2)  -->  Similarity Score: 0.34


In [73]:
recommend_movies("Toy Story (1995)")


Movies similar to 'Toy Story (1995)':

Toy Story 2 (1999)  -->  Similarity Score: 0.91
Monsters, Inc. (2001)  -->  Similarity Score: 0.9
Toy Story 3 (2010)  -->  Similarity Score: 0.9
Bug's Life, A (1998)  -->  Similarity Score: 0.88
Up (2009)  -->  Similarity Score: 0.86


In [71]:
recommend_movies_by_tags("sci-fi space alien war", 3)


Top 3 movies similar to tags: 'sci-fi space alien war'

Final Countdown, The (1980)  -->  Similarity Score: 0.34
Earth vs. the Flying Saucers (1956)  -->  Similarity Score: 0.33
Wing Commander (1999)  -->  Similarity Score: 0.32


## CLose matched tags


In [102]:
import difflib

# For books
all_book_tags = set()
for tag_string in bookshelf["tags_joined"]:
    for tag in tag_string.split():
        all_book_tags.add(tag.lower())
book_tag_list = sorted(list(all_book_tags))

# For movies
all_movie_tags = set()
for tag_string in movies_with_tags["tag"]:
    for tag in tag_string.split():
        all_movie_tags.add(tag.lower())
movie_tag_list = sorted(list(all_movie_tags))

In [117]:
tag = input("Enter related to book tag: ").strip().lower()

close_matches = difflib.get_close_matches(tag, book_tag_list, n=3, cutoff=0.6)
print("Did you mean:", close_matches)

Enter a book tag:  i love action


Did you mean: ['innovation', 'action', 'legal-fiction']


In [118]:
tag = input("Enter related to movie tag: ").strip().lower()

close_matches = difflib.get_close_matches(tag, movie_tag_list, n=3, cutoff=0.6)
print("Did you mean:", close_matches)

Enter related to movie tag:  i love books


Did you mean: []


In [116]:
# recommend_books("The Great Gatsby")
# recommend_book_by_tags("adventure",2)
# recommend_movies("Toy Story (1995)")
# recommend_movies_by_tags("sci-fi space alien war", 3)

## Syntax:
### funtion(title/tag , number of suggestions)