## 🔍 Semantic Book Recommendations using Vector Search

<hr style="border:1px solid #ccc">

## 👨‍💻 Author: **Muhammad Haweras**

[![LinkedIn](https://img.shields.io/badge/LinkedIn-blue?logo=linkedin&logoColor=white)](https://www.linkedin.com/in/muhammad-haweras-7aa6b11b2/)
[![GitHub](https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white)](https://github.com/MuhammadHaweras)

<hr style="border:1px solid #ccc">

We implement a semantic search system using **LangChain**, **OpenAI embeddings**, and **Chroma vector store** to recommend books based on the meaning of a user's query.

---

### 📦 Load and Split Descriptions

We begin by loading tagged book descriptions and splitting them into chunks for embedding:

### 🧠 Create Vector Database with OpenAI Embeddings
We embed the descriptions and store them in a Chroma vector database

### 🛠️ Step 4: Create a Reusable Retrieval Function
We wrap the similarity search logic in a convenient function



In [7]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

import pandas as pd

In [8]:
import os
from dotenv import load_dotenv

load_dotenv()

True

In [9]:
books = pd.read_csv('cleaned_books_data.csv')
books.head(10)

Unnamed: 0,isbn13,isbn10,title,authors,categories,thumbnail,description,published_year,average_rating,num_pages,ratings_count,full_title,tagged_description
0,9780002005883,0002005883,Gilead,Marilynne Robinson,Fiction,http://books.google.com/books/content?id=KQZCP...,A NOVEL THAT READERS and critics have been eag...,2004.0,3.85,247.0,361.0,Gilead,"9780002005883, A NOVEL THAT READERS and critic..."
1,9780002261982,0002261987,Spider's Web,Charles Osborne;Agatha Christie,Detective and mystery stories,http://books.google.com/books/content?id=gA5GP...,A new 'Christie for Christmas' -- a full-lengt...,2000.0,3.83,241.0,5164.0,Spider's Web: A Novel,"9780002261982, A new 'Christie for Christmas' ..."
2,9780006178736,0006178731,Rage of angels,Sidney Sheldon,Fiction,http://books.google.com/books/content?id=FKo2T...,"A memorable, mesmerizing heroine Jennifer -- b...",1993.0,3.93,512.0,29532.0,Rage of angels,"9780006178736, A memorable, mesmerizing heroin..."
3,9780006280897,0006280897,The Four Loves,Clive Staples Lewis,Christian life,http://books.google.com/books/content?id=XhQ5X...,Lewis' work on the nature of love divides love...,2002.0,4.15,170.0,33684.0,The Four Loves,"9780006280897, Lewis' work on the nature of lo..."
4,9780006280934,0006280935,The Problem of Pain,Clive Staples Lewis,Christian life,http://books.google.com/books/content?id=Kk-uV...,"""In The Problem of Pain, C.S. Lewis, one of th...",2002.0,4.09,176.0,37569.0,The Problem of Pain,"9780006280934, ""In The Problem of Pain, C.S. L..."
5,9780006380832,0006380832,Empires of the Monsoon,Richard Hall,"Africa, East",http://books.google.com/books/content?id=MuPEQ...,Until Vasco da Gama discovered the sea-route t...,1998.0,4.41,608.0,65.0,Empires of the Monsoon: A History of the India...,"9780006380832, Until Vasco da Gama discovered ..."
6,9780006470229,000647022X,The Gap Into Madness,Stephen R. Donaldson,"Hyland, Morn (Fictitious character)",http://books.google.com/books/content?id=4oXav...,A new-cover reissue of the fourth book in the ...,1994.0,4.15,743.0,103.0,The Gap Into Madness: Chaos and Order,"9780006470229, A new-cover reissue of the four..."
7,9780006472612,0006472613,Master of the Game,Sidney Sheldon,Adventure stories,http://books.google.com/books/content?id=TkTYp...,Kate Blackwell is an enigma and one of the mos...,1982.0,4.11,489.0,43540.0,Master of the Game,"9780006472612, Kate Blackwell is an enigma and..."
8,9780006482079,0006482074,Warhost of Vastmark,Janny Wurts,Fiction,http://books.google.com/books/content?id=uOL0f...,"Tricked once more by his wily half-brother, Ly...",1995.0,4.03,522.0,2966.0,Warhost of Vastmark,"9780006482079, Tricked once more by his wily h..."
9,9780006483014,0006483011,The Once and Future King,Terence Hanbury White,Arthurian romances,http://books.google.com/books/content?id=Jx6Bv...,An omnibus volume of the author's complete sto...,1996.0,4.04,823.0,2805.0,The Once and Future King,"9780006483014, An omnibus volume of the author..."


In [10]:
books['tagged_description']

0       9780002005883, A NOVEL THAT READERS and critic...
1       9780002261982, A new 'Christie for Christmas' ...
2       9780006178736, A memorable, mesmerizing heroin...
3       9780006280897, Lewis' work on the nature of lo...
4       9780006280934, "In The Problem of Pain, C.S. L...
                              ...                        
5688    9788173031014, This book tells the tale of a m...
5689    9788179921623, Wisdom to Create a Life of Pass...
5690    9788185300535, This collection of the timeless...
5691    9789027712059, Since the three volume edition ...
5692    9789042003408, This is a jubilant and rewardin...
Name: tagged_description, Length: 5693, dtype: object

* to make it work with `TextLoader` store `tagged_description` it txt file

In [11]:
books['tagged_description'].to_csv('tagged_description.txt', index=False, header=False)


In [12]:
raw_documents = TextLoader('tagged_description.txt').load()
text_splitter = CharacterTextSplitter(chunk_size=0, chunk_overlap=0, separator="\n")
documents = text_splitter.split_documents(raw_documents)

Created a chunk of size 1171, which is longer than the specified 0
Created a chunk of size 1217, which is longer than the specified 0
Created a chunk of size 376, which is longer than the specified 0
Created a chunk of size 312, which is longer than the specified 0
Created a chunk of size 484, which is longer than the specified 0
Created a chunk of size 485, which is longer than the specified 0
Created a chunk of size 963, which is longer than the specified 0
Created a chunk of size 191, which is longer than the specified 0
Created a chunk of size 846, which is longer than the specified 0
Created a chunk of size 297, which is longer than the specified 0
Created a chunk of size 198, which is longer than the specified 0
Created a chunk of size 882, which is longer than the specified 0
Created a chunk of size 1091, which is longer than the specified 0
Created a chunk of size 1192, which is longer than the specified 0
Created a chunk of size 307, which is longer than the specified 0
Create

In [13]:
documents[0]

Document(metadata={'source': 'tagged_description.txt'}, page_content='"9780002005883, A NOVEL THAT READERS and critics have been eagerly anticipating for over a decade, Gilead is an astonishingly imagined story of remarkable lives. John Ames is a preacher, the son of a preacher and the grandson (both maternal and paternal) of preachers. It’s 1956 in Gilead, Iowa, towards the end of the Reverend Ames’s life, and he is absorbed in recording his family’s story, a legacy for the young son he will never see grow up. Haunted by his grandfather’s presence, John tells of the rift between his grandfather and his father: the elder, an angry visionary who fought for the abolitionist cause, and his son, an ardent pacifist. He is troubled, too, by his prodigal namesake, Jack (John Ames) Boughton, his best friend’s lost son who returns to Gilead searching for forgiveness and redemption. Told in John Ames’s joyous, rambling voice that finds beauty, humour and truth in the smallest of life’s details, 

In [14]:
db_books = Chroma.from_documents(
	documents,
	embedding=OpenAIEmbeddings(),
)

Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


In [15]:
query = "A book to teach history to my 10 year old son"
docs = db_books.similarity_search(query, k=5)

Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given


In [16]:
docs

[Document(id='2facacd0-1de6-4045-8896-9b18097d0553', metadata={'source': 'tagged_description.txt'}, page_content='"9781565841000, Criticizes the way history is presented in current textbooks, and suggests a fresh and more accurate approach to teaching American history."'),
 Document(id='73eea2f3-a98b-4392-8dbe-40afacd6efa3', metadata={'source': 'tagged_description.txt'}, page_content='"9780763621643, As a page in his uncle\'s castle in thirteenth-century England, eleven-year-old Tobias records in his journal his experiences learning how to hunt, play games of skill, and behave in noble society."'),
 Document(id='9a77a343-b95a-485a-95a5-a0850c0959f7', metadata={'source': 'tagged_description.txt'}, page_content='"9780465024964, Armies and empires, statesmen and tyrants--the acclaimed historian Robin Lane Fox vividly recounts the history of two great civilizations and one thousand years that forged the Western world"'),
 Document(id='f4a42336-f685-48e4-ab83-df00cd1a09e6', metadata={'sourc

In [17]:
books[books["isbn13"] == int(docs[0].page_content.split()[0].strip().replace('"', '').replace(',', ''))]

Unnamed: 0,isbn13,isbn10,title,authors,categories,thumbnail,description,published_year,average_rating,num_pages,ratings_count,full_title,tagged_description
5101,9781565841000,156584100X,Lies My Teacher Told Me,James W. Loewen,Education,http://books.google.com/books/content?id=dW6no...,Criticizes the way history is presented in cur...,1995.0,3.96,384.0,398.0,Lies My Teacher Told Me: Everything Your Ameri...,"9781565841000, Criticizes the way history is p..."


In [18]:
def retrieve_recommendations(query: str, k: int = 5) -> pd.DataFrame:
		recs = db_books.similarity_search(query, k=k)
		books_list = []
		for i in range(0, len(recs)):
				isbn = int(recs[i].page_content.split()[0].strip().replace('"', '').replace(',', ''))
				books_list.append(isbn)

		return books[books["isbn13"].isin(books_list)].head(k)

In [19]:
retrieve_recommendations("A book to teach history to my 10 year old son")

Unnamed: 0,isbn13,isbn10,title,authors,categories,thumbnail,description,published_year,average_rating,num_pages,ratings_count,full_title,tagged_description
2676,9780465024964,0465024963,The Classical World,Robin Lane Fox,History,http://books.google.com/books/content?id=Srhcj...,"Armies and empires, statesmen and tyrants--the...",2005.0,3.92,672.0,1222.0,The Classical World: An Epic History from Home...,"9780465024964, Armies and empires, statesmen a..."
3651,9780743243780,0743243781,Teacher Man,Frank McCourt,Biography & Autobiography,http://books.google.com/books/content?id=Yhgcw...,The author describes his coming of age as a te...,2006.0,3.75,272.0,25726.0,Teacher Man: A Memoir,"9780743243780, The author describes his coming..."
3940,9780763621643,0763621641,Castle Diary,Richard Platt;Chris Riddell,Juvenile Fiction,http://books.google.com/books/content?id=rFUBA...,As a page in his uncle's castle in thirteenth-...,2003.0,3.84,128.0,1072.0,Castle Diary: The Journal of Tobias Burgess,"9780763621643, As a page in his uncle's castle..."
4845,9781400097036,1400097037,Arthur and George,Julian Barnes,Fiction,http://books.google.com/books/content?id=X5HRo...,Chronicles the lives of two boys--the son of a...,2007.0,3.7,445.0,11227.0,Arthur and George,"9781400097036, Chronicles the lives of two boy..."
5101,9781565841000,156584100X,Lies My Teacher Told Me,James W. Loewen,Education,http://books.google.com/books/content?id=dW6no...,Criticizes the way history is presented in cur...,1995.0,3.96,384.0,398.0,Lies My Teacher Told Me: Everything Your Ameri...,"9781565841000, Criticizes the way history is p..."
