# Langchain and Cohere
We can use both for free, just you need to have .env file and have your API key saved in that file

In [1]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma



In [2]:
from dotenv import load_dotenv

load_dotenv()

True

In [None]:
import pandas as pd
import re

In [4]:
books = pd.read_csv("cleaned_books_dataset.csv")
books.head()

Unnamed: 0,isbn13,isbn10,title,authors,categories,thumbnail,description,published_year,average_rating,num_pages,ratings_count,title_and_subtitle,tag_description
0,9780002005883,2005883,Gilead,Marilynne Robinson,Fiction,http://books.google.com/books/content?id=KQZCP...,A NOVEL THAT READERS and critics have been eag...,2004.0,3.85,247.0,361.0,Gilead,9780002005883_A NOVEL THAT READERS and critics...
1,9780002261982,2261987,Spider's Web,Charles Osborne;Agatha Christie,Detective and mystery stories,http://books.google.com/books/content?id=gA5GP...,A new 'Christie for Christmas' -- a full-lengt...,2000.0,3.83,241.0,5164.0,Spider's Web_A Novel,9780002261982_A new 'Christie for Christmas' -...
2,9780006178736,6178731,Rage of angels,Sidney Sheldon,Fiction,http://books.google.com/books/content?id=FKo2T...,"A memorable, mesmerizing heroine Jennifer -- b...",1993.0,3.93,512.0,29532.0,Rage of angels,"9780006178736_A memorable, mesmerizing heroine..."
3,9780006280897,6280897,The Four Loves,Clive Staples Lewis,Christian life,http://books.google.com/books/content?id=XhQ5X...,Lewis' work on the nature of love divides love...,2002.0,4.15,170.0,33684.0,The Four Loves,9780006280897_Lewis' work on the nature of lov...
4,9780006280934,6280935,The Problem of Pain,Clive Staples Lewis,Christian life,http://books.google.com/books/content?id=Kk-uV...,"""In The Problem of Pain, C.S. Lewis, one of th...",2002.0,4.09,176.0,37569.0,The Problem of Pain,"9780006280934_""In The Problem of Pain, C.S. Le..."


In [5]:
 #Langchain does not work with csv files but with text file. So we need to fix that
# and since we are going to vectorize the tagged description, so we will convert those to text

books['tag_description'].to_csv("tag_description.txt", sep="\n", index=False, header=False)

In [None]:
raw_documents = TextLoader("tag_description.txt").load()
# for the splitter, we put the chunk size to zero to ensure that it is going to chunk at the separator which is the 
# next book in the next line. as we dont want the books be chuncked together. The same thing for the overlap
text_splitter = CharacterTextSplitter(chunk_size = 1, chunk_overlap =0, separator="\n")
documents = text_splitter.split_documents(raw_documents)

In [7]:
#Just checking the chunk
documents[0]

Document(metadata={'source': 'tag_description.txt'}, page_content='9780002005883_A NOVEL THAT READERS and critics have been eagerly anticipating for over a decade, Gilead is an astonishingly imagined story of remarkable lives. John Ames is a preacher, the son of a preacher and the grandson (both maternal and paternal) of preachers. It’s 1956 in Gilead, Iowa, towards the end of the Reverend Ames’s life, and he is absorbed in recording his family’s story, a legacy for the young son he will never see grow up. Haunted by his grandfather’s presence, John tells of the rift between his grandfather and his father: the elder, an angry visionary who fought for the abolitionist cause, and his son, an ardent pacifist. He is troubled, too, by his prodigal namesake, Jack (John Ames) Boughton, his best friend’s lost son who returns to Gilead searching for forgiveness and redemption. Told in John Ames’s joyous, rambling voice that finds beauty, humour and truth in the smallest of life’s details, Gilea

In [None]:
# Here we vectorize the document chuncks using the openAI embeddings
# We are paying for this line, so make sure we run it once
# db_books = Chroma.from_documents(
#     documents,
#     embedding=OpenAIEmbeddings()
# )

In [None]:
# Now lets give a query and find the best matches to our query using the vector embeddings
query = "A book to teach children about nature"
docs = db_books.similarity_search(query, k=10)

In [53]:
docs

[Document(id='1c18379e-94e8-45af-afd0-36c91ac92969', metadata={'source': 'tag_description.txt'}, page_content='9780786808069_Children will discover the exciting world of their own backyard in this introduction to familiar animals from cats and dogs to bugs and frogs. The combination of photographs, illustrations, and fun facts make this an accessible and delightful learning experience.'),
 Document(id='1314dd89-3f50-4037-b1fd-78780863fd0a', metadata={'source': 'tag_description.txt'}, page_content="9780786808380_Introduce your babies to birds, cats, dogs, and babies through fine art, illustration, and photographs. These books are a rare opportunity to expose little ones to a range of images on a single subject, from simple child's drawings and abstract art to playful photos. A brief text accompanies each image, introducing the baby to some basic -- and sometimes playful -- information about the subjects."),
 Document(id='29b9f75e-6c37-408c-8e6d-2a74029f0430', metadata={'source': 'tag_de

In [117]:
# Create a function
def recommended_books(query, top_k):

    docs = db_books.similarity_search(query, k=top_k)

    recommended_books_isbn13 = []
    for d in docs:
        # strip the quatation mark as some of the numbers have a quatation and then split from underscore and so on
        isbn13 = int(d.page_content.strip('"').split("_")[0])
        recommended_books_isbn13.append(isbn13)
        
    selected_cols = ["title", "title_and_subtitle", "categories", "description", "average_rating", ]
    return books[books['isbn13'].isin(recommended_books_isbn13)][selected_cols]
    

In [118]:
pd.set_option("display.max_colwidth", None)

In [119]:
recommended_books("A book to teach children about nature", 5)

Unnamed: 0,title,title_and_subtitle,categories,description,average_rating
3214,"Moo, Baa, la la La!","Moo, Baa, la la La!",Animal sounds,"Children will love joining in and imitating the animal noises and sounds in this big, bold board book format, illustrated with Sandra Boynton's seriously silly signature animals.",4.2
3747,Baby Einstein: Neighborhood Animals,Baby Einstein: Neighborhood Animals,Juvenile Fiction,"Children will discover the exciting world of their own backyard in this introduction to familiar animals from cats and dogs to bugs and frogs. The combination of photographs, illustrations, and fun facts make this an accessible and delightful learning experience.",3.89
3748,Baby Einstein: Birds,Baby Einstein: Birds,Juvenile Fiction,"Introducing your baby to birds, cats, dogs, and babies through fine art, illsutration and photographs. These books are a rare opportunity to expose little ones to a range of images on a single subject, from simple child's drawings and abstract art to playful photos. A brief text accompanies each image, introducing baby to some basic -- and sometimes playful -- information on the subjects.",3.78
3749,Baby Einstein: Babies,Baby Einstein: Babies,Juvenile Fiction,"Introduce your babies to birds, cats, dogs, and babies through fine art, illustration, and photographs. These books are a rare opportunity to expose little ones to a range of images on a single subject, from simple child's drawings and abstract art to playful photos. A brief text accompanies each image, introducing the baby to some basic -- and sometimes playful -- information about the subjects.",4.03
3750,Baby Einstein: Dogs,Baby Einstein: Dogs,Juvenile Fiction,"Introduce your baby to birds, cats, dogs, and babies through fine art, illustration, and photographs. These books are a rare opportunity to exopse little ones to a range of images on a single subject, from simple child's drawings and abstract art to playful photos. A brief text accompanies each image, introducing baby to some basic -- and sometimes playful -- information about the subjects.",3.81


In [120]:
recommended_books("I just read harry potter and loved it", 5)

Unnamed: 0,title,title_and_subtitle,categories,description,average_rating
779,The Blue Sword,The Blue Sword,,"Harry, bored with her sheltered life in the remote orange-growing colony of Daria, discovers magic in herself when she is kidnapped by a native king with mysterious powers. Newbery Medal Honor Book, 1983.",4.23
3464,Harry Potter and the Goblet of Fire,Harry Potter and the Goblet of Fire,Juvenile Fiction,"The summer holidays are dragging on and Harry Potter can't wait for the start of the school year. It is his fourth year at Hogwarts School of Witchcraft and Wizardry and there are spells to be learnt and (unluckily) Potions and Divination lessons to be attended. But Harry can't know that the atmosphere is darkening around him, and his worst enemy is preparing a fate that it seems will be inescapable . . . With characteristic wit, fast-paced humour and marvellous emotional depth, J.K. Rowling has proved herself yet again to be a master story-teller.",4.55
3476,Harry Potter and the Prisoner of Azkaban,Harry Potter and the Prisoner of Azkaban,England,"Harry Potter is lucky to reach the age of thirteen, since he has survived the murderous attacks of the feared Dark Wizard Voldemort three times. But his hopes for a quiet term concentrating on Quidditch are dashed when a maniacal mass-murderer escapes from Azkaban, pursued by the soul-sucking Dementors who guard the prison. It's assumed that Hogwarts is the safest place for Harry to be. But is it a coincidence that he can feel eyes watching him in the dark, and should he be taking Professor Trelawney's ghoulish predictions seriously?",4.55
3982,Harry Potter and Philosophy,Harry Potter and Philosophy_If Aristotle Ran Hogwarts,Fiction,"Urging readers of the Harry Potter series to dig deeper than wizards, boggarts, and dementors, the authors of this unique guide collect the musings of seventeen philosophers on the series, who cover a wide range of Potter-related philosophical issues, including the difference between good and evil, the ethics of sorcery, and Aristotle's own school for wizards. Original.",4.48
4290,Ultimate Unofficial Guide to the Mysteries of Harry Potter,Ultimate Unofficial Guide to the Mysteries of Harry Potter,Literary Criticism,"A guide to J.K. Rowling's first four Harry Potter novels analyzes mysterious elements, themes, and puzzles hidden throughout the works and speculates about the plots and endings of future volumes.",4.05


In [121]:
recommended_books("I wanna teach math to my son who is 4 years old", 5)

Unnamed: 0,title,title_and_subtitle,categories,description,average_rating
1850,An Invisible Sign of My Own,An Invisible Sign of My Own,Fiction,"Mona Gray, a young second grade teacher who orders her universe with tidy numbers, finds chaos and romance with an odd new math teacher. A first novel. Reprint. 30,000 first printing.",3.68
2126,SCHOLASTIC SUCCESS WITH 4TH GRADE(WORKBOOK),SCHOLASTIC SUCCESS WITH 4TH GRADE(WORKBOOK),Education,"416 bright, colorful pages that give kids practice in the skills every 4th grader needs to be successful. Includes addition and subtraction, multiplication and division, fractions and decimals, problem solving, number concepts, reading comprehension, writing, grammar, maps, and lots more. For use with Grade 4.",4.57
3000,Teach Your Child to Read in 100 Easy Lessons,Teach Your Child to Read in 100 Easy Lessons,Education,"With more than half a million copies in print, Teach Your Child to Read in 100 Easy Lessons is the definitive guide to giving your child the reading skills needed now for a better chance at tomorrow, while bringing you and your child closer together. Is your child halfway through first grade and still unable to read? Is your preschooler bored with coloring and ready for reading? Do you want to help your child read, but are afraid you’ll do something wrong? Teach Your Child to Read in 100 Easy Lessons is a complete, step-by-step program that shows patents simply and clearly how to teach their children to read. Twenty minutes a day is all you need, and within 100 teaching days your child will be reading on a solid second-grade reading level. It’s a sensible, easy-to-follow, and enjoyable way to help your child gain the essential skills of reading. Everything you need is here—no paste, no scissors, no flash cards, no complicated directions—just you and your child learning together. One hundred lessons, fully illustrated and color-coded for clarity, give your child the basic and more advanced skills needed to become a good reader.",4.15
3199,My Name Is Maria Isabel,My Name Is Maria Isabel,Juvenile Fiction,"Third grader Marâia Isabel, born in Puerto Rico and now living in the United States, wants badly to fit in at school; and the teacher's writing assignment ""My Greatest Wish"" gives her that opportunity.",3.88
4519,The Curious Incident of the Dog in the Night-time,The Curious Incident of the Dog in the Night-time_A Novel,Autism,"Despite his overwhelming fear of interacting with people, Christopher, a mathematically-gifted, autistic fifteen-year-old boy, decides to investigate the murder of a neighbor's dog and uncovers secret information about his mother.",3.87


## How can we make this better?

Remember we had for example that column named categories in which we had a mess ...  we had a lot of categories with only one book in it. 
We can actually use a ***classification*** method to classify that column of our books dataset. In order to make it much cleaner. We can use an ML classifier for this but why not using LLMs ... in fact, LLMs are very good at classification when it is text. We can use just ***"prompt"*** to do a zero shot classification 

We will see that in the next notebook