<a href="https://colab.research.google.com/github/anamika1302/Experimenting-with-LLM-s/blob/main/Langchain%2BRag%2BOpenAI%2Bchroma_db.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [37]:
# Retrieval Augmented Generation

In [38]:
!pip install langchain
!pip install unstructured

!pip install -U langchain-community
!pip install langchain-openai
!pip install chromadb



In [62]:

import os
import shutil
from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores.chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI

In [54]:
data_path = "/content/data/books"

Chroma_path = "/content/chroma"

In [55]:
def load_documents(data_path):
  loader = DirectoryLoader(data_path, glob = "*.md")
  documents = loader.load()
  return documents

In [61]:
PROMPT_TEMPLATE = """
Yor are a search engine. Answer the question based only on the following context:

{context}

---

Answer the question based on the above context: {question}
"""

In [56]:
def split_text(documents):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size = 1000,
        chunk_overlap = 20,
        length_function = len,
        add_start_index = True
    )

    chunks = text_splitter.split_documents(documents)
    print(f" split {len(documents)} documents into {len(chunks)} chunks.")
    document = chunks[0]
    print(document.page_content)
    print(document.metadata)
    return chunks

In [57]:
def save_to_chroma(chunks,Chroma_path):

  if os.path.exists(Chroma_path):
    shutil.rmtree(Chroma_path)

  # create new db from documents
  db = Chroma.from_documents(chunks, OpenAIEmbeddings(api_key = ''), persist_directory=Chroma_path)
  db.persist()
  print(f"Saved {len(chunks)} to chroma_path")


In [58]:
def build_vector_db(data_path):
  documents = load_documents(data_path)
  chunks = split_text(documents)
  save_to_chroma(chunks,Chroma_path)


In [59]:
build_vector_db(data_path)

 split 1 documents into 196 chunks.
The Project Gutenberg eBook of Alice's Adventures in Wonderland

This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this ebook or online
at www.gutenberg.org. If you are not located in the United States,
you will have to check the laws of the country where you are located
before using this eBook.

Title: Alice's Adventures in Wonderland

Author: Lewis Carroll

Release date: June 27, 2008 [eBook #11]
Most recently updated: March 30, 2021

Language: English

Credits: Arthur DiBianca and David Widger

START OF THE PROJECT GUTENBERG EBOOK ALICE'S ADVENTURES IN WONDERLAND 
[Illustration]

Alice’s Adventures in Wonderland

by Lewis Carroll

THE MILLENNIUM FULCRUM EDITION 3.0

Contents
{'source': '/content/data/books/alice_in_wonderland.md', 'start_in

  warn_deprecated(


 split 1 documents into 196 chunks.
The Project Gutenberg eBook of Alice's Adventures in Wonderland

This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this ebook or online
at www.gutenberg.org. If you are not located in the United States,
you will have to check the laws of the country where you are located
before using this eBook.

Title: Alice's Adventures in Wonderland

Author: Lewis Carroll

Release date: June 27, 2008 [eBook #11]
Most recently updated: March 30, 2021

Language: English

Credits: Arthur DiBianca and David Widger

START OF THE PROJECT GUTENBERG EBOOK ALICE'S ADVENTURES IN WONDERLAND 
[Illustration]

Alice’s Adventures in Wonderland

by Lewis Carroll

THE MILLENNIUM FULCRUM EDITION 3.0

Contents
{'source': '/content/data/books/alice_in_wonderland.md', 'start_in

In [68]:

def query_data(question,Chroma_path):
  embedding_function = OpenAIEmbeddings(api_key = '')
  db = Chroma(persist_directory=Chroma_path, embedding_function=embedding_function)
  matching_context = db.similarity_search_with_relevance_scores(question, k =3)
  if len(matching_context) == 0 or matching_context[0][1]<0.7:
    print("Unable to find matching content in vector database")
    return

  context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in matching_context])
  prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
  prompt = prompt_template.format(context = context_text, question =question )
  print(prompt)

  model = ChatOpenAI(api_key = '')
  response_text = model.predict(prompt)
  sources = [doc.metadata.get("source", None) for doc, _score in matching_context]
  formatted_response = f"Response: {response_text}\nSources: {sources}"
  print(formatted_response)




In [69]:
question = "What is the story of alice in wonderland? Please summarize it"
query_data(question,Chroma_path)

Human: 
Yor are a search engine. Answer the question based only on the following context:

Contents

CHAPTER I. Down the Rabbit-Hole
CHAPTER II. The Pool of Tears
CHAPTER III. A Caucus-Race and a Long Tale
CHAPTER IV. The Rabbit Sends in a Little Bill
CHAPTER V. Advice from a Caterpillar
CHAPTER VI. Pig and Pepper
CHAPTER VII. A Mad Tea-Party
CHAPTER VIII. The Queen’s Croquet-Ground
CHAPTER IX. The Mock Turtle’s Story
CHAPTER X. The Lobster Quadrille
CHAPTER XI. Who Stole the Tarts?
CHAPTER XII. Alice’s Evidence

CHAPTER I.
Down the Rabbit-Hole

Alice was beginning to get very tired of sitting by her sister on the
bank, and of having nothing to do: once or twice she had peeped into
the book her sister was reading, but it had no pictures or
conversations in it, “and what is the use of a book,” thought Alice
“without pictures or conversations?”

---

Contents

CHAPTER I. Down the Rabbit-Hole
CHAPTER II. The Pool of Tears
CHAPTER III. A Caucus-Race and a Long Tale
CHAPTER IV. The Rabbit S