# Prerequisites
* Store Following api keys in `secrets` of google colab

  1. Huggingface access token with name `HF_TOKEN`.

* Get access of `meta-llama/Meta-Llama-3-8B-Instruct` from huggingface.
* Create a folder `FILES` and then put wour `txt` files in it.

# Installing requirements

In [None]:
!pip install -qqq git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git@main
!pip install -qqq accelerate bitsandbytes
!pip install -qqq -U langchain-community
!pip install -qqq -U langchain_chroma
!pip install -qqq -U langchain-huggingface
!pip install -qqq -U gradio

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
  Building wheel for peft (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.1/69.1 MB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m50.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m54.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m411.6/411.6 kB[0m [31m33.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

# Accessing Files

In [None]:
import os
import glob
import gradio as gr
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [None]:
folder = "FILES/"
documents=[]
loader = DirectoryLoader(folder, glob="*.txt", loader_cls=TextLoader)
folder_documents = loader.load()
for doc in folder_documents:
  documents.append(doc)

In [None]:
documents

[Document(metadata={'source': 'FILES/text1.txt'}, page_content='The Medicines and Healthcare products Regulatory Agency (MHRA) has announced the selection of five healthcare technologies for its ‘AI Airlock’ scheme.\n\nAI Airlock aims to refine the process of regulating AI-driven medical devices and help fast-track their safe introduction to the UK’s National Health Service (NHS) and patients in need.\n\nThe technologies chosen for this scheme include solutions targeting cancer and chronic respiratory diseases, as well as advancements in radiology diagnostics. These AI systems promise to revolutionise the accuracy and efficiency of healthcare, potentially driving better diagnostic tools and patient care.\n\nThe AI Airlock, as described by the MHRA, is a “sandbox” environment—an experimental framework designed to help manufacturers determine how best to collect real-world evidence to support the regulatory approval of their devices.\n\nUnlike traditional medical devices, AI models conti

In [None]:
len(documents)

3

# Chunking
Once we've loaded documents, we'll often want to transform them to better suit our application. The simplest example is we may want to split a long
document into smaller chunks that can fit into our model's context window. LangChain has a number of built-in document transformers that make it easy to split, combine, filter, and otherwise manipulate documents.

At a high level, text splitters work as following:

1. Split the text up into small, semantically meaningful chunks (often sentences).
2. Start combining these small chunks into a larger chunk until you reach a certain size (as measured by some function).
3. Once you reach that size, make that chunk its own piece of text and then start creating a new chunk of text with some overlap (to keep context between chunks).

We will user `RecursiveCharacterTextSplitter` that focuses on maintaining context and relationships between text segments.



In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

In [None]:
len(chunks)

20

In [None]:
chunks[9]

Document(metadata={'source': 'FILES/text3.txt'}, page_content='Google CEO Sundar Pichai has announced the launch of Gemini 2.0, a model that represents the next step in Google’s ambition to revolutionise AI.\n\nA year after introducing the Gemini 1.0 model, this major upgrade incorporates enhanced multimodal capabilities, agentic functionality, and innovative user tools designed to push boundaries in AI-driven technology.\n\nLeap towards transformational AI  \nReflecting on Google’s 26-year mission to organise and make the world’s information accessible, Pichai remarked, “If Gemini 1.0 was about organising and understanding information, Gemini 2.0 is about making it much more useful.”\n\nGemini 1.0, released in December 2022, was notable for being Google’s first natively multimodal AI model. The first iteration excelled at understanding and processing text, video, images, audio, and code. Its enhanced 1.5 version became widely embraced by developers for its long-context understanding, 

In [None]:
for chunk in chunks:
  if 'Clarifai' in chunk.page_content:
    print(chunk)
    print("******************************")

page_content='Artificial intelligence platform provider Clarifai has unveiled a new compute orchestration capability that promises to help enterprises optimise their AI workloads in any computing environment, reduce costs and avoid vendor lock-in.

Announced on December 3, 2024, the public preview release lets organisations orchestrate AI workloads through a unified control plane, whether those workloads are running on cloud, on-premises, or in air-gapped infrastructure. The platform can work with any AI model and hardware accelerator including GPUs, CPUs, and TPUs.

“Clarifai has always been ahead of the curve, with over a decade of experience supporting large enterprise and mission-critical government needs with the full stack of AI tools to create custom AI workloads,” said Matt Zeiler, founder and CEO of Clarifai. “Now, we’re opening up capabilities we built internally to optimise our compute costs as we scale to serve millions of models simultaneously.”' metadata={'source': 'FILES

# Creating Embeddings and storing them to vector database

We wil create embeddings. A text embedding is a piece of text projected into a high-dimensional latent space. The position of our text in this space is a vector, a long sequence of numbers. Think of the two-dimensional cartesian coordinates from algebra class, but with more dimensions—often 768 or 1536.

Mathematically, an embedding space, or latent space, is defined as a manifold in which similar items are positioned closer to one another than less similar items. In this case, sentences that are semantically similar should have similar embedded vectors and thus be closer together in the space.

**Vector Embedding model used:** `BAAI/bge-large-en` (Open Source)

**Vector Database used:** `Chroma`

In [None]:
from langchain_chroma import Chroma
from langchain_huggingface import HuggingFaceEmbeddings

In [None]:
db_name = "bge_db"

In [None]:
embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-large-en",encode_kwargs={'normalize_embeddings':True})

In [None]:
if os.path.exists(db_name):
  Chroma(persist_directory = db_name,embedding_function = embeddings).delete_collection()

In [None]:
vectorstore = Chroma.from_documents(documents=chunks,embedding = embeddings,persist_directory=db_name)
print(f"Chroma Vectorstore created with {vectorstore._collection.count()} chunks")

Chroma Vectorstore created with 20 chunks


In [None]:
vectorstore_collection = vectorstore._collection
embedding_sample = vectorstore_collection.get(limit=1,include=["embeddings"])["embeddings"][0]
len(embedding_sample) ## size of sample embedding

1024

# Creating the Conversational Retrieval Chain:

Steps:
  
  1. Setup LLM (`open` source)
  2. Create a retriever form vector database
  3. Create a `Retriever Chain:` it will retrieve the relevant data from the vector store.
      
      * Create a prompt that contains the user input, the chat history, and a message to generate a search query.
      * we will use `create_history_aware_retriever` chain to retrieve the relevant data from the vector store.
      * It will take `llm, prompt, retreiver` as input
  4. The next step is to send the retrieved documents from the vector store along with a prompt to the llm to get the response to the user input.
      * We create a prompt containing the context (retrieved documents from vector store), chat history and the user input.
      * Next, we create a `Document Chain` using `create_stuff_documents_chain` which will send the prompt to the llm.
      * At last, we combine `retriever_chain` and `document_chain` using `create_retrieval_chain` to create a Conversational retrieval chain.



# Setting up llm

In [None]:
from transformers import pipeline, AutoTokenizer
import transformers
from langchain.llms import HuggingFacePipeline
import torch

model_id = 'meta-llama/Meta-Llama-3-8B-Instruct'

bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
)
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
)
tokenizer = AutoTokenizer.from_pretrained(model_id)


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [None]:
query_pipeline = pipeline(
    "text-generation",
    model=model,
    temperature = 0.0000001,
    tokenizer=tokenizer,
    torch_dtype=torch.float16,
    device_map="auto",
)


Device set to use cuda:0


In [None]:
query_pipeline("What is ai")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


[{'generated_text': 'What is ai-powered chatbot?\nAn AI-powered chatbot is a computer program that uses artificial intelligence (AI) and machine learning (ML) to simulate human-like conversations with users. Chatbots are designed to interact with users through text or voice interactions, and they can be integrated into various applications, such as messaging platforms, websites, and mobile apps.\nAI-powered chatbots use natural language processing (NLP) and machine learning algorithms to understand and respond to user input. They can be trained to recognize and respond to specific keywords, phrases, and intent, allowing them to provide personalized and relevant responses to users.\nSome common features of AI-powered chatbots include:\n1. Natural Language Processing (NLP): AI-powered chatbots use NLP to understand and interpret user input, including text and voice commands.\n2. Machine Learning (ML): Chatbots use ML algorithms to learn from user interactions and improve their responses 

In [None]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

llm = HuggingFacePipeline(pipeline=query_pipeline)

  llm = HuggingFacePipeline(pipeline=query_pipeline)


# Prompt To Generate Search Query For Retriever


In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts import MessagesPlaceholder

system_instruction = """Given the above conversation, generate a search query to look up to get information relevant to the conversation"""

prompt_search_query = ChatPromptTemplate.from_messages([
    ("system", system_instruction),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}")
])


# Retriever Chain


In [None]:
from langchain.chains import create_history_aware_retriever


In [None]:
retriever = vectorstore.as_retriever(search_type="similarity",search_kwargs={"k": 3})  # Limiting to top 3 relevant documents

In [None]:
retriever_chain = create_history_aware_retriever(llm, retriever, prompt_search_query)

# Prompt To Get Response From LLM Based on Chat History


In [None]:
system_prompt = """
Given these texts:
-----
{context}
-----
Please answer the following question:
{input}
Return ONLY the direct, concise answer to the question as "Answer". Do not include any context, labels, or explanation. The answer should be to the point and nothing else .
"""
prompt_get_answer = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}")
])


# Document Chain

In [None]:
from langchain.schema import BaseOutputParser

class AnswerParser(BaseOutputParser):
    def parse(self, output: str) -> str:
        """
        This custom parser will ensure that only the concise answer is returned.
        It will strip out anything like context, explanation, or extra details.
        """
        start = "Answer:"
        if start in output:
            answer = output.split(start, 1)[1].strip()  # Extract the portion after "Answer:"
        else:
            answer = output.strip()
        return answer


In [None]:
from langchain.chains.combine_documents import create_stuff_documents_chain
document_chain = create_stuff_documents_chain(
    llm=llm,
    prompt=prompt_get_answer,
    output_parser=AnswerParser(),
)

now have a `retriever_chain` that retrieves the relevant data from vector store, and `document_chain` that sends the chat_history, relevant data and user input to the llm.

# Conversational Retrieval Chain

in the final step, we combine `retriever_chain` and `document_chain` using `create_retrieval_chain` to create a Conversational retrieval chain

In [None]:
from langchain.chains import create_retrieval_chain
retrieval_chain = create_retrieval_chain(retriever_chain, document_chain)

In [None]:
chat_history = []
response = retrieval_chain.invoke({
"chat_history":chat_history,
"input":"What is clarifai"
})

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


In [None]:
answer = response['answer']
context = response['context']
print("Answer:", answer)
print("Context:", context)

Answer: Artificial intelligence platform provider.
Context: [Document(metadata={'source': 'FILES/text2.txt'}, page_content='The compute orchestration capabilities build on Clarifai’s existing AI platform that, the company says, has processed over 2 billion operations in computer vision, language, and audio AI. The company reports maintaining 99.99%+ uptime and 24/7 availability for critical applications.\n\nThe compute orchestration capability is currently available in public preview. Organisations interested in testing the platform should contact Clarifai for access.'), Document(metadata={'source': 'FILES/text2.txt'}, page_content='Artificial intelligence platform provider Clarifai has unveiled a new compute orchestration capability that promises to help enterprises optimise their AI workloads in any computing environment, reduce costs and avoid vendor lock-in.\n\nAnnounced on December 3, 2024, the public preview release lets organisations orchestrate AI workloads through a unified co

In [None]:
from langchain_core.messages import HumanMessage, AIMessage

def chat(question, history):
    ai_message = retrieval_chain.invoke({"input": question, "chat_history": chat_history})
    chat_history.extend([HumanMessage(content=question), ai_message["answer"]])
    return ai_message['answer']


In [None]:
gradio_interface = gr.ChatInterface(chat).launch()



Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://3b25ba52000a57b0ed.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
