# Before running:
* Make sure you have Ollama installed. and are running at least one model. In this example I am using LLama3.2 model
* Ollama can be downloaded from https://ollama.com/ and can be run without hassle. 
* Make sure you download sentence-transformers model for embedding generation. 
* It will assume that the sentence-transformers is saved in the ./pretrained/ directory 

In [11]:
#Downloading the sentence-transformers llm
#Need to run these lines once. No need to run them at each iteration. 

# from transformers import AutoModel, AutoTokenizer
#import os 
# # Specify the model name
# model_name = "sentence-transformers/all-MiniLM-L6-v2"

# # Download the model and tokenizer
# model = AutoModel.from_pretrained(model_name)
# tokenizer = AutoTokenizer.from_pretrained(model_name)

# # Save the model and tokenizer to a local directory
# os.mkdir('./pretrained/')
# local_model_path = "./pretrained"
# model.save_pretrained(local_model_path)
# tokenizer.save_pretrained(local_model_path)

# print(f"Model downloaded and saved to {local_model_path}")


In [2]:
# If you want a pdf source, please use the relevant libraries to load pdf files 
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
# List of URLs to load documents from
urls = [
    "https://www.webmd.com/diabetes/diabetes-basics",
    "https://www.webmd.com/diabetes/diabetes-hyperglycemia",
    "https://www.webmd.com/diabetes/diabetes-hypoglycemia"
]
# Load documents from the URLs
docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [3]:
# Initialize a text splitter with specified chunk size and overlap. This can be modified as per requirement. 
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=250, chunk_overlap=0
)
# Split the documents into chunks
doc_splits = text_splitter.split_documents(docs_list)

In [4]:
doc_splits[5]

Document(metadata={'source': 'https://www.webmd.com/diabetes/diabetes-basics', 'title': 'Diabetes: An Overview', 'description': "Diabetes is a lifelong disease. There's no cure but you can manage and control it. Let's understand the Symptoms, Types and Treatment options from the experts at WebMD.", 'language': 'en'}, page_content='Insulin and DiabetesTo understand why insulin is important in diabetes, it helps to know more about how your body uses food for energy. Your body is made up of millions of cells. To make energy, these cells need food in a very simple form. When you eat or drink, much of your food is broken down into a simple sugar called "glucose." Then, glucose is transported through your bloodstream to the cells of your body where it can be used to provide some of the energy your body needs for daily activities. The amount of glucose in your bloodstream is regulated by the hormone insulin. Insulin is always being released in small amounts by your pancreas. When the amount o

In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.schema import Document

# Step 3: Initialize HuggingFace embeddings (offline)
# Use a local embedding model such as 'sentence-transformers/all-MiniLM-L6-v2'
embedding_model = HuggingFaceEmbeddings(model_name="./pretrained/") #remember the name of LLM is now changed as per the directory. 

# Step 4: Create a FAISS vector store from document chunks and embeddings
vectorstore = FAISS.from_documents(doc_splits, embedding_model)

# Step 5: Initialize a retriever with similarity search
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 5})

  embedding_model = HuggingFaceEmbeddings(model_name="./pretrained/")
  from tqdm.autonotebook import tqdm, trange
No sentence-transformers model found with name ./pretrained/. Creating a new one with mean pooling.


In [6]:
#!pip install langchain_ollama  #library needed to communicate with Ollama. 
from langchain_ollama import ChatOllama
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Define the prompt template for the LLM
prompt = PromptTemplate(
    template="""You are an assistant for question-answering tasks.
    Use the following documents to answer the question.
    If you don't know the answer, just say that you don't know.
    Use three sentences maximum and keep the answer concise:
    Question: {question}
    Documents: {documents}
    Answer:
    """,
    input_variables=["question", "documents"],
)

In [7]:
# Initialize the LLM with Llama 3.2 model
llm = ChatOllama(
    model="llama3.2",
    temperature=0,
)

In [8]:
# Create a chain combining the prompt template and LLM
rag_chain = prompt | llm | StrOutputParser()

In [9]:
# Define the RAG application class
class RAGApplication:
    def __init__(self, retriever, rag_chain):
        self.retriever = retriever
        self.rag_chain = rag_chain
    def run(self, question):
        # Retrieve relevant documents
        documents = self.retriever.invoke(question)
        # Extract content from retrieved documents
        doc_texts = "\\n".join([doc.page_content for doc in documents])
        # Get the answer from the language model
        answer = self.rag_chain.invoke({"question": question, "documents": doc_texts})
        return answer

In [10]:
# Initialize the RAG application
rag_application = RAGApplication(retriever, rag_chain)
# Example usage
question = "explain difference between hyperglycemia and hypoglycemia?"
answer = rag_application.run(question)
print("Question:", question)
print("Answer:", answer)

Question: explain difference between hyperglycemia and hypoglycemia?
Answer: Here is the answer:

Hyperglycemia and hypoglycemia are two distinct conditions related to blood sugar levels. Hyperglycemia occurs when blood sugar levels are too high, typically above 180 mg/dL after eating, while hypoglycemia occurs when blood sugar levels are too low, below normal levels. The main difference between the two is that hyperglycemia can lead to serious health problems if left untreated, whereas hypoglycemia can be treated with food or medication.


### Note that inference time might be longer if you dont have NVIDIA GPUs installed. 