# Demo 1 - RAG Pipeline with Chained Prompt Processing
By: [Lior Gazit](https://github.com/LiorGazit).  
Repo: [Agents-Over-The-Weekend](https://github.com/PacktPublishing/Agents-Over-The-Weekend/tree/main/Lior_Gazit/workshop_september_2025/)  
Running LLMs locally for free: This code leverages [`LLMPop`](https://pypi.org/project/llmpop/) that is dedicated to spinning up local or remote LLMs in a unified and modular syntax.  

<a target="_blank" href="https://colab.research.google.com/github/PacktPublishing/Agents-Over-The-Weekend/blob/main/Lior_Gazit/workshop_september_2025/codes_for_Lior_Bootcamp_talk_sept2025_demo1.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a> (pick a GPU Colab session for fastest computing)  

```
Disclaimer: The content and ideas presented in this notebook are solely those of the author, Lior Gazit, and do not represent the views or intellectual property of the author's employer.
```

Installing:

In [1]:
%pip -q install llmpop
%pip -q install sentence-transformers faiss-cpu langchain langchain_core # tiktoken langsmith langchain_openai

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.4/31.4 MB[0m [31m34.9 MB/s[0m eta [36m0:00:00[0m
[?25h

**Imports:**

In [2]:
import os
import requests

Setting up the Vector Store and RAG pipeline:

In [3]:
# Import required libraries
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

# Example documents (could be clinical notes, financial filings, etc.)
documents = [
    "Patient has diabetes type 2 and shows high glucose levels.",
    "Recent financial filings show revenue growth despite supply chain issues.",
    "Patient diagnosed with hypertension, recommended lifestyle changes.",
    "The company's earnings call mentioned concerns over increased production costs.",
]

# Step 1: Create embeddings using SentenceTransformer
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')  # lightweight embedding model

# Generate embeddings for the documents
document_embeddings = embedding_model.encode(documents)

# Step 2: Setup FAISS vector store
dimension = document_embeddings.shape[1]
faiss_index = faiss.IndexFlatL2(dimension)
faiss_index.add(document_embeddings)

# Step 3: Define the retriever(query) function
def retriever(query, top_k=2):
    # Generate embedding for the query
    query_embedding = embedding_model.encode([query])

    # Perform the similarity search in the FAISS index
    distances, indices = faiss_index.search(query_embedding, top_k)

    # Retrieve the top_k most similar documents
    retrieved_docs = [documents[idx] for idx in indices[0]]

    return retrieved_docs

# Example usage of the retriever
query = "What did the company say about production costs?"
context_docs = retriever(query)
print("\n\nRetrieved documents for context:\n", context_docs)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]



Retrieved documents for context:
 ["The company's earnings call mentioned concerns over increased production costs.", 'Recent financial filings show revenue growth despite supply chain issues.']


Prompting using a locally hosted LLM via Ollama:

In [4]:
from llmpop import init_llm
from langchain_core.prompts import ChatPromptTemplate

query = "What is the patient's diagnosis given these notes?"
context_docs = retriever(query)
question = f"Using the following context, answer the question:\n\n{context_docs}\n\nQ: {query}\n\n---\nA:"
local_llm = init_llm(model="llama3.2:1b", provider="ollama")

prompt = ChatPromptTemplate.from_template("Q: {question}\nA:")
print((prompt | local_llm).invoke({"question":question}).content)

🚀 Installing Ollama...
🚀 Starting Ollama server...
→ Ollama PID: 2728
⏳ Waiting for Ollama to be ready…
Ready!

🚀 Pulling model 'llama3.2:1b'…
All done setting up Ollama (ChatOllama).

Given the context provided, it appears that the patient has both diabetes type 2 and hypertension (high blood pressure). The first note mentions the high glucose levels, which are a key indicator of diabetes type 2. However, the second note specifically mentions hypertension as part of the recommended lifestyle changes for the patient. This suggests that while the patient may have diabetes type 2, they also have hypertension, possibly indicating that their overall health management plan will focus on both conditions.


Prompting using OpenAI's API (paid) route:

In [5]:
# In Colab, use getpass to securely prompt for your API key
from getpass import getpass
import openai

openai.api_key = getpass("Paste your OpenAI API key: ")

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role":"system","content":"You are a medical assistant."},
        {"role":"user",  "content": question}
    ]
)

answer_api = response.choices[0].message.content
print("\n\n")
print(answer_api)

Paste your OpenAI API key: ··········



The patient has been diagnosed with diabetes type 2 and hypertension.
