# Safely Store ENV variables for Gemini, Chroma DB connection, and vector embeddings
### Additional techniques: cleared memory of any existing variables to avoid accidental exposure. Performed due to significant security risk of exposing PII and PHI for healthcare data.
---

In [None]:
# INSTALLS
#! pip install rank-bm25
#! pip install langchain-classic rank_bm25 langchain-community
#! pip install -U langchain-google-vertexai

In [18]:
# imports 
# # load dotenv + imports for retriever tool
import os
from dotenv import load_dotenv
from langchain_chroma import Chroma 
from langchain.tools import tool
import langchainhub as hub
import chromadb
from langchain_google_genai import GoogleGenerativeAIEmbeddings #type:ignore
import vertexai
from vertexai.evaluation import EvalTask
from vertexai.language_models import TextEmbeddingModel
from langchain_core.runnables import Runnable

In [34]:

import os
from dotenv import load_dotenv
# Load environment variables from .env file first
load_dotenv()

# load and set environments---safe because no actual var are exposed, just using a flag method
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY", "")
GOOGLE_GENAI_USE_VERTEXAI = os.environ["GOOGLE_GENAI_USE_VERTEXAI"] = "true"
GOOGLE_CLOUD_PROJECT = os.environ["GOOGLE_CLOUD_PROJECT"] = "gen-lang-client-0343643614"
GCLOUD_PROJECT = os.environ["GCLOUD_PROJECT"] = "gen-lang-client-0343643614"
GOOGLE_CLOUD_LOCATION = os.environ["GOOGLE_CLOUD_LOCATION"] = "us-east4"
GOOGLE_APPLICATION_CREDENTIALS = os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/Users/briannamitchell/.config/gcp/vertex-sa.json"

In [35]:
import os
print("GOOGLE_CLOUD_PROJECT:", os.getenv("GOOGLE_CLOUD_PROJECT"))
print("GCLOUD_PROJECT:", os.getenv("GCLOUD_PROJECT"))
print("GOOGLE_APPLICATION_CREDENTIALS:", os.getenv("GOOGLE_APPLICATION_CREDENTIALS"))
print("GOOGLE_CLOUD_LOCATION:", os.getenv("GOOGLE_CLOUD_LOCATION"))
print("GOOGLE_GENAI_USE_VERTEXAI:", os.getenv("GOOGLE_GENAI_USE_VERTEXAI"))

GOOGLE_CLOUD_PROJECT: gen-lang-client-0343643614
GCLOUD_PROJECT: gen-lang-client-0343643614
GOOGLE_APPLICATION_CREDENTIALS: /Users/briannamitchell/.config/gcp/vertex-sa.json
GOOGLE_CLOUD_LOCATION: us-east4
GOOGLE_GENAI_USE_VERTEXAI: true


In [36]:
# Load environment variables from chromaDB with persistence locally, and prevent unecessary calls to API--more cost-effective approach
persistent_client = chromadb.PersistentClient(path="./chroma_db")

# Importing Gemini for embedding
embeddings = GoogleGenerativeAIEmbeddings(model="gemini-embedding-001")


# **Challenges + Mitigation strategies for Performing Vector Search:**
---
#### **Tradeoffs:**
- Originally, my plan was to use the same framework for both retrievers, then fetch the data from my vector store in chroma. However, I found that my BM25 retriever simply would not return any data for either framework--langchain or llamaindex. Later on, I found out that due to how the BM25-Retriever architecture is built, the keyword search library needs access to pull data from **raw text nodes**. This does not align with my current Chroma vector store DB architecture, which is built **with stored embeddings--not original text**. Hence, why orginally no data was being returned in my prior iteration. 

#### **How I solved for this/Why I chose this method?:**
To mitigate for this, I decided to change my framework and use llamaindex for BM25/lexical retriever; it provides a much more seamless integration with the Chroma vector store architecture. Then, I used llamaindex to pass my list of nodes directly to the BM25 retriever, which solved the short-term problem.

#### **Improvement with Llama_Index:**
- Use raw text nodes for BM25 retrieval to ensure compatibility with the keyword search library
- Implement response enhancement by providing additional context or *page_content* for my document queries.

---

## **Pitfall/Tradeoffs with my original approach:**
While attempting to test the combination of langchain and llamaIndex for different retrievers, I came across multiple problems. The main issue, my database and embeddings are in ChromaDB. The issue arises when performing a hybrid retrieval--where you **combine both a semantic, dense retriever and a lexical, keyword precise retriever**. The **root problem stemmed from BM25's architecture**. Semantic retriever seamlessy integrates with my dB because it uses the **vector_store from Chroma**. My BM25 does not integrate well because it **must pass the queried Document objects as nodes**--**something that can only be performed with VectorStoreIndex in LlamaIndex**. Being that Chroma is very langchain dependent and VectorStoreIndex is LlamaIndex dependant, combining these two very different methods would result in uneccessary overhead. I also had to factor in the limited constraint of having to use Gemini 1.5 with VertexAI embeddings, which is a closed-solution.

- If I had to start over, to successfully integrate a hybrid retrieval RAG, I would have used either **OpenAI embeddings, ChatGPT4o, and ChromaDB** to still keep the closed-source solution in production. 

- If building proof of concept locally, I would use **FAISS for my vector store** + replace my **model with Gemma from HuggingFace.** for a more open source solution.

---


# **Final RAG Approach:**
---

## My main consideration...**How can I pivot without losing retrieval quality?****
My plan is to move to a more powerful approach by performing accurate similarity search with my **vector_store in ChromaDb** as my retriever in langchain. This is done to obtain exact matches with the potential of ingesting large documents via **Approximate Nearest Neighbors (ANN) algorithm built-in to ChromaDB**. Then, I will apply **Maximum Marginal Relevance(MMR)**. It's a great pivot from my prior solution. I'm able to optimize for balance between **relevance and diversity in a faster time complexity**. In a live healthcare environment, my agent would come in contact with multiple provider oppinions. In order to accurately generate responses that mimic live production, I would need to compensate for scaling this factor later on. Hence, whyn I chose this method.
**Benefits:**
- Prioritizes relevance and diversity during retrieval
- Compensates for random spikes during data ingestion
- Balances exploration with exploitation
- Ensures my retrieved documents are distinct from each other

# **Creating VectorStore for LangChain + Enhancing Responses by Verifying Context Retrieved: How I Improved Generator Output:** 
---
### Used response enhancement as a guideline for the model to follow when verifying retrieved content, building a better **knowledge base engine** for my agent to fetch. This system design technique results in a more enhanced **RAG performance + quality**. 




In [37]:
# adding response enhancement for generated outputs
from langchain_core.documents import Document
from langchain_community.vectorstores import Chroma

#get collection name from chromaDB
#COLLECTION_NAME = "ng12"
#chroma_collection = client.get_collection(name=COLLECTION_NAME)
#embed_model = embeddings

#vectore store
vectorstore = Chroma(
    collection_name="ng12",
    embedding_function=embeddings,
    client=persistent_client
)

#storage context
#storage_context = StorageContext.from_defaults(vectorstore=vectorstore)
#storage_context.persist()

documents = [
    Document(page_content="Shortness of breath with cough or fatigue or chest pain or weight loss or appetite loss (unexplained), 40 and over: possible cancer Lung or mesothelioma", metadata={"referral": "Urgent", "source": "Suspected cancer: recognition and referral (NG12) 2026", "page": "52"}),
    Document(page_content="Bleeding, bruising or petechiae, unexplained: possible cancer Leukaemia", metadata={"referral": "Very urgent", "source": "Suspected cancer: recognition and referral (NG12) 2026", "page": "43"}),
    Document(page_content="Fracture unexplained, 60 and over: possible cancer Myeloma", metadata={"referral": "Unexplained", "source": "Suspected cancer: recognition and referral (NG12) 2026", "page": "55"}),
    Document(page_content="Refer people using a suspected cancer pathway referral for oesophageal cancer if they: have dysphagia or, are aged 55 and over, with weight loss, and they have any of the following: upper abdominal pain, reflux, dyspepsia. [2015, amended 2025]", metadata={"referral": "Suspected cancer pathway referral", "source": "Suspected cancer: recognition and referral (NG12) 2026", "page": "11"}),
    Document(page_content="Skin lesion that raises the suspicion of a basal cell carcinoma: possible cancer Basal cell carcinoma  ", metadata={"referral": "Raises the suspicion of", "source": "Suspected cancer: recognition and referral (NG12) 2026", "page": "58"}),
    Document(page_content="Urinary urgency or frequency, increased and persistent or frequent, particularly more than 12 times per month in women, especially if 50 and over: possible cancer Ovarian", metadata={"referral": "Persistent", "source": "Suspected cancer: recognition and referral (NG12) 2026", "page": "60"}),
    Document(page_content="Upper abdominal pain with low haemoglobin levels or raised platelet count or nausea or vomiting, 55 and over: possible cancer Oesophageal or stomach ", metadata={"referral": "Non-urgent", "source": "Suspected cancer: recognition and referral (NG12) 2026", "page": "40"}),
    Document(page_content="Petechiae unexplained in children and young people: possible cancer Leukaemia", metadata={"referral": "Immediate", "source": "Suspected cancer: recognition and referral (NG12) 2026", "page": "74"})
]

##### Maximum Marginal Relevance:
---
I use my lambda (ƛ) value as a threshold:
- 0 = Max diversity ƛ=0
- 1 = max relevance ƛ=1

**Potential Tuning for MMR:**
- Navigational/Exact--> ƛ= 0.7 - 0.9: "Is this patient at 403 an urgent referral?"
- Balanced/Research--> ƛ= 0.5 - 0.7: "What is the best referral recommendation based on symptoms of patient403?"
- Exploratory/Diverse--> ƛ= 0.3 - 0.5: "Give me a summary of findings and determine referral status.."

In [38]:
#create retriever from vector store and then apply MMR 

retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 8,
        "fetch_k": 50,
        "lambda_mult": 0.6
    }
)

# Techniques used(for my readme):
- vector store as retrieval
- response enhancement
- MMR to diversify the output response 
- Source attribution--> helps prevent hallucinations in the model to output where information comes from
- add Maximal marginal relevance to ensure a balance of diversity and relevance retrieved

# **Perform Context Engineering**
---
### Tradeoffs:

**Why did I choose this approach for context engineering?**

**Reasoning:** 
At first, my goal was to return a single string of relevant documents. However, I realized that my agent would have trouble parsing my actual data in production. Based on the way the data is formatted in retrieval, I had to make some adjustments, and go a bit deeper into context engineering. Also, one of the key constraints in building this agent included a task to cite the specific sources within the relevant data retrieved from the NG12 documents. I could have kept my single string, however, there would have been significant fine-tuning and overhead later. Performing context engineering was the only plausible to way achieve this goal. 

#### Adding Clinical Context Tool
---

# Source attribution Prompt
---

In [39]:
# attrbution
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

attribution_prompt = ChatPromptTemplate.from_template(
    """ You are a precise and accurate clinical AI assistant that provides expert knowledge in
    clinical decision-making support for those who encounter direct patient care.
    You have inherited a persona profile module as your agentic architecture.
    Main Task: Your main task is to determine whether the presented patient requires an urgent referral or not.
    Use the information provided from the text corpus below:
    The National Institute for Health and Care Excellence (NICE) Guideline for Suspected cancer: recognition and referral NICE guideline.
    Once, main task is complete and accurate, you must provide a recommendation that decides post-referral instructions corresponding to the most relevant medical imaging practices.
    If patient does not meet urgent referral criteria, do not recommend any medical imaging practices.

    Answer the following question based ONLY on the provided sources. 
    For each fact or claim in your answer include a citation that refers to the source.

    Do not make up information or provide personal opinions in your responses without verifying answers with evidence.
    You must cite the specific sources you found from the NICE guidelines in this specific format below:
    This is how you are expected to format: [referral type: insert referral, source: name of text corpus, (year published), page: insert page number]
    
    Your source attributes at the end of your responses will look like this template below in practice:
    [referral: Persistent, source: Suspected cancer: recognition and referral (NG12), 2026, page: 60]

    How your input and output will be formatted with citation sources used:
    Question: {question}

    Sources: {sources}

    Your answer:
    """
)

### Helper functions to further format the sources with citation numbers and generate attributed responses
---

In [40]:
#source formatted strings from documents
def format_sources_with_citations(documents):
    formatted_sources = []
    for i, doc in enumerate(documents, 1):
        source_info = f"[{i} {doc.metadata.get('source', 'Unknown source')}]"
        if doc.metadata.get('page'):
            source_info += f", page {doc.metadata['page']}"
        formatted_sources.append(f"{source_info}\n{doc.page_content}")
    return "\n\n".join(formatted_sources)
    

### Building RAG chain with my source attribution 
---

In [41]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", temperature=0.0)

def generated_attributed_response(question: str):
    retrieved_docs = retriever.invoke(question)
    sources_formatted  = format_sources_with_citations(retrieved_docs)
    attribution_chain = attribution_prompt | llm | StrOutputParser()

    response = attribution_chain.invoke({
        "question": question,
        "sources": sources_formatted
    })
    return response

In [None]:
question = "What are some symptoms that require an urgent referral?"
response = generated_attributed_response(question)
print(response)

### Consistency Checking for Accuracy
---