# RAG Failure #10: The "Common Neighbor" Intersection Gap

## The Problem
A user asks: **"What is the interaction risk between Drug A and Drug B?"**
Standard RAG retrieves documents about Drug A and Drug B. 
Even if RAG retrieves **ALL** relevant documents, LLMs often fail to synthesize **implicit indirect connections**. They prefer explicit statements. If a document says "Drug A and B are both popular", the LLM latches onto that superficial link, ignoring the complex technical chain: `Drug A -> Inhibits Enzyme X` and `Drug B -> Needs Enzyme X`.

## The Scenario: Drug-Drug Interaction (DDI) Discovery
**Query:** "Is there a specific interaction risk between **Zenthorax** and **Vira-X**?"

**The Disjointed Data:**
1.  **Doc 1 (Zenthorax Profile):** "**Zenthorax** is a potent antifungal. It is a strong **CYP3A4 Inhibitor**."
2.  **Doc 2 (Vira-X Profile):** "**Vira-X** is a statin used for cholesterol. It is heavily metabolized by the **CYP3A4 Enzyme**."
3.  **Doc 3 (The Red Herring):** "Market Analysis: Both **Zenthorax** and **Vira-X** are FDA-approved, top-selling pills available at pharmacy counters."

**Naive RAG Failure:** Even with full context, the LLM sees Doc 3 explicitly mentioning both drugs together. It answers: *"Both are FDA approved pills."* It misses the metabolic clash (Inhibition + Metabolism = Toxic Build up).

**KG Solution:** We map `Drug -> Mechanism -> Protein`. We run an intersection query. We apply a logic rule: `If A Inhibits X AND B is Metabolized by X -> DANGER`.

In [None]:
# --- Step 1: Environment Setup ---
!pip install -q langchain langchain-community langchain-huggingface faiss-cpu networkx transformers sentence-transformers accelerate bitsandbytes

In [None]:
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
from langchain_huggingface import HuggingFacePipeline, HuggingFaceEmbeddings
import networkx as nx

# --- Step 2: Load Model ---
model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

print(f"Loading {model_id}...")
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

pipe = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_new_tokens=256, 
    temperature=0.1, 
    do_sample=True
)

llm = HuggingFacePipeline(pipeline=pipe)
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
print("Model loaded. Pipeline ready.")

Loading TinyLlama-1.1B-Chat-v1.0...
Model loaded. Pipeline ready.


In [None]:
from langchain.docstore.document import Document

# --- Step 3: Simulate Data ---
# Note: The "Red Herring" (Doc 3) mentions both drugs but is scientifically irrelevant.
raw_texts = [
    "[Pharmacology] Zenthorax is a potent antifungal agent widely used in hospitals. Mechanistically, it acts as a strong CYP3A4 Inhibitor in the liver.",
    "[Pharmacokinetics] Vira-X is a statin prescribed for lowering cholesterol. It is a prodrug that is extensively metabolized by the CYP3A4 Enzyme to become active.",
    "[Red Herring] Market Analysis 2024: Both Zenthorax and Vira-X are leading FDA-approved oral medications found in most family pharmacies.",
    "[Safety] Common side effects of Zenthorax include nausea and dizziness.",
    "[Safety] Vira-X may cause muscle pain in rare cases."
]

docs = [Document(page_content=t) for t in raw_texts]
print(f"Created {len(docs)} Documents.")
for i, d in enumerate(docs):
    print(f"Doc {i+1}: {d.page_content}")

Created 5 Documents.
Doc 1: [Pharmacology] Zenthorax is a potent antifungal agent widely used in hospitals. Mechanistically, it acts as a strong CYP3A4 Inhibitor in the liver.
Doc 2: [Pharmacokinetics] Vira-X is a statin prescribed for lowering cholesterol. It is a prodrug that is extensively metabolized by the CYP3A4 Enzyme to become active.
Doc 3: [Red Herring] Market Analysis 2024: Both Zenthorax and Vira-X are leading FDA-approved oral medications found in most family pharmacies.
Doc 4: [Safety] Common side effects of Zenthorax include nausea and dizziness.
Doc 5: [Safety] Vira-X may cause muscle pain in rare cases.


In [None]:
# --- Step 4: Naive RAG (Full Retrieval) ---
from langchain_community.vectorstores import FAISS

print("\n--- NAIVE RAG (Full Context Failure) ---")
query = "Is there a specific interaction risk between Zenthorax and Vira-X?"
print(f"Query: {query}")

# 1. Indexing
vectorstore = FAISS.from_documents(docs, embeddings)

# 2. Retrieval
# We set k=5 to retrieve EVERYTHING. There is no "missing data" excuse here.
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
retrieved_docs = retriever.invoke(query)

print("\nRetrieved Context (k=5 - ALL DOCS):")
context_str = ""
for i, d in enumerate(retrieved_docs):
    print(f"{i+1}. {d.page_content}")
    context_str += d.page_content + "\n"

# 3. Generation
prompt = f"<|system|>\nAnswer the question based on the context. If no interaction is explicitly stated, say 'None'.\n<|user|>\nContext:\n{context_str}\nQuestion:\n{query}\n<|assistant|>"
response = llm.invoke(prompt)
cleaned_response = response.split("<|assistant|>")[-1].strip()

print("\nLLM Answer:")
print(cleaned_response)


--- NAIVE RAG (Full Context Failure) ---
Query: Is there a specific interaction risk between Zenthorax and Vira-X?

Retrieved Context (k=5 - ALL DOCS):
1. [Red Herring] Market Analysis 2024: Both Zenthorax and Vira-X are leading FDA-approved oral medications found in most family pharmacies.
2. [Pharmacology] Zenthorax is a potent antifungal agent widely used in hospitals. Mechanistically, it acts as a strong CYP3A4 Inhibitor in the liver.
3. [Pharmacokinetics] Vira-X is a statin prescribed for lowering cholesterol. It is a prodrug that is extensively metabolized by the CYP3A4 Enzyme to become active.
4. [Safety] Vira-X may cause muscle pain in rare cases.
5. [Safety] Common side effects of Zenthorax include nausea and dizziness.

LLM Answer:
Based on the context, both Zenthorax and Vira-X are FDA-approved oral medications found in most pharmacies. There is no specific interaction risk mentioned between them, although individual side effects like nausea (Zenthorax) and muscle pain (Vir

In [None]:
# --- Step 5: Biomedical NER & Extraction ---
# We use a specialized prompt to extract PHARMACOKINETIC relationships.

kg = nx.DiGraph()

def extract_bio_relations(text):
    """
    Extracts Drug -> Mechanism -> Target.
    """
    prompt = f"""<|system|>
    Extract the Drug and the biological interaction.
    Format: Drug | RELATION | Target
    Use relations: INHIBITS, METABOLIZED_BY, INDUCES, IS_A.
    <|user|>
    Text: {text}
    <|assistant|>"""
    
    raw = llm.invoke(prompt)
    out = raw.split("<|assistant|>")[-1].strip()
    if "|" in out:
        return [p.strip() for p in out.split("|")]
    return []

print("\n--- BIOMEDICAL RELATION EXTRACTION ---")

for doc in docs:
    print(f"\nParsing: {doc.page_content}")
    parts = extract_bio_relations(doc.page_content)
    
    if len(parts) >= 3:
        drug, rel, target = parts[0], parts[1], parts[2]
        
        # Clean strings
        drug = drug.strip()
        rel = rel.strip().upper()
        target = target.replace(" Enzyme", "").strip() # Normalize 'CYP3A4 Enzyme' to 'CYP3A4'
        
        print(f"   [Raw LLM]: {drug} | {rel} | {target}")
        kg.add_edge(drug, target, relation=rel)
        print(f"   [Action]: Edge ({drug}) -[{rel}]-> ({target})")


--- BIOMEDICAL RELATION EXTRACTION ---

Parsing: [Pharmacology] Zenthorax is a potent antifungal agent widely used in hospitals. Mechanistically, it acts as a strong CYP3A4 Inhibitor in the liver.
   [Raw LLM]: Zenthorax | INHIBITS | CYP3A4
   [Action]: Edge (Zenthorax) -[INHIBITS]-> (CYP3A4)

Parsing: [Pharmacokinetics] Vira-X is a statin prescribed for lowering cholesterol. It is a prodrug that is extensively metabolized by the CYP3A4 Enzyme to become active.
   [Raw LLM]: Vira-X | METABOLIZED_BY | CYP3A4
   [Action]: Edge (Vira-X) -[METABOLIZED_BY]-> (CYP3A4)

Parsing: [Red Herring] Market Analysis 2024: Both Zenthorax and Vira-X are leading FDA-approved oral medications found in most family pharmacies.
   [Raw LLM]: Zenthorax | IS_A | Oral Medication
   [Action]: Edge (Zenthorax) -[IS_A]-> (Oral Medication)
...


In [None]:
# --- Step 6: The Solution (Logic-Based Intersection) ---

print("\n--- INTERSECTION & DDI LOGIC CHECK ---")
print(f"Query: Risk between 'Zenthorax' and 'Vira-X'?")

drug_a = "Zenthorax"
drug_b = "Vira-X"

def check_ddi(a, b):
    if a not in kg or b not in kg:
        return "Drugs not found."
    
    # 1. Find Common Successors (Neighbors)
    targets_a = set(kg.successors(a))
    targets_b = set(kg.successors(b))
    intersection = targets_a.intersection(targets_b)
    
    print(f"\n1. Common Neighbors Analysis:")
    print(f"   Neighbors of {a}: {targets_a}")
    print(f"   Neighbors of {b}: {targets_b}")
    print(f"   INTERSECTION: {intersection}")
    
    print(f"\n2. Logic Check on Intersections:")
    risk_found = False
    explanation = ""
    
    for target in intersection:
        print(f"   Checking '{target}'...")
        rel_a = kg[a][target]['relation']
        rel_b = kg[b][target]['relation']
        
        print(f"     {a} {rel_a} {target}")
        print(f"     {b} {rel_b} {target}")
        
        # THE LOGIC RULE
        if (rel_a == "INHIBITS" and rel_b == "METABOLIZED_BY") or \
           (rel_a == "METABOLIZED_BY" and rel_b == "INHIBITS"):
            print("     -> Logic: DDI ALERT! Inhibitor + Substrate = TOXICITY.")
            risk_found = True
            explanation = f"CRITICAL WARNING: Interaction detected. {a} {rel_a} {target}, while {b} is {rel_b} {target}. This combination can lead to toxic levels of {b}."
        else:
            print("     -> Logic: Harmless similarity.")
            
    if not risk_found:
        return "No critical interaction found."
    return explanation

final_answer = check_ddi(drug_a, drug_b)

print(f"\nFinal Answer (Generated from Logic):\n{final_answer}")


--- INTERSECTION & DDI LOGIC CHECK ---
Query: Risk between 'Zenthorax' and 'Vira-X'?

1. Common Neighbors Analysis:
   Neighbors of Zenthorax: {'CYP3A4', 'Oral Medication'}
   Neighbors of Vira-X: {'CYP3A4', 'Oral Medication'}
   INTERSECTION: {'CYP3A4', 'Oral Medication'}

2. Logic Check on Intersections:
   Checking 'Oral Medication'...
     Zenthorax IS_A Oral Medication
     Vira-X IS_A Oral Medication
     -> Logic: Harmless similarity.

   Checking 'CYP3A4'...
     Zenthorax INHIBITS CYP3A4
     Vira-X METABOLIZED_BY CYP3A4
     -> Logic: DDI ALERT! Inhibitor + Substrate = TOXICITY.

Final Answer (Generated from Logic):
