# Answer-with-Evidence Pipeline

Core Tool Design Patterns (No Memory)

1. Deterministic Tool Pattern

Purpose: Fetch or compute facts
Properties:

No LLM

Predictable output

External or internal logic

Examples:

Search

Math

Parsing

Validation via rules

2. Interpretation Tool Pattern

Purpose: Convert raw data into meaning
Properties:

LLM-based

No external calls

Input constrained

Examples:

Summarize

Extract entities

Classify

Normalize text

3. Evaluation Tool Pattern

Purpose: Judge quality against criteria
Properties:

LLM-based

Structured output

Conservative bias

Examples:

Relevance check

Completeness check

Risk detection

4. Orchestration Pattern (Non-agentic)

Purpose: Decide what runs next
Properties:

Plain Python

No reasoning

Explicit control flow

5. Repair / Fallback Pattern

Purpose: Improve or recover from failure
Properties:

Conditional

Targeted

Limited retries

In [1]:
from google import genai
from google.genai import types

In [2]:
import json

In [3]:
import google.generativeai as genai1


All support for the `google.generativeai` package has ended. It will no longer be receiving 
updates or bug fixes. Please switch to the `google.genai` package as soon as possible.
See README for more details:

https://github.com/google-gemini/deprecated-generative-ai-python/blob/main/README.md

  import google.generativeai as genai1


In [4]:
model = genai1.GenerativeModel('gemini-2.5-flash')

In [5]:
import os

key = os.getenv("GOOGLE_API_KEY")
print(f"Current Key in memory: {key[:5]}...")

Current Key in memory: AIzaS...


Deterministic: Search

In [6]:
from serpapi import GoogleSearch

In [7]:
import os

In [8]:
from dotenv import load_dotenv

load_dotenv(override=True)

True

In [9]:
import re

def slugify(text: str) -> str:
    return re.sub(r"[^a-z0-9]+", "_", text.lower()).strip("_")


def normalize_serpapi_results(results: list[dict]) -> list[dict]:
    """
    Pattern: Deterministic normalization of SerpAPI search results.

    Produces source descriptors ONLY.
    Does NOT fetch pages.
    Does NOT create segments.
    """

    normalized = []

    for r in results:
        title = r.get("title", "").strip()
        date = r.get("date", "").strip()

        source_id = slugify(f"{title}_{date}") if title else None

        normalized.append({
            "source_id": source_id,
            "title": title,
            "url": r.get("link"),
            "published_date": date,
            "provider": "serpapi"
        })

    return normalized


In [10]:
from ddgs import DDGS
from datetime import datetime

def get_search_results(query: str):
    """
    Searches for the latest information using DuckDuckGo (Free/No Key).
    """
    final_results = []
    # 'd' mimics SerpAPI's 'qdr:d' (last 24 hours)
    num_results = 5 

    try:
        with DDGS() as ddgs:
            # Using text search for general queries
            results = ddgs.text(
                query, 
                region='us-en', 
                safesearch='moderate', 
                timelimit='d', 
                max_results=num_results
            )
            blocked_domains = {"bing.com", "doubleclick.net"}

            for r in results:
                # Generate a unique source_id (domain_year)
                domain = r['href'].split('//')[-1].split('/')[0].replace('www.', '')
                source_id = f"{domain.split('.')[0]}_{datetime.now().year}"
                # print(domain)
                if domain in blocked_domains:
                    continue
                
                # Normalize to your requested output format
                final_results.append({
                    "source_id": source_id,
                    "title": r['title'],
                    "url": r['href'],
                    "published_date": "Recent (Last 24h)",
                    "retrieved_from": "duckduckgo"
                })
                
    except Exception as e:
        # Return empty list to avoid breaking the LLM tool-calling chain
        return [{"error": f"DuckDuckGo search failed: {str(e)}"}]

    
    return final_results

In [11]:
response_from_google_search=get_search_results("latest developments in generative AI")

In [12]:
response_from_google_search

[{'source_id': 'en_2025',
  'title': 'Generative artificial intelligence - Wikipedia',
  'url': 'https://en.wikipedia.org/wiki/Generative_artificial_intelligence',
  'published_date': 'Recent (Last 24h)',
  'retrieved_from': 'duckduckgo'},
 {'source_id': 'linkedin_2025',
  'title': 'New Innovations and Developments in Generative AI and LLMs - 2025',
  'url': 'https://www.linkedin.com/pulse/new-innovations-developments-generative-ai-llms-2025-vishal-goyal-au0vf',
  'published_date': 'Recent (Last 24h)',
  'retrieved_from': 'duckduckgo'},
 {'source_id': 'news_2025',
  'title': 'Google News - India outperforms world in generative AI use at work...',
  'url': 'https://news.google.com/stories/CAAqNggKIjBDQklTSGpvSmMzUnZjbmt0TXpZd1NoRUtEd2pzbXFDWkVCRVk4aUQ4MkdqZDJDZ0FQAQ?hl=en-IN&gl=IN&ceid=IN:en',
  'published_date': 'Recent (Last 24h)',
  'retrieved_from': 'duckduckgo'},
 {'source_id': 'ibm_2025',
  'title': 'What is GPT ( generative pre-trained transformer)? | IBM',
  'url': 'https://www.

Deterministic: Fetch + Parse

What “Complete Content” Actually Means
Included (Deterministically)

    Main article body

    Headings

    Paragraphs

    Lists (optional but useful)

    Tables (optional, often flattened)

Excluded (Deterministically)

    Navigation bars

    Footers

    Ads

    Cookie banners

    Sidebars

    Scripts / styles

    Related links sections (usually)

This is not subjective — it is rule-based boilerplate removal.

In [13]:
import requests


def fetch_html(url: str) -> str | None:
    headers = {
        "User-Agent": (
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/120.0.0.0 Safari/537.36"
        ),
        "Accept-Language": "en-US,en;q=0.9",
        "Accept": "text/html,application/xhtml+xml"
    }

    try:
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()
        return response.text
    except requests.HTTPError as e:
        return None
    except requests.RequestException:
        # Network / timeout / DNS errors
        return None


In [14]:
from bs4 import BeautifulSoup

def clean_html(html: str) -> BeautifulSoup:

    soup = BeautifulSoup(html, "html.parser")

    # Remove non-content tags deterministically
    for tag in soup(["script", "style", "noscript", "iframe"]):
        tag.decompose()

    return soup


In [15]:
def extract_main_container(soup: BeautifulSoup):
    # Preferred tags in order
    candidates = [
        soup.find("article"),
        soup.find("main"),
        soup.find("div", {"id": "content"}),
        soup.find("div", {"class": "content"})
    ]

    for c in candidates:
        if c:
            return c

    # Fallback: body
    return soup.body


In [16]:
def segment_content(main_content)-> list[dict]:
    segments = []
    counter = 1
    for el in main_content.find_all(["h1", "h2", "h3", "h4", "h5", "h6", "p"], recursive=True):
        text = el.get_text(" ",strip=True)

        if not text:
            continue

        seg_type = "heading" if el.name.startswith("h") else "paragraph"

        segments.append({
            "segment_id": f"s{counter}",
            "type": seg_type,
            "text": text
        })

        counter += 1

    return segments


In [17]:
def fetch_and_parse_sources(sources: list[dict]) -> list[dict]:
    parsed_sources = []

    for source in sources:
        html = fetch_html(source["url"])
        if html is None:
            parsed_sources.append({
                "source_id": source["source_id"],
                "title": source["title"],
                "url": source["url"],
                "segments": [],
                "fetch_error": "fetch_failed"
            })
            continue
        clean_html_soup = clean_html(html)
        main_content = extract_main_container(clean_html_soup)
        # print(main_content)
        segments = segment_content(main_content)

        parsed_sources.append({
            "source_id": source["source_id"],
            "title": source["title"],
            "url": source["url"],
            "segments": segments
        })

    return parsed_sources


In [18]:
fetch_and_parse_sources(response_from_google_search)

[{'source_id': 'en_2025',
  'title': 'Generative artificial intelligence - Wikipedia',
  'url': 'https://en.wikipedia.org/wiki/Generative_artificial_intelligence',
  'segments': [{'segment_id': 's1',
    'type': 'heading',
    'text': 'Generative artificial intelligence'},
   {'segment_id': 's2',
    'type': 'paragraph',
    'text': 'Generative artificial intelligence ( Generative AI , or GenAI ) is a subfield of artificial intelligence that uses generative models to generate text, images , videos , audio , software code or other forms of data. These models learn the underlying patterns and structures of their training data and use them to produce new data [ 1 ] in response to input, which often comes in the form of natural language prompts . [ 2 ] [ 3 ]'},
   {'segment_id': 's3',
    'type': 'paragraph',
    'text': 'The prevalence of generative AI tools has increased significantly since the AI boom in the 2020s. This boom was made possible by improvements in deep neural networks , pa

Evidence Extraction (LLM)

In [19]:
def extract_text(response):
    if not response.candidates:
        return ""

    content = response.candidates[0].content
    if not content.parts:
        return ""

    return content.parts[0].text or ""


In [25]:
def evidence_extractor(query: str, evidence: list[dict]) -> dict:
    """
    Extracts evidence from the raw text returned from the google search.
    Uses Gemini as an fetch and parser tool.
    """

    Parser_System_Prompt='''
        You are an evidence extraction tool.

        Your task is to extract ONLY verbatim text spans from the provided source segments
        that directly answer the user's question.

        STRICT RULES (do not violate):

        1. You MUST copy text exactly as it appears in the source segments.
        2. You MUST reference the exact segment_id for every extracted quote.
        3. You MUST NOT paraphrase, summarize, or infer.
        4. You MUST NOT combine information from multiple segments into one quote.
        5. If the sources do NOT contain information that answers the question,
        return an empty evidence list.
        6. Do NOT use prior knowledge.
        7. Do NOT explain or add commentary.

        Output ONLY valid JSON matching the schema.
        Your response MUST be a JSON object with EXACTLY these top-level keys:
            - candidate_evidence
            - reasoning

        Do NOT include any other keys.
        '''

    user_payload = {
        "query": query,
        "evidence": evidence,
        "response_schema": {
            "selected_evidence": [
                {
                    "source_id": "string",
                    "segment_id": "string",
                    "quote": "string"
                }
                ],
            "reasoning": "string"
        }
    }

    response = model.generate_content(
    [
        {
            "role": "user",
            "parts": [
                Parser_System_Prompt,
                json.dumps(user_payload)
            ]
        }
    ],
    generation_config={"temperature": 0.0, "response_mime_type": "application/json"}
    )


    raw_text = extract_text(response)
    print("RAW MODEL OUTPUT >>>")
    print(repr(raw_text))
    print("<<< END RAW OUTPUT")

    if not raw_text.strip():
        return {
            "candidate_evidence": [],
            "reasoning": "No extractable evidence found in provided sources."
        }

    return json.loads(raw_text)


STEP 2 — Evidence Selector Tool (Gemini)

Design Pattern Being Learned

LLM-as-Interpreter (Filter Pattern)

In [35]:
def select_evidence(question: str, evidence: list[dict]) -> dict:
    """
    Filters evidence to only what is directly relevant to the question.
    Uses Gemini as an interpretation (selection) tool.
    """
   

    system_prompt = (
        "You are an evidence selector.\n"
        "Your task is to select ONLY the evidence entries that directly help "
        "answer the user question.\n\n"
        "Rules:\n"
        "- A quote is relevant ONLY if it explicitly answers the question.\n"
        "- Background, definitions, or tangential mentions MUST be excluded.\n"
        "- If none are relevant, return an empty list.\n"
        "- Respond ONLY in valid JSON matching the schema."
        "Return ONLY valid JSON. Do not include markdown, explanations, or text outside JSON."
    )

    user_payload = {
        "question": question,
        "candidate_evidence": evidence,
        "response_schema": {
            "selected_evidence": [
                {
                    "source_id": "string",
                    "segment_id": "string",
                    "quote": "string"
                }
            ],
            "reasoning": "string"
        }
    }

    response = model.generate_content(
        [
            {
                "role": "user",
                "parts": [
                    system_prompt,
                    json.dumps(user_payload)
                ]
            }
        ],
        generation_config={"temperature": 0.0, "response_mime_type": "application/json"}
    )


    

    raw_text = extract_text(response)

    if not raw_text.strip():
        return {
            "selected_evidence": [],
            "reasoning": "No extractable evidence found in provided sources."
        }

    return json.loads(raw_text)

3 Interpretation: Draft Answer

Convert raw facts into structured meaning.

Rules:

LLM-based

No external calls

Input tightly constrained

Output schema enforced

In [None]:
def draft_answer(question: str, selected_evidence: list[dict]) -> str:
    system_prompt = (
        "You are an answer drafting tool.\n"
        "Your task is to write a concise answer to the user's question "
        "using ONLY the provided evidence quotes.\n\n"
        "STRICT RULES:\n"
        "- Use only information explicitly stated in the evidence.\n"
        "- Do NOT add, infer, or assume any information.\n"
        "- If the evidence is insufficient, say so clearly.\n"
        "- Do NOT mention sources or segment IDs.\n"
        "- Output plain text only."
    )

    user_payload = {
        "question": question,
        "selected_evidence": selected_evidence
    }
    response = model.generate_content(
        [
            {
                "role": "user",
                "parts": [
                    system_prompt,
                    json.dumps(user_payload)
                ]
            }
        ],
        generation_config={"temperature": 0.0}
    )

    

    return response.text.strip()


Canonical Orchestration Flow

In [29]:
def answer_with_evidence(question: str) -> dict:
    # 1. Deterministic search
    sources = get_search_results(question)

    # 2. Deterministic fetch + parse + segmentation
    parsed_sources = fetch_and_parse_sources(sources)

    # 3. Evidence extraction (LLM)
    extracted_evidence = evidence_extractor(
        query=question,
        evidence=parsed_sources
    )

    # 4. Evidence selection (LLM)
    selected = select_evidence(
        question=question,
        evidence=extracted_evidence["candidate_evidence"]
    )

    # 5. Draft answer (LLM)
    answer_text = draft_answer(
        question=question,
        selected_evidence=selected["selected_evidence"]
    )

    return {
        "question": question,
        "answer": answer_text,
        "evidence": selected["selected_evidence"]
    }


In [36]:
answer_with_evidence("latest developments in generative AI")

RAW MODEL OUTPUT >>>
'{\n "candidate_evidence": [\n  {\n   "source_id": "linkedin_2025",\n   "segment_id": "s3",\n   "quote": "The Generative AI and Large Language Model landscape has undergone transformative changes in 2025, marked by breakthrough innovations in coding agents, revolutionary agentic patterns, and explosive enterprise adoption."\n  },\n  {\n   "source_id": "en_2025",\n   "segment_id": "s19",\n   "quote": "By mid 2025, despite continued consumer growth, many companies were increasingly abandoning generative AI pilot projects as they had difficulties with integration, data quality and unmet returns, leading analysts at Gartner and The Economist to characterize the period as entering the Gartner hype cycle \'s \\"trough of disillusionment\\" phase."\n  },\n  {\n   "source_id": "en_2025",\n   "segment_id": "s64",\n   "quote": "In January 2025, the United States Copyright Office (USCO) released extensive guidance regarding the use of AI tools in the creative process, and est

InvalidArgument: 400 Please use a valid role: user, model.