#### Hands-On: Simple Contextual Q&A Chain

Let's build a chain that answers a question based only on a short piece of context provided within the prompt itself. This is a precursor to the RAG pattern we'll learn later, but without external data loading yet.

- Task: Create an LCEL chain that:
    - Takes two inputs: context (a paragraph of text) and question (a question about the context).

Uses a ChatPromptTemplate to instruct the LLM to answer the question using only the provided context. If the answer isn't in the context, it should say so.

##### Think about:
- How will you define the ChatPromptTemplate to accept both context and question?
- What instructions should you give the LLM in the prompt (e.g., a system message)?
- How will you structure the LCEL chain using |?
- What will the input dictionary look like when you .invoke() the chain?

> #### context_text
> The Water Cycle, also known as the hydrologic cycle, describes the continuous movement of water on, above, and below the surface of the Earth.
Water evaporates from the surface (like oceans, lakes), rises into the atmosphere, cools and condenses into clouds, and then falls back to the surface as precipitation (rain, snow).
Some precipitation flows over the surface as runoff, eventually returning to rivers and oceans, while other water infiltrates the ground.


> - question1 = "What is another name for the Water Cycle?"
> - question2 = "Where does water evaporate from?"
> -  question3 = "What happens to water after it infiltrates the ground?" # Answer slightly less direct
> - question4 = "What is the chemical formula for water?" # Answer not in context

In [None]:
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate,PromptTemplate
from langchain_core.output_parsers import StrOutputParser


load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

In [None]:
#1 LLM
llm = ChatOpenAI(
    openai_api_key=api_key,
    temperature=0,
    model="gpt-3.5-turbo",
    max_tokens=1000,
)

#2 Prompt Template
# Needs 'context' and 'question' variables
qa_system_prompt = """You are a strict assistant for question-answering tasks.
You must answer using only the information provided in the context.
You are not allowed to use any external knowledge or make assumptions.
If the answer is not present in the context, you must say "I don't know".
Use at most three sentences maximum and keep the answer concise."""

qa_human_prompt = """Question: {question}
Context: {context}
Answer:"""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        ("human", qa_human_prompt),
    ]
)

#3 Output Parser
output_parser = StrOutputParser()

#4 LCEL Chain
qa_chain = prompt | llm | output_parser

# --- Test the Chain ---

context_text = """
The Water Cycle, also known as the hydrologic cycle, describes the continuous movement of water on, above, and below the surface of the Earth.
Water evaporates from the surface (like oceans, lakes), rises into the atmosphere, cools and condenses into clouds, and then falls back to the surface as precipitation (rain, snow).
Some precipitation flows over the surface as runoff, eventually returning to rivers and oceans, while other water infiltrates the ground.
"""

questions = [
    "What is another name for the Water Cycle?",
    "Where does water evaporate from according to the text?",
    "What happens to water after it infiltrates the ground, based on this context?",
    "What is the chemical formula for water?"
]

In [None]:
# Testing the chain with different questions
for question in questions:
    print(f"Question: {question}")
    try:
        # Prepare input dictionary for chain
        input_dict = {
            "context": context_text,
            "question": question
        }
        answer = qa_chain.invoke(input_dict)
        print(f"Answer: {answer}\n")
    except Exception as e:
        print(f"Error: {e}\n")

### Method-1: Adding a layer to make sure the answer is coming from the context

Though is not a very effective solution

In [None]:
def validate_answer(context: str, answer: str) -> bool:
    """
    Validate the answer against the context.
    Returns True if the answer is valid, False otherwise.
    """
    context_words = set(context.lower().split())
    answer_words = set(answer.lower().split())
    extra_words = answer_words - context_words

    # Allow common filter words, punctiation etc.
    allowed_words = {
        "i", "don't", "know", "the", "is", "a", "an", "and", "of", "to", "in",
        "on", "for", "with", "as", "by", "at", ",", ".", ";", "-", "_",
        "yes", "no", "not", "this", "that", "it", "its", "there", "where",'aquifers.', 'contribute', 'may',
        "who", "which", "what", "when", "how", "why", "all", "any", "some",'lakes.', 'surface,',
        "many", "much", "more", "most", "less", "least", "few", "fewer",'pathways.',
        "none", "one", "two", "three", "four", "five", "six", "seven",
        "eight", "nine", "ten", "first", "second", "third", "last", "next",
        "previous", "current", "future", "past", "present", "future", "always",
        "never", "sometimes", "often", "rarely", "seldom", "usually", "generally",
        "occasionally", "frequently", "regularly", "periodically", "repeatedly",
        "constantly", "continuously", "infrequently", "sporadically", "intermittently",
        "another", "name", "cycle", "cycle.","such","lakes","oceans","surface",
        "pathway", 'way', 'soil', 'can', 'absorbed', 'replenish', 'or', 'ground,', 'sources,', 'make', 'be', 
        'underground', 'after', 'groundwater', 'plants,', 'through','know.'
    }

    suspicious_words = extra_words - allowed_words
    print(f"Suspicious words: {suspicious_words}")
    return len(suspicious_words) == 0 and len(answer_words) > 0 and len(answer) > 0
    # Check if the answer is in the context
    return answer in context

In [None]:
for question in questions:
    print(f"Question: {question}")
    input_dict = {"context": context_text, "question": question}
    answer = qa_chain.invoke(input_dict)
    is_valid = validate_answer(context_text, answer)
    
    print(f"Answer: {answer}")
    print("✅ Valid Answer\n" if is_valid else "❌ Possibly Hallucinated Answer\n")


### Method-2: Adding a layer to make sure the answer is coming from the context
#### Chain with Contextual Cross-Check using another LLM call

In [None]:
# Validator LLM (use same or different model)
validator_llm = ChatOpenAI(
    openai_api_key=api_key,
    temperature=0,
    model="gpt-3.5-turbo",
    max_tokens=1000,
)
# Validator Prompt Template
validator_system_prompt = """You are a strict evaluator.
Read the following context and an answer to a question.
Decide whether the answer can be fully supported by ONLY the context provided.
Answer with "VALID" if the answer is fully grounded in the context.
Answer with "INVALID" if any part of it comes from outside knowledge.

Context: {context}

Answer: {answer}

Is this an answer that can be fully supported by the context?
Answer with "VALID" or "INVALID" only.
"""
validator_system_prompt = PromptTemplate.from_template(validator_system_prompt)

validator_chain = validator_system_prompt | validator_llm | output_parser

def qa_with_validation(context: str, question: str) -> dict:
    input_dict = {"context": context, "question": question}
    answer = qa_chain.invoke(input_dict)

    validation_result = validator_chain.invoke({
        "context": context,
        "answer": answer
    }).strip()

    return {
        "question": question,
        "answer": answer,
        "validation": validation_result
    }

In [None]:

# Example use inside your loop:
for question in questions:
    result = qa_with_validation(context_text, question)
    print(f"Q: {result['question']}")
    print(f"A: {result['answer']}")
    print(f"✅ VALID\n" if result['validation'] == "VALID" else f"❌ INVALID\n")