# Retrieval-Augmented Generation (RAG), Foundation Model Selection, and Temperature Control

This notebook covers three major interview topics related to Large Language Models (LLMs):

1. **Retrieval-Augmented Generation (RAG)**
2. **Choosing the Right Foundation Model**
3. **Impact of the Temperature Parameter (with STAR example)**


## Retrieval-Augmented Generation (RAG)
**Concept:**
Retrieval-Augmented Generation (RAG) combines *retrieval* (finding relevant documents) with *generation* (LLM text creation).  
It helps LLMs stay accurate, current, and domain-aware without full retraining.
markdown
**Why RAG is Needed:**
- Prevents **hallucination** by grounding answers in real data.  
- Keeps knowledge **up-to-date** via document indexing instead of retraining.  
- Enables **domain adaptation** for specialized fields (medical, legal, etc.).  
- Improves **efficiency** by reducing the required model size.


In [10]:
import random

def mock_llm_response(prompt, temperature=0.3):
    """Simulates different outputs based on temperature."""
    deterministic_answer = "Air pollution leads to respiratory issues and heart disease."
    creative_variations = [
        "Air pollution silently steals our breath and burdens the heart.",
        "Dirty air harms lungs, hearts, and our shared environment.",
        "Polluted skies mean unhealthy lungs and heavy hearts."
    ]
    if temperature < 0.4:
        return deterministic_answer
    else:
        return random.choice(creative_variations)

knowledge_base = {
    "air pollution": "Air pollution causes respiratory issues and heart disease.",
    "climate change": "Climate change increases the frequency of severe weather events.",
}

def retrieve_docs(query):
    for key, text in knowledge_base.items():
        if key in query.lower():
            return text
    return "No relevant documents found."

def rag_generate(query, temperature=0.3):
    context = retrieve_docs(query)
    prompt = f"Context: {context}\n\nQuestion: {query}\nAnswer:"
    return mock_llm_response(prompt, temperature)

query = "What are the health impacts of air pollution?"
print(rag_generate(query))

Air pollution leads to respiratory issues and heart disease.


## Choosing the Right Foundation Model
Selecting the right foundation model depends on:
1. **Task Type** – generation, summarization, reasoning, etc.  
2. **Domain Needs** – medical, legal, coding, creative, etc.  
3. **Performance Metrics** – accuracy, latency, token cost, etc.  
4. **Fine-tuning or RAG options** – can it be customized?

| Application | Recommended Model | Reason |
|--------------|-------------------|--------|
| Summarization | GPT-4, Claude, or Llama 3 | Strong in natural text compression and contextual understanding |
| Code Generation | Codex, CodeLlama, StarCoder | Trained specifically on large-scale code corpora |
| Multimodal (text + image) | Gemini, GPT-4V | Integrates visual and textual reasoning |
| Domain-Specific (e.g., Legal, Medical) | Fine-tuned LLaMA or Mistral | Customizable for specialized knowledge |

In [11]:
def select_model(task):
    if task == "summarization":
        return "gpt-4o-mini"
    elif task == "code":
        return "gpt-4o-code"
    elif task == "multimodal":
        return "gpt-4o-vision"
    else:
        return "llama3-finetuned"

print("Recommended model:", select_model("summarization"))

Recommended model: gpt-4o-mini


## Impact of the Temperature Parameter
**Temperature** controls randomness in model responses:
- **Low temperature (0.1–0.3):** Focused, factual, and deterministic.
- **High temperature (0.7–1.0):** Creative, diverse, and exploratory.

Let's demonstrate this in code.

In [12]:
import random

prompt = "Write a short tagline for a disaster response aircraft."

def mock_temperature_response(prompt, temperature):
    low_temp_response = "Reliable rescue, ready for any storm."
    creative_variations = [
        "Wings of hope through every storm.",
        "Braving chaos to bring calm.",
        "When disaster strikes, we rise."
    ]
    if temperature < 0.4:
        return low_temp_response
    else:
        return random.choice(creative_variations)

for temp in [0.2, 0.6, 0.9]:
    print(f"\n--- Temperature {temp} ---")
    print(mock_temperature_response(prompt, temp))


--- Temperature 0.2 ---
Reliable rescue, ready for any storm.

--- Temperature 0.6 ---
Wings of hope through every storm.

--- Temperature 0.9 ---
Wings of hope through every storm.


### STAR Example (Situation–Task–Action–Result)
**Situation:**  
During a humanitarian operations project, LLM-generated mission summaries were inconsistent in tone and clarity.  

**Task:**  
We needed to produce reliable, factual mission reports.  

**Action:**  
Lowered temperature from 0.8 → 0.2 in the generation pipeline to reduce randomness.  

**Result:**  
Summaries became consistent, accurate, and professional — improving reporting efficiency by 35%.
# Summary
- **RAG**: Combines retrieval + generation for factual grounding.  
- **Model Choice**: Depends on task, domain, and compute trade-offs.  
- **Temperature**: Adjusts creativity and precision balance.