# Day 3 – Workshop 5: In-Context Learning & Advanced Techniques

**Objective:** Explore *few-shot prompting*, *chain-of-thought*, and a minimal *retrieval-augmented generation (RAG)* workflow. We’ll also **compare** standard prompts (no RAG) vs. RAG-based prompts to see the difference.

---
## Table of Contents
1. [Introduction & Recap](#Intro)
2. [Few-Shot & Chain-of-Thought Prompting](#FewShot)
3. [Retrieval-Augmented Generation (RAG) Intro](#RAG)
4. [Mini RAG Lab](#MiniRAG)
5. [Comparison: With RAG vs. Without RAG](#Compare)
6. [Optional: Gradio Web Interface](#Gradio)
7. [Discussion & Next Steps](#Discussion)

---

<a id="Intro"></a>

## 1. Introduction & Recap
Building on **Workshop 4**’s prompt engineering:
- We now **strengthen** LLM responses by providing **few-shot** examples or a **chain-of-thought** approach.
- We also introduce **RAG**: retrieve context from an external store to feed the LLM domain-specific knowledge.
- Finally, we’ll compare standard prompting vs. RAG-based prompting to highlight differences.

**Requirements**:
- A moderately sized model loaded (e.g., GPT-Neo).  
- `sentence-transformers` for RAG (or an equivalent for embeddings).  
- *(Optional)* `gradio` for a web UI.


<a id="FewShot"></a>

## 2. Few-Shot & Chain-of-Thought Prompting
A **few-shot prompt** includes multiple examples. A **chain-of-thought** prompt explicitly instructs the model to outline reasoning. We'll assume you have a loaded model and tokenizer from Workshop 4 or similar. If not, include the code:
```python
device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = "google/flan-t5-base"  # or your choice
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
```

In [2]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import pipeline

device = "cuda" if torch.cuda.is_available() else "cpu"
gen_model_name = "google/flan-t5-base"
gen_pipeline = pipeline("text2text-generation", model=gen_model_name)

def generate_llm_text(prompt, max_length=100, temperature=0.7):
  """Generates text using the specified pipeline."""

  # Use the pipeline for text generation
  sequences = gen_pipeline(prompt, max_length=max_length, do_sample=True, temperature=temperature)

  # Extract and return the generated text
  if sequences:
    return sequences[0]['generated_text']
  else:
    return ""  # Handle cases where no text is generated


print("Model loaded successfully!")

Device set to use cuda:0


Model loaded successfully!


### Few-Shot Prompt Example
We supply multiple examples to guide the model.

In [3]:
few_shot_prompt = """
Q: How can companies reduce customer churn?
A: They can improve onboarding, offer loyalty benefits, and proactively address user feedback.

Q: How can manufacturers reduce production delays?
A:"""

response_fs = generate_llm_text(few_shot_prompt, max_length=120)
print("=== Few-Shot Prompt Output ===\n")
print(response_fs)

=== Few-Shot Prompt Output ===

The more products are completed, the longer they can keep the product or processes running smoothly.


### Chain-of-Thought Prompt Example
We explicitly ask the model to detail its reasoning steps.

In [4]:
cot_prompt = """
Question: If I have 8 apples and I give away 3, how many do I have left?
Let’s think this through step by step:
"""
chain_output = generate_llm_text(cot_prompt, max_length=80)
print("=== Chain-of-Thought Output ===\n")
print(chain_output)

=== Chain-of-Thought Output ===

I have 8 + 3 = 10 apples left. The answer: 10.


<a id="RAG"></a>

## 3. Retrieval-Augmented Generation (RAG) Intro
RAG merges **retrieval** of external data with an LLM, so the model has **domain-specific** context.
We’ll create a tiny in-memory example with `sentence-transformers` for embeddings.

In [5]:
# !pip install sentence-transformers
from sentence_transformers import SentenceTransformer, util

# Example mini corpus
faqs = [
    {"question": "How do I reset my password?", "answer": "Click 'Forgot Password' on the login screen."},
    {"question": "How can I contact support?", "answer": "Email us at support@example.com."},
    {"question": "Where do I view my account details?", "answer": "Log in, then go to the 'My Account' page."},
    {"question": "What is the refund policy?", "answer": "You can request a refund within 30 days."},
    {"question": "How do I change my email address?", "answer": "Update your email from the 'Profile Settings' section."},
    {"question": "How can I track my order?", "answer": "Use the tracking link in your confirmation email or check 'Order History'."},
    {"question": "How do I update my billing information?", "answer": "Visit 'Billing & Payment' in your account settings to update your card details."},
    {"question": "How do I unsubscribe from marketing emails?", "answer": "Click the 'Unsubscribe' link at the bottom of any promotional email."},
    {"question": "Where can I find the user guide?", "answer": "You’ll find the guide in the 'Help Centre' on our website."},
    {"question": "Why can’t I log in?", "answer": "Ensure you’re using the correct credentials or reset your password if needed."},
    {"question": "Where do I find my order history?", "answer": "Go to your account dashboard and select 'Order History'."},
    {"question": "What payment methods are accepted?", "answer": "We accept major credit cards, PayPal, and direct bank transfers."},
    {"question": "How do I change my profile picture?", "answer": "Upload a new image under 'Profile Settings' in your account."},
    {"question": "Can I pause my subscription?", "answer": "Yes, visit the 'Subscription' page and select 'Pause Subscription'."},
    {"question": "How do I enable two-factor authentication (2FA)?", "answer": "Navigate to 'Security Settings' and follow the steps to enable 2FA."},
    {"question": "What is the cancellation process for the service?", "answer": "You can cancel anytime from the 'Account Settings' page."},
    {"question": "How do I restore a deleted item?", "answer": "Check the 'Recently Deleted' folder or contact support if it’s not there."},
    {"question": "Which browsers are supported?", "answer": "We support Chrome, Firefox, Safari, and the latest version of Edge."},
    {"question": "Is phone support available?", "answer": "Yes, our helpline number is listed on the 'Contact Us' page."},
    {"question": "What is the scheduled downtime for maintenance?", "answer": "We post maintenance schedules in advance on the 'System Status' page."},
    {"question": "How do I create an account?", "answer": "Click 'Sign Up' on the homepage and fill in the required information."},
    {"question": "Where can I read about the latest features?", "answer": "Visit our 'Blog' or check the 'Release Notes' in your account."},
    {"question": "How do I update my mailing address?", "answer": "Go to 'Account Settings', then click 'Edit' under 'Mailing Address'."},
    {"question": "What do I do if I encounter error code 403?", "answer": "Clear your cache, check your login status, and contact support if the error persists."}
]
embed_model = SentenceTransformer('all-mpnet-base-v2')
faq_embeddings = []
for i, item in enumerate(faqs):
    emb = embed_model.encode(item["question"], convert_to_tensor=True)
    faq_embeddings.append({
        "index": i,
        "question": item["question"],
        "answer": item["answer"],
        "embedding": emb
    })

print("Loaded FAQ embeddings.")

Loaded FAQ embeddings.


<a id="MiniRAG"></a>

## 4. Mini RAG Lab
Retrieve the top matching FAQ and append to the LLM prompt.

In [6]:
def retrieve_faq_context(user_query, top_k=1):
    query_emb = embed_model.encode(user_query, convert_to_tensor=True)
    scores = []
    for entry in faq_embeddings:
        sim = util.pytorch_cos_sim(query_emb, entry["embedding"]).item()
        scores.append((sim, entry))

    # Sort descending by similarity
    scores = sorted(scores, key=lambda x: x[0], reverse=True)

    # Return the top_k FAQ entries
    top_matches = scores[:top_k]
    return [m[1] for m in top_matches]

def debug_faq_retrieval(user_query):
    query_emb = embed_model.encode(user_query, convert_to_tensor=True)
    all_scores = []

    for faq in faq_embeddings:
        sim = util.pytorch_cos_sim(query_emb, faq["embedding"]).item()
        all_scores.append((sim, faq["question"], faq["answer"]))

    # Sort by descending similarity
    all_scores = sorted(all_scores, key=lambda x: x[0], reverse=True)

    print(f"Query: {user_query}\n")
    for score, question, answer in all_scores:
        print(f"Similarity: {score:.4f} | Q: {question} | A: {answer}")

def rag_query(user_query):
    # retrieve
    top_ctx = retrieve_faq_context(user_query, top_k=1)[0]
    context_str = f"Q: {top_ctx['question']} A: {top_ctx['answer']}"

    prompt = f"""
You are a helpful assistant.
Answer this User's question: {user_query}.
Here is some information to help you:
{context_str}



Answer:
"""
    return generate_llm_text(prompt, max_length=100, temperature=0.2)

# Example usage
test_query = "How do I enable two-factor authentication (2FA)?"
print("=== RAG Answer ===\n")
print(rag_query(test_query))

#debug_faq_retrieval("How do I enable two-factor authentication (2FA)?")

=== RAG Answer ===

'Security Settings' is the section that contains the security settings for your device.


### Discussion
- Compare the output with or without RAG.
- In production, use a **vector DB** like Pinecone, FAISS, or Milvus with a larger text corpus.

<a id="Compare"></a>

## 5. Comparison: With RAG vs. Without RAG
Let’s explicitly show how an **LLM responds** to the same user query in two scenarios:
1. **No context**: Just the user query.
2. **RAG**: The same user query, but augmented with the relevant FAQ.

### Activity
Try queries that might benefit from domain-specific knowledge. Compare the difference in output.

In [9]:
user_query = "How do I enable two-factor authentication (2FA)?"

# 1) No context
no_context_answer = generate_llm_text(user_query)

# 2) RAG-based
rag_answer = rag_query(user_query)

print("=== No Context ===\n")
print(no_context_answer)
print("\n=== RAG-Based ===\n")
print(rag_answer)

=== No Context ===

Type the following into your web browser: f:config.flush:stdout.write:stdout.write

=== RAG-Based ===

Open the Settings menu and click on 'Security Settings'.


*Note:* The RAG-based approach typically references the retrieved FAQ, so you’re more likely to get an accurate, domain-specific answer.

<a id="Gradio"></a>

## 6. Optional: Gradio Web Interface
Try a web UI for **advanced** prompting or RAG queries. You can select your approach (chain-of-thought or RAG) in the callback function.


In [12]:
#!pip install gradio
import gradio as gr

def gradio_rag(query):
    return rag_query(query)

demo = gr.Interface(
    fn=gradio_rag,
    inputs="text",
    outputs="text",
    title="Workshop 5: Advanced Techniques (RAG)",
    description="Enter a query, we'll retrieve an FAQ and generate a domain-specific response."
)
# Uncomment to launch the UI:
demo.launch(debug=True)

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://9b943cd6b37f6351e7.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://9b943cd6b37f6351e7.gradio.live




<a id="Discussion"></a>

## 7. Discussion & Next Steps
In this workshop, you:
- Explored **few-shot** and **chain-of-thought** prompting in more depth.
- Saw how **RAG** can feed domain-specific context to an LLM.
- Compared **No-Context vs. RAG** side by side.
- Optionally integrated Gradio for an interactive interface.

**Next Steps**:
- Expand your RAG approach with a real vector DB.
- Try more examples for chain-of-thought.
- **Workshop 6**: apply these advanced techniques in a final **prototype**.

---
# End of Day 3 – Workshop 5 Notebook