<a href="https://colab.research.google.com/github/KinzaaSheikh/buildables_artificial_intelligence/blob/main/buildables_project_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Part 1

## Give a recap of notebook 1

👉 In notebook 1 the focus was on some theory of what LLMs are, Agentic AI, and other background information. It was mostly about you getting a feel of the understanding we had of AI and how much ground do we need to cover.

The coding part was simply integrating API keys in the Colab environment and testing whether it is working.


## Give a recap of notebook 2

👉 In notebook 2 we really got our hands dirty with NLP tasks, task preprocessing, learning a little bit of AI workflow and got a glimplse on how chatbots can be created.

## Give a recap of notebook 3

👉 This week we were allowed to play around with prompting on our own.

# 📚 Chatbot Development Project


# 1) Use case — **Arabic grammar & vocabulary tutor for elder stroke survivors**

**Why I chose it (short):**
Elder stroke survivors often face language impairment and need gentle, repetitive, scaffolded practice to (re)learn grammar and vocabulary. A patient chatbot can give short lessons, graded exercises, immediate corrective feedback, and encouragement anytime — helping preserve purpose and providing cognitively appropriate practice.

**Target users & expected benefits**

* Primary: older adult Muslim stroke survivors who can read Arabic but need grammar/vocab practice (and caregivers supporting them).
* Benefits: short, repeatable lessons; adjustable difficulty; immediate corrective feedback; consistent encouragement; accessible pace (short messages, transliteration), 24/7 availability, low-cost scaling for clinics/centers.

**Limitations (what it cannot do)**

* Not a medical or speech-language pathology replacement (no clinical diagnosis or therapy plans).
* Will sometimes hallucinate details (dates/factual claims) — must not be used for medical/legal advice.
* Not a substitute for a trained SLP for severe aphasia; escalate to professionals when needed.
* Offline use requires local deployment or on-prem inference; by default this design uses Groq cloud inference.

---

# 2) Why worth solving with a chatbot — justification

* **Accessibility & repetition:** Tutors can offer short, repeatable drills tailored to cognitive load; websites/apps are often less conversational and less forgiving in pacing.
* **Personalization:** Chatbots can adapt level and style instantly (e.g., simpler sentences, transliteration, larger fonts).
* **Low-cost availability:** A single model instance can serve many learners; human tutors are limited/expensive.
* **Immediate feedback loop:** Users get corrections and examples right away — crucial for retention.

**Specific tasks the chatbot will perform**

* Explain one grammar point in a single short paragraph (and provide simple Arabic example + transliteration).
* Provide 1–2 short practice exercises and a model answer.
* Give short, constructive feedback on user answers.
* Track progress per session (short-term) and optionally across sessions (persistent profile).
* Offer to repeat/slow down or change difficulty.

---

# 3) Select the LLM (Groq) — research-based pick + reasons

**Which models Groq supports (high-level):** Groq hosts Meta Llama 3 variants (8B and 70B), Mixtral (mixtral-8x7b), Gemma (7B), and others. ([docs.typingmind.com][1])

**Important model facts used in selection**

* Groq production Llama 3 models include `llama-3.1-8b-instant` (very low latency, large context). ([GroqCloud][2])
* Groq emphasizes ultra-low inference latency (deterministic LPU) — good for responsive chat. ([Groq][3])
* Groq provides a Python SDK with OpenAI-compatible chat completions. Quickstart examples exist. ([GroqCloud][4])

**Final model decision & rationale**

* **Primary model:** `llama-3.1-8b-instant` — chosen for **speed, cost, and adequate instruction-following** for tutoring tasks. It supports very large context windows (useful for session transcripts and lesson content) and is a production model on Groq. ([GroqCloud][2])
* **Fallback/high-quality option (optional):** `llama-3.3-70b-versatile` — use when you want the highest possible accuracy for complex, detailed explanations (tradeoff: higher cost). ([GroqCloud][2])

**Why not the smallest gemma/mixtral by default?**
Mixtral/Gemma are strong but Llama 3 is instruction-tuned and tends to perform better for careful language explanations and alignment; for an empathetic tutor persona and multilingual needs Llama 3 is a safe, well-documented choice on Groq. ([docs.litellm.ai][5])

---

# 4) Conversation flow (design + examples)

**High-level policies**

* Keep replies short (2–4 sentences) unless the user asks for more.
* Use **English + Arabic script + simple transliteration** for examples.
* Always give *one* short example, *one* practice exercise, and a short rubric for feedback.
* Encourage and offer to repeat or slow down.

**Session memory**

* Short-term (in-memory): last N turns (e.g., last 6 user-assistant exchanges) to maintain context in a session.
* Optional persistent memory: user preferences (preferred difficulty, transliteration on/off) stored in DB (Redis/Postgres) — only if user opts in.

**Greeting and goodbye**

* Greeting: “Salam — I’m Noor, your Arabic tutor. What would you like to practice for 5–10 minutes today?” (Offer options: grammar/vocab/quiz.)
* Goodbye: “Great work today — would you like to save this short lesson? See you next time!”

**Unknown questions / fallback**

* If the assistant is unsure: short fallback — “I’m not sure about that. Would you like a simple example instead, or should I ask a specialist?”
* If user requests medical or clinical advice: refuse and recommend a qualified professional.

**Sample user inputs & expected assistant outputs**

1. User: “Explain the Arabic dual (المثنى) briefly.”
   Assistant: short rule in English, Arabic example + transliteration, 1 quick exercise (convert singular to dual), and a simple model answer.

2. User: “I wrote: كتابان — is that correct for two books?”
   Assistant: Yes/No correction, highlight pattern (ending with ـان in nominative), one short corrective hint.

3. User: “I want an easy quiz.”
   Assistant: 3 1-word prompts; user types answers; assistant grades and gives short encouragement.

---

# 5) Prompting technique — chosen method & **actual template**

**Technique(s) used**

* **System prompt + few-shot examples (few-shot prompting)** — gives consistent persona and output format.
* Use **structured output** (JSON-like or clearly delimited sections) so the UI can parse rule/example/exercise.
* Keep temperature low (0.2–0.4) for correctness and low hallucination.

**Why few-shot + system prompt?**

* For tutoring you want a predictable output format (rule + example + exercise + feedback). Few-shot examples teach that format clearly and robustly. System prompt sets persona (patient tutor), language style, pacing.

**Sample system prompt + two few-shot examples (copy/paste-friendly)**

```
SYSTEM:
You are Noor, a patient, empathetic Arabic grammar tutor for older adult learners.
Rules:
- Keep each reply short (max 4 sentences for explanation).
- Provide: [Rule], [Example (Arabic + transliteration)], [Exercise], [Model answer], [Tip].
- Use simple English to explain grammar; show Arabic script and transliteration for examples.
- Always offer an option: "Would you like to try an exercise?" at the end.

FEW-SHOT EXAMPLES:

USER: "Explain the Arabic dual (المثنى)."
ASSISTANT:
[Rule] The dual (المثنى) expresses exactly two items. In the nominative, add "انِ" or "ان" (ـانِ / ـان).
[Example] كتابان — kitābān (two books)
[Exercise] Convert "قلم" to the dual.
[Model answer] قلمان — qalamān
[Tip] Focus on the ending: ـان for two in a simple sentence.
Would you like to try an exercise?

USER: "How do you make a feminine plural for 'teacher'?"
ASSISTANT:
[Rule] To make feminine plural (for people) often add "ات" (ـات).
[Example] معلمة — muʿallimah (female teacher); معلمات — muʿallimāt (female teachers)
[Exercise] Convert "ممرضة" to plural.
[Model answer] ممرضات — mumarriḍāt
[Tip] Watch the short vowels when pronouncing.
Would you like another example?
```

Use this system + few-shot block as the top of `messages` in the API call, then append user messages and recent session history.

---

# 6) Minimal runnable implementation (FastAPI backend + tiny web UI)

Below is a minimal, production-directional example you can run locally. It demonstrates:

* Groq Python SDK usage (chat completions)
* Short-term in-memory session memory (for demo only — replace with Redis/Postgres in prod)
* Structured assistant output parsing

> **Prereqs:** Python 3.9+, `pip install fastapi uvicorn groq python-multipart`
> Set environment var: `export GROQ_API_KEY="YOUR_KEY_HERE"`

### `app.py` (FastAPI backend)

```python
# app.py
import os
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from groq import Groq

# Basic config
GROQ_API_KEY = os.environ.get("GROQ_API_KEY")
if not GROQ_API_KEY:
    raise RuntimeError("Set GROQ_API_KEY environment variable")

client = Groq(api_key=GROQ_API_KEY)
MODEL = "llama-3.1-8b-instant"   # chosen for speed / cost tradeoff

app = FastAPI(title="Noor - Arabic Tutor (Groq)")

# In-memory sessions (demo). Use Redis/Postgres for production.
sessions = {}  # session_id -> list of message dicts {"role":..., "content":...}

class ChatRequest(BaseModel):
    session_id: str
    message: str
    user_name: str | None = None

# Basic system prompt (keeps persona and format)
SYSTEM_PROMPT = (
    "You are Noor, a patient Arabic grammar tutor for older adult learners. "
    "Always keep answers short and friendly. "
    "Format responses as sections labeled [Rule], [Example], [Exercise], [Model answer], [Tip]. "
    "Provide Arabic script and simple transliteration for examples. "
    "End with: 'Would you like to try an exercise?'\n"
)

# Optional few-shot: include 1-2 short examples to lock format
FEW_SHOT = [
    {"role": "assistant", "content":
        "[Rule] The dual (المثنى) denotes two items.\n"
        "[Example] كتابان — kitābān (two books)\n"
        "[Exercise] Convert 'قلم' to dual.\n"
        "[Model answer] قلمان — qalamān\n"
        "[Tip] Look for the ending ـان for two.\n"
        "Would you like to try an exercise?"
    }
]

@app.post("/chat")
async def chat_endpoint(req: ChatRequest):
    # Build session history (keep small window)
    history = sessions.setdefault(req.session_id, [])
    # Build messages: system + few-shot + recent history + user
    messages = [{"role":"system", "content": SYSTEM_PROMPT}] + FEW_SHOT
    # include last 6 messages to keep tokens manageable
    messages += history[-6:]
    messages.append({"role":"user", "content": req.message})

    try:
        completion = client.chat.completions.create(
            model=MODEL,
            messages=messages,
            temperature=0.25,
            max_tokens=512,
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

    assistant_text = completion.choices[0].message.content

    # Simple fallback-on-uncertainty: if model hedges, offer to rephrase
    hedges = ["i don't know", "i'm not sure", "i could be wrong"]
    if any(h in assistant_text.lower() for h in hedges):
        assistant_text = (
            "Sorry, I don't have a confident short explanation for that. "
            "Would you like a simpler example instead, or to ask a human tutor?"
        )

    # Append user+assistant to session history
    history.append({"role":"user", "content": req.message})
    history.append({"role":"assistant", "content": assistant_text})

    return {"reply": assistant_text}
```

### Simple web UI (`index.html`)

```html
<!doctype html>
<html>
<head><meta charset="utf-8"><title>Noor — Arabic Tutor</title></head>
<body>
  <div style="max-width:600px;margin:20px auto;font-family:sans-serif">
    <h2>Noor — Arabic Tutor (demo)</h2>
    <div id="conv" style="border:1px solid #ddd;padding:12px;height:300px;overflow:auto"></div>
    <input id="msg" style="width:80%" placeholder="Type a short question (e.g., 'Explain the dual')" />
    <button id="send">Send</button>
  </div>

<script>
const sessionId = "demo-session-1"; // in prod create unique per user
document.getElementById("send").onclick = async () => {
  const msg = document.getElementById("msg").value.trim();
  if(!msg) return;
  addLine("You: " + msg);
  document.getElementById("msg").value = "";
  const res = await fetch("/chat", {
    method: "POST",
    headers: {"Content-Type":"application/json"},
    body: JSON.stringify({session_id: sessionId, message: msg})
  });
  const data = await res.json();
  addLine("Noor: " + data.reply);
};
function addLine(text){
  const el = document.getElementById("conv");
  el.innerHTML += "<div style='margin:8px 0'>"+text+"</div>";
  el.scrollTop = el.scrollHeight;
}
</script>
</body>
</html>
```

**Run it locally**

1. `export GROQ_API_KEY="..."`
2. `pip install fastapi uvicorn groq python-multipart`
3. `uvicorn app:app --reload --port 8000`
4. Open `http://localhost:8000/` (put `index.html` in same folder and serve it, or use a tiny static server).

---

# 7) Safety, evaluation & limitations in practice

* **Hallucinations:** Use low temperature and structured prompts; still verify critical facts externally.
* **Medical / clinical requests:** refuse and direct to professionals.
* **Privacy:** store minimal personal data; use encryption at rest; follow HIPAA-like rules if used for health (consult legal).
* **Monitoring:** log user ratings and sample transcripts (with consent) to fine-tune prompts and detect failures.

---

# 8) Deployment & scaling notes (brief)

* For production, **replace in-memory sessions with Redis** and persist user preferences in a DB.
* Use Groq's production models and autoscaling; Groq provides deterministic low latency LPUs for fast inference. ([Groq][3])
* Containerize with Docker, use a load balancer and short-lived worker pods; monitor costs (Groq pricing page). ([Groq][6])

---

# 9) Sources & helpful links (most important citations)

* Groq Models / production list (Llama 3 variants): Groq docs. ([GroqCloud][2])
* Groq quickstart & Python SDK examples (how to call chat completions): Groq Quickstart / SDK. ([GroqCloud][4])
* Groq supported upstream models list & examples (Llama 3, Mixtral, Gemma): third-party summaries & docs. ([docs.typingmind.com][1])
* Groq speed / LPU / latency claims (for justification): groq.com pages & API benchmarking notes. ([Groq][3])

---

If you want I can:

* Convert the UI into a **React + Tailwind** single-file component and put it in a canvas (previewable).
* Add **audio input/output** (speech-to-text + TTS), slower but very useful for elders.
* Add **persistence** example (Redis + Postgres) and a small Dockerfile / Kubernetes manifest.

Which of those would you like next? (Or I can immediately produce the React/Tailwind code and a Dockerfile — your call.)

[1]: https://docs.typingmind.com/manage-and-connect-ai-models/groq-api-%28llama-3-mixtral-8x7b-gemma-7b%29?utm_source=chatgpt.com "Groq API (LLaMA 3, Mixtral 8x7b, Gemma 7b)"
[2]: https://console.groq.com/docs/models?utm_source=chatgpt.com "Supported Models - GroqDocs"
[3]: https://groq.com/?utm_source=chatgpt.com "Groq is fast inference for AI builders"
[4]: https://console.groq.com/docs/quickstart?utm_source=chatgpt.com "Quickstart - GroqDocs"
[5]: https://docs.litellm.ai/docs/providers/groq?utm_source=chatgpt.com "Groq"
[6]: https://groq.com/pricing?utm_source=chatgpt.com "On-demand Pricing for Tokens-as-a-Service"
