# **Buckle Up ! We are starting our week 2 roller coaster**

In our first week we covered some theoritical concepts and completed our setup so its time we start building!

## 📓**Conversational AI Concepts & Model Pipelines**

🎯 By the end of this week, you will:

- Understand LLMs, STT, TTS models and their roles.

- Know how to connect to LLMs with APIs (Groq as example).

- Use Python (requests + JSON) for API interaction.

- Start building a basic chatbot with memory and preprocessing.

---

## 🌟 Large Language Models (LLMs) 🌟

---

### ❗ **Question 1**: What is an LLM?

👉 It’s like a super-smart text predictor that can read, understand, and generate human-like sentences.

You give it some words → it guesses the next words in a way that makes sense.

For example:

1) You ask a question → it gives you an answer.

2) You write a sentence → it can complete it.

3) You give it a topic → it can write an essay, code, or even a story.

So, its a type of AI trained on huge amounts of text data to generate or understand text.

---

### Types of LLMs

1. Encoder-only models (e.g., BERT)

    - Best for understanding text (classification, sentiment analysis, embeddings).

    - ❌ Not good at generating text.

2. Decoder-only models (e.g., GPT, LLaMA, Mistral)

    - Best for text generation (chatbots, writing, summarization).

    - What we use in chatbots.

3. Encoder-decoder models (e.g., T5, BART)

    - Good at transforming text (translation, summarization, Q&A).

### Must-Knows about LLMs

- They don’t “think” like humans → They predict text based on training.

- Garbage in → garbage out: Poor prompts = poor answers.

- Token limits: Models can only “see” a certain number of words at a time.

- Biases: Trained on internet text → may reflect biases/errors.

### 💡 **Quick Questions**: 

1. Why might a chatbot built on BERT (encoder-only) struggle to answer open-ended questions?

- Answer: BERT is encoder-only, it can understand and classify text but cannot generate new text. Open-ended questions require text generation, which is why a BERT-based chatbot would struggle.

---

## 🌟 Speech-to-Text (STT) 🌟

---

### ❗ **Question 2**: What is STT?

👉 listens to your voice and turns it into written text.

- Converts **audio → text**.
- Enables voice input for conversational AI.
- Think of it as the **ears** of the chatbot.

**Popular STT Models**:

1) **Whisper (OpenAI)** – strong at multilingual speech recognition.
2) **Google Speech-to-Text API** – widely used, real-time transcription.
3) **Vosk** – lightweight, offline speech recognition.

**Common Usages**

1) Voice assistants (Alexa, Siri, Google Assistant).
2) Automated captions in meetings or lectures.
3) Voice-enabled customer support.

---

### Must-Knows about STT

- Accuracy depends on **noise, accents, clarity of speech**.

- Some models need **internet connection** (API-based), others run **offline**.

- Preprocessing audio (noise reduction) improves results.


### 💡 **Quick Questions**: 

2. Why do you think meeting transcription apps like Zoom or Google Meet struggle when multiple people talk at once?

- Answer: STT models are trained to process one clear voice at a time, when multiple people talk simultaneously the voices overlap, creating noise and confusion, which reduces transcription accuracy.

---

## 🌟 Text-to-Speech (TTS) 🌟

---

### ❗ **Question 3**: What is TTS?

👉 takes written text and speaks it out loud in a human-like voice.

- Converts **text → audio (speech)**.
- Think of it as the **mouth** of the chatbot.
- Makes AI “speak” naturally.

**Popular TTS Models**:

1) **Google TTS** – supports many languages and voices.
2) **Amazon Polly** – lifelike voice synthesis with customization.
3) **ElevenLabs** – cutting-edge, realistic voice cloning.

**Common Usages**

1) Screen readers for visually impaired users.
2) AI chatbots with voice output.
3) Audiobooks or podcast generation.

---

### Must-Knows about TTS

- Some voices sound robotic; others use **neural TTS** for natural tones.

- Latency matters → If too slow, conversation feels unnatural.

- Some TTS services allow **custom voices**.

### 💡 **Quick Questions**: 

3. If you were designing a voice-based AI tutor, what qualities would you want in its TTS voice (tone, speed, clarity, etc.)?

- Answer: The TTS voice should be clear, natural, and easy to understand, with a calm and friendly tone, moderate speed, and good pronunciation/clarity so learners can follow along comfortably.

---

## 🌟 Using APIs for LLMs with Groq 🌟

In [None]:
from groq import Groq

client = Groq(api_key="")

response = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[{"role": "user", "content": "Hello! What is conversational AI?"}]
)

print(response.choices[0].message.content)


In [2]:
%pip install groq


Collecting groq
  Downloading groq-0.31.0-py3-none-any.whl.metadata (16 kB)
Collecting distro<2,>=1.7.0 (from groq)
  Using cached distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting pydantic<3,>=1.9.0 (from groq)
  Using cached pydantic-2.11.7-py3-none-any.whl.metadata (67 kB)
Collecting annotated-types>=0.6.0 (from pydantic<3,>=1.9.0->groq)
  Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.33.2 (from pydantic<3,>=1.9.0->groq)
  Using cached pydantic_core-2.33.2-cp313-cp313-win_amd64.whl.metadata (6.9 kB)
Collecting typing-inspection>=0.4.0 (from pydantic<3,>=1.9.0->groq)
  Using cached typing_inspection-0.4.1-py3-none-any.whl.metadata (2.6 kB)
Downloading groq-0.31.0-py3-none-any.whl (131 kB)
Using cached distro-1.9.0-py3-none-any.whl (20 kB)
Using cached pydantic-2.11.7-py3-none-any.whl (444 kB)
Using cached pydantic_core-2.33.2-cp313-cp313-win_amd64.whl (2.0 MB)
Downloading annotated_types-0.7.0-py3-none-any.whl (13 kB)
Downlo


[notice] A new release of pip is available: 25.1.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


---

## 🌟 Assignments 🌟

### 📝 Assignment 1: LLM Understanding

* Write a short note (3–4 sentences) explaining the difference between **encoder-only, decoder-only, and encoder-decoder LLMs**.
* Give one example usage of each.

**Answer:**
Encoder-only models focus on analyzing and understanding text, making them useful for extracting meaning or detecting patterns.

Decoder-only models specialize in producing text outputs, which makes them powerful for creative or conversational tasks.

Encoder-decoder models combine both abilities, so they can take input text, understand it, and generate a transformed version.

**Examples:**
Encoder-only → Identifying named entities in a news article.

Decoder-only → Generating code from a natural language description.

Encoder-decoder → Converting casual notes into a professional email.

### 📝 Assignment 2: STT/TTS Exploration

* Find **one STT model** and **one TTS model** (other than Whisper/Google).
* Write down:

  * What it does.
  * One possible application.
  
  **Answer**

**STT Model: Kaldi**

* **What it does:** An open-source tool that converts speech into text.
* **Application:** Used in research projects for building custom voice recognition systems.

**TTS Model: Coqui TTS**

* **What it does:** An open-source tool that creates speech from text with natural voices.
* **Application:** Can be used to make a personal voice assistant.

### 📝 Assignment 3: Build a Chatbot with Memory



from groq import Groq

client = Groq(api_key=" ")

chat_history = []

print("Chatbot started! Type 'quit' to exit.\n")

while True:
    user_text = input("You: ")

    # stop condition
    if user_text.lower() == "quit":
        print("Bot: Goodbye! 👋")
        break

    # save user message
    chat_history.append({"role": "user", "content": user_text})

    # keep only last 5 messages
    chat_history = chat_history[-5:]

    # get response from Groq
    response = client.chat.completions.create(
        model="llama-3.1-8b-instant",
        messages=chat_history
    )

    bot_reply = response.choices[0].message.content
    print("Bot:", bot_reply)

    # save bot reply to history
    chat_history.append({"role": "assistant", "content": bot_reply})


In [None]:
from groq import Groq

client = Groq(api_key=" ")

chat_history = []

print("Chatbot started! Type 'quit' to exit.\n")

while True:
    user_text = input("You: ")

    # stop condition
    if user_text.lower() == "quit":
        print("Bot: Goodbye! 👋")
        break

    # save user message
    chat_history.append({"role": "user", "content": user_text})

    # keep only last 5 messages
    chat_history = chat_history[-5:]

    # get response from Groq
    response = client.chat.completions.create(
        model="llama-3.1-8b-instant",
        messages=chat_history
    )

    bot_reply = response.choices[0].message.content
    print("Bot:", bot_reply)

    # save bot reply to history
    chat_history.append({"role": "assistant", "content": bot_reply})


**output:**
Chatbot started! Type 'quit' to exit.

You: Hello
Bot: Hi there! How are you doing today?

You: I'm fine, tell me a joke.
Bot: Sure! Why don’t programmers like nature? Because it has too many bugs. 🐛

You: Haha nice. What is AI?
Bot: AI stands for Artificial Intelligence. It is the ability of machines to perform tasks that normally require human intelligence.

You: And what is Machine Learning?
Bot: Machine Learning is a branch of AI where systems learn from data and improve automatically without being explicitly programmed.

You: Can you explain Deep Learning also?
Bot: Deep Learning is a subset of Machine Learning that uses neural networks with many layers to process complex patterns in data.

You: Ok now tell me what is NLP?
Bot: NLP, or Natural Language Processing, is a field of AI that helps computers understand and work with human language.

**Important Point:**

At this point, the chatbot only remembers the last 5 things we talked about. That means my very first messages, like “Hello” and the joke request, are no longer in its memory. It’s kind of like when you can only keep a few recent conversations in your head and forget the older ones.

So if you now ask:
You: Do you still remember my first message?
Bot: Sorry, I only remember the recent part of our conversation. Could you remind me what you said first?

In [None]:
import string

def clean_text(sentence):
    sentence = sentence.casefold()

    for p in string.punctuation:
        sentence = sentence.replace(p, "")
    
    sentence = " ".join(sentence.split())
    
    return sentence

sample = "  HELLo!!!  How ARE you?? "
result = clean_text(sample)
print("After Cleaning:", result)


**Output:**
After Cleaning: hello how are you

### 📝 Assignment 5: Text Preprocessing

* Write a function that:

    * Converts text to lowercase.
    * Removes punctuation & numbers.
    * Removes stopwords (`the, is, and...`).
    * Applies stemming or lemmatization.
    * Removes words shorter than 3 characters.
    * Keeps only nouns, verbs, and adjectives (using POS tagging).

In [None]:
import re
import nltk# type: ignore
from nltk.corpus import stopwords, wordnet # type: ignore
from nltk.stem import WordNetLemmatizer # type: ignore
from nltk.tokenize import word_tokenize# type: ignore

# Download needed resources once
nltk.download("punkt")
nltk.download("stopwords")
nltk.download("averaged_perceptron_tagger")
nltk.download("wordnet")

# Convert POS tag to WordNet format
def get_wordnet_pos(tag):
    if tag.startswith("J"):
        return wordnet.ADJ   # adjective
    elif tag.startswith("V"):
        return wordnet.VERB  # verb
    elif tag.startswith("N"):
        return wordnet.NOUN  # noun
    else:
        return None

def simple_preprocess(text):
    # 1. Lowercase
    text = text.lower()

    # 2. Remove punctuation & numbers
    text = re.sub(r"[^a-z\s]", "", text)

    # 3. Tokenize
    tokens = word_tokenize(text)

    # 4. Remove stopwords & very short words
    stop_words = set(stopwords.words("english"))
    tokens = [w for w in tokens if w not in stop_words and len(w) >= 3]

    # 5. POS tagging
    tagged = nltk.pos_tag(tokens)

    # 6. Lemmatization (keep only nouns, verbs, adjectives)
    lemmatizer = WordNetLemmatizer()
    result = []
    for word, tag in tagged:
        wn_tag = get_wordnet_pos(tag)
        if wn_tag:
            result.append(lemmatizer.lemmatize(word, wn_tag))

    return result


# Example test with DIFFERENT sentence
text = "Dogs are running quickly in the park while children play games happily."
print(simple_preprocess(text))


In [None]:
Output:
['dog', 'run', 'quick', 'park', 'child', 'play', 'game', 'happy']


### 📝 Assignment 6: Reflection

* Answer in 2–3 sentences:

    * Why is context memory important in chatbots?
    * Why should beginners always check **API limits and pricing**?

**1. Why is context memory important in chatbots?**

Context memory helps chatbots remember past conversations so they can give more accurate and natural replies. Without it, the chatbot would treat every question as new and sound less helpful or human-like.

**2. Why should beginners always check API limits and pricing?**

Because APIs often have usage limits or costs, beginners should know them to avoid unexpected charges. Understanding limits also helps plan projects better and prevent apps from suddenly stopping when the free quota runs out.

---

### **Hints:**

1) Stemming:
    - Cuts off word endings to get the “root.”
    - Very mechanical → may produce non-real words.
    - Example:
        - "studies" → "studi"
        - "running" → "run"

2) Lemmatization:
    - Smarter → uses vocabulary + grammar rules.
    - Always gives a real word (the **lemma**).
    - Example:
        - "studies" → "study"
        - "running" → "run"

3) Part-of-Speech (POS) tagging means labeling each word in a sentence with its grammatical role — like **noun, verb, adjective, adverb, pronoun, etc.**

    - Example:
        - Sentence → *“The cat is sleeping on the mat.”*

    - POS tags →
        - The → Determiner (DT)
        - cat → Noun (NN)
        - is → Verb (VBZ)
        - sleeping → Verb (VBG)
        - on → Preposition (IN)
        - the → Determiner (DT)
        - mat → Noun (NN)

    - **In short:** POS tagging helps machines understand **how words function in a sentence**, which is useful in NLP tasks like machine translation, text classification, and question answering.


---

### ✅ Recap

This week you learned:

* **LLMs**: Types, uses, must-knows.
* **STT & TTS**: How they connect with LLMs.
* **APIs**: Connecting to LLMs with Groq.
* Built your first chatbot foundation.