# **Buckle Up ! We are starting our week 2 roller coaster**

In our first week we covered some theoritical concepts and completed our setup so its time we start building!

## 📓**Conversational AI Concepts & Model Pipelines**

🎯 By the end of this week, you will:

- Understand LLMs, STT, TTS models and their roles.

- Know how to connect to LLMs with APIs (Groq as example).

- Use Python (requests + JSON) for API interaction.

- Start building a basic chatbot with memory and preprocessing.

---

## 🌟 Large Language Models (LLMs) 🌟

---

### ❗ **Question 1**: What is an LLM?

👉 It’s like a super-smart text predictor that can read, understand, and generate human-like sentences.

You give it some words → it guesses the next words in a way that makes sense.

For example:

1) You ask a question → it gives you an answer.

2) You write a sentence → it can complete it.

3) You give it a topic → it can write an essay, code, or even a story.

So, its a type of AI trained on huge amounts of text data to generate or understand text.

---

### Types of LLMs

1. Encoder-only models (e.g., BERT)

    - Best for understanding text (classification, sentiment analysis, embeddings).

    - ❌ Not good at generating text.

2. Decoder-only models (e.g., GPT, LLaMA, Mistral)

    - Best for text generation (chatbots, writing, summarization).

    - What we use in chatbots.

3. Encoder-decoder models (e.g., T5, BART)

    - Good at transforming text (translation, summarization, Q&A).

### Must-Knows about LLMs

- They don’t “think” like humans → They predict text based on training.

- Garbage in → garbage out: Poor prompts = poor answers.

- Token limits: Models can only “see” a certain number of words at a time.

- Biases: Trained on internet text → may reflect biases/errors.

### 💡 **Quick Questions**:

1. Why might a chatbot built on BERT (encoder-only) struggle to answer open-ended questions?

- Answer 👉A chatbot built on BERT (encoder-only) might struggle with open-ended questions because BERT is designed mainly for understanding and analyzing text, not for generating new responses. Since it lacks a decoder, it cannot naturally produce long, coherent answers beyond classification or extraction tasks.

---

## 🌟 Speech-to-Text (STT) 🌟

---

### ❗ **Question 2**: What is STT?

👉 listens to your voice and turns it into written text.

- Converts **audio → text**.
- Enables voice input for conversational AI.
- Think of it as the **ears** of the chatbot.

**Popular STT Models**:

1) **Whisper (OpenAI)** – strong at multilingual speech recognition.
2) **Google Speech-to-Text API** – widely used, real-time transcription.
3) **Vosk** – lightweight, offline speech recognition.

**Common Usages**

1) Voice assistants (Alexa, Siri, Google Assistant).
2) Automated captions in meetings or lectures.
3) Voice-enabled customer support.

---

### Must-Knows about STT

- Accuracy depends on **noise, accents, clarity of speech**.

- Some models need **internet connection** (API-based), others run **offline**.

- Preprocessing audio (noise reduction) improves results.


### 💡 **Quick Questions**:

2. Why do you think meeting transcription apps like Zoom or Google Meet struggle when multiple people talk at once?

- Answer 👉Meeting transcription apps like Zoom or Google Meet struggle when multiple people talk at once because speech recognition models are usually optimized for single-speaker input. Overlapping voices create noise, making it hard for the system to separate speakers and correctly align words with the right person, which reduces transcription accuracy.

---

## 🌟 Text-to-Speech (TTS) 🌟

---

### ❗ **Question 3**: What is TTS?

👉 takes written text and speaks it out loud in a human-like voice.

- Converts **text → audio (speech)**.
- Think of it as the **mouth** of the chatbot.
- Makes AI “speak” naturally.

**Popular TTS Models**:

1) **Google TTS** – supports many languages and voices.
2) **Amazon Polly** – lifelike voice synthesis with customization.
3) **ElevenLabs** – cutting-edge, realistic voice cloning.

**Common Usages**

1) Screen readers for visually impaired users.
2) AI chatbots with voice output.
3) Audiobooks or podcast generation.

---

### Must-Knows about TTS

- Some voices sound robotic; others use **neural TTS** for natural tones.

- Latency matters → If too slow, conversation feels unnatural.

- Some TTS services allow **custom voices**.

### 💡 **Quick Questions**:

3. If you were designing a voice-based AI tutor, what qualities would you want in its TTS voice (tone, speed, clarity, etc.)?

- Answer 👉If I were designing a voice-based AI tutor, I’d want the TTS voice to have a clear and natural tone, moderate speed that adapts to the learner’s pace, and good pronunciation for easy understanding. It should also convey a friendly and encouraging style, with slight variations in intonation so it doesn’t sound robotic or monotonous.

---

## 🌟 Using APIs for LLMs with Groq 🌟

In [2]:
!pip install groq


Collecting groq
  Downloading groq-0.31.0-py3-none-any.whl.metadata (16 kB)
Downloading groq-0.31.0-py3-none-any.whl (131 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/131.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m131.4/131.4 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: groq
Successfully installed groq-0.31.0


In [12]:
from groq import Groq
import getpass

api_key = getpass.getpass("Enter your Groq API key: ")

client = Groq(api_key=api_key)

response = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[{"role": "user", "content": "Hello! What is conversational AI?"}]
)

print(response.choices[0].message.content)


Enter your Groq API key: ··········
Conversational AI, also known as conversational artificial intelligence or chatbots, refers to the use of artificial intelligence (AI) to enable computers or online platforms to understand and respond to human language in a conversational manner.

Conversational AI technologies have advanced significantly in recent years, allowing machines to understand the nuances of human conversation, including tone, context, and intent. This enables them to respond in a way that mimics human-like communication.

Some common applications of conversational AI include:

1. **Chatbots**: Virtual assistants that respond to user queries, provide support, and offer solutions to customer inquiries.
2. **Virtual assistants**: Like Siri, Alexa, and Google Assistant, which can perform tasks, provide information, and control smart home devices.
3. **Customer service**: AI-powered chatbots and virtual agents help customers resolve issues, answer questions, and provide support

---

## 🌟 Assignments 🌟

### 📝 Assignment 1: LLM Understanding

* Write a short note (3–4 sentences) explaining the difference between **encoder-only, decoder-only, and encoder-decoder LLMs**.
* Give one example usage of each.


Answer:
1) Encoder-only = the wise teacher (only understands and deeply analyzes the input).
2) Decoder-only = the storyteller friend (only generates the next possible word).
3) Encoder–Decoder = the translator friend (first understands, then generates a new output).

Encoder-only LLMs (like BERT) focus on understanding and representing input text, making them useful for classification, embeddings, and semantic search. Decoder-only LLMs (like GPT) specialize in generating text by predicting the next token, ideal for chatbots, story generation, and code completion. Encoder–Decoder LLMs (like T5 or BART) combine both: the encoder processes the input while the decoder generates context-aware output, making them effective for tasks such as translation and summarization.

### 📝 Assignment 2: STT/TTS Exploration

* Find **one STT model** and **one TTS model** (other than Whisper/Google).
* Write down:

  * What it does.
  * One possible application.

Answer:
Speech-to-Text (STT) Model: **DeepSpeech (by Mozilla)**

1) What it does:
Converts spoken audio into written text using a recurrent neural network
trained on large speech datasets.

2) One possible application:
Building a voice-controlled note-taking app where lectures or meetings are automatically transcribed into text.

ext-to-Speech (TTS) Model: **VITS (Variational Inference Text-to-Speech, by Kakao Brain)**

1) What it does: End-to-end model that directly generates high-quality, natural-sounding speech from text without needing separate components (like Tacotron + vocoder).

2) One possible application: Audiobook generation, where long-form text can be converted into human-like narration automatically.

### 📝 Assignment 3: Build a Chatbot with Memory

* Write a Python program that:

  * Takes user input in a loop.
  * Sends it to Groq API.
  * Stores the last 5 messages in memory.
  * Ends when user types `"quit"`.

In [5]:
from groq import Groq

client = Groq(api_key="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")

messages = []

while True:
    user_input = input("You: ")

    if user_input.lower() == "quit":
        print("Chat ended.")
        break

    # Add user message
    messages.append({"role": "user", "content": user_input})
    messages = messages[-5:]


    response = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=messages
    )

    reply = response.choices[0].message.content
    print("Bot:", reply)

    # Add bot reply
    messages.append({"role": "assistant", "content": reply})
    messages = messages[-5:]  # keep only last 5 messages


You: hello
Bot: Hello. How can I assist you today?
You: i want to know about AI, define it in 5 sentences.
Bot: Artificial Intelligence (AI) refers to the development of computer systems that can perform tasks that typically require human intelligence, such as learning, reasoning, and problem-solving. AI systems use algorithms and data to make decisions, often without being explicitly programmed for a specific task. These systems can be designed to simulate human-like intelligence, allowing them to adapt and improve over time through machine learning and experience. AI has numerous applications across various industries, including healthcare, finance, transportation, and education, and is used in everyday technologies like virtual assistants and image recognition software. Overall, AI aims to create intelligent machines that can think and act like humans, augmenting human capabilities and improving the efficiency and accuracy of various processes.
You: okay thanks
Bot: I'm glad I could

### 📝 Assignment 4: Preprocessing Function

* Write a function to clean user input:

  * Lowercase text.
  * Remove punctuation.
  * Strip extra spaces.

Test with: `"  HELLo!!!  How ARE you?? "`


In [6]:
import string

def clean_text(text: str) -> str:
    text = text.lower()
    text = text.translate(str.maketrans("", "", string.punctuation))
    text = " ".join(text.split())
    return text

sample = " HELLo!!! How ARE you?? "
print("Original:", sample)
print("Cleaned :", clean_text(sample))


Original:  HELLo!!! How ARE you?? 
Cleaned : hello how are you


### 📝 Assignment 5: Text Preprocessing

* Write a function that:

    * Converts text to lowercase.
    * Removes punctuation & numbers.
    * Removes stopwords (`the, is, and...`).
    * Applies stemming or lemmatization.
    * Removes words shorter than 3 characters.
    * Keeps only nouns, verbs, and adjectives (using POS tagging).

In [10]:
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk import pos_tag, word_tokenize
nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')
nltk.download('averaged_perceptron_tagger_eng')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


True

In [11]:
def clean_text(text):
    text = text.lower()
    text = re.sub(r"[^a-z\s]", "", text)
    words = word_tokenize(text)
    words = [w for w in words if w not in stopwords.words("english")]
    words = [w for w in words if len(w) >= 3]
    words = [PorterStemmer().stem(w) for w in words]
    tagged = pos_tag(words)
    words = [w for w, t in tagged if t.startswith(("N","V","J"))]
    return words

sample = "Its September 2025, only 3 and a half month left in 2026!"
print(clean_text(sample))


['septemb', 'half', 'month', 'left']


### 📝 Assignment 6: Reflection

* Answer in 2–3 sentences:

    * Why is context memory important in chatbots?
    * Why should beginners always check **API limits and pricing**?

Answer: Context memory is important in chatbots because it allows the system to remember past interactions, maintain continuity, and give more natural, human-like responses instead of treating each message in isolation. Beginners should always check API limits and pricing to avoid unexpected costs and to design their applications efficiently within usage constraints.

---

### **Hints:**

1) Stemming:
    - Cuts off word endings to get the “root.”
    - Very mechanical → may produce non-real words.
    - Example:
        - "studies" → "studi"
        - "running" → "run"

2) Lemmatization:
    - Smarter → uses vocabulary + grammar rules.
    - Always gives a real word (the **lemma**).
    - Example:
        - "studies" → "study"
        - "running" → "run"

3) Part-of-Speech (POS) tagging means labeling each word in a sentence with its grammatical role — like **noun, verb, adjective, adverb, pronoun, etc.**

    - Example:
        - Sentence → *“The cat is sleeping on the mat.”*

    - POS tags →
        - The → Determiner (DT)
        - cat → Noun (NN)
        - is → Verb (VBZ)
        - sleeping → Verb (VBG)
        - on → Preposition (IN)
        - the → Determiner (DT)
        - mat → Noun (NN)

    - **In short:** POS tagging helps machines understand **how words function in a sentence**, which is useful in NLP tasks like machine translation, text classification, and question answering.


---

### ✅ Recap

This week you learned:

* **LLMs**: Types, uses, must-knows.
* **STT & TTS**: How they connect with LLMs.
* **APIs**: Connecting to LLMs with Groq.
* Built your first chatbot foundation.