# **Buckle Up ! We are starting our week 2 roller coaster**

In our first week we covered some theoritical concepts and completed our setup so its time we start building!

## 📓**Conversational AI Concepts & Model Pipelines**

🎯 By the end of this week, you will:

- Understand LLMs, STT, TTS models and their roles.

- Know how to connect to LLMs with APIs (Groq as example).

- Use Python (requests + JSON) for API interaction.

- Start building a basic chatbot with memory and preprocessing.

---

## 🌟 Large Language Models (LLMs) 🌟

---

### ❗ **Question 1**: What is an LLM?

👉 It’s like a super-smart text predictor that can read, understand, and generate human-like sentences.

You give it some words → it guesses the next words in a way that makes sense.

For example:

1) You ask a question → it gives you an answer.

2) You write a sentence → it can complete it.

3) You give it a topic → it can write an essay, code, or even a story.

So, its a type of AI trained on huge amounts of text data to generate or understand text.

---

### Types of LLMs

1. Encoder-only models (e.g., BERT)

    - Best for understanding text (classification, sentiment analysis, embeddings).

    - ❌ Not good at generating text.

2. Decoder-only models (e.g., GPT, LLaMA, Mistral)

    - Best for text generation (chatbots, writing, summarization).

    - What we use in chatbots.

3. Encoder-decoder models (e.g., T5, BART)

    - Good at transforming text (translation, summarization, Q&A).

### Must-Knows about LLMs

- They don’t “think” like humans → They predict text based on training.

- Garbage in → garbage out: Poor prompts = poor answers.

- Token limits: Models can only “see” a certain number of words at a time.

- Biases: Trained on internet text → may reflect biases/errors.

### 💡 **Quick Questions**:

1. Why might a chatbot built on BERT (encoder-only) struggle to answer open-ended questions?

- Answer 👉 BERT is designed mainly for understanding and classifying text, not generating it. It cannot produce long, creative, or open-ended responses because it lacks a decoder to generate new text.

---

## 🌟 Speech-to-Text (STT) 🌟

---

### ❗ **Question 2**: What is STT?

👉 listens to your voice and turns it into written text.

- Converts **audio → text**.
- Enables voice input for conversational AI.
- Think of it as the **ears** of the chatbot.

**Popular STT Models**:

1) **Whisper (OpenAI)** – strong at multilingual speech recognition.
2) **Google Speech-to-Text API** – widely used, real-time transcription.
3) **Vosk** – lightweight, offline speech recognition.

**Common Usages**

1) Voice assistants (Alexa, Siri, Google Assistant).
2) Automated captions in meetings or lectures.
3) Voice-enabled customer support.

---

### Must-Knows about STT

- Accuracy depends on **noise, accents, clarity of speech**.

- Some models need **internet connection** (API-based), others run **offline**.

- Preprocessing audio (noise reduction) improves results.


### 💡 **Quick Questions**:

2. Why do you think meeting transcription apps like Zoom or Google Meet struggle when multiple people talk at once?

- Answer 👉 These apps struggle because overlapping voices make it hard for the model to separate and recognize each speaker’s words. Background noise and different accents can also reduce accuracy.

---

## 🌟 Text-to-Speech (TTS) 🌟

---

### ❗ **Question 3**: What is TTS?

👉 takes written text and speaks it out loud in a human-like voice.

- Converts **text → audio (speech)**.
- Think of it as the **mouth** of the chatbot.
- Makes AI “speak” naturally.

**Popular TTS Models**:

1) **Google TTS** – supports many languages and voices.
2) **Amazon Polly** – lifelike voice synthesis with customization.
3) **ElevenLabs** – cutting-edge, realistic voice cloning.

**Common Usages**

1) Screen readers for visually impaired users.
2) AI chatbots with voice output.
3) Audiobooks or podcast generation.

---

### Must-Knows about TTS

- Some voices sound robotic; others use **neural TTS** for natural tones.

- Latency matters → If too slow, conversation feels unnatural.

- Some TTS services allow **custom voices**.

### 💡 **Quick Questions**:

3. If you were designing a voice-based AI tutor, what qualities would you want in its TTS voice (tone, speed, clarity, etc.)?

- Answer 👉 I would want the TTS voice to be clear, friendly, and natural-sounding. The tone should be encouraging and patient, the speed should be moderate and adjustable, and the pronunciation should be accurate to help learners understand easily.

---

## 🌟 Using APIs for LLMs with Groq 🌟

In [4]:
from groq import Groq

client = Groq(api_key="API key")

response = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[{"role": "user", "content": "Hello! What is conversational AI?"}]
)

print(response.choices[0].message.content)


Conversational AI, also known as conversational interfaces or chatbots, is a type of artificial intelligence (AI) that allows computers to engage in natural-sounding conversations with humans. It enables users to interact with systems, applications, or services through text or voice, just like they would with another human being.

Conversational AI uses various techniques, such as:

1. **Natural Language Processing (NLP)**: This involves analyzing and understanding human language, including syntax, semantics, and pragmatics.
2. **Machine Learning**: This enables the system to learn from user interactions and improve its responses over time.
3. **Dialogue Management**: This involves managing the conversation flow, including recognizing user inputs, generating responses, and determining when to ask follow-up questions.

Conversational AI has many applications, such as:

1. **Customer service bots**: These help customers with inquiries, complaints, or support requests, 24/7.
2. **Virtual 

---

## 🌟 Assignments 🌟

### 📝 Assignment 1: LLM Understanding

* Write a short note (3–4 sentences) explaining the difference between **encoder-only, decoder-only, and encoder-decoder LLMs**.
* Give one example usage of each.


Encoder-only LLMs, like BERT, are designed mainly for understanding and
analyzing text by creating contextual embeddings; they excel at tasks
such as classification and sentiment analysis but cannot generate new text.
Decoder-only LLMs, such as GPT, focus on generating text by predicting
the next word in a sequence, making them ideal for chatbots and creative
writing. Encoder-decoder LLMs, like T5 or BART, combine both architectures
to transform input text into output text, which is useful for tasks like
translation and summarization.

**Examples:**  
- Encoder-only: Sentiment analysis (BERT)  
- Decoder-only: Conversational chatbot (GPT)  
- Encoder-decoder: Machine translation (T5)

### 📝 Assignment 2: STT/TTS Exploration

* Find **one STT model** and **one TTS model** (other than Whisper/Google).
* Write down:

  * What it does.
  * One possible application.

**STT Model: Vosk**  
- What it does: Vosk listens to your voice and turns it into written words.  
- One possible application: It can be used in offline voice typing apps, so you can talk and your words appear as text even without internet.

**TTS Model: Amazon Polly**  
- What it does: Amazon Polly takes written text and reads it out loud in a human-like voice.  
- One possible application: It can be used to make talking books for people who have trouble reading.

### 📝 Assignment 3: Build a Chatbot with Memory

* Write a Python program that:

  * Takes user input in a loop.
  * Sends it to Groq API.
  * Stores the last 5 messages in memory.
  * Ends when user types `"quit"`.

In [5]:

from groq import Groq

client = Groq(api_key="API Key")

messages = []

while True:
    user_input = input("You: ")
    if user_input.lower() == "quit":
        print("Chatbot: Goodbye!")
        break
    messages.append({"role": "user", "content": user_input})
    messages = messages[-2:]


    response = client.chat.completions.create(
        model="llama-3.1-8b-instant",
        messages=messages
    )
    bot_reply = response.choices[0].message.content
    print("Chatbot:", bot_reply)
    messages.append({"role": "assistant", "content": bot_reply})
    messages

You: hi
Chatbot: Hello. Is there something I can help you with?
You: how are you
Chatbot: I'm just a computer program, so I don't have feelings in the same way that humans do, but I'm functioning properly and ready to help with any questions or tasks you may have. How about you? How's your day going?
You: what are you doing
Chatbot: I'm a large language model, so my primary function is to process and respond to text-based input. Here are some ways I'm currently "doing things":

1. **Listening**: I'm waiting for your next message, and I'll respond based on what you type.
2. **Learning**: In the background, I'm continuously learning from the vast amount of text data I was trained on, which helps me improve my language understanding and generation capabilities.
3. **Generating text**: When you ask me a question or request information, I'm generating text that's relevant and accurate based on my training data.
4. **Improving**: My developers are constantly working on updating and refining 

### 📝 Assignment 4: Preprocessing Function

* Write a function to clean user input:

  * Lowercase text.
  * Remove punctuation.
  * Strip extra spaces.

Test with: `"  HELLo!!!  How ARE you?? "`


In [6]:

import string

def clean_text(text):

    text = text.lower()
    text = text.translate(str.maketrans('', '', string.punctuation))
    text = ' '.join(text.split())
    return text

test_input = "  HELLo!!!  How ARE you?? "
print(clean_text(test_input))  #

hello how are you


### 📝 Assignment 5: Text Preprocessing

* Write a function that:

    * Converts text to lowercase.
    * Removes punctuation & numbers.
    * Removes stopwords (`the, is, and...`).
    * Applies stemming or lemmatization.
    * Removes words shorter than 3 characters.
    * Keeps only nouns, verbs, and adjectives (using POS tagging).

In [11]:
import re
import string
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk import pos_tag, word_tokenize



try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt', quiet=True)
try:
    nltk.data.find('corpora/stopwords')
except LookupError:
    nltk.download('stopwords', quiet=True)
try:
    nltk.data.find('corpora/wordnet')
except LookupError:
    nltk.download('wordnet', quiet=True)
try:
    nltk.data.find('taggers/averaged_perceptron_tagger')
except LookupError:
    nltk.download('averaged_perceptron_tagger', quiet=True)
try:
    nltk.data.find('taggers/averaged_perceptron_tagger_eng')
except LookupError:
    nltk.download('averaged_perceptron_tagger_eng', quiet=True)


def preprocess_text(text):

    text = text.lower()
    text = re.sub(r"[^a-z\s]", "", text)
    words = word_tokenize(text)
    stop_words = set(stopwords.words("english"))
    words = [w for w in words if w not in stop_words]
    lemmatizer = WordNetLemmatizer()
    words = [lemmatizer.lemmatize(w) for w in words]
    words = [w for w in words if len(w) >= 3]
    tagged_words = pos_tag(words)
    allowed_tags = {"NN", "NNS", "NNP", "NNPS",
                    "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
                    "JJ", "JJR", "JJS"}

    filtered_words = [word for word, tag in tagged_words if tag in allowed_tags]

    return filtered_words
sample_text = "The quick brown foxes were jumping over the lazy dogs in 2025!"
print("Original:", sample_text)
print("Processed:", preprocess_text(sample_text))

Original: The quick brown foxes were jumping over the lazy dogs in 2025!
Processed: ['quick', 'brown', 'fox', 'jumping', 'lazy', 'dog']


**Chatbot with Text Preprocessing**

This code integrates a Groq-powered chatbot using the LLaMA 3.1 model while applying advanced NLP preprocessing. The preprocess_text function cleans, tokenizes, lemmatizes, removes stopwords, and filters words by POS tags. The processed output ensures meaningful context extraction from chatbot responses.

In [14]:

def preprocess_text(text):

    text = text.lower()
    text = re.sub(r"[^a-z\s]", "", text)
    words = word_tokenize(text)
    stop_words = set(stopwords.words("english"))
    words = [w for w in words if w not in stop_words]
    lemmatizer = WordNetLemmatizer()
    words = [lemmatizer.lemmatize(w) for w in words]
    words = [w for w in words if len(w) >= 3]
    tagged_words = pos_tag(words)
    allowed_tags = {"NN", "NNS", "NNP", "NNPS",
                    "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
                    "JJ", "JJR", "JJS"}

    filtered_words = [word for word, tag in tagged_words if tag in allowed_tags]

    return filtered_words
sample_text = bot_reply
print("Original:", sample_text)
print("Processed:", preprocess_text(sample_text))
from groq import Groq

client = Groq(api_key="apikye")

messages = []

while True:
    user_input = input("You: ")
    if user_input.lower() == "quit":
        print("Chatbot: Goodbye!")
        break
    messages.append({"role": "user", "content": user_input})
    messages = messages[-2:]

    # Send to Groq API
    response = client.chat.completions.create(
        model="llama-3.1-8b-instant",
        messages=messages
    )
    bot_reply = response.choices[0].message.content
    sample_text = bot_reply
    print("Original:", sample_text)
    print("Processed:", preprocess_text(sample_text))
    def preprocess_text(text):

      text = text.lower()
      text = re.sub(r"[^a-z\s]", "", text)
      words = word_tokenize(text)
      stop_words = set(stopwords.words("english"))
      words = [w for w in words if w not in stop_words]
      lemmatizer = WordNetLemmatizer()
      words = [lemmatizer.lemmatize(w) for w in words]
      words = [w for w in words if len(w) >= 3]
      tagged_words = pos_tag(words)
      allowed_tags = {"NN", "NNS", "NNP", "NNPS",
                    "VB", "VBD", "VBG", "VBN", "VBP", "VBZ",
                    "JJ", "JJR", "JJS"}

      filtered_words = [word for word, tag in tagged_words if tag in allowed_tags]

      return filtered_words



    messages.append({"role": "assistant", "content": bot_reply})
    messages

Original: It seems like you meant to type "hello". How can I assist you today?
Processed: ['seems', 'meant', 'type', 'hello', 'assist', 'today']
You: hi
Original: It's nice to meet you. Is there something I can help you with or would you like to chat?
Processed: ['nice', 'meet', 'something', 'help', 'like', 'chat']
You: how are you
Original: I'm just a computer program, so I don't have feelings or emotions like humans do. I'm functioning properly and ready to help answer any questions or provide information you might need. What about you? How's your day going?
Processed: ['computer', 'program', 'dont', 'feeling', 'emotion', 'human', 'functioning', 'ready', 'help', 'answer', 'question', 'information', 'need', 'hows', 'day', 'going']
You: good one
Original: I guess I could have done better than reiterating the obvious. What's on your mind? Want to talk about something specific or just chat about a topic?
Processed: ['guess', 'done', 'reiterating', 'obvious', 'whats', 'mind', 'want', 'tal

### 📝 Assignment 6: Reflection

* Answer in 2–3 sentences:

    * Why is context memory important in chatbots?
    * Why should beginners always check **API limits and pricing**?

Context memory is important in chatbots because it helps the bot remember previous messages, making conversations feel more natural and allowing it to give relevant, accurate responses. Beginners should always check API limits and pricing to avoid unexpected costs and ensure their project does not stop working due to hitting usage limits.

---

### **Hints:**

1) Stemming:
    - Cuts off word endings to get the “root.”
    - Very mechanical → may produce non-real words.
    - Example:
        - "studies" → "studi"
        - "running" → "run"

2) Lemmatization:
    - Smarter → uses vocabulary + grammar rules.
    - Always gives a real word (the **lemma**).
    - Example:
        - "studies" → "study"
        - "running" → "run"

3) Part-of-Speech (POS) tagging means labeling each word in a sentence with its grammatical role — like **noun, verb, adjective, adverb, pronoun, etc.**

    - Example:
        - Sentence → *“The cat is sleeping on the mat.”*

    - POS tags →
        - The → Determiner (DT)
        - cat → Noun (NN)
        - is → Verb (VBZ)
        - sleeping → Verb (VBG)
        - on → Preposition (IN)
        - the → Determiner (DT)
        - mat → Noun (NN)

    - **In short:** POS tagging helps machines understand **how words function in a sentence**, which is useful in NLP tasks like machine translation, text classification, and question answering.


---

### ✅ Recap

This week you learned:

* **LLMs**: Types, uses, must-knows.
* **STT & TTS**: How they connect with LLMs.
* **APIs**: Connecting to LLMs with Groq.
* Built your first chatbot foundation.