# **Buckle Up ! We are starting our week 2 roller coaster**

In our first week we covered some theoritical concepts and completed our setup so its time we start building!

## 📓**Conversational AI Concepts & Model Pipelines**

🎯 By the end of this week, you will:

- Understand LLMs, STT, TTS models and their roles.

- Know how to connect to LLMs with APIs (Groq as example).

- Use Python (requests + JSON) for API interaction.

- Start building a basic chatbot with memory and preprocessing.

---

## 🌟 Large Language Models (LLMs) 🌟

---

### ❗ **Question 1**: What is an LLM?

👉 It’s like a super-smart text predictor that can read, understand, and generate human-like sentences.

You give it some words → it guesses the next words in a way that makes sense.

For example:

1) You ask a question → it gives you an answer.

2) You write a sentence → it can complete it.

3) You give it a topic → it can write an essay, code, or even a story.

So, its a type of AI trained on huge amounts of text data to generate or understand text.

---

### Types of LLMs

1. Encoder-only models (e.g., BERT)

    - Best for understanding text (classification, sentiment analysis, embeddings).

    - ❌ Not good at generating text.

2. Decoder-only models (e.g., GPT, LLaMA, Mistral)

    - Best for text generation (chatbots, writing, summarization).

    - What we use in chatbots.

3. Encoder-decoder models (e.g., T5, BART)

    - Good at transforming text (translation, summarization, Q&A).

### Must-Knows about LLMs

- They don’t “think” like humans → They predict text based on training.

- Garbage in → garbage out: Poor prompts = poor answers.

- Token limits: Models can only “see” a certain number of words at a time.

- Biases: Trained on internet text → may reflect biases/errors.

### 💡 **Quick Questions**:

1. Why might a chatbot built on BERT (encoder-only) struggle to answer open-ended questions?

- Answer 👉 BERT is mainly designed to understand text, not to generate it. So, while it’s good at finding meaning in sentences, it struggles to create long, open-ended answers like a conversation would need.

---

## 🌟 Speech-to-Text (STT) 🌟

---

### ❗ **Question 2**: What is STT?

👉 listens to your voice and turns it into written text.

- Converts **audio → text**.
- Enables voice input for conversational AI.
- Think of it as the **ears** of the chatbot.

**Popular STT Models**:

1) **Whisper (OpenAI)** – strong at multilingual speech recognition.
2) **Google Speech-to-Text API** – widely used, real-time transcription.
3) **Vosk** – lightweight, offline speech recognition.

**Common Usages**

1) Voice assistants (Alexa, Siri, Google Assistant).
2) Automated captions in meetings or lectures.
3) Voice-enabled customer support.

---

### Must-Knows about STT

- Accuracy depends on **noise, accents, clarity of speech**.

- Some models need **internet connection** (API-based), others run **offline**.

- Preprocessing audio (noise reduction) improves results.


### 💡 **Quick Questions**:

2. Why do you think meeting transcription apps like Zoom or Google Meet struggle when multiple people talk at once?

- Answer 👉 Because the app can only clearly process one voice at a time. When people overlap, the software gets confused about whose words belong to which person.

---

## 🌟 Text-to-Speech (TTS) 🌟

---

### ❗ **Question 3**: What is TTS?

👉 takes written text and speaks it out loud in a human-like voice.

- Converts **text → audio (speech)**.
- Think of it as the **mouth** of the chatbot.
- Makes AI “speak” naturally.

**Popular TTS Models**:

1) **Google TTS** – supports many languages and voices.
2) **Amazon Polly** – lifelike voice synthesis with customization.
3) **ElevenLabs** – cutting-edge, realistic voice cloning.

**Common Usages**

1) Screen readers for visually impaired users.
2) AI chatbots with voice output.
3) Audiobooks or podcast generation.

---

### Must-Knows about TTS

- Some voices sound robotic; others use **neural TTS** for natural tones.

- Latency matters → If too slow, conversation feels unnatural.

- Some TTS services allow **custom voices**.

### 💡 **Quick Questions**:

3. If you were designing a voice-based AI tutor, what qualities would you want in its TTS voice (tone, speed, clarity, etc.)?

- Answer 👉 The voice should sound clear, friendly, and natural. It should also speak at a comfortable speed so learners can easily follow along

---

## 🌟 Using APIs for LLMs with Groq 🌟

In [None]:
from groq import Groq

client = Groq(api_key="")

response = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[{"role": "user", "content": "Hello! What is conversational AI?"}]
)

print(response.choices[0].message.content)


Conversational AI refers to the technology that enables computers or digital systems to simulate human-like conversations with humans. This is achieved through the use of natural language processing (NLP) and machine learning algorithms that allow AI systems to understand, interpret, and respond to human input in a way that feels natural and intuitive.

Conversational AI can take many forms, including:

1. **Chatbots**: These are AI-powered software programs that can engage in text-based conversations with humans, often used to provide customer support, answer frequently asked questions, or facilitate transactions.
2. **Virtual assistants**: These are AI-powered digital assistants, such as Siri, Google Assistant, or Alexa, that can understand voice commands and respond with relevant information or actions.
3. **Voice-controlled interfaces**: These are AI-powered interfaces that allow users to interact with devices using voice commands, such as smart speakers or home automation systems.

---

## 🌟 Assignments 🌟

### 📝 Assignment 1: LLM Understanding

* Write a short note (3–4 sentences) explaining the difference between **encoder-only, decoder-only, and encoder-decoder LLMs**.
* Give one example usage of each.


**Answer:**

- An **encoder-only** model is mainly used to understand text. It reads sentences and finds meaning but cannot generate new text. e.g ***BERT*** is used for sentiment analysis.

- A **decoder-only** model is mainly used to generate text. It predicts the next words and can write full answers, stories, or code but doesn’t focus on deep understanding. e.g **GPT** is used in chatbots.

- An **encoder-decoder model** does both together. The encoder understands the input, and the decoder uses that understanding to produce new text, which makes it useful for tasks like translation and summarization. e.g **T5** is used in machine translation and summarization.

### 📝 Assignment 2: STT/TTS Exploration

* Find **one STT model** and **one TTS model** (other than Whisper/Google).
* Write down:

  * What it does.
  * One possible application.

***Answer:***

**STT Model – Vosk**
- It converts speech into text and works offline without needing the internet.
- One application is creating real-time transcription apps for classrooms or meetings.

**TTS Model – Tacotron 2**
- It converts written text into natural-sounding human speech.
- One application is building voice assistants that can read messages aloud.

### 📝 Assignment 3: Build a Chatbot with Memory

* Write a Python program that:

  * Takes user input in a loop.
  * Sends it to Groq API.
  * Stores the last 5 messages in memory.
  * Ends when user types `"quit"`.

In [2]:
import requests

# Replace with your actual API key
API_KEY = "your_groq_api_key"
API_URL = "https://api.groq.com/openai/v1/chat/completions"

# To store last 5 messages
memory = []

while True:
    user_input = input("You: ")

    # Exit condition
    if user_input.lower() == "quit":
        print("Chatbot: Goodbye 👋")
        break

    # Add user input to memory
    memory.append({"role": "user", "content": user_input})

    # Keep only last 5 messages
    if len(memory) > 5:
        memory = memory[-5:]

    # Prepare API request
    headers = {"Authorization": f"Bearer {API_KEY}"}
    payload = {
        "model": "llama3-8b-8192",   # example Groq model
        "messages": memory
    }

    # Call Groq API
    response = requests.post(API_URL, headers=headers, json=payload)
    bot_reply = response.json()["choices"][0]["message"]["content"]

    # Print bot reply
    print("Chatbot:", bot_reply)

    # Save bot reply in memory
    memory.append({"role": "assistant", "content": bot_reply})

    # Keep only last 5 messages again
    if len(memory) > 5:
        memory = memory[-5:]


You: hi can you please tell me about the AI
Chatbot: Artificial Intelligence (AI)!

AI is a rapidly evolving field that involves creating machines that can perform tasks that typically require human intelligence, such as:

1. **Learning**: AI systems can learn from data, adapt to new situations, and improve their performance over time.
2. **Recognition**: AI can recognize patterns, images, speech, and text, and make decisions based on that recognition.
3. **Reasoning**: AI systems can reason and draw conclusions based on given data and rules.

There are many types of AI, including:

1. **Machine Learning** (ML): AI that can learn from data without being explicitly programmed.
2. **Deep Learning** (DL): A type of machine learning that uses neural networks to analyze data.
3. **Natural Language Processing** (NLP): AI that can understand, generate, and process human language.
4. **Robotics**: AI that controls robots and enables them to perform tasks.
5. **Expert Systems**: AI that mimics 

### 📝 Assignment 4: Preprocessing Function

* Write a function to clean user input:

  * Lowercase text.
  * Remove punctuation.
  * Strip extra spaces.

Test with: `"  HELLo!!!  How ARE you?? "`


In [3]:
import string

def clean_text(text: str) -> str:
    # Lowercase
    text = text.lower()
    # Remove punctuation
    text = text.translate(str.maketrans("", "", string.punctuation))
    # Remove extra spaces
    text = " ".join(text.split())
    return text

# Test
sample = " HELLo!!! How ARE you?? "
print("Before:", sample)
print("After:", clean_text(sample))


Before:  HELLo!!! How ARE you?? 
After: hello how are you


### 📝 Assignment 5: Text Preprocessing

* Write a function that:

    * Converts text to lowercase.
    * Removes punctuation & numbers.
    * Removes stopwords (`the, is, and...`).
    * Applies stemming or lemmatization.
    * Removes words shorter than 3 characters.
    * Keeps only nouns, verbs, and adjectives (using POS tagging).

In [4]:
import nltk
import string
import re
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk import pos_tag, word_tokenize

In [13]:
# Download necessary resources (run once)
# Core downloads
nltk.download("punkt")
nltk.download("punkt_tab")
nltk.download("stopwords")
nltk.download("wordnet")
nltk.download("omw-1.4")

# Taggers (new + legacy names, so it works in both environments)
nltk.download("averaged_perceptron_tagger")
nltk.download("averaged_perceptron_tagger_eng")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.


True

In [14]:
def preprocess_text(text: str) -> str:
    # Lowercase
    text = text.lower()

    # Remove punctuation & numbers
    text = re.sub(r"[^a-z\s]", "", text)

    # Tokenize
    words = word_tokenize(text)

    # Remove stopwords
    stop_words = set(stopwords.words("english"))
    words = [w for w in words if w not in stop_words]

    # Lemmatizer
    lemmatizer = WordNetLemmatizer()

    # POS tagging
    pos_tags = pos_tag(words)

    cleaned_words = []
    for word, tag in pos_tags:
        # Remove short words (<3 chars)
        if len(word) < 3:
            continue

        # Keep only nouns, verbs, adjectives
        if tag.startswith("N"):
            lemma = lemmatizer.lemmatize(word, "n")  # noun
        elif tag.startswith("V"):
            lemma = lemmatizer.lemmatize(word, "v")  # verb
        elif tag.startswith("J"):
            lemma = lemmatizer.lemmatize(word, "a")  # adjective
        else:
            continue

        cleaned_words.append(lemma)

    return " ".join(cleaned_words)

In [15]:
sample = "The quick brown foxes are jumping quickly over the lazy dogs 123!!!"
print("Before:", sample)
print("After:", preprocess_text(sample))

Before: The quick brown foxes are jumping quickly over the lazy dogs 123!!!
After: quick brown fox jump lazy dog




---

### Second Method

In [11]:
!pip install spacy
!python -m spacy download en_core_web_sm


Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m51.9 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [12]:
import spacy

# Load small English model
nlp = spacy.load("en_core_web_sm")

def preprocess_text_spacy(text: str) -> str:
    doc = nlp(text.lower())

    cleaned = []
    for token in doc:
        # Skip stopwords, punctuation, numbers, short words
        if token.is_stop or token.is_punct or token.is_digit or len(token.text) < 3:
            continue

        # Keep only nouns, verbs, adjectives
        if token.pos_ in ["NOUN", "VERB", "ADJ"]:
            cleaned.append(token.lemma_)  # Lemmatized form

    return " ".join(cleaned)

# Test
sample = "The quick brown foxes are jumping quickly over the lazy dogs 123!!!"
print("Before:", sample)
print("After:", preprocess_text_spacy(sample))


Before: The quick brown foxes are jumping quickly over the lazy dogs 123!!!
After: quick brown fox jump lazy dog


### 📝 Assignment 6: Reflection

* Answer in 2–3 sentences:

    * Why is context memory important in chatbots?
    * Why should beginners always check **API limits and pricing**?

**Answer**

**Context memory** is important in chatbots because it helps them remember previous parts of the conversation, making replies more natural and relevant instead of starting fresh every time.

Beginners should always check **API limits** and pricing to avoid unexpected costs and to design their applications within the allowed usage, ensuring smooth and affordable development.

---

