# **Buckle Up ! We are starting our week 2 roller coaster**

In our first week we covered some theoritical concepts and completed our setup so its time we start building!

## 📓**Conversational AI Concepts & Model Pipelines**

🎯 By the end of this week, you will:

- Understand LLMs, STT, TTS models and their roles.

- Know how to connect to LLMs with APIs (Groq as example).

- Use Python (requests + JSON) for API interaction.

- Start building a basic chatbot with memory and preprocessing.

---

## 🌟 Large Language Models (LLMs) 🌟

---

### ❗ **Question 1**: What is an LLM?

👉 It’s like a super-smart text predictor that can read, understand, and generate human-like sentences.

You give it some words → it guesses the next words in a way that makes sense.

For example:

1) You ask a question → it gives you an answer.

2) You write a sentence → it can complete it.

3) You give it a topic → it can write an essay, code, or even a story.

So, its a type of AI trained on huge amounts of text data to generate or understand text.

---

### Types of LLMs

1. Encoder-only models (e.g., BERT)

    - Best for understanding text (classification, sentiment analysis, embeddings).

    - ❌ Not good at generating text.

2. Decoder-only models (e.g., GPT, LLaMA, Mistral)

    - Best for text generation (chatbots, writing, summarization).

    - What we use in chatbots.

3. Encoder-decoder models (e.g., T5, BART)

    - Good at transforming text (translation, summarization, Q&A).

### Must-Knows about LLMs

- They don’t “think” like humans → They predict text based on training.

- Garbage in → garbage out: Poor prompts = poor answers.

- Token limits: Models can only “see” a certain number of words at a time.

- Biases: Trained on internet text → may reflect biases/errors.

### 💡 **Quick Questions**: 

1. Why might a chatbot built on BERT (encoder-only) struggle to answer open-ended questions?

- BERT as an encoder-only model is designed to understand and process text input rather than generate text. It excels at comprehension tasks like classification and sentiment analysis but lacks the generative capabilities needed for open-ended responses. When faced with questions requiring creative or lengthy responses, BERT doesn't have the architecture to effectively predict sequential token outputs that form coherent, contextually-appropriate text like decoder-only models (GPT, LLaMA) can.

---

## 🌟 Speech-to-Text (STT) 🌟

---

### ❗ **Question 2**: What is STT?

👉 listens to your voice and turns it into written text.

- Converts **audio → text**.
- Enables voice input for conversational AI.
- Think of it as the **ears** of the chatbot.

**Popular STT Models**:

1) **Whisper (OpenAI)** – strong at multilingual speech recognition.
2) **Google Speech-to-Text API** – widely used, real-time transcription.
3) **Vosk** – lightweight, offline speech recognition.

**Common Usages**

1) Voice assistants (Alexa, Siri, Google Assistant).
2) Automated captions in meetings or lectures.
3) Voice-enabled customer support.

---

### Must-Knows about STT

- Accuracy depends on **noise, accents, clarity of speech**.

- Some models need **internet connection** (API-based), others run **offline**.

- Preprocessing audio (noise reduction) improves results.


### 💡 **Quick Questions**: 

2. Why do you think meeting transcription apps like Zoom or Google Meet struggle when multiple people talk at once?

- Meeting transcription apps struggle with overlapping speech because STT models are typically trained on single-speaker audio. When multiple people talk simultaneously, the audio contains overlapping frequencies and patterns that confuse the model. The system cannot easily separate distinct voices, determine who said what (speaker diarization), or process the mixed acoustic signals accurately. Additionally, varying volumes and accents compound the problem, making it difficult for the model to produce accurate transcriptions.

---

## 🌟 Text-to-Speech (TTS) 🌟

---

### ❗ **Question 3**: What is TTS?

👉 takes written text and speaks it out loud in a human-like voice.

- Converts **text → audio (speech)**.
- Think of it as the **mouth** of the chatbot.
- Makes AI “speak” naturally.

**Popular TTS Models**:

1) **Google TTS** – supports many languages and voices.
2) **Amazon Polly** – lifelike voice synthesis with customization.
3) **ElevenLabs** – cutting-edge, realistic voice cloning.

**Common Usages**

1) Screen readers for visually impaired users.
2) AI chatbots with voice output.
3) Audiobooks or podcast generation.

---

### Must-Knows about TTS

- Some voices sound robotic; others use **neural TTS** for natural tones.

- Latency matters → If too slow, conversation feels unnatural.

- Some TTS services allow **custom voices**.

### 💡 **Quick Questions**: 

3. If you were designing a voice-based AI tutor, what qualities would you want in its TTS voice (tone, speed, clarity, etc.)?

- For an AI tutor, ideal TTS voice qualities would include:
Clear pronunciation and articulation for educational content
Natural, warm tone that feels engaging but professional
Appropriate pacing (not too fast for complex topics, not too slow to bore students)
Expressive intonation to emphasize important points and maintain interest
Ability to vary speaking style based on content (explanatory vs encouraging)
Consistent volume levels and natural-sounding pauses
Accent that's widely understandable for the target audience

---

## 🌟 Using APIs for LLMs with Groq 🌟

In [1]:
from groq import Groq

client = Groq(api_key="")

response = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[{"role": "user", "content": "Hello! What is conversational AI?"}]
)

print(response.choices[0].message.content)


Conversational AI, also known as conversational artificial intelligence or chatbots, is a type of artificial intelligence (AI) that enables computers to understand and respond to human language inputs in a natural and conversational way. It's designed to mimic human-like conversations, using natural language processing (NLP) and machine learning (ML) algorithms.

Conversational AI systems can engage in conversation with humans through various interfaces, such as text chat windows, voice assistants, messaging apps, or even customer service chatbots. They can understand and respond to a wide range of user queries, from simple questions to more complex requests.

Some common applications of conversational AI include:

1. **Customer service and support**: Chatbots can help resolve customer complaints, answer frequently asked questions, and provide basic support.
2. **Virtual assistants**: Conversational AI powers virtual assistants like Siri, Alexa, and Google Assistant, which can perform 

---

## 🌟 Assignments 🌟

### 📝 Assignment 1: LLM Understanding

* Write a short note (3–4 sentences) explaining the difference between **encoder-only, decoder-only, and encoder-decoder LLMs**.
* Give one example usage of each.

Encoder-only models like BERT are designed to understand text by creating rich representations but they can't generate content on their own, which makes them great for text classification or sentiment analysis. Decoder-only models like GPT are built to generate text by predicting what comes next in a sequence and they're what powers most modern chatbots. Encoder-decoder models like T5 combine both approaches by first encoding input text to understand it and then decoding to generate new text, which is really useful when you need to transform content from one form to another.

Example usages:
- Encoder-only (BERT): Analyzing customer reviews to determine if they're positive or negative
- Decoder-only (GPT): Writing a creative story continuation from a prompt
- Encoder-decoder (T5): Summarizing a long article into a few bullet points

### 📝 Assignment 2: STT/TTS Exploration

* Find **one STT model** and **one TTS model** (other than Whisper/Google).
* Write down:

  * What it does.
  * One possible application.

STT Model: Mozilla DeepSpeech
- What it does: DeepSpeech is an open-source speech-to-text engine that uses a model trained by machine learning techniques based on Baidu's Deep Speech research. It's designed to run on-device rather than sending audio to cloud servers, which is great for privacy.
- Possible application: I could use it in a voice-controlled smart home system that works offline, so people's voice commands for turning lights on/off or adjusting the thermostat aren't sent to external servers.

TTS Model: Coqui TTS
- What it does: Coqui is an open-source deep learning toolkit for text-to-speech with pretrained models. It supports multiple languages and can generate fairly natural-sounding speech without requiring cloud services.
- Possible application: I could build an accessibility app that reads website content aloud for visually impaired users while maintaining their privacy since all processing happens locally on their device.

### 📝 Assignment 3: Build a Chatbot with Memory

* Write a Python program that:

  * Takes user input in a loop.
  * Sends it to Groq API.
  * Stores the last 5 messages in memory.
  * Ends when user types `"quit"`.

In [2]:
from groq import Groq
import os
api_key = "your-groq-api-key"
client = Groq(api_key=api_key)

def chatbot_with_memory():
    memory = []
    memory_size = 5
    print("Welcome to my chatbot! Type 'quit' to exit.")
    while True:
        user_input = input("You: ")
        if user_input.lower() == "quit":
            print("Thanks for chatting! Goodbye!")
            break
        memory.append({"role": "user", "content": user_input})
        messages = [
            {"role": "system", "content": "You are a helpful assistant."}
        ]
        messages.extend(memory[-memory_size:])

        try:
            response = client.chat.completions.create(
                model="llama-3.1-8b-instant",
                messages=messages
            )
            bot_response = response.choices[0].message.content
            print(f"Bot: {bot_response}")
            memory.append({"role": "assistant", "content": bot_response})
            if len(memory) > memory_size * 2:
                memory = memory[-(memory_size * 2):]

        except Exception as e:
            print(f"Oops! Something went wrong: {e}")

if __name__ == "__main__":
    chatbot_with_memory()

Welcome to my chatbot! Type 'quit' to exit.
Bot: Quantum computing is a revolutionary technology that uses the principles of quantum mechanics to perform computations that are beyond the capabilities of classical computers. It offers a new approach to processing information, one that leverages the unique properties of quantum systems to solve complex problems exponentially faster than classical computers.

**Key Principles:**

1. **Quantum Bits (Qubits):** Quantum computers use quantum bits, or qubits, which can exist in multiple states simultaneously, unlike classical bits which can only be 0 or 1. This property, known as superposition, allows qubits to process multiple possibilities at the same time.
2. **Entanglement:** Quantum computers use entanglement, a phenomenon where two or more qubits become connected in such a way that the state of one qubit affects the state of the others, even when separated by large distances.
3. **Quantum Tunneling:** Quantum computers use quantum tunne

### 📝 Assignment 4: Preprocessing Function

* Write a function to clean user input:

  * Lowercase text.
  * Remove punctuation.
  * Strip extra spaces.

Test with: `"  HELLo!!!  How ARE you?? "`

In [5]:
import string

def clean_user_input(text):
    text = text.lower()
    translator = str.maketrans('', '', string.punctuation)
    text = text.translate(translator)
    text = " ".join(text.split())
    return text

clean_text = clean_user_input("  HELLo!!!  How ARE you?? ")
print(clean_text)


hello how are you
