# **Buckle Up ! We are starting our week 2 roller coaster**

In our first week we covered some theoritical concepts and completed our setup so its time we start building!

## 📓**Conversational AI Concepts & Model Pipelines**

🎯 By the end of this week, you will:

- Understand LLMs, STT, TTS models and their roles.

- Know how to connect to LLMs with APIs (Groq as example).

- Use Python (requests + JSON) for API interaction.

- Start building a basic chatbot with memory and preprocessing.

---

## 🌟 Large Language Models (LLMs) 🌟

---

### ❗ **Question 1**: What is an LLM?

👉 It’s like a super-smart text predictor that can read, understand, and generate human-like sentences.

You give it some words → it guesses the next words in a way that makes sense.

For example:

1) You ask a question → it gives you an answer.

2) You write a sentence → it can complete it.

3) You give it a topic → it can write an essay, code, or even a story.

So, its a type of AI trained on huge amounts of text data to generate or understand text.

---

### Types of LLMs

1. Encoder-only models (e.g., BERT)

    - Best for understanding text (classification, sentiment analysis, embeddings).

    - ❌ Not good at generating text.

2. Decoder-only models (e.g., GPT, LLaMA, Mistral)

    - Best for text generation (chatbots, writing, summarization).

    - What we use in chatbots.

3. Encoder-decoder models (e.g., T5, BART)

    - Good at transforming text (translation, summarization, Q&A).

### Must-Knows about LLMs

- They don’t “think” like humans → They predict text based on training.

- Garbage in → garbage out: Poor prompts = poor answers.

- Token limits: Models can only “see” a certain number of words at a time.

- Biases: Trained on internet text → may reflect biases/errors.

### 💡 **Quick Questions**:

1. Why might a chatbot built on BERT (encoder-only) struggle to answer open-ended questions?

- Answer 👉 Because BERT is encoder-only, its built to understand and classify text, not generate it so it struggles as a chatbot.

---

## 🌟 Speech-to-Text (STT) 🌟

---

### ❗ **Question 2**: What is STT?

👉 listens to your voice and turns it into written text.

- Converts **audio → text**.
- Enables voice input for conversational AI.
- Think of it as the **ears** of the chatbot.

**Popular STT Models**:

1) **Whisper (OpenAI)** – strong at multilingual speech recognition.
2) **Google Speech-to-Text API** – widely used, real-time transcription.
3) **Vosk** – lightweight, offline speech recognition.

**Common Usages**

1) Voice assistants (Alexa, Siri, Google Assistant).
2) Automated captions in meetings or lectures.
3) Voice-enabled customer support.

---

### Must-Knows about STT

- Accuracy depends on **noise, accents, clarity of speech**.

- Some models need **internet connection** (API-based), others run **offline**.

- Preprocessing audio (noise reduction) improves results.


### 💡 **Quick Questions**:

2. Why do you think meeting transcription apps like Zoom or Google Meet struggle when multiple people talk at once?

- Answer 👉 When people talk simultaneously, the AI struggles to differentiate between voices. Background noise and different accents makes it even harder to transcribe.

---

## 🌟 Text-to-Speech (TTS) 🌟

---

### ❗ **Question 3**: What is TTS?

👉 takes written text and speaks it out loud in a human-like voice.

- Converts **text → audio (speech)**.
- Think of it as the **mouth** of the chatbot.
- Makes AI “speak” naturally.

**Popular TTS Models**:

1) **Google TTS** – supports many languages and voices.
2) **Amazon Polly** – lifelike voice synthesis with customization.
3) **ElevenLabs** – cutting-edge, realistic voice cloning.

**Common Usages**

1) Screen readers for visually impaired users.
2) AI chatbots with voice output.
3) Audiobooks or podcast generation.

---

### Must-Knows about TTS

- Some voices sound robotic; others use **neural TTS** for natural tones.

- Latency matters → If too slow, conversation feels unnatural.

- Some TTS services allow **custom voices**.

### 💡 **Quick Questions**:

3. If you were designing a voice-based AI tutor, what qualities would you want in its TTS voice (tone, speed, clarity, etc.)?

- Answer 👉 I would go for a soft and friendly tone, with clear pronunciation and adjustable speed. I’d also add options for multiple languages which would be accesible for everyone.

---

## 🌟 Using APIs for LLMs with Groq 🌟

In [12]:
!pip install groq

Collecting groq
  Downloading groq-0.31.0-py3-none-any.whl.metadata (16 kB)
Downloading groq-0.31.0-py3-none-any.whl (131 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m131.4/131.4 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: groq
Successfully installed groq-0.31.0


In [13]:
from google.colab import userdata

GROQ_API_KEY=userdata.get("GROQ_API_KEY")

In [14]:
from groq import Groq

client = Groq(api_key=GROQ_API_KEY)

response = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[{"role": "user", "content": "Hello! What is conversational AI?"}]
)

print(response.choices[0].message.content)

Conversational AI, also known as conversational interface or chatbots, is a type of artificial intelligence (AI) that enables machines to have interactions with humans through natural language conversations. It uses a combination of natural language processing (NLP), machine learning (ML), and dialog management to understand, interpret, and respond to user inputs.

Conversational AI systems aim to simulate human-like conversations, enabling users to interact with them as they would with a human being. This can take various forms, such as:

1. Chatbots and virtual assistants, like Siri, Alexa, or Google Assistant, that can answer questions, provide information, or perform tasks.
2. Social media interfaces that allow users to interact with companies or services through messaging platforms.
3. Customer service chatbots that help resolve customer inquiries and issues.
4. Voice assistants integrated into home devices, cars, or other smart devices.

Conversational AI relies on the following 

---

## 🌟 Assignments 🌟

### 📝 Assignment 1: LLM Understanding

* Write a short note (3–4 sentences) explaining the difference between **encoder-only, decoder-only, and encoder-decoder LLMs**.
* Give one example usage of each.


####**Answer**

Encoder-only models (like BERT) are designed to understand and classify text. They processes text bidirectionally but can't generate it, making them useful for tasks like sentiment analysis.

Decoder-only models (like GPT) predicts next word sequentially making them ideal text generation, it is useful for chatbots or any type of writing.

Encoder-decoder models (like T5) combines both understanding and generation of text, commonly used in translation or summarization.

### 📝 Assignment 2: STT/TTS Exploration

* Find **one STT model** and **one TTS model** (other than Whisper/Google).
* Write down:

  * What it does.
  * One possible application.

####**Answer**
####**STT Model: AssemblyAI**

- **What it does:** AssemblyAI is a cloud-based speech-to-text API that transcribes both pre-recorded and streaming audio. It supports features like speaker diarization, sentiment analysis, topic detection, entity detection, automated punctuation, and automatic language detection.
- **Application**: It can be used for automated transcription for meetings or podcasts with clear speaker labels. I have used it in one of my projects.


####**TTS Model: Tacotron 2 (by NVIDIA)**

- **What it does:** Tacotron 2 is a neural network-based text-to-speech model that generates natural-sounding human speech from text. It combines sequence-to-sequence modeling with attention mechanisms to produce high-quality audio.
- **Application**: It can be used to generate voiceovers for videos or audiobooks, making it valuable for content creators, e-learning platforms, and publishers.

### 📝 Assignment 3: Build a Chatbot with Memory

* Write a Python program that:

  * Takes user input in a loop.
  * Sends it to Groq API.
  * Stores the last 5 messages in memory.
  * Ends when user types `"quit"`.

In [15]:
memory = []

chatbot_name = "ChatGroq"

print(f"{chatbot_name}: A chatbot with Memory - Type 'quit' to exit\n\n")

while True:
    user_input = input("You: ")

    if user_input.lower().strip() == "quit":
        print(f"{chatbot_name}: Goodbye!")
        break

    # add user message to memory
    memory.append({"role": "user", "content": "You are a helpful assistant. "+user_input})

    # this  keeps only last 5 user inputs
    if len(memory) > 5:
        memory = memory[-5:]

    # model selection and sends user input to groq api
    chat_completion = client.chat.completions.create(
        model="llama-3.1-8b-instant",
        messages=memory
    )

    #get chatbot response
    bot_response = chat_completion.choices[0].message.content
    print(f"{chatbot_name}: ", bot_response)

    # add bot response to memory
    memory.append({"role": "assistant", "content": bot_response})

    # this  keeps only last 5 responses
    if len(memory) > 5:
        memory = memory[-5:]

ChatGroq: A chatbot with Memory - Type 'quit' to exit


You: How will you define Agentic AI?
ChatGroq:  Agentic AI, also known as Autonomous or Goal-Driven AI, refers to a type of artificial intelligence that possesses the ability to act independently, make decisions, and take actions to achieve its goals or objectives. This form of AI is often described as "agent-like" because it can perceive its environment, reason about its goals, and act accordingly to achieve those goals.

Agentic AI is typically characterized by the following features:

1. **Autonomy**: Agentic AI can operate independently, making decisions without human intervention.
2. **Goal-Oriented**: Agentic AI has specific objectives or goals that it strives to achieve.
3. **Self-Direction**: Agentic AI can adjust its actions based on the goals it has set.
4. **Learning**: Agentic AI can learn from its experiences, adapting to changing environments and refining its decision-making processes.
5. **Intentionality**: Agentic 

In [20]:
import json

print("Chatbot Memory:")
for i, message in enumerate(memory):
    print(f"Message {i+1}:")
    print(f"  Role: {message['role']}")
    print(f"  Content: {json.dumps(message['content'], indent=2)}")

Chatbot Memory:
Message 1:
  Role: assistant
  Content: "The question of whether AI will replace humans is a complex and multifaceted one. Some experts predict that AI will automate many tasks, but it's unlikely to completely replace humans across the board. Here's a balanced perspective:\n\n**Arguments for potential automation and replacement:**\n\n1. **Task-oriented automation**: AI can automate tasks that are repetitive, rule-based, and follow a predictable pattern. This includes jobs where tasks are well-defined, such as:\n\t* Data entry and processing\n\t* Bookkeeping and accounting\n\t* Manufacturing and assembly line work\n\t* Customer service (e.g., answering frequently asked questions)\n2. **Augmentation**: AI will enhance human capabilities, freeing us up to focus on creative, strategic, and high-value tasks. For example:\n\t* Virtual assistants can help with scheduling, email management, and research\n\t* AI-powered tools can assist with content creation, design, and writing

### 📝 Assignment 4: Preprocessing Function

* Write a function to clean user input:

  * Lowercase text.
  * Remove punctuation.
  * Strip extra spaces.

Test with: `"  HELLo!!!  How ARE you?? "`


In [16]:
import re

def preprocess_text(text):
    text = text.lower()
    text = re.sub(r'[^\w\s]', '', text)
    text = re.sub(r'\s+', ' ', text.strip())
    return text

text = " HELLo!!!  How ARE you?? "
preprocessed_text = preprocess_text(text)

print(f"Original: '{text}'")

print(f"Preprocessed: '{preprocessed_text}'")

Original: ' HELLo!!!  How ARE you?? '
Preprocessed: 'hello how are you'


### 📝 Assignment 5: Text Preprocessing

* Write a function that:

    * Converts text to lowercase.
    * Removes punctuation & numbers.
    * Removes stopwords (`the, is, and...`).
    * Applies stemming or lemmatization.
    * Removes words shorter than 3 characters.
    * Keeps only nouns, verbs, and adjectives (using POS tagging).

In [18]:
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk import pos_tag, word_tokenize

nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

In [19]:
def advanced_preprocess(text,use_stemming = False):
  #Lowercasing
  text = text.lower()

  #Removing punctuation and numbers
  text = re.sub(r'[^a-zA-Z\s]', '', text)

  #Tokenize
  tokens = word_tokenize(text)

  #Removing stopwords
  stop_words = set(stopwords.words('english'))
  tokens = [word for word in tokens if word not in stop_words]



  #Keeps only nouns, verbs, and adjectives (using POS tagging)
  pos_tags = pos_tag(tokens)
  allowed_pos = {'NN', 'NNS', 'NNP', 'NNPS', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'JJ', 'JJR', 'JJS'}

  #Applying stemming or lemmatization
  if use_stemming:
      stemmer = PorterStemmer()
      tokens = [stemmer.stem(word) for word in tokens]
  else:
      lemmatizer = WordNetLemmatizer()
      tokens = [word for word, pos in pos_tags if pos in allowed_pos]
      tokens = [lemmatizer.lemmatize(word) for word in tokens]

  #Removes words shorter than 3 characters
  tokens = [word for word in tokens if len(word) >= 3]

  return ' '.join(tokens)

# Test
text = "The children were running quickly across the fields while their teacher reminded them that being better prepared would help them perform well in future competitions"
l_processed = advanced_preprocess(text)
s_processed = advanced_preprocess(text, use_stemming=True)
print(f"Original: {text}")
print(f"Stemmed Text: {s_processed}")
print(f"Lemmatized Text: {l_processed}")

Original: The children were running quickly across the fields while their teacher reminded them that being better prepared would help them perform well in future competitions
Stemmed Text: children run quickli across field teacher remind better prepar would help perform well futur competit
Lemmatized Text: child running field reminded prepared help perform future competition


### 📝 Assignment 6: Reflection

* Answer in 2–3 sentences:

    * Why is context memory important in chatbots?
    * Why should beginners always check **API limits and pricing**?

---

### **Hints:**

1) Stemming:
    - Cuts off word endings to get the “root.”
    - Very mechanical → may produce non-real words.
    - Example:
        - "studies" → "studi"
        - "running" → "run"

2) Lemmatization:
    - Smarter → uses vocabulary + grammar rules.
    - Always gives a real word (the **lemma**).
    - Example:
        - "studies" → "study"
        - "running" → "run"

3) Part-of-Speech (POS) tagging means labeling each word in a sentence with its grammatical role — like **noun, verb, adjective, adverb, pronoun, etc.**

    - Example:
        - Sentence → *“The cat is sleeping on the mat.”*

    - POS tags →
        - The → Determiner (DT)
        - cat → Noun (NN)
        - is → Verb (VBZ)
        - sleeping → Verb (VBG)
        - on → Preposition (IN)
        - the → Determiner (DT)
        - mat → Noun (NN)

    - **In short:** POS tagging helps machines understand **how words function in a sentence**, which is useful in NLP tasks like machine translation, text classification, and question answering.


---

### ✅ Recap

This week you learned:

* **LLMs**: Types, uses, must-knows.
* **STT & TTS**: How they connect with LLMs.
* **APIs**: Connecting to LLMs with Groq.
* Built your first chatbot foundation.