# **Buckle Up ! We are starting our week 2 roller coaster**

In our first week we covered some theoritical concepts and completed our setup so its time we start building!

## 📓**Conversational AI Concepts & Model Pipelines**

🎯 By the end of this week, you will:

- Understand LLMs, STT, TTS models and their roles.

- Know how to connect to LLMs with APIs (Groq as example).

- Use Python (requests + JSON) for API interaction.

- Start building a basic chatbot with memory and preprocessing.

---

## 🌟 Large Language Models (LLMs) 🌟

---

### ❗ **Question 1**: What is an LLM?

👉 It’s like a super-smart text predictor that can read, understand, and generate human-like sentences.

You give it some words → it guesses the next words in a way that makes sense.

For example:

1) You ask a question → it gives you an answer.

2) You write a sentence → it can complete it.

3) You give it a topic → it can write an essay, code, or even a story.

So, its a type of AI trained on huge amounts of text data to generate or understand text.

---

### Types of LLMs

1. Encoder-only models (e.g., BERT)

    - Best for understanding text (classification, sentiment analysis, embeddings).

    - ❌ Not good at generating text.

2. Decoder-only models (e.g., GPT, LLaMA, Mistral)

    - Best for text generation (chatbots, writing, summarization).

    - What we use in chatbots.

3. Encoder-decoder models (e.g., T5, BART)

    - Good at transforming text (translation, summarization, Q&A).

### Must-Knows about LLMs

- They don’t “think” like humans → They predict text based on training.

- Garbage in → garbage out: Poor prompts = poor answers.

- Token limits: Models can only “see” a certain number of words at a time.

- Biases: Trained on internet text → may reflect biases/errors.

### 💡 **Quick Questions**: 

1. Why might a chatbot built on BERT (encoder-only) struggle to answer open-ended questions?

- Answer 👉 BERT is great at understanding text but can’t really generate answers because it’s encoder only. It can find relevant info but struggles with open ended questions since it doesn’t have a decoder to produce free form text.

---

## 🌟 Speech-to-Text (STT) 🌟

---

### ❗ **Question 2**: What is STT?

👉 listens to your voice and turns it into written text.

- Converts **audio → text**.
- Enables voice input for conversational AI.
- Think of it as the **ears** of the chatbot.

**Popular STT Models**:

1) **Whisper (OpenAI)** – strong at multilingual speech recognition.
2) **Google Speech-to-Text API** – widely used, real-time transcription.
3) **Vosk** – lightweight, offline speech recognition.

**Common Usages**

1) Voice assistants (Alexa, Siri, Google Assistant).
2) Automated captions in meetings or lectures.
3) Voice-enabled customer support.

---

### Must-Knows about STT

- Accuracy depends on **noise, accents, clarity of speech**.

- Some models need **internet connection** (API-based), others run **offline**.

- Preprocessing audio (noise reduction) improves results.


### 💡 **Quick Questions**: 

2. Why do you think meeting transcription apps like Zoom or Google Meet struggle when multiple people talk at once?

- Answer 👉 When multiple people talk at once, overlapping speech makes it hard for the system to separate voices

---

## 🌟 Text-to-Speech (TTS) 🌟

---

### ❗ **Question 3**: What is TTS?

👉 takes written text and speaks it out loud in a human-like voice.

- Converts **text → audio (speech)**.
- Think of it as the **mouth** of the chatbot.
- Makes AI “speak” naturally.

**Popular TTS Models**:

1) **Google TTS** – supports many languages and voices.
2) **Amazon Polly** – lifelike voice synthesis with customization.
3) **ElevenLabs** – cutting-edge, realistic voice cloning.

**Common Usages**

1) Screen readers for visually impaired users.
2) AI chatbots with voice output.
3) Audiobooks or podcast generation.

---

### Must-Knows about TTS

- Some voices sound robotic; others use **neural TTS** for natural tones.

- Latency matters → If too slow, conversation feels unnatural.

- Some TTS services allow **custom voices**.

### 💡 **Quick Questions**: 

3. If you were designing a voice-based AI tutor, what qualities would you want in its TTS voice (tone, speed, clarity, etc.)?

- Answer 👉 I’d want a clear, friendly, and engaging tone that feels approachable. The speed should be fast enough to stay efficient but slow enough for learners to follow.

---

## 🌟 Using APIs for LLMs with Groq 🌟

In [None]:
from groq import Groq

client = Groq(api_key="")

response = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[{"role": "user", "content": "Hello! What is conversational AI?"}]
)

print(response.choices[0].message.content)


Conversational AI refers to the technology that enables computers or digital systems to simulate human-like conversations with humans. This is achieved through the use of natural language processing (NLP) and machine learning algorithms that allow AI systems to understand, interpret, and respond to human input in a way that feels natural and intuitive.

Conversational AI can take many forms, including:

1. **Chatbots**: These are AI-powered software programs that can engage in text-based conversations with humans, often used to provide customer support, answer frequently asked questions, or facilitate transactions.
2. **Virtual assistants**: These are AI-powered digital assistants, such as Siri, Google Assistant, or Alexa, that can understand voice commands and respond with relevant information or actions.
3. **Voice-controlled interfaces**: These are AI-powered interfaces that allow users to interact with devices using voice commands, such as smart speakers or home automation systems.

---

## 🌟 Assignments 🌟

### 📝 Assignment 1: LLM Understanding

* Write a short note (3–4 sentences) explaining the difference between **encoder-only, decoder-only, and encoder-decoder LLMs**.
* Give one example usage of each.

Answer:

Encoder-only models focus on understanding text. They’re great for tasks like classification or extracting answers from a passage. Example: BERT for sentiment analysis.

Decoder-only models generate text based on a prompt, predicting one token at a time. Example: GPT for chat or story generation.

Encoder-decoder models both understand input and generate output, making them ideal for translation or summarization. Example: T5 for text summarization.

### 📝 Assignment 2: STT/TTS Exploration

* Find **one STT model** and **one TTS model** (other than Whisper/Google).
* Write down:

  * What it does.
  * One possible application.

TTS model: ElevenLabs. Produces highly realistic, human-like speech from text; Application: Audiobook narration or personalized AI tutors.

STT model: Mozilla DeepSpeech. converts spoken language into text using a neural network; Application: Dictation software or voice controlled apps.

### 📝 Assignment 3: Build a Chatbot with Memory

* Write a Python program that:

  * Takes user input in a loop.
  * Sends it to Groq API.
  * Stores the last 5 messages in memory.
  * Ends when user types `"quit"`.

In [4]:
from groq import Groq

client = Groq(api_key="-removed-")
history = []

print("Type message ('quit' to exit):")

while True:
    user_input = input("You: ")
    if user_input.lower() == "quit":
        print("Exiting.")
        break

    history.append({"role": "user", "content": user_input})

    if len(history) > 10:
        history = history[-10:]

    response = client.chat.completions.create(
        model="llama-3.1-8b-instant",
        messages=history
    )

    reply = response.choices[0].message.content
    history.append({"role": "assistant", "content": reply})

    if len(history) > 10:
        history = history[-10:]

    print("Groq:", reply)

def showHistory():
    print("\nHistory:\n")
    for msg in history:
        role = "You" if msg["role"] == "user" else "Groq"
        print(f"{role}: {msg['content']}")

Type message ('quit' to exit):


You:  hi


Groq: Hello, how can I assist you today?


You:  who are you


Groq: I'm an artificial intelligence (AI) model called a conversational AI or chatbot. My purpose is to assist and communicate with you through text-based conversations. I'm designed to understand and respond to a wide range of questions and topics.

I don't have a personal identity, emotions, or consciousness like a human being. I'm a software program running on computer servers, trained on a massive dataset of text to learn patterns and relationships.

I can help with:

- Answering questions on various topics, such as history, science, entertainment, and more
- Generating text, like stories or summaries
- Translating languages
- Providing definitions and explanations
- Chatting in a friendly, natural-sounding way (like this conversation!)

Feel free to ask me anything, and I'll do my best to help!


You:  whats my name


Groq: Unfortunately, I don't know your name yet. This is the start of our conversation, and I don't have any prior information about you. If you'd like, you can share your name with me, and I can use it in our conversation.


You:  Rayan


Groq: Nice to meet you, Rayan. It's good to know your name. Now that I have it, I can refer to you personally in our conversation. How's your day going so far?


You:  whats my name


Groq: Your name is Rayan. You just told me!


You:  Tell the full form of GPT


Groq: The full form of GPT is Generative Pre-trained Transformer.


You:  perfect


Groq: Glad I could get it right, Rayan. If you have any more questions or topics you'd like to discuss, feel free to ask!


You:  whats Man City


Groq: Manchester City, commonly referred to as Man City, is an English professional football club based in Manchester, England. They are a prominent team in the Premier League, one of England's top-tier football leagues. The team has undergone significant transformations and has achieved numerous successes in recent years, making them one of the most successful and popular football clubs in the world.


You:  great


Groq: Glad I could provide some info, Rayan. Do you have a favorite team or player?


You:  exit


Groq: It was nice chatting with you, Rayan. Goodbye!


You:  quit


Exiting.


In [5]:
showHistory()


History:

You: Tell the full form of GPT
Groq: The full form of GPT is Generative Pre-trained Transformer.
You: perfect
Groq: Glad I could get it right, Rayan. If you have any more questions or topics you'd like to discuss, feel free to ask!
You: whats Man City
Groq: Manchester City, commonly referred to as Man City, is an English professional football club based in Manchester, England. They are a prominent team in the Premier League, one of England's top-tier football leagues. The team has undergone significant transformations and has achieved numerous successes in recent years, making them one of the most successful and popular football clubs in the world.
You: great
Groq: Glad I could provide some info, Rayan. Do you have a favorite team or player?
You: exit
Groq: It was nice chatting with you, Rayan. Goodbye!


### 📝 Assignment 4: Preprocessing Function

* Write a function to clean user input:

  * Lowercase text.
  * Remove punctuation.
  * Strip extra spaces.

Test with: `"  HELLo!!!  How ARE you?? "`


In [7]:
import string

def preprocessing(text):
    text = text.lower()
    text = text.translate(str.maketrans("", "", string.punctuation))
    text = " ".join(text.split())
    return text

text = "  HELLo!!!  How ARE you?? "
ans = preprocessing(text)
print(ans)

hello how are you


### 📝 Assignment 5: Text Preprocessing

* Write a function that:

    * Converts text to lowercase.
    * Removes punctuation & numbers.
    * Removes stopwords (`the, is, and...`).
    * Applies stemming or lemmatization.
    * Removes words shorter than 3 characters.
    * Keeps only nouns, verbs, and adjectives (using POS tagging).

In [19]:
import re
import string
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk import pos_tag
from nltk.corpus import wordnet
#nltk.download('punkt')
#nltk.download('stopwords')
#nltk.download('wordnet')
#nltk.download('averaged_perceptron_tagger_eng')

lemmatizer = WordNetLemmatizer()

def getWordnetPOS(treebank_tag):
    if treebank_tag.startswith('J'):
        return wordnet.ADJ
    elif treebank_tag.startswith('V'):
        return wordnet.VERB
    elif treebank_tag.startswith('N'):
        return wordnet.NOUN
    else:
        return None

def preprocessText(text):
    text = text.lower()
    text = re.sub(r'\d+', '', text)
    text = text.translate(str.maketrans('', '', string.punctuation))
    tokens = word_tokenize(text)
    tokens = [w for w in tokens if w not in set(stopwords.words('english')) and len(w) >= 3]
    posTokens = pos_tag(tokens)
    filteredTokens = []
    for word, tag in posTokens:
        wntag = getWordnetPOS(tag)
        if wntag:
            filteredTokens.append(lemmatizer.lemmatize(word, wntag))
    return filteredTokens

text = "Hi! My name is Rayan!"
ans = preprocessText(text)
print(ans)

['name', 'rayan']


### 📝 Assignment 6: Reflection

* Answer in 2–3 sentences:

    * Why is context memory important in chatbots?
    * Why should beginners always check **API limits and pricing**?

Context memory is important in chatbots because it lets them remember previous messages, understand ongoing conversations, and provide relevant responses. API limits and pricing shall be checked to avoid unexpected costs and rate limiting issues while experimenting with APIs.

---

### **Hints:**

1) Stemming:
    - Cuts off word endings to get the “root.”
    - Very mechanical → may produce non-real words.
    - Example:
        - "studies" → "studi"
        - "running" → "run"

2) Lemmatization:
    - Smarter → uses vocabulary + grammar rules.
    - Always gives a real word (the **lemma**).
    - Example:
        - "studies" → "study"
        - "running" → "run"

3) Part-of-Speech (POS) tagging means labeling each word in a sentence with its grammatical role — like **noun, verb, adjective, adverb, pronoun, etc.**

    - Example:
        - Sentence → *“The cat is sleeping on the mat.”*

    - POS tags →
        - The → Determiner (DT)
        - cat → Noun (NN)
        - is → Verb (VBZ)
        - sleeping → Verb (VBG)
        - on → Preposition (IN)
        - the → Determiner (DT)
        - mat → Noun (NN)

    - **In short:** POS tagging helps machines understand **how words function in a sentence**, which is useful in NLP tasks like machine translation, text classification, and question answering.


---

### ✅ Recap

This week you learned:

* **LLMs**: Types, uses, must-knows.
* **STT & TTS**: How they connect with LLMs.
* **APIs**: Connecting to LLMs with Groq.
* Built your first chatbot foundation.