# **Assignment 1: LLM Understanding**
**Q. Write a short note (3–4 sentences) explaining the difference between encoder-only, decoder-only, and encoder-decoder LLMs.
Give one example usage of each.**

- **Encoder-only models** (e.g: BERT) focus on understanding input text by creating contextual embeddings. They are mainly used for classification, sentiment analysis, and information retrieval.

- **Decoder-only models** (e.g: GPT) generate text by predicting the next word in a sequence. They are used for chatbots, story writing, and code generation.

- **Encoder–decoder models** (e.g: T5, BART) both understand and generate text, making them useful for tasks like machine translation, summarization, and question answering.

# **Assignment 2: STT/TTS Exploration**
**Q. Find one STT model and one TTS model (other than Whisper/Google) and write down:**

**What it does and One possible application.**

**Speech-to-Text (STT) model:**

- Model: Wav2Vec 2.0 (by Meta, open sourced on Hugging Face)

- What it does: Learns speech representations directly from raw audio and converts spoken language into text.

- Application: Real time transcription for customer service calls.

**Text-to-Speech (TTS) model:**

- Model: FastSpeech 2 (by Microsoft, available on Hugging Face)

- What it does: Generates high quality, natural speech from text with faster inference speed than older models.

- Application: Audiobook generation with different voices and emotions.

# **Assignment 3: Build a Chatbot with Memory**

**Q. Write a Python program that: Takes user input in a loop, Sends it to Groq API, Stores the last 5 messages in memory, Ends when user types "quit".**

In [2]:
!pip install groq

Collecting groq
  Downloading groq-0.31.0-py3-none-any.whl.metadata (16 kB)
Downloading groq-0.31.0-py3-none-any.whl (131 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m131.4/131.4 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: groq
Successfully installed groq-0.31.0


In [6]:
from groq import Groq
from google.colab import userdata
api_key = userdata.get('GROQ_API_KEY')

In [28]:
class GroqChatClient():
  def __init__(self, model_id = "llama-3.3-70b-versatile", system_message = None, api_key = None):
    if api_key:
      self.client = Groq(api_key = api_key)
    else:
      self.client = Groq()

    # default model i.e. already specified will be used
    self.model_id = model_id

    # It stores the entire conversation history, including both user prompts and model responses, which allows the model to maintain context across multiple turns.
    self.messages = []

    # system message defines an LLM's persona
    # If one is provided, it's added to the beginning of the self.messages
    if system_message:
      self.messages.append({"role":"system","content": system_message})

  # function to format user prompt to a dictionary
  def format_prompt(self, prompt, role = "user"):
    return({"role": role, "content": prompt})

  def send_request(self, message, max_tokens = 1024, temperature = 0.5, stream = False, stop = None):
    # The function first appends the new message (which is in the dictionary format created by send_prompt) to the self.messages list.
    # This ensures the entire conversation history is sent with each request, allowing the model to recall past context.
    self.messages.append(message)

    # Limit the conversation history to last 5 messages. If system message is included it will be 6.
    history_limit = 5

    if system_message:
      self.messages = self.messages[:1] + self.messages[1:][-history_limit:]
    else:
      self.messages = self.messages[-history_limit:]

    # This line makes the actual call to the Groq API.
    chat_completion = self.client.chat.completions.create(
        messages = self.messages, # The entire list of conversation history.
        model = self.model_id,
        max_tokens = max_tokens,
        temperature = temperature, # Controls the randomness of the output. closer to 0 -> more focused output, closer to 1 ->  more creative.
        stream = stream,
        stop = stop,
    )
    if not stream:
      response = {
          "content": chat_completion.choices[0].message.content,
          "finish_reason": chat_completion.choices[0].finish_reason,
          "role": chat_completion.choices[0].message.role,

          "prompt_tokens": chat_completion.usage.prompt_tokens,
          "prompt_time": chat_completion.usage.prompt_time,

          "completion_tokens": chat_completion.usage.completion_tokens,
          "completion_time": chat_completion.usage.completion_time,

          "total_tokens": chat_completion.usage.total_tokens,
          "total_time": chat_completion.usage.total_time,
      }
      self.messages.append(self.format_prompt(prompt = response["content"], role = response["role"]))
      return response
    return chat_completion

In [31]:
if __name__ == "__main__":
  system_message = """
  You are crime fiction books reviewer. Use simple words. Do not answer irrelevant questions. Be concise. The flow of the response must be engaging.
  """.strip().replace("\n", " ")

  client = GroqChatClient(system_message = system_message, api_key = api_key)
  stream_response = True #  response will be received in small chunks rather than a single, complete block of text.

  while True:
    user_input = input("Enter you message(or type 'quit'): ")
    if user_input.lower() == 'quit':
      break

    response = client.send_request(
        client.format_prompt(prompt = user_input),
        stream = stream_response)

    message = ''
    for chunk in response:
      content_chunk = chunk.choices[0].delta.content
      if content_chunk:
        print(content_chunk, end = "")
        message += content_chunk

    client.messages.append(client.format_prompt(message, 'assistant'))

Enter you message(or type 'quit'): what is the radius of earth?
That's not about crime fiction. I review crime books, not science facts. Want to talk about a crime novel instead?Enter you message(or type 'quit'): what do you think about the novel point of origin written by patricia cornwell?
Now that's a great topic. "Point of Origin" by Patricia Cornwell is a thrilling novel. It's part of her Kay Scarpetta series. The story is intense, with a complex plot and intriguing characters. Cornwell's writing is detailed and realistic, making it feel like you're part of the investigation. I think it's a must-read for fans of crime fiction. Have you read it?Enter you message(or type 'quit'): yes i have and i want to know you thoughts on ihe plot
The plot of "Point of Origin" is engaging and suspenseful. It revolves around a series of arson attacks and murders in Virginia. Kay Scarpetta, the protagonist, is tasked with investigating these crimes. As the story unfolds, it becomes clear that the c

# **Assignment 4: Preprocessing Function**

**Q. Write a function to clean user input: Lowercase text, Remove punctuation, Strip extra spaces.
Test with: "  HELLo!!!  How ARE you??**

In [32]:
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
from nltk.corpus import wordnet

In [33]:
nltk.download('stopwords')
nltk.download('punkt_tab')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger_eng')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.


True

In [34]:
stopwords = set(stopwords.words('english'))

In [35]:
def preprocess_text(text: str) -> str:
  text = text.lower()
  text = re.sub(r'[^\w\s]', '', text)
  text = text.strip()
  return text

In [36]:
preprocess_text("  HELLo!!!  How ARE you?? ")

'hello  how are you'

# **Assignment 5: Text Preprocessing:**
**Write a function that:**
- **Converts text to lowercase.**
- **Removes punctuation & numbers.**
- **Removes stopwords (the, is, and...).**
- **Applies stemming or lemmatization.**
- **Removes words shorter than 3 characters.**
- **Keeps only nouns, verbs, and adjectives (using POS tagging).**

In [None]:
# NLTK's pos_tag function returns tags in the Penn Treebank format whereas WordNetLemmatizer requires a simpler POS tag (e.g: 'v' for verb)
# This will convert the Treebank tag to the WordNet format.
def get_wordnet_pos(tag):
  if tag.startswith('J'):
    return wordnet.ADJ
  elif tag.startswith('V'):
    return wordnet.VERB
  elif tag.startswith('N'):
    return wordnet.NOUN
  else:
    return None

In [40]:
def preprocess_text_1(text:str) -> str:
  # convert text to lowercase
  text = text.lower()

  # remove punctuation and numbers
  text = re.sub(r'[^a-zA-Z\s]', '', text)

  # tokenize text
  tokens = word_tokenize(text)

  # remove stopwords and words shorter than 3 characters
  filtered_tokens = [word for word in tokens if word.lower() not in stopwords and len(word) >= 3]

   # Perform POS tagging
  pos_tagged_sentence = nltk.pos_tag(filtered_tokens)

  # Perform lemmatization and only keep noun, adjectives and verb
  lemmatizer = WordNetLemmatizer()
  lemmatized_words = []

  for word, tag in pos_tagged_sentence:
    wn_tag = get_wordnet_pos(tag)

    if wn_tag in (wordnet.NOUN, wordnet.ADJ, wordnet.VERB):
      lemmatized_words.append(lemmatizer.lemmatize(word, wn_tag))

  return lemmatized_words

In [41]:
preprocess_text_1("The aurora borealis, or northern lights, are colorful light displays in the sky caused by charged particles from the sun colliding with atmospheric gases in Earth's polar regions.")

['aurora',
 'borealis',
 'northern',
 'light',
 'colorful',
 'light',
 'display',
 'sky',
 'cause',
 'charge',
 'particle',
 'collide',
 'atmospheric',
 'gas',
 'earth',
 'polar',
 'region']

# **Assignment 6: Reflection**

**Q. Why is context memory important in chatbots?**
- Context memory allows chatbots to remember past messages, helping them answer follow up questions and hold natural conversations. Without it, a chatbot cannot understand context and would treat every new message as a separate query.

**Q. Why should beginners always check API limits and pricing?**
- Beginners should always check API limits and pricing to avoid unexpected charges. It also helps them ensure their application doesn't fail due to hitting usage limits.


### **Extra:**

In [37]:
wordnet.ADJ

'a'

In [16]:
num = ['1','2','3','4','5','6','7']
num[-5:]

['3', '4', '5', '6', '7']

In [24]:
print("num[:1] :",num[:1],"--------", "num[1:] :",num[1:])

num[:1] : ['1'] -------- num[1:] : ['2', '3', '4', '5', '6', '7']


In [25]:
num[:1] + num[1:][-5:]

['1', '3', '4', '5', '6', '7']