# **Textual Data Analysis - Exercise - 11**


---


## **Name: Ayesha Zafar**
## **Date: 26/02/2025**


---

### Building a basic chatbot

In this exercise, you will build a minimal chatbot using the transformers library and a pretrained model.

First, if you haven't done so already, study the Using LLMs with transformers notebook. That should give you all the necessary prerequisites for completing this exercise.

Using what you learned from that notebook, write a python script that takes input from a user, passes that to an LLM, and prints out the LLM response, repeating until the user requests to exit while keeping track of the message history. You can use this template as a starting point

[YOUR CODE HERE: load model]

messages = []
while True:
    user_input = input('Say something ("exit" to quit): ')
    if user_input == 'exit':
        break
    messages.append({'role': 'user', 'content': user_input})
    [YOUR CODE HERE:  call model, print its output, and add it to messages]
(You may want to tweak generation parameters and add additional features like having the text "reset" reset the message queue to make testing easier.)

You can use e.g. one of the following models:

HuggingFaceTB/SmolLM2-135M-Instruct
HuggingFaceTB/SmolLM2-360M-Instruct
HuggingFaceTB/SmolLM2-1.7B-Instruct
(For best results, use the largest model with GPU acceleration. If you're running on a GPU with more memory, you can of course use even larger models, just pick an appropriate one from https://huggingface.co/models.)

Test your chatbot with the following and provide logs of your discussions:

At least 5 questions about basic facts about the world (e.g. the capital of a country)
At least 5 arithmetic questions ranging from trivial ("what is 1+1?") to more complex
Inform the system of a secret word (e.g. "zebra"), then after a few other questions ask it what the secret word is. Make sure you understand where the memory of that secret word is.
In very brief, what did you think about the capabilities of the chatbot you created? Of the factors impacting LLM performance that we discussed on the lecture, which one (or ones) do you think are primarily limiting its performance?

---



Step 1. Importing required libraries

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

Step 2. Setting the required model

In [3]:
model_name = "HuggingFaceTB/SmolLM2-1.7B-Instruct"

Step 3. Loading the model and tokenizer

In [4]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    device_map="auto"
)
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/3.76k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/801k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.10M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/655 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/792 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.42G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(49152, 2048, padding_idx=2)
    (layers): ModuleList(
      (0-23): 24 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (v_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=2048, out_features=8192, bias=False)
          (up_proj): Linear(in_features=2048, out_features=8192, bias=False)
          (down_proj): Linear(in_features=8192, out_features=2048, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((2048,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((2048,), eps=1e-05)
      )
    )
    (norm): LlamaRMSNorm((2048,), eps=1e-05)
 

Step 4. Implementation of chatbot

In [5]:
chat_history = []

def chatbot():
    global chat_history
    print("\n🤖 **Chatbot Ready!** Type 'reset' to clear history or 'exit' to quit.\n")

    while True:
        user_input = input("👤 **You:** ").strip()
        if user_input.lower() == "exit":
            print("\n👋 **Goodbye! Have a great day!**\n")
            break
        elif user_input.lower() == "reset":
            chat_history = []
            print("\n🔄 **Chat history cleared!**\n")
            continue

        chat_history.append({"role": "user", "content": user_input})
        if len(chat_history) > 10:
            chat_history = chat_history[-10:]

        prompt = tokenizer.apply_chat_template(chat_history, tokenize=False, add_generation_prompt=True)
        inputs = tokenizer(prompt, return_tensors="pt").to(device)

        output_ids = model.generate(**inputs, max_new_tokens=150)
        response = tokenizer.decode(output_ids[0], skip_special_tokens=True)

        bot_response = response.split("Assistant:")[-1].strip()
        bot_response = bot_response.replace("You are a helpful AI assistant.", "").strip()

        for marker in ["user\n", "assistant\n"]:
            if marker in bot_response:
                bot_response = bot_response.split(marker)[-1].strip()

        chat_history.append({"role": "assistant", "content": bot_response})

        print(f"🤖 **Assistant:** {bot_response}\n")

Step 5. Calling the chatbot

In [7]:
if __name__ == "__main__":
    chatbot()


🤖 **Chatbot Ready!** Type 'reset' to clear history or 'exit' to quit.

👤 **You:** Who made spider-man?
🤖 **Assistant:** Spider-Man was created by writer-artist Stan Lee and artist Steve Ditko. The first appearance of Spider-Man was in the comic book "Amazing Fantasy" #15, which was published in August 1962. The character was first introduced in the United States, but he has since become a global icon.

👤 **You:** What is the capital of Brazil?
🤖 **Assistant:** The capital of Brazil is Brasília. It was officially inaugurated on April 22, 1960, after being moved from Rio de Janeiro. Brasília is located in the country's central-western region and serves as the country's political and administrative capital.

👤 **You:** What color is the sky?
🤖 **Assistant:** The sky is typically blue. This is due to a phenomenon called Rayleigh scattering, where the shorter blue wavelengths of light are scattered more than the longer red wavelengths by the tiny molecules of gases in the Earth's atmospher

## Evaluation of Chatbot
The chatbot performed well in answering general knowledge and arithmetic questions, providing mostly accurate and well structured responses. It also successfully remembered the secret word, demonstrating short term memory retention across multiple interactions.

### Limitations:

- Lack of Deep Reasoning - While it correctly answered factual and math based questions, it lacks deeper reasoning and contextual understanding.

- Limited Long-Term Memory - The chatbot can remember information within a session, but once reset or restarted, it will forget previous interactions.

- Occasional Redundant Responses - The chatbot initially hesitated in remembering the secret word. I had to specifically tell it to "Remember it" otherwise it didn't understand the statement.

### Factors Limiting Performance

- Context Window Size - The chatbot only retains a limited number of messages (last 10 exchanges), affecting long term recall.

- Model Size & Training Data - While SmolLM2-1.7B is a strong model, larger models (like GPT-4) have better reasoning, factual accuracy, and memory capabilities.

- Token Limit in Responses - The chatbot generates responses based on token limits, which can sometimes truncate or simplify complex answers.