# Creating a simple ChatBot with open-source LLMs using Python and Hugging Face:

## Introduction: Under the hood of a ChatBot


### Intro: How does a ChatBot work?

A chatbot is a computer program that takes a text input, and returns a corresponding text output.

Chatbots use a special kind of computer program called a transformer, which is like its brain. Inside this brain, there is something called a language model (LLM), which helps the chatbot understand and generate human-like responses. It looks at lots of examples of human conversations it has seen before to help it respond in a way that makes sense.

Transformers and LLMs work together within a chatbot to enable conversation. Here's a simplified explanation of how they interact:

    Input Processing: When you send a message to the chatbot, the transformer helps process your input. It breaks down your message into smaller parts and represents them in a way that the chatbot can understand. Each part is called a token.

    Understanding Context: The transformer passes these tokens to the LLM, which is a language model trained on lots of text data. The LLM has learned patterns and meanings from this data, so it tries to understand the context of your message based on what it has learned.

    Generating Response: Once the LLM understands your message, it generates a response based on its understanding. The transformer then takes this response and converts it into a format that can be easily sent back to you.

    Iterative Conversation: As the conversation continues, this process repeats. The transformer and LLM work together to process each new input message, understand the context, and generate a relevant response.

The key is that the LLM learns from a large amount of text data to understand language patterns and generate meaningful responses. The transformer helps with the technical aspects of processing and representing the input/output data, allowing the LLM to focus on understanding and generating language

Once the chatbot understands your message, it uses the language model to generate a response that it thinks will be helpful or interesting to you. The response is sent back to you, and the process continues as you have a back-and-forth conversation with the chatbot.

### Intro: Hugging Face

Hugging Face is an organization that focuses on natural language processing (NLP) and AI. They provide a variety of tools, resources, and services to support NLP tasks.

In [1]:
# Installing Requirements:
!pip install transformers



In [2]:
# Import required tools from the transformers library
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

## Choosing a Model:

Choosing the right model for your purposes is an important part of building chatbots! You can read on the different types of models available on the Hugging Face website: https://huggingface.co/models.

LLMs differ from each other in how they are trained. Let's gloss over some examples to see how different models fit better in various contexts.

- **Text Generation**:
    If you need a general-purpose text generation model, consider using the GPT-2 or GPT-3 models. They are known for their impressive language generation capabilities.
    Example: You want to build a chatbot that generates creative and coherent responses to user input.

- **Sentiment Analysis**:
    For sentiment analysis tasks, models like BERT or RoBERTa are popular choices. They are trained to understand the sentiment and emotional tone of text.
    Example: You want to analyze customer feedback and determine whether it is positive or negative.

- **Named Entity Recognition**:
    LLMs such as BERT, GPT-2, or RoBERTa can be used for Named Entity Recognition (NER) tasks. They perform well in understanding and extracting entities like person names, locations, organizations, etc.
    Example: You want to build a system that extracts names of people and places from a given text.

- **Question Answering**:
    Models like BERT, GPT-2, or XLNet can be effective for question answering tasks. They can comprehend questions and provide accurate answers based on the given context.
    Example: You want to build a chatbot that can answer factual questions from a given set of documents.

- **Language Translation**:
    For language translation tasks, you can consider models like MarianMT or T5. They are designed specifically for translating text between different languages.
    Example: You want to build a language translation tool that translates English text to French.

However, these examples are very limited and the fit of an LLM may depend on many factors such as data availability, performance requirements, resource constraints, and domain-specific considerations. It's important to explore different LLMs thoroughly and experiment with them to find the best match for your specific application.

Other important purposes that should be taken into consideration when choosing an LLM include (but are not limited to):
- Licensing: Ensure you are allowed to use your chosen model the way you intend
- Model size: Larger models may be more accurate, but might also come at the cost of greater resource requirements
- Training data: Ensure that the model's training data aligns with the domain or context you intend to use the LLM for
- Performance and accuracy: Consider factors like accuracy, runtime, or any other metrics that are important for your specific use case

In [3]:
model_name = "facebook/blenderbot-400M-distill"

## Fetch the Model and initialize a Tokenizer:

In [4]:
# Load model (download on first run and reference local installation for consequent runs)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

config.json: 0.00B [00:00, ?B/s]

2025-07-22 18:22:02.283462: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1753208522.552692      72 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1753208522.629552      72 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


pytorch_model.bin:   0%|          | 0.00/730M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/730M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/347 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/16.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

## Chat:

Now that we're all set up, let's start chatting!

There are several things we'll do to have an effective conversation with our chatbot.

Before interacting with our model, we need to initialize an object where we can store our conversation history.
1. Initialize object to store conversation history

Afterwards, we'll do the following for each interaction with the model:
2. Encode conversation history as a string
3. Fetch prompt from user
4. Tokenize (optimize) prompt
5. Generate output from model using prompt and history
6. Decode output
7. Update conversation history

### Keeping track of conversation history

The conversation history is important when interacting with a chatbot because the chatbot will also reference the previous conversations when generating output.

For our simple implementation in Python, we may simply use a list. Per the Hugging Face implementation, we will use this list to store the conversation history as follows:

```
conversation_history

>> [input_1, output_1, input_2, output_2, ...]
```

Let's initialize this list before any conversations occur.

In [5]:
conversation_history = []

### Encoding the Conversation History:

During each interaction, we will pass our conversation history to the model along with our input so that it may also reference the previous conversation when generating the next answer.

In [6]:
history_string = "\n".join(conversation_history)
history_string 

''

### Fetch Prompt from User:

Befor we start building a simple terminal chatbot, let's example, the input will be:

In [7]:
input_text ="Hello, how are you?"
input_text

'Hello, how are you?'

### Tokenization of User Prompt and Chat History:

In [8]:
inputs = tokenizer.encode_plus(history_string, input_text, return_tensors = "pt")
inputs

{'input_ids': tensor([[6950,   19,  544,  366,  304,   38]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1]])}

In [9]:
tokenizer.pretrained_vocab_files_map

{}

### Generate output from Model:

In [10]:
outputs = model.generate(**inputs)
outputs

tensor([[   1, 6950, 6950, 6950,   19,   19,  281,  632,  265,  265, 1710, 1710,
         1710,   21,   21,   21,  281,  632,  584,   21,   21,    2]])

### Decode Output:

In [11]:
response = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
response

'Hello Hello Hello,, I am a a hell hell hell... I am good..'

### Update Conversation History:
All we need to do here is add both the input and response to `conversation_history` in plaintext.

In [12]:
conversation_history.append(input_text)
conversation_history.append(response)
conversation_history

['Hello, how are you?',
 'Hello Hello Hello,, I am a a hell hell hell... I am good..']

# Repeat:
Now, we can put everything in a loop and run a whole conversation! (please note that it takes time to response)

In [13]:
while True:
    # Create conversation history string
    history_string = "\n".join(conversation_history)

    # Get the input data from the user
    input_text = input("> ")

    # Tokenize the input text and history
    inputs = tokenizer.encode_plus(history_string, input_text, return_tensors="pt")

    # Generate the response from the model
    outputs = model.generate(**inputs)

    # Decode the response
    response = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
    print(response)

    # Add interaction to conversation history
    conversation_history.append(input_text)
    conversation_history.append(response)

>  Who is the president of the United States?


I am not sure who the president is, but I do know that Donald Trump is the current president.


>  What is the capital of France?


The capital is Paris, France. It is the most populous city in France.


>  What do you think about artificial intelligence?


I don't know much about it, but it sounds interesting. Do you have any hobbies?


>  Do you have a favorite movie?


I love movies. My favorite movie of all time is The Godfather. What is yours?


>  Do you like music?


Token indices sequence length is longer than the specified maximum sequence length for this model (154 > 128). Running this sequence through the model will result in indexing errors


IndexError: index out of range in self