<a href="https://colab.research.google.com/github/Tolulade-A/opensource-llm-chatbot-blenderbot/blob/main/Chatbot_with_Open_Source_LLM_%26_Hugging_Face.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Overview

This project aims to build a chatbot platform using Large Language Models. Tools used here include;



*   Hugging face model hub (to see models)
*   Python
*   Transformers (the brain box for our LLM)
*   Blenderbot by FacebookAI -LLM



LLM Use cases;

This helps in chosing the right large model for our application.


1.   Text generation
2.   Language translation
3.   Question & Answering
4.   Sentimental analysis
5.   Named entity recognition

Additional factors to consider while selecting a language model for your projects include;

1.  Licencing (Opensource & commercially available etc)
2.  Type of training dat
3.  Model size
4.  Performance & accuracy



**Transformers** and LLMs work together within an application to enable conversation.

What does a transformer really do?

1. **Input processing**: When a message is received in an app, say chatbot here, the input data is processed, broken down into tokens (smaller bits).
2. **Context understanding**: Transformers helps send the tokens to the LLM since the LLM has been trained on a lot of data, it tries to understand your context.
3. **Response output**: If the LLM understands the message (tokens), it generates a response based on its understanding.Transformer will send you a response from the LLM that you can read or understand.
4. Repeat.



**Step 1:** Install requirements

In [2]:
!pip install transformers


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


**Step 2:** Import dependencies from the transformer library

In [3]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

  from .autonotebook import tqdm as notebook_tqdm


**Step 3:** Chose the model you want

Hugging face has a hub for this;

https://huggingface.co/models

Let's use "facebook/blenderbot-400M-distill". It's opensource and fast




In [4]:
model_name = "facebook/blenderbot-400M-distill"

**Step 4:** Fetch & Initialise the model tokeniser

If we run this script for the first time, the host machine will download the model from Hugging Face API. But after running the code once, the script will not re-download the model and will instead reference the local installation.

Two terms here; **model** and **tokenizer**.

In this script, we initiate variables using two handy classes from the transformers library:

`model` is an instance of the class `AutoModelForSeq2SeqLM`, which allows us to interact with our chosen language model.

`tokenizer` is an instance of the class `AutoTokenizer`, which optimizes our input and passes it to the language model efficiently. It does so by converting our text input to "tokens", which is how the model interprets the text.

In [5]:
# Load model (download on first run and reference local installation for consequent runs)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

**Step 5:** Chabot Process

To start chatting;

Let's do these to have an effective conversation with our chatbot.

Before interacting with our model, we need to **initialize an object** where we can **store** our conversation **history**.

Thereafter, we'll do the following for each interaction with the model:

1. Encode conversation history as a string
2. Fetch prompt from user
3. Tokenise (optimise) prompt
4. Generate output from model using prompt and history
5. Decode output
6. Update conversation history



**Step 5a:** Keeping track of conversation history

The conversation history is important when interacting with a chatbot because the chatbot will also reference the previous conversations when generating output.

For our simple implementation in Python, we may simply use a **list**. Per the Hugging Face implementation, we will use this list to store the conversation history as follows:

`conversation_history`

`>> [input_1, output_1, input_2, output_2, ...]`

In [6]:
conversation_history = []

**Step 5b:** Encoding the conversation history

During each interaction, we **will pass our conversation history to the model** along **with our input so that it may also reference the previous conversation** when *generating the next answer*.

The transformers library function we are using expects to receive the conversation history as a string, with each element separated by the newline character '\n'. Thus, we create such a string.

We'll use the join() method in Python to do exactly that. (Initially, our history_string will be an empty string, which is okay, and will grow as the conversation goes on)

In [7]:
history_string = "\n".join(conversation_history)
history_string

''

**Step 5c:** Fetch prompt from user

Let's run an example

In [8]:
#input_text = "hello, how are you doing ?"
input_text = "I had a piece of chocolate cake for lunch, so it was not too bad. Do you like cake ?"

**Step 5d:** Tokenisation of User Prompt and Chat History

Tokens in NLP are individual units or elements that text or sentences are divided into. Tokenisation or vectorisation is the process of converting tokens into numerical representations.

In NLP tasks, we often use the `encode_plus` method from the tokeniser object to perform tokenisation and vectorisation. Let's encode our inputs (prompt & chat history) as tokens so that we may pass them to the model.

In [9]:
inputs = tokenizer.encode_plus(history_string, input_text, return_tensors="pt")
inputs

{'input_ids': tensor([[ 281,  562,  265, 2725,  306, 7764, 6141,  335, 5344,   19,  394,  312,
          372,  368,  618,  810,   21,  946,  304,  398, 6141, 2453]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

In doing so, we've now created a **Python dictionary** which contains special keywords that allow the model to properly reference its contents. To learn more about tokens and their associated pretrained vocabulary files, you can explore the pretrained_vocab_files_map attribute. This attribute provides a mapping of pretrained models to their corresponding vocabulary files.

In [10]:
#tokenizer.pretrained_vocab_files_map

**Step 5e:** Generate output from model

Now that we have our inputs ready, both past and present inputs, we can pass them to the model and generate a response. According to the documentation, we can use the generate() function and pass the inputs as keyword arguments (kwargs).

In [11]:
outputs = model.generate(**inputs)
outputs


tensor([[   1,  281,  361,  398, 6141,   19,  373,  281,  476,  368,  265,  893,
         1599,  306, 7764,   21,  714,  906,  306, 6141,  372,  312,   38,    2]])

**Step 5f:** Decode output

We may decode the output using `tokenizer.decode()`. This is know as "detokenization" or "reconstruction". It is the process of combining or merging individual tokens back into their original form, typically to reconstruct the original text or sentence

In [12]:
response = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
response

"I do like cake, but I'm not a big fan of chocolate. What kind of cake was it?"

Alright! We've successfully had an interaction with our chatbot! We've given it a prompt, and we received its response.

Now, all that's left to do is to update our conversation history, so that we may pass it with the next iteration.

**Step 5g**: Update Conversation History

All we need to do here is add both the input and response to `conversation_history` in plaintext.


In [13]:
conversation_history.append(input_text)
conversation_history.append(response)
conversation_history

['I had a piece of chocolate cake for lunch, so it was not too bad. Do you like cake ?',
 "I do like cake, but I'm not a big fan of chocolate. What kind of cake was it?"]

**Step 6**: Repeat

We have gone through all the steps of interacting with your chatbot. Now, we can put everything in a loop and run a whole conversation!

Please stop loop! (simply for demo)


In [None]:
while True:
    # Create conversation history string
    history_string = "\n".join(conversation_history)

    # Get the input data from the user
    input_text = input("> ")

    # Tokenize the input text and history
    inputs = tokenizer.encode_plus(history_string, input_text, return_tensors="pt")

    # Generate the response from the model
    outputs = model.generate(**inputs)

    # Decode the response
    response = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()

    # Add interaction to conversation history
    conversation_history.append(input_text)
    conversation_history.append(response)

## **Deploying Chatbot to Backend Server**

## **Flask APP on Colab**

Let's expose the flask app to the internet



In [None]:
!pip install flask-ngrok
#when using flask on colab, not needed on local IDE



**Setup and Installation of Ngrok**

In [None]:
# install ngrok linux version using the following command or you can get the
# latest version from its official website- https://dashboard.ngrok.com/get-started/setup

!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.tgz

--2023-07-08 22:42:06--  https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.tgz
Resolving bin.equinox.io (bin.equinox.io)... 54.237.133.81, 18.205.222.128, 52.202.168.65, ...
Connecting to bin.equinox.io (bin.equinox.io)|54.237.133.81|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13856790 (13M) [application/octet-stream]
Saving to: ‘ngrok-stable-linux-amd64.tgz.1’


2023-07-08 22:42:06 (176 MB/s) - ‘ngrok-stable-linux-amd64.tgz.1’ saved [13856790/13856790]



In [None]:
# extract the downloaded file using the following command

!tar -xvf /content/ngrok-stable-linux-amd64.tgz

ngrok


**The next step is to get your AuthToken from ngrok using this link-** https://dashboard.ngrok.com/get-started/your-authtoken

In [None]:
# paste your AuthToken here and execute this command

!./ngrok authtoken


Authtoken saved to configuration file: /root/.ngrok2/ngrok.yml


**Chatbot APP**

In [None]:
from flask import Flask, request, render_template
from flask_cors import CORS
import json
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

from flask_ngrok import run_with_ngrok #for colab use only, remove in local IDE



app = Flask(__name__)
run_with_ngrok(app) #for colab use only, remove in local IDE
CORS(app)


model_name = "facebook/blenderbot-400M-distill"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
conversation_history = []

@app.route('/chatbot', methods=['POST'])
def handle_prompt():
    #read prompt from https request body
    data = request.get_data(as_text = True)
    data = json.loads(data)

    input_text = data['prompt'] # Get the input data from the user

    # Create conversation history string
    history_string = "\n".join(conversation_history)

    # Get the input data from the user #when testing prototype
    #input_text = input("> ")

    # Tokenize the input text and history
    inputs = tokenizer.encode_plus(history_string, input_text, return_tensors="pt")

    # Generate the response from the model
    outputs = model.generate(**inputs, max_length=60)

    # Decode the response
    response = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()

    # Add interaction to conversation history
    conversation_history.append(input_text)
    conversation_history.append(response)

    return response


if __name__ == '__main__':
    app.run()

 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://127.0.0.1:5000
INFO:werkzeug:[33mPress CTRL+C to quit[0m


 * Running on http://41c4-35-221-27-36.ngrok-free.app
 * Traffic stats available on http://127.0.0.1:4040


INFO:werkzeug:127.0.0.1 - - [09/Jul/2023 00:58:14] "[33mGET / HTTP/1.1[0m" 404 -
INFO:werkzeug:127.0.0.1 - - [09/Jul/2023 00:58:19] "[33mGET /favicon.ico HTTP/1.1[0m" 404 -
INFO:werkzeug:127.0.0.1 - - [09/Jul/2023 00:58:23] "[31m[1mGET /chatbot HTTP/1.1[0m" 405 -
INFO:werkzeug:127.0.0.1 - - [09/Jul/2023 00:58:23] "[31m[1mGET /chatbot HTTP/1.1[0m" 405 -
INFO:werkzeug:127.0.0.1 - - [09/Jul/2023 00:58:24] "[33mGET /favicon.ico HTTP/1.1[0m" 404 -
INFO:werkzeug:127.0.0.1 - - [09/Jul/2023 00:58:39] "POST /chatbot HTTP/1.1" 200 -
