##18. Install & Import Necessary Libraries for Inference
* The trained model is now available for running the chatbot. This section of the code sets up the necessary environment and importing the required libraries.
* The Gradio library is used to create simple and interactive web interface with the chatbot. The package is installed and imported in this section.
* The other libraries are imported to support the trained model execution, with purpose for each library commented in the codebase.


In [1]:
# Install Gradio, For User-Friendly Web Interface to Interact with the chatbot
!pip install Gradio
# SentencePiece for tokenizing inputs to T5 model
!pip install sentencepiece

# Import necessary libraries
import os             # For loading the trained model
import re             # For cleaning input text
import torch          # Tensor computation and Model handling
import gradio as gr   # Gradio for building web interface

# Import T5 model and tokenizer from Hugging face library
from transformers import T5ForConditionalGeneration, T5Tokenizer

Collecting Gradio
  Downloading gradio-5.3.0-py3-none-any.whl.metadata (15 kB)
Collecting aiofiles<24.0,>=22.0 (from Gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting fastapi<1.0,>=0.115.2 (from Gradio)
  Downloading fastapi-0.115.2-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from Gradio)
  Downloading ffmpy-0.4.0-py3-none-any.whl.metadata (2.9 kB)
Collecting gradio-client==1.4.2 (from Gradio)
  Downloading gradio_client-1.4.2-py3-none-any.whl.metadata (7.1 kB)
Collecting httpx>=0.24.1 (from Gradio)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting huggingface-hub>=0.25.1 (from Gradio)
  Downloading huggingface_hub-0.26.1-py3-none-any.whl.metadata (13 kB)
Collecting markupsafe~=2.0 (from Gradio)
  Downloading MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Collecting orjson~=3.0 (from Gradio)
  Downloading orjson-3.10.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.wh

##19. Storage Access for Trained Model (Optional)
* This code was developed predominantly on Google Colab environment. The trained model was stored at Google Drive. This section of the code provides the code Google Dive access with relevant user credentials to load the model trained in above sections.
* **Now it is optional to have Google Drive Access.** The trained models are uploaded to the hugging face repository at the path 'mniazm/t5cornel150k' from where the model directly downloads and uses it.


In [None]:
# Mount google drive for accessing Complaints dataset
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive/Data/

Mounted at /content/drive
/content/drive/MyDrive/Data


##20. Chatbot Response Generator with History Depth Manager
* This section of the code provides the response generator of the multi-turn conversational chatbot using the conversation history managed with controlled depth of exhanges and tokens as defined by the text or gui interface.
* The generate_response_with_history function generates response from concatenated avaialble history text input. The response is decoded into human readable text without any special tokens. The nature of response is controlled by the following parameters:
  * temperature: Controls the randomness of the model's response. Lower values such as 0.7 makes the response factual, while higher values increase randomness in output.
  * num_beams: Number of explored beams for potential response. Higher the beams improves the response diversity.
  * top_k: Number of top likely tokens to be sampled.
  * top_p: Cumulative probability for choosing tokens based on Nucleus sampling.
  * rep_penality: Penality for repeating token.
* The truncate_conversation_history function ensures that the conversation stays within specified token lenght limit. If the conversation history contains more tokens oldest conversation is popped out.


In [2]:
# Generate response from trained model, considering conversation history
def generate_response_with_history(conversation_history, max_length,
                                   temperature, num_beams, top_k, top_p,
                                   rep_penalty):

    # Concatenate the dialogue exchanges without speaker tag
    input_text = " ".join([text.split(": ")[1] \
                           for text in conversation_history])

    # Tokenize the concatenated input text
    input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)

    # Generate a response without gradient calculation
    with torch.no_grad():
        # Model response guided by the parameters
        output_ids = model.generate(
            input_ids,
            max_length=max_length,
            # Beam search only if sampling is off
            num_beams=num_beams,
            early_stopping=True,
            # Controls randomness
            temperature=temperature,
            # Limits pool to top k tokens
            top_k=top_k,
            # Nucleus sampling, cumulative probability
            top_p=top_p,
            # Penalty for repeated phrases
            repetition_penalty=rep_penalty,
            # Enable sampling only if temperature is altered
            do_sample=(temperature != 1.0)
        )

    # Decode output tokens to return human readable string without special token
    response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    return response

# Truncate conversation history if it exceeds thershold
def truncate_conversation_history(conversation_history, \
                                  tokenizer, max_token_length):
    # Tokenize the conversation history in excluding speaker tags
    tokenized_history = tokenizer.encode(" ".join([text.split(": ")[1]
                    for text in conversation_history]), return_tensors="pt")

    # If tokenized history exceeds max_token_length, remove older conversation
    while tokenized_history.shape[1] > max_token_length and \
                                          len(conversation_history) > 1:
        conversation_history.pop(0)  # Remove the oldest conversation
        tokenized_history = tokenizer.encode(" ".join([text.split(": ")[1]
                    for text in conversation_history]), return_tensors="pt")

    return conversation_history

##21. Text Interface to Chatbot
* This section of the code provides text based interface with the chatbot using the terminal.
* The pre-trained model and tokenizer are loaded from the directory. The text interface captures the plain text input, updates the conversation history, limits the conversation history depth as per the defined parameter set, which allows multi turn contextual conversation.
* The conversation loop collects the response as defined by the parameter set and prints the response to the user, and adds it to the conversation history.


In [3]:
# Text Interface to Chatbot

# Optional: Goodgle Drive Path to directory where the trained model is saved
# model_path = './flan_t5B_cornell_150k'

# Recommended: Download the trained models stored at Hugging Face Repository
model_path = 'mniazm/t5cornel150k'

# Load the tokenizer from the directory
tokenizer = T5Tokenizer.from_pretrained(model_path)
# Load the trained model from the directory
model = T5ForConditionalGeneration.from_pretrained(model_path)

# Check if a GPU is available and use it; otherwise, fallback to CPU
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("Using GPU:", torch.cuda.get_device_name(device))
else:
    device = torch.device("cpu")
    print("Using CPU")
print(f"Using device: {device}")

# Move the model to the GPU or CPU device
model = model.to(device)

# Parameter definitions for Response Control
params = {
    # Maximum number of exchanges to retain in the conversation history
    'max_hist_conv': 10,
    # Maximum token length for conversation history
    'max_hist_token': 512,
    # Maximum token length for responses
    'max_resp_token': 50,
    # Controls response randomness: Higher the value more the randomness
    'temperature': 0.95,
    # Beam search size : Diversity control
    'num_beams': 5,
    # Top-k sampling: Number of most likely tokens
    'top_k': 50,
    # Nucleus sampling: Cumulative Probability
    'top_p': 0.85,
    # Penalty for repeated phrases: Discourages repetition
    'rep_penalty': 5.0
}

# Chat loop with conversation history
print("I'm a FLAN-T5 model! I will talk to you till you say 'bye' :)")
conversation_history = []  # Initialize empty conversation history list

# Main conversation loop
while True:
    raw_input = input("You: ")
    # Sanitize user input by removing special characters
    user_input = re.sub(r'[^\w\s!?.,]', '', raw_input)

    # Check if user wants to end the conversation
    if user_input.lower() == 'bye':
        print("Ending conversation. Goodbye!")
        break  # Exit the loop to end conversation

    # Add the user's input to the conversation history with 'You:' tag
    conversation_history.append(f"You: {user_input}")

    # Generate the chatbot response using updated history
    response = generate_response_with_history(conversation_history,
                        params['max_resp_token'], params['temperature'],
                        params['num_beams'], params['top_k'],
                        params['top_p'], params['rep_penalty'])
    print(f"Chatbot: {response}")

    # Add the model's response to the conversation history
    conversation_history.append(f"Me: {response}")

    # Keep only the last max_hist_conv exchanges
    if len(conversation_history) > params['max_hist_conv']:
        conversation_history = conversation_history[-params['max_hist_conv']:]

    # Truncate the conversation history if token length exceeds max_hist_token
    conversation_history = truncate_conversation_history(conversation_history,
                            tokenizer, params['max_hist_token'])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/20.8k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/2.59k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.56k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/142 [00:00<?, ?B/s]

Using GPU: Tesla T4
Using device: cuda
I'm a FLAN-T5 model! I will talk to you till you say 'bye' :)
You: What a beautiful day
Chatbot: It was a beautiful day!
You: Where did you go?
Chatbot: san diego
You: Did you meet anyone there?
Chatbot: no i did not meet anyone there
You: Is seattle close by?
Chatbot: yes
You: How far?
Chatbot: 5 miles
You: bye
Ending conversation. Goodbye!


##22. Web GUI Interface to Chatbot
* This section of the code provides Web GUI based interface using Gradio to interact with the chatbot.
* The pre-trained model and tokenizer are loaded from the directory. The text interface captures the plain text input, updates the conversation history, limits the conversation history depth as per the defined parameter set, which allows multi turn contextual conversation.
* The conversation loop collects the response as defined by the parameter set and prints the response to the user, and adds it to the conversation history.

In [5]:
# Web-GUI Interface to Chatbot

# Optional: Goodgle Drive Path to directory where the trained model is saved
# model_path = './flan_t5B_cornell_150k'

# Recommended: Download the trained models stored at Hugging Face Repository
model_path = 'mniazm/t5cornel150k'

# Load the tokenizer from the directory
tokenizer = T5Tokenizer.from_pretrained(model_path)
# Load the trained model from the directory
model = T5ForConditionalGeneration.from_pretrained(model_path)

# Check if a GPU is available and use it; otherwise, fallback to CPU
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("Using GPU:", torch.cuda.get_device_name(device))
else:
    device = torch.device("cpu")
    print("Using CPU")
print(f"Using device: {device}")

# Move the model to the GPU or CPU device
model = model.to(device)

# Parameter definitions for Response Control
params = {
    # Maximum number of exchanges to retain in the conversation history
    'max_hist_conv': 10,
    # Maximum token length for conversation history
    'max_hist_token': 512,
    # Maximum token length for responses
    'max_resp_token': 50,
    # Controls response randomness: Higher the value more the randomness
    'temperature': 0.95,
    # Beam search size : Diversity control
    'num_beams': 5,
    # Top-k sampling: Number of most likely tokens
    'top_k': 50,
    # Nucleus sampling: Cumulative Probability
    'top_p': 0.85,
    # Penalty for repeated phrases: Discourages repetition
    'rep_penalty': 5.0
}

# Chat loop with conversation history
# print("I'm a FLAN-T5 model! I will talk to you till you say 'bye' :)")
conversation_history = []  # Initialize empty conversation history list

# Chatbot response function to handle user Input and generate model responses
def chatbot(user_input):
    global conversation_history # Access global conversation history variable

    # Check if user Terminates conversation with 'bye'
    if user_input.strip().lower() == "bye":
        # Close the Gradio interface after responding
        iface.close()  # Close the interface
        return "Goodbye! It was nice talking to you." #Farewell Message

    # Add user input to conversation history with prefix "You:"
    conversation_history.append(f"You: {user_input}")

    # Generate the chatbot response using updated history
    response = generate_response_with_history(conversation_history,
                        params['max_resp_token'], params['temperature'],
                        params['num_beams'], params['top_k'],
                        params['top_p'], params['rep_penalty'])

    # Add model response to history with prefix "Me:"
    conversation_history.append(f"Me: {response}")

    # Limit conversation history to "max_hist_conv" exchanges
    if len(conversation_history) > params['max_hist_conv']:
        conversation_history = conversation_history[-params['max_hist_conv']:]

    # Truncate conversation history if token length exceeds 'max_hist_token'
    conversation_history = truncate_conversation_history(conversation_history, \
                                tokenizer, params['max_hist_token'])
    # Return chatbot's response to be displayed in the interface
    return response

# Gradio Interface for user interaction with the chatbot
iface = gr.Interface(
    fn=chatbot,         # Function call when the user submits input
    inputs="text",      # Plain text input
    outputs="text",     # Plain text output retuned
    title="USD-AAI-520 Group 3 FLAN-T5 Chatbot",
    description="I'm a FLAN-T5 model! I will talk to you till you say 'bye' :)."
)

# Launch Gradio Interface allowing users to interact with the chatbot
iface.launch()

Using GPU: Tesla T4
Using device: cuda
Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://58805bb3ce0d68cc19.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.

