<a href="https://colab.research.google.com/github/Abhijeetkumar710/MONTY-Custom-AI-ChatBot/blob/main/MontyBETA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [20]:


import json


nb_filename = '/content/drive/MyDrive/Colab Notebooks/MontyBETA.ipynb'

# Load notebook
with open(nb_filename, 'r', encoding='utf-8') as f:
    nb_data = json.load(f)

# Remove 'widgets metadata from all cells
for cell in nb_data.get('cells', []):
    if 'metadata' in cell and 'widgets' in cell['metadata']:
        del cell['metadata']['widgets']

# Save the notebook back
with open(nb_filename, 'w', encoding='utf-8') as f:
    json.dump(nb_data, f, indent=1)

print(" 'metadata.widgets' cleared.")



# 1. Setup & Installation

!pip install transformers gradio --upgrade
!pip install torch


# 2. Model Initialization

import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Download GPT-2 model and tokenizer
model_name = 'gpt2'
print(f"Downloading model: {model_name}")
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)


#  3. Interaction Logic

def generate_response(input_text):
    inputs = tokenizer.encode(input_text, return_tensors='pt')
    outputs = model.generate(inputs, max_length=50, num_return_sequences=1)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response


#  4. Testing the Model

test_prompt = "Hello Monty, how are you?"
print("Input: ", test_prompt)
print("Output: ", generate_response(test_prompt))






 'metadata.widgets' cleared.
Downloading model: gpt2


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Input:  Hello Monty, how are you?
Output:  Hello Monty, how are you?

I'm fine. I'm fine.

I'm fine.

I'm fine.

I'm fine.

I'm fine.

I'm fine.




In [21]:
from math import e
from logging import exception
#CONFIGURATION AND PARAMETER TUNING

CONFIG = {
  "model_name": "gpt2",
  "max_length":50,
  "num_return_sequences": 1,
    "temperature": 0.8,  # Adjust randomness; higher = more creative
    "top_k": 50,         # Limits next token selection to top K options
    "top_p": 0.95        # Nucleus sampling to prevent excessive repetition
}


#Function to initialize the model with error handling

def initialize_model():
  try:
    print(f"Loading Model: {CONFIG['model_name']}")
    model = GPT2LMHeadModel.from_pretrained(CONFIG['model_name'])
    tokenizer = GPT2Tokenizer.from_pretrained(CONFIG['model_name'])
    print("Model and tokenizer loaded succesfully")
    return model,tokenizer
  except Exception as e:
    print(f"Error loading model: {e}")
    return None, None


    # Initialize the model and tokenizer
model, tokenizer = initialize_model()

# Function to generate response with configured parameters


def generate_response(input_text):
    try:
        inputs = tokenizer.encode(input_text, return_tensors='pt')

        # Generate response with enhanced parameters
        outputs = model.generate(
            inputs,
            max_length=CONFIG["max_length"],
            num_return_sequences=CONFIG["num_return_sequences"],
            temperature=CONFIG["temperature"],
            top_k=CONFIG["top_k"],
            top_p=CONFIG["top_p"],
            do_sample=True  # Enables sampling to reduce repetitive outputs
        )

        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        return response

    except Exception as e:
        return f"Error generating response: {e}"

# Test Monty with enhanced parameters
test_prompt = "Hello Monty, how are you?"
print("Input:", test_prompt)
print("Output:", generate_response(test_prompt))

Loading Model: gpt2


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Model and tokenizer loaded succesfully
Input: Hello Monty, how are you?
Output: Hello Monty, how are you? Is it okay to take this for granted?

Kamu: I want to hear it all.

Kamu: I want to hear it all.

Kamu: I


In [22]:

# STEP 3: RESPONSE REFINEMENT


def generate_refined_response(input_text):
    """
    Generate a refined response from Monty:
    - Uses attention mask to avoid warnings and improve reliability
    - Applies sampling parameters for more natural outputs
    - Removes repetition of the input text in final response
    """
    try:
        # Encode input with attention mask
        inputs = tokenizer.encode_plus(
            input_text,
            return_tensors="pt",
            add_special_tokens=True,
            return_attention_mask=True
        )

        # Generate response with tuned parameters
        outputs = model.generate(
            input_ids=inputs["input_ids"],
            attention_mask=inputs["attention_mask"],
            max_length=CONFIG["max_length"],
            num_return_sequences=CONFIG["num_return_sequences"],
            temperature=CONFIG["temperature"],
            top_k=CONFIG["top_k"],
            top_p=CONFIG["top_p"],
            do_sample=True
        )

        # Decode generated text
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)

        # Post-processing: trim input repetition if present
        if response.startswith(input_text):
            response = response[len(input_text):].strip()

        return response if response else "[No meaningful response generated]"

    except Exception as e:
        return f"Error generating refined response: {e}"


# Test Monty with refined response
test_prompt = "Hello Monty, how are you feeling today?"
print("Input:", test_prompt)
print("Refined Output:", generate_refined_response(test_prompt))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Input: Hello Monty, how are you feeling today?
Refined Output: JAMIE: It feels good to be back.

JAMIE: Thank you.

JAMIE: I'll be back. I'm glad to hear that.


In [23]:

# STEP 4: REDUCING REPETITION


def generate_improved_response(input_text):
    """
    Generate a more natural response from Monty:
    - Uses attention mask for reliability
    - Applies sampling parameters (temperature, top_k, top_p)
    - Adds repetition penalty to discourage loops
    - Trims repeated input text from output
    """
    try:
        # Encode input with attention mask
        inputs = tokenizer.encode_plus(
            input_text,
            return_tensors="pt",
            add_special_tokens=True,
            return_attention_mask=True
        )

        # Generate response with repetition penalty
        outputs = model.generate(
            input_ids=inputs["input_ids"],
            attention_mask=inputs["attention_mask"],
            max_length=CONFIG["max_length"],
            num_return_sequences=CONFIG["num_return_sequences"],
            temperature=CONFIG["temperature"],
            top_k=CONFIG["top_k"],
            top_p=CONFIG["top_p"],
            do_sample=True,
            repetition_penalty=1.2   #  discourages repeating tokens
        )

        # Decode generated text
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)

        # Post-processing: trim input repetition if present
        if response.startswith(input_text):
            response = response[len(input_text):].strip()

        return response if response else "[No meaningful response generated]"

    except Exception as e:
        return f"Error generating improved response: {e}"


# Test Monty with improved response
test_prompt = "Hello Monty, how are you feeling today?"
print("Input:", test_prompt)
print("Improved Output:", generate_improved_response(test_prompt))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Input: Hello Monty, how are you feeling today?
Improved Output: I love the new trailer. I'm actually so excited to get it out in time for Easter next week! [Laughs] So yeah, this is going through a lot of changes over at my


In [24]:

# STEP 5: MULTI-PROMPT TESTING + CLEAN OUTPUT


import re

def clean_response(text):
    """
    Clean up Monty's output:
    - Remove trailing incomplete sentences
    - Ensure response ends cleanly with punctuation
    """
    # Split by sentence-ending punctuation
    sentences = re.split(r'(?<=[.!?]) +', text)
    if sentences:
        return " ".join(sentences[:-1]) if len(sentences) > 1 else sentences[0]
    return text.strip()


def generate_clean_response(input_text):
    """
    Generated a refined and cleaned response from Monty.
    """
    try:
        inputs = tokenizer.encode_plus(
            input_text,
            return_tensors="pt",
            add_special_tokens=True,
            return_attention_mask=True
        )

        outputs = model.generate(
            input_ids=inputs["input_ids"],
            attention_mask=inputs["attention_mask"],
            max_length=CONFIG["max_length"],
            num_return_sequences=CONFIG["num_return_sequences"],
            temperature=CONFIG["temperature"],
            top_k=CONFIG["top_k"],
            top_p=CONFIG["top_p"],
            do_sample=True,
            repetition_penalty=1.2
        )

        response = tokenizer.decode(outputs[0], skip_special_tokens=True)

        if response.startswith(input_text):
            response = response[len(input_text):].strip()

        return clean_response(response)

    except Exception as e:
        return f"Error generating response: {e}"


# Test Monty with multiple prompts
test_prompts = [
    "Hello Monty, how are you feeling today?",
    "Monty, what’s your favorite color and why?",
    "Tell me something interesting about space.",
    "Can you give me advice on staying motivated?"
]

for prompt in test_prompts:
    print("="*50)
    print("Input:", prompt)
    print("Output:", generate_clean_response(prompt))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Input: Hello Monty, how are you feeling today?


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output: The answer is simple: We've been through a lot.
Input: Monty, what’s your favorite color and why?


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output: . . It's a great way to get all the details for this painting that was so important in me having my work done!
Input: Tell me something interesting about space.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output: What's your favorite place on earth where you go to see a movie?
Input: Can you give me advice on staying motivated?
Output: I would love to. I know that if my current job can help, so could everyone else in the organization – especially when it comes to getting things done at night!


In [25]:
#DialoGPT is a Hugging Face model fine-tuned on conversational data (Reddit dialogue).
#So instead of “broken GPT-2 vibes,” Monty will sound more natural in back and forth chat.

# STEP 6: SWITCH TO DIALOGPT


from transformers import AutoModelForCausalLM, AutoTokenizer

# Choose DialoGPT (medium size balances quality & free-tier compute)
DIALOGPT_MODEL = "microsoft/DialoGPT-medium"

def initialize_dialogpt():
    try:
        print(f"Loading Model: {DIALOGPT_MODEL}")
        model = AutoModelForCausalLM.from_pretrained(DIALOGPT_MODEL)
        tokenizer = AutoTokenizer.from_pretrained(DIALOGPT_MODEL)
        print("DialoGPT loaded successfully ")
        return model, tokenizer
    except Exception as e:
        print(f"Error loading DialoGPT: {e}")
        return None, None

# Load new model + tokenizer
dialogpt_model, dialogpt_tokenizer = initialize_dialogpt()

# Conversation state (for multi-turn chat)
chat_history_ids = None

def chat_with_monty(user_input, chat_history_ids=None):
    """
    Conversational function using DialoGPT:
    - Maintains chat history
    - Generates natural conversational responses
    """
    try:
        # Encode user input
        new_input_ids = dialogpt_tokenizer.encode(
            user_input + dialogpt_tokenizer.eos_token,
            return_tensors="pt"
        )

        # Append to chat history if exists
        bot_input_ids = (
            torch.cat([chat_history_ids, new_input_ids], dim=-1)
            if chat_history_ids is not None else new_input_ids
        )

        # Generate response
        chat_history_ids = dialogpt_model.generate(
            bot_input_ids,
            max_length=CONFIG["max_length"],
            pad_token_id=dialogpt_tokenizer.eos_token_id,
            temperature=0.8,
            top_k=50,
            top_p=0.95,
            do_sample=True
        )

        # Decode only the bot's latest reply
        response = dialogpt_tokenizer.decode(
            chat_history_ids[:, bot_input_ids.shape[-1]:][0],
            skip_special_tokens=True
        )

        return response, chat_history_ids

    except Exception as e:
        return f"Error: {e}", chat_history_ids



# Test Monty with DialoGPT

chat_history_ids = None  # reset conversation
test_prompts = [
    "Hello Monty, how are you?",
    "What’s your favorite color?",
    "Tell me something interesting about space.",
    "Can you give me advice on staying motivated?"
]

for prompt in test_prompts:
    reply, chat_history_ids = chat_with_monty(prompt, chat_history_ids)
    print("="*50)
    print("Input:", prompt)
    print("Monty:", reply)


Loading Model: microsoft/DialoGPT-medium
DialoGPT loaded successfully 
Input: Hello Monty, how are you?
Monty: Pretty good. You?
Input: What’s your favorite color?
Monty: Blue... but I'm not a blue person.
Input: Tell me something interesting about space.
Monty: What does it look like?
Input: Can you give me advice on staying motivated?
Monty: Error: Input length of input_ids is 60, but `max_length` is set to 50. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`.


In [26]:

# STEP 6.1: FIXING LENGTH HANDLING


def chat_with_monty(user_input, chat_history_ids=None):
    try:
        new_input_ids = dialogpt_tokenizer.encode(
            user_input + dialogpt_tokenizer.eos_token,
            return_tensors="pt"
        )

        bot_input_ids = (
            torch.cat([chat_history_ids, new_input_ids], dim=-1)
            if chat_history_ids is not None else new_input_ids
        )

        # FIX: use max_new_tokens instead of max_length
        chat_history_ids = dialogpt_model.generate(
            bot_input_ids,
            max_new_tokens=100,   # allows Monty to reply with up to 100 new tokens
            pad_token_id=dialogpt_tokenizer.eos_token_id,
            temperature=0.8,
            top_k=50,
            top_p=0.95,
            do_sample=True
        )

        response = dialogpt_tokenizer.decode(
            chat_history_ids[:, bot_input_ids.shape[-1]:][0],
            skip_special_tokens=True
        )

        return response, chat_history_ids

    except Exception as e:
        return f"Error: {e}", chat_history_ids



      # Test Monty with DialoGPT

chat_history_ids = None  # reset conversation
test_prompts = [
    "Hello Monty, how are you?",
    "What’s your favorite color?",
    "Tell me something interesting about space.",
    "Can you give me advice on staying motivated?"
]

for prompt in test_prompts:
    reply, chat_history_ids = chat_with_monty(prompt, chat_history_ids)
    print("="*50)
    print("Input:", prompt)
    print("Monty:", reply)





Input: Hello Monty, how are you?
Monty: I'm good, how are you?
Input: What’s your favorite color?
Monty: I don't really have one, but I like violet
Input: Tell me something interesting about space.
Monty: Well, I don't really know anything about space.
Input: Can you give me advice on staying motivated?
Monty: I'm not sure what that is, but I'll try my best.


In [27]:
!pip uninstall -y bitsandbytes
!pip install -q bitsandbytes==0.43.1
!pip install -q --upgrade transformers accelerate


Found existing installation: bitsandbytes 0.48.1
Uninstalling bitsandbytes-0.48.1:
  Successfully uninstalled bitsandbytes-0.48.1


In [29]:
# MONTY CHATBOT – FALCON-7B-INSTRUCT


# Install / Upgrade required packages

!pip install -q --upgrade transformers accelerate bitsandbytes
!pip install -U bitsandbytes
!pip install -U transformers accelerate

# Import libraries

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig


# Set device

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print("Device:", DEVICE)


# Model and tokenizer config

MODEL_NAME = "tiiuae/falcon-7b-instruct"

# Quantization config for a 4-bit loading
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16
)

MAX_NEW_TOKENS = 150
TEMPERATURE = 0.7
TOP_K = 50
TOP_P = 0.95
REPETITION_PENALTY = 1.1
MAX_INPUT_LENGTH = 1024

#  Load tokenizer and model
print("Loading Falcon-7B-Instruct in 4-bit mode...")

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# Fix pad token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto"
)

# Critical fix: disable caching
model.config.use_cache = False

print("Monty loaded successfully on", DEVICE)


#  Chat function
def chat_with_monty(user_input):
    global chat_memory
    prompt = build_prompt(user_input, chat_memory)

    inputs = tokenizer(
        prompt,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=MAX_INPUT_LENGTH
    ).to(DEVICE)

    # disabled caching here explicitly
    output_ids = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_new_tokens=MAX_NEW_TOKENS,
        temperature=TEMPERATURE,
        top_k=TOP_K,
        top_p=TOP_P,
        repetition_penalty=REPETITION_PENALTY,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id,
        use_cache=False    # <<---- important fix
    )

    response = tokenizer.decode(
        output_ids[0][inputs["input_ids"].shape[-1]:],
        skip_special_tokens=True
    ).strip()

    chat_memory.append({"user": user_input, "monty": response})
    return response



# Test Monty

test_prompts = [
    "Hello Monty, how are you?",
    "What’s your favorite color?",
    "Tell me something interesting about space.",
    "Can you give me advice on staying motivated?"
]

for prompt in test_prompts:
    reply = chat_with_monty(prompt)
    print("=" * 50)
    print("User:", prompt)
    print("Monty:", reply)


Device: cuda
Loading Falcon-7B-Instruct in 4-bit mode...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Monty loaded successfully on cuda
User: Hello Monty, how are you?
Monty: "I'm doing great! How can I help you today?"
User : Tell me something interesting about space.
Monty: Did you know that the largest asteroid ever discovered is the one that hit the Earth 4.6 billion years ago? It’s called the Earth-killer and is estimated to be more than 2 billion miles in diameter.
User: Can you give me advice on staying motivated?
Monty: "If you want something, never give up on yourself. Believe in yourself, work hard, and you will achieve." -Monty Hall
User: Hello Monty, how are you?
Monty: "I'm doing great! How can I help you today?"
User: What’s your favorite color?
Monty: "My favorite color is green. It is the color of nature, life, and growth. It is also the color of the British flag, which is where I am from. I love green. It's the color of my eyes, too." -Monty Hall
User: Can you tell me something about space?
Monty: "There are more stars in the universe than grains of sand on all the bea

In [30]:
!jupyter nbconvert --ClearMetadataPreprocessor.enabled=True --to notebook --inplace MontyBETA.ipynb


This application is used to convert notebook files (*.ipynb)
        to various other formats.


Options
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
    <cmd> --help-all

--debug
    set log level to logging.DEBUG (maximize logging output)
    Equivalent to: [--Application.log_level=10]
--show-config
    Show the application's configuration (human-readable format)
    Equivalent to: [--Application.show_config=True]
--show-config-json
    Show the application's configuration (json format)
    Equivalent to: [--Application.show_config_json=True]
--generate-config
    generate default config file
    Equivalent to: [--JupyterApp.generate_config=True]
-y
    Answer yes to any questions instead of prompting.
    Equivalent to: [--JupyterApp.answer_yes=True]
--execute
    Execute the notebook prior to export.
    Equivalent to: [--ExecutePr