<a href="https://colab.research.google.com/github/Ncn914491/mlcollab_notebooks/blob/main/gemma_chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Gemma 2B Chatbot in Google Colab

This notebook implements a simple chatbot using Google's Gemma 2 2B model. The chatbot includes:
- Easy setup for Google Colab
- Conversation memory and context management
- Interactive chat interface
- Error handling and response validation

## Requirements
- Google Colab (recommended: GPU runtime for better performance)
- Hugging Face account (for model access)
- Internet connection for model download

## Model Information
- **Model**: google/gemma-2-2b-it (instruction-tuned version)
- **Size**: ~2B parameters
- **Memory**: Requires ~4-6GB GPU memory


## 1. Setup and Installation

First, let's install the necessary dependencies and check our environment.

In [1]:
# Install required packages
!pip install -q transformers torch accelerate bitsandbytes
!pip install -q huggingface_hub

# Check GPU availability
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("⚠️  No GPU detected. The model will run on CPU (much slower).")

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m78.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m67.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m44.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## 2. Hugging Face Authentication

You need to authenticate with Hugging Face to access the Gemma model.

**Steps:**
1. Go to [Hugging Face](https://huggingface.co/) and create an account
2. Visit the [Gemma 2 2B model page](https://huggingface.co/google/gemma-2-2b-it) and accept the license
3. Generate an access token at [Settings > Access Tokens](https://huggingface.co/settings/tokens)
4. Run the cell below and enter your token when prompted

In [2]:
from huggingface_hub import login
import getpass

# Authenticate with Hugging Face
print("Please enter your Hugging Face access token:")
token = getpass.getpass("Token: ")
login(token=token)
print("✅ Successfully authenticated with Hugging Face!")

Please enter your Hugging Face access token:
Token: ··········
✅ Successfully authenticated with Hugging Face!


## 3. Model Configuration and Loading

Now let's load the Gemma 2 2B model with optimized settings for Colab.

In [3]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
import warnings
warnings.filterwarnings('ignore')

# Model configuration
MODEL_NAME = "google/gemma-2-2b-it"

print("Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# Configure quantization for memory efficiency (if GPU available)
if torch.cuda.is_available():
    quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=True,
    )

    print("Loading model with 4-bit quantization...")
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        quantization_config=quantization_config,
        device_map="auto",
        torch_dtype=torch.float16,
    )
else:
    print("Loading model on CPU...")
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        torch_dtype=torch.float32,
        device_map="cpu"
    )

print("✅ Model loaded successfully!")
print(f"Model device: {next(model.parameters()).device}")

Loading tokenizer...


tokenizer_config.json:   0%|          | 0.00/47.0k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

Loading model with 4-bit quantization...


config.json:   0%|          | 0.00/838 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/241M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

✅ Model loaded successfully!
Model device: cuda:0


## 4. Chatbot Implementation

Let's create a chatbot class with conversation memory and proper formatting.

In [4]:
class GemmaChatbot:
    def __init__(self, model, tokenizer, max_history=10):
        self.model = model
        self.tokenizer = tokenizer
        self.conversation_history = []
        self.max_history = max_history

        # Generation parameters
        self.generation_config = {
            'max_new_tokens': 512,
            'temperature': 0.7,
            'top_p': 0.9,
            'do_sample': True,
            'pad_token_id': tokenizer.eos_token_id,
        }

    def format_conversation(self, user_input):
        """Format the conversation for Gemma's chat template"""
        messages = []

        # Add conversation history
        for user_msg, bot_msg in self.conversation_history[-self.max_history:]:
            messages.append({"role": "user", "content": user_msg})
            messages.append({"role": "assistant", "content": bot_msg})

        # Add current user input
        messages.append({"role": "user", "content": user_input})

        return self.tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
        )

    def generate_response(self, user_input):
        """Generate a response to user input"""
        try:
            # Format the conversation
            formatted_input = self.format_conversation(user_input)

            # Tokenize
            inputs = self.tokenizer(
                formatted_input,
                return_tensors="pt",
                truncation=True,
                max_length=2048
            )

            # Move to same device as model
            inputs = {k: v.to(self.model.device) for k, v in inputs.items()}

            # Generate response
            with torch.no_grad():
                outputs = self.model.generate(
                    **inputs,
                    **self.generation_config
                )

            # Decode response
            response = self.tokenizer.decode(
                outputs[0][inputs['input_ids'].shape[1]:],
                skip_special_tokens=True
            ).strip()

            # Add to conversation history
            self.conversation_history.append((user_input, response))

            return response

        except Exception as e:
            error_msg = f"Sorry, I encountered an error: {str(e)}"
            print(f"Error details: {e}")
            return error_msg

    def clear_history(self):
        """Clear conversation history"""
        self.conversation_history = []
        print("Conversation history cleared!")

    def get_history_summary(self):
        """Get a summary of conversation history"""
        if not self.conversation_history:
            return "No conversation history yet."

        summary = f"Conversation history ({len(self.conversation_history)} exchanges):\n"
        for i, (user_msg, bot_msg) in enumerate(self.conversation_history[-5:], 1):
            summary += f"{i}. User: {user_msg[:50]}{'...' if len(user_msg) > 50 else ''}\n"
            summary += f"   Bot: {bot_msg[:50]}{'...' if len(bot_msg) > 50 else ''}\n"
        return summary

# Initialize the chatbot
chatbot = GemmaChatbot(model, tokenizer)
print("✅ Chatbot initialized and ready to chat!")

✅ Chatbot initialized and ready to chat!


## 5. Test the Chatbot

Let's test the chatbot with a simple conversation to make sure everything is working.

In [5]:
# Test basic functionality
print("Testing the chatbot...\n")

test_messages = [
    "Hello! What's your name?",
    "Can you help me with Python programming?",
    "What's the weather like today?"
]

for msg in test_messages:
    print(f"👤 User: {msg}")
    response = chatbot.generate_response(msg)
    print(f"🤖 Bot: {response}")
    print("-" * 50)

Testing the chatbot...

👤 User: Hello! What's your name?
🤖 Bot: Hello! My name is Gemma. 😊  How can I help you today?
--------------------------------------------------
👤 User: Can you help me with Python programming?
🤖 Bot: I can definitely help you with Python programming!  

To give you the best help, tell me:

* **What are you trying to do?**  (e.g., "I want to learn how to create a simple calculator", "I need to sort a list of numbers", "I'm having trouble with a specific error message") 
* **What's your current level of experience?** (e.g., "I'm a beginner", "I know some basic syntax", "I'm familiar with other programming languages")
* **Do you have any code you're working on?** (If so, please share it with me!) 

The more details you can give me, the better I can assist you.  

I'm excited to help you on your Python journey! 🚀 🐍
--------------------------------------------------
👤 User: What's the weather like today?
🤖 Bot: I can't give you real-time information like the weather

## 6. Interactive Chat Interface

Now you can have an interactive conversation with the chatbot. Run the cell below and start chatting!

In [7]:
import time

def interactive_chat():
    print("🤖 Gemma Chatbot is ready! Type 'quit' to exit, 'clear' to clear history, or 'history' to see conversation summary.\n")

    while True:
        try:
            user_input = input("👤 You: ").strip()

            if user_input.lower() in ['quit', 'exit', 'bye']:
                print("🤖 Goodbye! Thanks for chatting!")
                break
            elif user_input.lower() == 'clear':
                chatbot.clear_history()
                continue
            elif user_input.lower() == 'history':
                print("\n" + chatbot.get_history_summary() + "\n")
                continue
            elif not user_input:
                print("Please enter a message or 'quit' to exit.")
                continue

            print("🤖 Thinking...", end="", flush=True)
            start_time = time.time()

            response = chatbot.generate_response(user_input)

            end_time = time.time()
            print(f"\r🤖 Bot ({end_time - start_time:.1f}s): {response}\n")

        except KeyboardInterrupt:
            print("\n🤖 Chat interrupted. Goodbye!")
            break
        except Exception as e:
            print(f"\n❌ Error: {e}\n")

# Start interactive chat
interactive_chat()

🤖 Gemma Chatbot is ready! Type 'quit' to exit, 'clear' to clear history, or 'history' to see conversation summary.

👤 You: give me a python code to append 2 numbers to a list
🤖 Bot (45.9s): ```python
def append_numbers(numbers, num1, num2):
  """
  Appends two numbers to a list.

  Args:
      numbers: The existing list.
      num1: The first number.
      num2: The second number.

  Returns:
      A new list with the two numbers appended to the end.
  """
  new_numbers = numbers.copy()  # Make a copy to avoid modifying the original list
  new_numbers.append(num1)
  new_numbers.append(num2)
  return new_numbers

# Example usage
my_list = [1, 2, 3]
new_list = append_numbers(my_list, 4, 5)
print(new_list)  # Output: [1, 2, 3, 4, 5]
```

**Explanation:**

1. **Function Definition:**
   - `def append_numbers(numbers, num1, num2):` defines a function named `append_numbers` that takes three arguments:
     - `numbers`: The list you want to modify.
     - `num1`: The first number to add.
    

## 7. Advanced Features and Customization

Here are some additional features you can experiment with:

In [8]:
# Adjust generation parameters for different conversation styles

def set_creative_mode():
    """More creative and diverse responses"""
    chatbot.generation_config.update({
        'temperature': 0.9,
        'top_p': 0.95,
        'top_k': 50
    })
    print("🎨 Creative mode activated!")

def set_focused_mode():
    """More focused and consistent responses"""
    chatbot.generation_config.update({
        'temperature': 0.3,
        'top_p': 0.8,
        'top_k': 20
    })
    print("🎯 Focused mode activated!")

def set_balanced_mode():
    """Balanced responses (default)"""
    chatbot.generation_config.update({
        'temperature': 0.7,
        'top_p': 0.9,
        'top_k': 40
    })
    print("⚖️ Balanced mode activated!")

# Example usage:
print("Available modes:")
print("- set_creative_mode(): More creative responses")
print("- set_focused_mode(): More focused responses")
print("- set_balanced_mode(): Default balanced responses")

# You can call these functions to change the chatbot's personality
# set_creative_mode()

Available modes:
- set_creative_mode(): More creative responses
- set_focused_mode(): More focused responses
- set_balanced_mode(): Default balanced responses


## 8. Troubleshooting and Tips

### Common Issues:

1. **Out of Memory Error**:
   - Make sure you're using a GPU runtime in Colab
   - Try reducing `max_new_tokens` in generation config
   - Clear conversation history more frequently

2. **Slow Responses**:
   - Ensure GPU runtime is enabled
   - Consider using a smaller model if needed
   - Reduce `max_new_tokens` for faster responses

3. **Model Access Issues**:
   - Verify you've accepted the Gemma license on Hugging Face
   - Check your Hugging Face token is valid
   - Ensure you have internet connectivity

### Performance Tips:
- Use GPU runtime for best performance
- Keep conversation history reasonable (default: 10 exchanges)
- Adjust generation parameters based on your needs
- Clear history periodically for long conversations

### Customization Ideas:
- Add system prompts for specific roles (teacher, assistant, etc.)
- Implement conversation themes or contexts
- Add response filtering or safety checks
- Create specialized chatbots for different domains