# 🚀 LocalLab on Google Colab

This notebook provides a complete guide to running LocalLab on Google Colab. Follow each section step by step.

## Table of Contents
1. Setup & Installation
2. Configuration
3. Model Loading
4. Usage Examples
5. Monitoring & Optimization
6. Troubleshooting

## 1. Setup & Installation

First, let's install LocalLab and its dependencies.

In [None]:
!pip install locallab

## 2. Configuration

### 2.1 Enter Your Tokens
Please enter your ngrok and Hugging Face tokens below:

In [None]:
import os
from getpass import getpass

print("Enter your ngrok token (get one from https://dashboard.ngrok.com):")
ngrok_token = getpass()
os.environ["NGROK_AUTH_TOKEN"] = ngrok_token

print("\nEnter your Hugging Face token (optional, get from https://huggingface.co/settings/tokens):")
hf_token = getpass()
if hf_token:
    os.environ["HUGGINGFACE_TOKEN"] = hf_token

### 2.2 Select Model
Choose your model configuration:

In [None]:
# @title Model Configuration
model_choice = "microsoft/phi-2" # @param ["microsoft/phi-2", "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "custom"]
custom_model = "" # @param {type:"string"}

# Method 1: Using CLI (recommended)
if model_choice == "custom" and custom_model:
    !locallab config --model {custom_model}
else:
    !locallab config --model {model_choice}

# Method 2: Using environment variables (backup method)
if model_choice == "custom":
    os.environ["HUGGINGFACE_MODEL"] = custom_model
else:
    os.environ["HUGGINGFACE_MODEL"] = model_choice

### 2.3 Performance Settings

In [None]:
# @title Optimization Settings
enable_quantization = True # @param {type:"boolean"}
quantization_type = "int8" # @param ["fp16", "int8", "int4"]
enable_flash_attention = True # @param {type:"boolean"}
enable_attention_slicing = True # @param {type:"boolean"}

# Method 1: Using CLI (recommended)
!locallab config --quantize {str(enable_quantization).lower()} --quantize-type {quantization_type} \
                 --flash-attention {str(enable_flash_attention).lower()} \
                 --attention-slicing {str(enable_attention_slicing).lower()}

# Method 2: Using environment variables (backup method)
os.environ["LOCALLAB_ENABLE_QUANTIZATION"] = str(enable_quantization)
os.environ["LOCALLAB_QUANTIZATION_TYPE"] = quantization_type
os.environ["LOCALLAB_ENABLE_FLASH_ATTENTION"] = str(enable_flash_attention)
os.environ["LOCALLAB_ENABLE_ATTENTION_SLICING"] = str(enable_attention_slicing)

## 3. Start Server

Now let's start the LocalLab server:

In [None]:
# Start server with ngrok enabled using CLI (notice the ! prefix)
!locallab start --use-ngrok

# This will show the public URL in logs like:
# 🚀 Ngrok Public URL: https://abc123.ngrok.app
# COPY THIS URL - you'll need it to connect!

## 4. Connect Client

Copy the ngrok URL from the logs above and use it to connect the client:

In [None]:
from locallab_client import LocalLabClient

# @title Client Connection
server_url = "" # @param {type:"string"}

client = LocalLabClient(server_url)

# Test connection
is_healthy = await client.health_check()
print(f"Server connection: {'OK' if is_healthy else 'Failed'}")

## 5. Usage Examples

### 5.1 Basic Text Generation

In [None]:
# @title Text Generation
prompt = "Write a story about a robot learning to paint" # @param {type:"string"}
temperature = 0.7 # @param {type:"slider", min:0.1, max:1.0, step:0.1}

response = await client.generate(prompt, temperature=temperature)
print("AI Response:")
print(response)

### 5.2 Chat Completion

In [None]:
# @title Chat with AI
system_message = "You are a helpful assistant." # @param {type:"string"}
user_message = "What is artificial intelligence?" # @param {type:"string"}

response = await client.chat([
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_message}
])

print("AI Response:")
print(response.choices[0].message.content)

### 5.3 Streaming Response

In [None]:
# @title Stream Generation
stream_prompt = "Tell me a story" # @param {type:"string"}

print("AI Response:")
async for token in client.stream_generate(stream_prompt):
    print(token, end="", flush=True)

## 6. System Monitoring

In [None]:
# Get system information
system_info = await client.get_system_info()

print("System Status:")
print(f"CPU Usage: {system_info.cpu_usage}%")
print(f"Memory Usage: {system_info.memory_usage}%")
if system_info.gpu_info:
    print(f"GPU: {system_info.gpu_info.device}")
    print(f"GPU Memory Used: {system_info.gpu_info.used_memory}MB")
print(f"Active Model: {system_info.active_model}")

## 7. Cleanup

When you're done, clean up resources:

In [None]:
# Unload model to free memory
await client.unload_model()

# Close client connection
await client.close()

## 8. Troubleshooting

If you encounter issues:

1. **Connection Issues**
   - Verify ngrok token is correct
   - Check server logs for errors
   - Ensure URL is copied correctly

2. **Memory Issues**
   - Enable quantization
   - Try a smaller model
   - Check GPU memory usage

3. **Model Loading Issues**
   - Verify Hugging Face token
   - Check model name is correct
   - Ensure sufficient resources

For more help, visit:
- [Troubleshooting Guide](https://github.com/UtkarshTheDev/LocalLab/blob/main/docs/TROUBLESHOOTING.md)
- [FAQ](https://github.com/UtkarshTheDev/LocalLab/blob/main/docs/FAQ.md)
- [GitHub Issues](https://github.com/UtkarshTheDev/LocalLab/issues)