# üåø Empathetic Friend - Colab API Server

Run your fine-tuned empathetic chatbot as an API with a **public URL**!

**Features:**
- üöÄ Free T4 GPU from Colab
- üåê Public URL via ngrok (share with anyone!)
- üí¨ Both API and Chat Interface
- ‚ö° Fast inference

---

## How to Use

1. **Run cells 1-4** to install deps and load model
2. **Choose an option:**
   - **Option A (Gradio)**: Beautiful chat interface with public URL
   - **Option B (FastAPI)**: REST API for integration with apps
   - **Option C (Both)**: Chat + API together


In [None]:
# ============================================================
# CELL 1: Install Dependencies
# ============================================================
%pip install -q transformers accelerate torch gradio fastapi uvicorn pyngrok nest-asyncio

print("‚úÖ Dependencies installed!")


In [None]:
# ============================================================
# CELL 2: Configuration - UPDATE THIS!
# ============================================================

MODEL_ID = "Someet24/empathetic-qwen3-8b-11-01"  # Your HuggingFace model

# Optional: Set your ngrok auth token for persistent URLs
# Get free token at: https://dashboard.ngrok.com/get-started/your-authtoken
NGROK_AUTH_TOKEN = None  # e.g., "2abc123def456..."

SYSTEM_PROMPT = """You are a warm, supportive, and empathetic friend. You listen carefully to what people share and respond with genuine care and understanding.

When someone shares their feelings:
1. Acknowledge and validate their emotions first
2. Show that you understand their situation
3. Ask thoughtful follow-up questions
4. Offer support without being preachy or giving unsolicited advice
5. Never minimize their feelings with phrases like "just" or "at least"

You're not a therapist - you're a caring friend who's always there to listen."""

print(f"üì¶ Model: {MODEL_ID}")
print(f"üîë ngrok token: {'Set' if NGROK_AUTH_TOKEN else 'Not set (will use random URL)'}")


In [None]:
# ============================================================
# CELL 3: Load the Model
# ============================================================
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

print(f"üîÑ Loading model: {MODEL_ID}")
print(f"üñ•Ô∏è  GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU'}")

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

print("‚úÖ Model loaded!")
print(f"üìä GPU Memory: {torch.cuda.memory_allocated()/1024**3:.2f} GB")


In [None]:
# ============================================================
# CELL 4: Define Generation Function
# ============================================================

def generate_response(user_message, history=None, temperature=0.7, max_tokens=256):
    """Generate an empathetic response."""
    if history is None:
        history = []
    
    # Build conversation
    messages = [{"role": "system", "content": SYSTEM_PROMPT}]
    
    for user_msg, assistant_msg in history:
        messages.append({"role": "user", "content": user_msg})
        if assistant_msg:
            messages.append({"role": "assistant", "content": assistant_msg})
    
    messages.append({"role": "user", "content": user_message})
    
    # Apply chat template
    try:
        text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    except:
        text = f"System: {SYSTEM_PROMPT}\n\n"
        for user_msg, assistant_msg in history:
            text += f"User: {user_msg}\n\nAssistant: {assistant_msg}\n\n"
        text += f"User: {user_message}\n\nAssistant:"
    
    # Tokenize
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_tokens,
            temperature=temperature,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.pad_token_id,
        )
    
    # Decode
    response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()
    
    # Handle Qwen3 thinking tokens
    if "<think>" in response:
        response = response.split("</think>")[-1].strip()
    
    return response

# Test it!
print("üß™ Testing generation...")
test_response = generate_response("I'm feeling a bit anxious today.")
print(f"\nü§ñ Response: {test_response}")


---

# üéØ Option A: Gradio Chat Interface (Easiest!)

Run this cell to get a beautiful chat interface with a **public URL** you can share!


In [None]:
# ============================================================
# OPTION A: Gradio Chat Interface
# ============================================================
import gradio as gr

def chat(message, history):
    """Gradio chat function."""
    response = generate_response(message, history)
    return response

# Create Gradio interface
demo = gr.ChatInterface(
    fn=chat,
    title="üåø Empathetic Friend",
    description="A safe space to share what's on your mind. I'm here to listen without judgment.",
    examples=[
        "I'm feeling overwhelmed with work lately.",
        "My friend hasn't talked to me in weeks.",
        "I just got some exciting news!",
        "I've been feeling lonely since moving to a new city.",
    ],
    theme=gr.themes.Soft(),
)

# Launch with public URL
print("üöÄ Starting Gradio server...")
print("üìã You'll get a PUBLIC URL below that you can share with anyone!")
demo.launch(share=True, debug=True)


---

# üîå Option B: FastAPI REST API

Run these cells to get a REST API you can call from any application.

‚ö†Ô∏è **Don't run both Option A and B at the same time!**


In [None]:
# ============================================================
# OPTION B: FastAPI Setup
# ============================================================
from fastapi import FastAPI
from pydantic import BaseModel
from typing import List, Optional
import uvicorn
import nest_asyncio
from pyngrok import ngrok

# Allow nested event loops (needed for Colab)
nest_asyncio.apply()

# Create FastAPI app
app = FastAPI(title="Empathetic Chat API", version="1.0.0")

# Request/Response models
class Message(BaseModel):
    role: str
    content: str

class ChatRequest(BaseModel):
    messages: List[Message]
    temperature: Optional[float] = 0.7
    max_tokens: Optional[int] = 256

class ChatResponse(BaseModel):
    response: str

class SimpleRequest(BaseModel):
    message: str
    temperature: Optional[float] = 0.7
    max_tokens: Optional[int] = 256

# Endpoints
@app.get("/")
async def root():
    return {"status": "running", "model": MODEL_ID}

@app.post("/chat", response_model=ChatResponse)
async def chat_endpoint(request: ChatRequest):
    """Chat with conversation history."""
    history = []
    user_message = ""
    
    for msg in request.messages:
        if msg.role == "user":
            if user_message:
                history.append((user_message, ""))
            user_message = msg.content
        elif msg.role == "assistant":
            if user_message:
                history.append((user_message, msg.content))
                user_message = ""
    
    response = generate_response(user_message, history, request.temperature, request.max_tokens)
    return ChatResponse(response=response)

@app.post("/simple", response_model=ChatResponse)
async def simple_chat(request: SimpleRequest):
    """Simple single-turn chat."""
    response = generate_response(request.message, temperature=request.temperature, max_tokens=request.max_tokens)
    return ChatResponse(response=response)

@app.get("/health")
async def health():
    return {"status": "healthy", "gpu": torch.cuda.is_available()}

print("‚úÖ FastAPI app created!")


In [None]:
# ============================================================
# Start API Server with Public URL
# ============================================================
PORT = 8000

# Set ngrok auth token if provided
if NGROK_AUTH_TOKEN:
    ngrok.set_auth_token(NGROK_AUTH_TOKEN)

# Create tunnel
public_url = ngrok.connect(PORT)

print(f"\n{'='*60}")
print(f"üåê PUBLIC API URL: {public_url}")
print(f"{'='*60}")
print(f"\nüìã API Endpoints:")
print(f"   GET  {public_url}/          - Status")
print(f"   POST {public_url}/simple    - Simple chat")
print(f"   POST {public_url}/chat      - Chat with history")
print(f"   GET  {public_url}/health    - Health check")
print(f"   GET  {public_url}/docs      - Interactive docs")
print(f"\nüí° Example curl command:")
print(f'   curl -X POST "{public_url}/simple" \\')
print(f'        -H "Content-Type: application/json" \\')
print(f'        -d \'{{"message": "I am feeling anxious today"}}\'')
print(f"\n{'='*60}")
print("\n‚ö†Ô∏è  Keep this cell running! The server stops when you stop the cell.")

# Run server (this blocks - server runs until you stop the cell)
uvicorn.run(app, host="0.0.0.0", port=PORT)


---

# üìö How to Call the API

Replace `YOUR_URL` with your ngrok URL from above.


In [None]:
# Python Example - Run this from any Python script!
python_example = '''
import requests

API_URL = "YOUR_NGROK_URL"  # Replace with your URL

# Simple chat
response = requests.post(
    f"{API_URL}/simple",
    json={"message": "I'm feeling anxious about my exam tomorrow"}
)
print(response.json()["response"])

# Chat with history
response = requests.post(
    f"{API_URL}/chat",
    json={
        "messages": [
            {"role": "user", "content": "I had a bad day"},
            {"role": "assistant", "content": "I'm sorry to hear that..."},
            {"role": "user", "content": "My boss yelled at me"}
        ],
        "temperature": 0.7,
        "max_tokens": 256
    }
)
print(response.json()["response"])
'''

print("üêç Python Example:")
print(python_example)

# JavaScript Example
js_example = '''
// JavaScript/Fetch Example
const API_URL = "YOUR_NGROK_URL";

async function chat(message) {
    const response = await fetch(`${API_URL}/simple`, {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ message })
    });
    const data = await response.json();
    return data.response;
}

// Usage
const reply = await chat("I need someone to talk to");
console.log(reply);
'''

print("\nüìú JavaScript Example:")
print(js_example)
