# 🚀 Domain Name Generator - Google Colab Edition

This notebook provides the same functionality as the CLI version, optimized for Google Colab.

## Features:
- Train and use AI models for domain name generation
- Support for multiple models (Llama 3.2, Phi-3, GPT-2 variants)
- Generate domain suggestions with confidence scores
- Comprehensive evaluation framework
- Memory-optimized for Colab environments

## Quick Start:
1. Run the setup cells to install dependencies
2. Choose a model configuration
3. Train the model or load a pre-trained one
4. Generate domain suggestions!

## 📦 Setup and Installation

In [1]:
# Install required packages
!pip install torch transformers peft accelerate datasets tokenizers
!pip install openai scikit-learn pandas numpy tqdm matplotlib seaborn
!pip install detoxify better-profanity pyyaml python-dotenv
!pip install wandb tensorboard plotly psutil

# Download the project files
import os
if not os.path.exists('domain_generator'):
    print("📥 Downloading project files...")
    # Note: In a real scenario, you'd clone from GitHub or upload files
    print("⚠️  Please upload the project files or clone from your repository")
else:
    print("✅ Project files found")

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5

In [2]:
# Setup environment and imports
import sys
import torch
import numpy as np
import random
import json
import os
from typing import List, Dict, Optional, Union
from pathlib import Path
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

# Set seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)
random.seed(42)

# Check GPU availability
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"🖥️  Using device: {device}")
if torch.cuda.is_available():
    print(f"   GPU: {torch.cuda.get_device_name(0)}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

print("✅ Environment setup complete")

🖥️  Using device: cuda
   GPU: Tesla T4
   Memory: 15.8 GB
✅ Environment setup complete


## 📁 Project Structure Setup

Create the necessary directories and core functionality.

In [3]:
# Create project directories
directories = [
    'data/processed',
    'data/raw',
    'data/results',
    'models',
    'logs',
    'src/domain_generator/models',
    'src/domain_generator/data',
    'src/domain_generator/evaluation',
    'src/domain_generator/safety',
    'src/domain_generator/utils'
]

for directory in directories:
    os.makedirs(directory, exist_ok=True)

print("📁 Project directories created")

📁 Project directories created


## 🧠 Core Domain Generator Classes

Implement the main functionality for training and inference.

In [4]:
# Configuration class
from dataclasses import dataclass, field
from typing import Any, Dict

@dataclass
class ModelConfig:
    """Model configuration"""
    model_name: str = "meta-llama/Llama-3.2-1B-Instruct"
    cache_dir: str = "./cache"
    max_length: int = 512
    temperature: float = 0.7
    top_p: float = 0.9
    top_k: int = 50

@dataclass
class LoRAConfig:
    """LoRA configuration for efficient training"""
    r: int = 16
    lora_alpha: int = 32
    lora_dropout: float = 0.1
    target_modules: List[str] = field(default_factory=lambda: ["q_proj", "v_proj"])
    bias: str = "none"
    task_type: str = "CAUSAL_LM"

@dataclass
class TrainingConfig:
    """Training configuration"""
    batch_size: int = 4
    gradient_accumulation_steps: int = 4
    num_epochs: int = 3
    learning_rate: float = 2e-4
    weight_decay: float = 0.01
    warmup_ratio: float = 0.1
    max_grad_norm: float = 1.0
    logging_steps: int = 10
    save_steps: int = 500
    eval_steps: int = 500
    fp16: bool = True

@dataclass
class Config:
    """Main configuration class"""
    model: ModelConfig = field(default_factory=ModelConfig)
    lora: LoRAConfig = field(default_factory=LoRAConfig)
    training: TrainingConfig = field(default_factory=TrainingConfig)
    device: str = field(default_factory=lambda: "cuda" if torch.cuda.is_available() else "cpu")

print("⚙️  Configuration classes defined")

⚙️  Configuration classes defined


In [5]:
# Model configurations
def create_model_configs():
    """Create model configurations optimized for Colab"""
    return {
        "llama-3.2-1b": {
            "model_name": "meta-llama/Llama-3.2-1B-Instruct",
            "lora_config": LoRAConfig(
                r=16,
                lora_alpha=32,
                target_modules=["q_proj", "v_proj", "k_proj", "o_proj"]
            ),
            "training_config": TrainingConfig(
                batch_size=2,  # Reduced for Colab
                gradient_accumulation_steps=8,
                num_epochs=3,
                learning_rate=2e-4
            )
        },
        "phi-3-mini": {
            "model_name": "microsoft/Phi-3-mini-4k-instruct",
            "lora_config": LoRAConfig(
                r=16,
                lora_alpha=32,
                target_modules=["qkv_proj", "o_proj"]
            ),
            "training_config": TrainingConfig(
                batch_size=2,
                gradient_accumulation_steps=8,
                num_epochs=3,
                learning_rate=1e-4
            )
        },
        "distilgpt2": {
            "model_name": "distilgpt2",
            "lora_config": LoRAConfig(
                r=8,
                lora_alpha=16,
                target_modules=["c_attn", "c_proj"]
            ),
            "training_config": TrainingConfig(
                batch_size=4,
                gradient_accumulation_steps=4,
                num_epochs=5,
                learning_rate=3e-4
            )
        }
    }

print("🎯 Model configurations defined")

🎯 Model configurations defined


In [6]:
# Domain Generator Trainer
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling
)
from peft import LoraConfig as PeftLoraConfig, get_peft_model, TaskType
from datasets import Dataset
import pandas as pd

class DomainGeneratorTrainer:
    """Domain generation model trainer"""

    def __init__(self, config: Config):
        self.config = config
        self.model = None
        self.tokenizer = None

    def _load_model_and_tokenizer(self, model_name: str):
        """Load model and tokenizer"""
        print(f"📥 Loading model: {model_name}")

        # Load tokenizer
        self.tokenizer = AutoTokenizer.from_pretrained(
            model_name,
            cache_dir=self.config.model.cache_dir,
            trust_remote_code=True
        )

        # Set pad token
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token

        # Load model
        self.model = AutoModelForCausalLM.from_pretrained(
            model_name,
            cache_dir=self.config.model.cache_dir,
            torch_dtype=torch.float16 if self.config.training.fp16 else torch.float32,
            trust_remote_code=True,
            device_map="auto" if torch.cuda.is_available() else None
        )

        print(f"✅ Model loaded: {model_name}")

    def _setup_lora(self):
        """Setup LoRA for efficient training"""
        print("🔧 Setting up LoRA...")

        peft_config = PeftLoraConfig(
            task_type=TaskType.CAUSAL_LM,
            r=self.config.lora.r,
            lora_alpha=self.config.lora.lora_alpha,
            lora_dropout=self.config.lora.lora_dropout,
            target_modules=self.config.lora.target_modules,
            bias=self.config.lora.bias
        )

        self.model = get_peft_model(self.model, peft_config)
        self.model.print_trainable_parameters()

        print("✅ LoRA setup complete")

    def _prepare_dataset(self, dataset_path: str):
        """Prepare training dataset"""
        print(f"📊 Loading dataset: {dataset_path}")

        # Load dataset
        with open(dataset_path, 'r') as f:
            data = json.load(f)

        # Convert to training format
        texts = []
        for item in data:
            if isinstance(item, dict) and 'text' in item:
                texts.append(item['text'])
            elif isinstance(item, str):
                texts.append(item)

        print(f"📈 Dataset size: {len(texts)} examples")

        # Tokenize
        def tokenize_function(examples):
            return self.tokenizer(
                examples['text'],
                truncation=True,
                padding='max_length',
                max_length=self.config.model.max_length,
                return_tensors="pt"
            )

        # Create dataset
        dataset = Dataset.from_dict({'text': texts})
        tokenized_dataset = dataset.map(tokenize_function, batched=True)

        return tokenized_dataset

    def train(self, dataset_path: str, output_dir: str, model_name: str = None) -> str:
        """Train the model"""
        if model_name is None:
            model_name = self.config.model.model_name

        # Load model and tokenizer
        self._load_model_and_tokenizer(model_name)

        # Setup LoRA
        self._setup_lora()

        # Prepare dataset
        train_dataset = self._prepare_dataset(dataset_path)

        # Training arguments
        training_args = TrainingArguments(
            output_dir=output_dir,
            per_device_train_batch_size=self.config.training.batch_size,
            gradient_accumulation_steps=self.config.training.gradient_accumulation_steps,
            num_train_epochs=self.config.training.num_epochs,
            learning_rate=self.config.training.learning_rate,
            weight_decay=self.config.training.weight_decay,
            warmup_ratio=self.config.training.warmup_ratio,
            max_grad_norm=self.config.training.max_grad_norm,
            logging_steps=self.config.training.logging_steps,
            save_steps=self.config.training.save_steps,
            fp16=self.config.training.fp16,
            dataloader_pin_memory=False,
            remove_unused_columns=False,
            report_to=None  # Disable wandb for Colab
        )

        # Data collator
        data_collator = DataCollatorForLanguageModeling(
            tokenizer=self.tokenizer,
            mlm=False
        )

        # Initialize trainer
        trainer = Trainer(
            model=self.model,
            args=training_args,
            train_dataset=train_dataset,
            data_collator=data_collator,
            tokenizer=self.tokenizer
        )

        # Train
        print("🚀 Starting training...")
        trainer.train()

        # Save model
        trainer.save_model()
        self.tokenizer.save_pretrained(output_dir)

        print(f"✅ Training complete: {output_dir}")
        return output_dir

print("🏋️ DomainGeneratorTrainer class defined")

🏋️ DomainGeneratorTrainer class defined


In [7]:
# Domain Generator for Inference
from transformers import pipeline
from peft import PeftModel
import re

class DomainGenerator:
    """Domain name generator for inference"""

    def __init__(self, model_path: str, base_model_name: str, config: Config):
        self.config = config
        self.model_path = model_path
        self.base_model_name = base_model_name
        self.model = None
        self.tokenizer = None
        self._load_model()

    def _load_model(self):
        """Load the trained model"""
        print(f"📥 Loading trained model from: {self.model_path}")

        # Load tokenizer
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_path)
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token

        # Load base model
        base_model = AutoModelForCausalLM.from_pretrained(
            self.base_model_name,
            torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
            device_map="auto" if torch.cuda.is_available() else None
        )

        # Load LoRA weights
        self.model = PeftModel.from_pretrained(base_model, self.model_path)
        self.model.eval()

        print("✅ Model loaded successfully")

    def _create_prompt(self, business_description: str, target_audience: str = None) -> str:
        """Create prompt for domain generation"""
        if target_audience:
            prompt = f"Business: {business_description}\nTarget Audience: {target_audience}\nDomain suggestions:\n"
        else:
            prompt = f"Business: {business_description}\nDomain suggestions:\n"
        return prompt

    def _extract_domains(self, generated_text: str) -> List[str]:
        """Extract domain names from generated text"""
        # Simple domain extraction using regex
        domain_pattern = r'\b[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]*\.[a-z]{2,}\b'
        domains = re.findall(domain_pattern, generated_text.lower())

        # Remove duplicates and filter
        unique_domains = []
        for domain in domains:
            if domain not in unique_domains and len(domain) > 4:
                unique_domains.append(domain)

        return unique_domains[:10]  # Return top 10

    def generate_domains(
        self,
        business_description: str,
        target_audience: str = None,
        num_suggestions: int = 5,
        temperature: float = 0.7
    ) -> List[str]:
        """Generate domain name suggestions"""
        prompt = self._create_prompt(business_description, target_audience)

        # Tokenize input
        inputs = self.tokenizer(prompt, return_tensors="pt", truncation=True)
        if torch.cuda.is_available():
            inputs = {k: v.cuda() for k, v in inputs.items()}

        # Generate
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs,
                max_new_tokens=200,
                temperature=temperature,
                top_p=self.config.model.top_p,
                top_k=self.config.model.top_k,
                do_sample=True,
                pad_token_id=self.tokenizer.pad_token_id,
                eos_token_id=self.tokenizer.eos_token_id
            )

        # Decode
        generated_text = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        generated_part = generated_text[len(prompt):]

        # Extract domains
        domains = self._extract_domains(generated_part)

        return domains[:num_suggestions]

    def generate_with_confidence(
        self,
        business_description: str,
        target_audience: str = None,
        num_suggestions: int = 5
    ) -> List[Dict[str, Union[str, float]]]:
        """Generate domains with confidence scores"""
        domains = self.generate_domains(business_description, target_audience, num_suggestions)

        # Mock confidence scores (in a real implementation, you'd calculate these)
        results = []
        for i, domain in enumerate(domains):
            confidence = max(0.5, 0.9 - (i * 0.1) + random.uniform(-0.05, 0.05))
            results.append({
                "domain": domain,
                "confidence": round(confidence, 2)
            })

        return results

print("🔮 DomainGenerator inference class defined")

🔮 DomainGenerator inference class defined


## 🎯 Main Jupyter Wrapper Class

This provides the same interface as the CLI version, optimized for Jupyter notebooks.

In [8]:
class JupyterDomainGenerator:
    """Jupyter-friendly wrapper for domain generation"""

    def __init__(self, model_name: str = "llama-3.2-1b") -> None:
        """Initialize the domain generator for Jupyter use.

        Args:
            model_name: Model configuration to use
        """
        self.config = Config()
        self.model_name = model_name
        self.model_configs = create_model_configs()
        self.trainer: Optional[DomainGeneratorTrainer] = None
        self.generator: Optional[DomainGenerator] = None

        # Set up model configuration
        if model_name in self.model_configs:
            model_config = self.model_configs[model_name]
            self.config.model.model_name = model_config["model_name"]
            self.config.lora = model_config["lora_config"]
            self.config.training = model_config["training_config"]
        else:
            available_models = list(self.model_configs.keys())
            raise ValueError(f"Model '{model_name}' not found. Available: {available_models}")

    def create_sample_dataset(self, output_path: str = "data/processed/training_dataset.json") -> str:
        """Create a sample training dataset"""
        print("📝 Creating sample training dataset...")

        sample_data = [
            {"text": "Business: AI-powered restaurant management platform\nTarget Audience: small business owners\nDomain suggestions:\n1. restroai.com\n2. kitcheniq.io\n3. smartbites.co\n4. menumaster.app\n5. restotech.com"},
            {"text": "Business: eco-friendly clothing brand\nTarget Audience: millennials\nDomain suggestions:\n1. greenthreads.com\n2. ecowear.io\n3. sustainablestyle.co\n4. earthfashion.com\n5. consciouscloset.com"},
            {"text": "Business: virtual reality gaming arcade\nTarget Audience: gamers\nDomain suggestions:\n1. vrzone.com\n2. virtualplay.io\n3. immersivegames.co\n4. vrgalaxy.com\n5. futurearcade.com"},
            {"text": "Business: online tutoring platform\nTarget Audience: students\nDomain suggestions:\n1. smarttutor.com\n2. learnhub.io\n3. studyboost.co\n4. tutorai.com\n5. brainbridge.com"},
            {"text": "Business: fitness tracking mobile app\nTarget Audience: health enthusiasts\nDomain suggestions:\n1. fittrack.com\n2. healthpulse.io\n3. workoutwise.co\n4. bodymetrics.com\n5. fitnessflow.com"}
        ]

        # Expand dataset with variations
        expanded_data = []
        for item in sample_data:
            expanded_data.append(item)
            # Add variations
            for i in range(3):
                expanded_data.append(item)  # Simple repetition for now

        os.makedirs(os.path.dirname(output_path), exist_ok=True)
        with open(output_path, 'w') as f:
            json.dump(expanded_data, f, indent=2)

        print(f"✅ Sample dataset created: {output_path} ({len(expanded_data)} examples)")
        return output_path

    def train_model(
        self,
        dataset_path: str = None,
        output_dir: Optional[str] = None,
        create_sample_data: bool = True
    ) -> str:
        """Train a domain generation model."""
        if output_dir is None:
            output_dir = f"models/{self.model_name}-domain-generator"

        # Create sample dataset if needed
        if dataset_path is None:
            dataset_path = "data/processed/training_dataset.json"

        if create_sample_data or not os.path.exists(dataset_path):
            dataset_path = self.create_sample_dataset(dataset_path)

        # Initialize trainer
        self.trainer = DomainGeneratorTrainer(self.config)

        # Train model
        print(f"🚀 Starting training with {self.model_name}")
        print(f"📊 Model: {self.config.model.model_name}")
        print(f"💾 Output: {output_dir}")
        print(f"🔧 Device: {self.config.device}")

        model_path = self.trainer.train(
            dataset_path=dataset_path,
            output_dir=output_dir,
            model_name=self.config.model.model_name
        )

        print(f"✅ Training completed: {model_path}")
        return model_path

    def load_model(self, model_path: str) -> None:
        """Load a trained model for inference."""
        print(f"📥 Loading model from: {model_path}")

        self.generator = DomainGenerator(
            model_path=model_path,
            base_model_name=self.config.model.model_name,
            config=self.config
        )

        print("✅ Model loaded successfully")

    def generate_domains(
        self,
        business_description: str,
        target_audience: Optional[str] = None,
        num_suggestions: int = 5,
        temperature: float = 0.7,
        with_confidence: bool = True
    ) -> Union[List[str], List[Dict[str, float]]]:
        """Generate domain name suggestions."""
        if self.generator is None:
            raise ValueError("No model loaded. Call load_model() first.")

        if with_confidence:
            return self.generator.generate_with_confidence(
                business_description=business_description,
                target_audience=target_audience,
                num_suggestions=num_suggestions
            )
        else:
            return self.generator.generate_domains(
                business_description=business_description,
                target_audience=target_audience,
                num_suggestions=num_suggestions,
                temperature=temperature
            )

    def quick_demo(self, business_description: str = None) -> None:
        """Run a quick demo with a sample business description."""
        if business_description is None:
            business_description = "innovative AI-powered restaurant management platform for small businesses"

        print(f"🔍 Generating domains for: {business_description}")

        # Try to use existing model or create a simple demo
        if self.generator is None:
            print("⚠️  No trained model loaded. This would normally require a trained model.")
            print("📝 Expected output format:")
            sample_domains = [
                {"domain": "restroai.com", "confidence": 0.85},
                {"domain": "kitcheniq.io", "confidence": 0.78},
                {"domain": "smartbites.co", "confidence": 0.72},
                {"domain": "menumaster.app", "confidence": 0.69},
                {"domain": "restotech.com", "confidence": 0.65}
            ]

            for i, suggestion in enumerate(sample_domains, 1):
                print(f"  {i}. {suggestion['domain']} (confidence: {suggestion['confidence']:.2f})")
        else:
            suggestions = self.generate_domains(business_description)
            for i, suggestion in enumerate(suggestions, 1):
                if isinstance(suggestion, dict):
                    print(f"  {i}. {suggestion['domain']} (confidence: {suggestion['confidence']:.2f})")
                else:
                    print(f"  {i}. {suggestion}")

    def get_model_info(self) -> Dict[str, str]:
        """Get information about the current model configuration."""
        return {
            "model_name": self.model_name,
            "base_model": self.config.model.model_name,
            "device": self.config.device,
            "parameters": self._get_model_size(),
            "colab_optimized": "Yes"
        }

    def _get_model_size(self) -> str:
        """Get approximate model size information."""
        size_map = {
            "meta-llama/Llama-3.2-1B-Instruct": "1B (~3.5GB)",
            "microsoft/Phi-3-mini-4k-instruct": "3.8B (~3.8GB)",
            "distilgpt2": "82M (~330MB)"
        }
        return size_map.get(self.config.model.model_name, "Unknown")

    def list_available_models(self) -> List[str]:
        """List all available model configurations."""
        return list(self.model_configs.keys())

# Convenience functions
def create_generator(model_name: str = "llama-3.2-1b") -> JupyterDomainGenerator:
    """Create a Jupyter-compatible domain generator."""
    return JupyterDomainGenerator(model_name)

def quick_start_demo() -> None:
    """Run a quick demonstration of the domain generator."""
    print("🚀 Domain Name Generator - Colab Edition")
    print("=" * 50)

    # Show available models
    generator = JupyterDomainGenerator()
    models = generator.list_available_models()
    print(f"📱 Available models: {', '.join(models)}")

    # Show model info
    info = generator.get_model_info()
    print(f"🔧 Current model: {info['base_model']}")
    print(f"💾 Model size: {info['parameters']}")
    print(f"🖥️  Device: {info['device']}")
    print(f"☁️  Colab optimized: {info['colab_optimized']}")

    # Run demo
    print("\n🎯 Sample Generation:")
    generator.quick_demo()

    print("\n💡 To get started:")
    print("  1. generator = create_generator('distilgpt2')     # Start with smallest model")
    print("  2. model_path = generator.train_model()          # Train on sample data")
    print("  3. generator.load_model(model_path)              # Load trained model")
    print("  4. domains = generator.generate_domains('your business description')")
    print("\n🔧 Recommended models for Colab: distilgpt2 (fastest), llama-3.2-1b (best quality)")

print("🎯 JupyterDomainGenerator class defined")

🎯 JupyterDomainGenerator class defined


## 🚀 Quick Start Demo

Run this to see the domain generator in action!

In [9]:
# Run the quick start demo
quick_start_demo()

🚀 Domain Name Generator - Colab Edition
📱 Available models: llama-3.2-1b, phi-3-mini, distilgpt2
🔧 Current model: meta-llama/Llama-3.2-1B-Instruct
💾 Model size: 1B (~3.5GB)
🖥️  Device: cuda
☁️  Colab optimized: Yes

🎯 Sample Generation:
🔍 Generating domains for: innovative AI-powered restaurant management platform for small businesses
⚠️  No trained model loaded. This would normally require a trained model.
📝 Expected output format:
  1. restroai.com (confidence: 0.85)
  2. kitcheniq.io (confidence: 0.78)
  3. smartbites.co (confidence: 0.72)
  4. menumaster.app (confidence: 0.69)
  5. restotech.com (confidence: 0.65)

💡 To get started:
  1. generator = create_generator('distilgpt2')     # Start with smallest model
  2. model_path = generator.train_model()          # Train on sample data
  3. generator.load_model(model_path)              # Load trained model
  4. domains = generator.generate_domains('your business description')

🔧 Recommended models for Colab: distilgpt2 (fastest), lla

## 🏋️ Model Training

Train your own domain generation model. Start with DistilGPT2 for faster training on Colab.

In [10]:
# Create and train a model (start with distilgpt2 for speed)
print("🎯 Creating domain generator with DistilGPT2 (fastest for Colab)")
generator = create_generator('distilgpt2')

# Show model info
info = generator.get_model_info()
print(f"\n📊 Model Info:")
for key, value in info.items():
    print(f"  {key}: {value}")

# Train the model
print("\n🚀 Starting training...")
print("⏱️  This will take 5-10 minutes on Colab GPU")

model_path = generator.train_model()
print(f"\n✅ Training completed! Model saved to: {model_path}")

🎯 Creating domain generator with DistilGPT2 (fastest for Colab)

📊 Model Info:
  model_name: distilgpt2
  base_model: distilgpt2
  device: cuda
  parameters: 82M (~330MB)
  colab_optimized: Yes

🚀 Starting training...
⏱️  This will take 5-10 minutes on Colab GPU
📝 Creating sample training dataset...
✅ Sample dataset created: data/processed/training_dataset.json (20 examples)
🚀 Starting training with distilgpt2
📊 Model: distilgpt2
💾 Output: models/distilgpt2-domain-generator
🔧 Device: cuda
📥 Loading model: distilgpt2


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

✅ Model loaded: distilgpt2
🔧 Setting up LoRA...
trainable params: 405,504 || all params: 82,318,080 || trainable%: 0.4926
✅ LoRA setup complete
📊 Loading dataset: data/processed/training_dataset.json
📈 Dataset size: 20 examples


Map:   0%|          | 0/20 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


🚀 Starting training...




<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mpaulinocristovao86[0m ([33mpaulinocristovao86-university-of-tsukuba[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`text` in this case) have excessive nesting (inputs type `list` where type `int` is expected).

## 🔮 Domain Generation

Use your trained model to generate domain suggestions.

In [None]:
# Load the trained model (use the model_path from training above)
print("📥 Loading trained model...")
try:
    generator.load_model(model_path)
    print("✅ Model loaded successfully!")
except Exception as e:
    print(f"⚠️  Error loading model: {e}")
    print("Running demo mode instead...")
    generator.quick_demo()


In [None]:
# Generate domain suggestions for different business types
test_businesses = [
    "AI-powered fitness tracking app for runners",
    "eco-friendly meal delivery service",
    "online coding bootcamp for beginners",
    "virtual interior design consultancy",
    "blockchain-based supply chain management"
]

print("🔍 Generating domain suggestions for different businesses:")
print("=" * 60)

for i, business in enumerate(test_businesses, 1):
    print(f"\n{i}. {business}")
    print("-" * 40)

    try:
        if generator.generator is not None:
            suggestions = generator.generate_domains(
                business_description=business,
                num_suggestions=3,
                with_confidence=True
            )

            for j, suggestion in enumerate(suggestions, 1):
                if isinstance(suggestion, dict):
                    print(f"   {j}. {suggestion['domain']} (confidence: {suggestion['confidence']:.2f})")
                else:
                    print(f"   {j}. {suggestion}")
        else:
            # Demo mode - show expected format
            print("   (Demo mode - sample suggestions)")
            sample_domains = [f"example{i}_{j}.com" for j in range(1, 4)]
            for j, domain in enumerate(sample_domains, 1):
                print(f"   {j}. {domain} (confidence: {0.9 - j*0.1:.2f})")

    except Exception as e:
        print(f"   ⚠️  Error: {e}")

## 🎮 Interactive Domain Generation

Try generating domains for your own business ideas!

In [None]:
# Interactive domain generation
def interactive_domain_generator():
    """Interactive function for domain generation"""
    print("🎯 Interactive Domain Generator")
    print("=" * 40)
    print("Enter your business description below:")

    # In a real Colab environment, you'd use input()
    # For demo purposes, we'll use a sample description
    business_description = "sustainable fashion marketplace for vintage clothing"
    print(f"Business Description: {business_description}")

    target_audience = "fashion-conscious millennials"
    print(f"Target Audience: {target_audience}")

    num_suggestions = 5
    print(f"Number of suggestions: {num_suggestions}")

    print("\n🔍 Generating domain suggestions...")

    try:
        if generator.generator is not None:
            suggestions = generator.generate_domains(
                business_description=business_description,
                target_audience=target_audience,
                num_suggestions=num_suggestions,
                with_confidence=True
            )

            print("\n✨ Domain Suggestions:")
            for i, suggestion in enumerate(suggestions, 1):
                if isinstance(suggestion, dict):
                    print(f"  {i}. {suggestion['domain']} (confidence: {suggestion['confidence']:.2f})")
                else:
                    print(f"  {i}. {suggestion}")
        else:
            print("\n✨ Sample Domain Suggestions (Demo Mode):")
            sample_suggestions = [
                {"domain": "vintagestyle.com", "confidence": 0.89},
                {"domain": "retrowear.io", "confidence": 0.84},
                {"domain": "sustainablethreads.co", "confidence": 0.78},
                {"domain": "ecovintage.com", "confidence": 0.73},
                {"domain": "circularfashion.app", "confidence": 0.68}
            ]

            for i, suggestion in enumerate(sample_suggestions, 1):
                print(f"  {i}. {suggestion['domain']} (confidence: {suggestion['confidence']:.2f})")

    except Exception as e:
        print(f"⚠️  Error generating domains: {e}")

    print("\n💡 Tips for better results:")
    print("  • Be specific about your business model")
    print("  • Include your target audience")
    print("  • Mention key features or differentiators")
    print("  • Try different temperature settings for variety")

# Run interactive generator
interactive_domain_generator()

## 📊 Model Evaluation

Evaluate the performance of your trained model.

In [None]:
# Model evaluation and benchmarking
def evaluate_model_performance():
    """Evaluate model performance on various metrics"""
    print("📊 Model Performance Evaluation")
    print("=" * 40)

    # Test cases for evaluation
    test_cases = [
        "innovative coffee shop with co-working space",
        "AI-powered personal finance advisor",
        "sustainable pet food subscription service",
        "virtual reality fitness studio",
        "blockchain-based voting platform"
    ]

    print(f"🎯 Testing on {len(test_cases)} business descriptions...")

    results = []
    total_domains = 0

    for i, test_case in enumerate(test_cases, 1):
        print(f"\n{i}. {test_case}")

        try:
            if generator.generator is not None:
                import time
                start_time = time.time()

                suggestions = generator.generate_domains(
                    business_description=test_case,
                    num_suggestions=3,
                    with_confidence=True
                )

                end_time = time.time()
                generation_time = end_time - start_time

                print(f"   Generated {len(suggestions)} domains in {generation_time:.2f}s")

                for j, suggestion in enumerate(suggestions, 1):
                    if isinstance(suggestion, dict):
                        print(f"     {j}. {suggestion['domain']} (confidence: {suggestion['confidence']:.2f})")
                    else:
                        print(f"     {j}. {suggestion}")

                results.append({
                    'test_case': test_case,
                    'num_domains': len(suggestions),
                    'generation_time': generation_time,
                    'avg_confidence': np.mean([s.get('confidence', 0.5) if isinstance(s, dict) else 0.5 for s in suggestions])
                })

                total_domains += len(suggestions)

            else:
                print("   (Demo mode - using sample data)")
                results.append({
                    'test_case': test_case,
                    'num_domains': 3,
                    'generation_time': 0.5,
                    'avg_confidence': 0.75
                })
                total_domains += 3

        except Exception as e:
            print(f"   ⚠️  Error: {e}")

    # Calculate overall metrics
    if results:
        avg_generation_time = np.mean([r['generation_time'] for r in results])
        avg_confidence = np.mean([r['avg_confidence'] for r in results])
        avg_domains_per_request = total_domains / len(results)

        print("\n📈 Performance Summary:")
        print(f"  Average generation time: {avg_generation_time:.2f}s")
        print(f"  Average confidence score: {avg_confidence:.2f}")
        print(f"  Average domains per request: {avg_domains_per_request:.1f}")
        print(f"  Total domains generated: {total_domains}")
        print(f"  Domains per second: {total_domains/sum([r['generation_time'] for r in results]):.2f}")

    return results

# Run evaluation
evaluation_results = evaluate_model_performance()

## 📈 Results Visualization

Visualize the performance and results of your domain generator.

In [None]:
# Visualization of results
def create_visualizations(results):
    """Create visualizations of model performance"""
    if not results:
        print("No results to visualize")
        return

    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    fig.suptitle('Domain Generator Performance Analysis', fontsize=16, fontweight='bold')

    # 1. Generation time per test case
    test_cases = [r['test_case'][:30] + '...' if len(r['test_case']) > 30 else r['test_case'] for r in results]
    generation_times = [r['generation_time'] for r in results]

    axes[0, 0].bar(range(len(test_cases)), generation_times, color='skyblue')
    axes[0, 0].set_title('Generation Time by Test Case')
    axes[0, 0].set_ylabel('Time (seconds)')
    axes[0, 0].set_xticks(range(len(test_cases)))
    axes[0, 0].set_xticklabels(test_cases, rotation=45, ha='right')

    # 2. Confidence scores
    confidence_scores = [r['avg_confidence'] for r in results]
    axes[0, 1].bar(range(len(test_cases)), confidence_scores, color='lightgreen')
    axes[0, 1].set_title('Average Confidence Scores')
    axes[0, 1].set_ylabel('Confidence')
    axes[0, 1].set_xticks(range(len(test_cases)))
    axes[0, 1].set_xticklabels(test_cases, rotation=45, ha='right')
    axes[0, 1].set_ylim(0, 1)

    # 3. Number of domains generated
    num_domains = [r['num_domains'] for r in results]
    axes[1, 0].bar(range(len(test_cases)), num_domains, color='orange')
    axes[1, 0].set_title('Domains Generated per Test Case')
    axes[1, 0].set_ylabel('Number of Domains')
    axes[1, 0].set_xticks(range(len(test_cases)))
    axes[1, 0].set_xticklabels(test_cases, rotation=45, ha='right')

    # 4. Performance metrics pie chart
    metrics = {
        'Fast (< 1s)': sum(1 for r in results if r['generation_time'] < 1),
        'Medium (1-3s)': sum(1 for r in results if 1 <= r['generation_time'] < 3),
        'Slow (> 3s)': sum(1 for r in results if r['generation_time'] >= 3)
    }

    axes[1, 1].pie(metrics.values(), labels=metrics.keys(), autopct='%1.1f%%',
                   colors=['lightcoral', 'lightsalmon', 'lightblue'])
    axes[1, 1].set_title('Generation Speed Distribution')

    plt.tight_layout()
    plt.show()

    # Summary statistics
    print("\n📊 Summary Statistics:")
    print(f"  Total test cases: {len(results)}")
    print(f"  Average generation time: {np.mean(generation_times):.2f}s (±{np.std(generation_times):.2f})")
    print(f"  Average confidence: {np.mean(confidence_scores):.3f} (±{np.std(confidence_scores):.3f})")
    print(f"  Total domains generated: {sum(num_domains)}")
    print(f"  Min/Max generation time: {min(generation_times):.2f}s / {max(generation_times):.2f}s")
    print(f"  Min/Max confidence: {min(confidence_scores):.3f} / {max(confidence_scores):.3f}")

# Create visualizations
create_visualizations(evaluation_results)

## 🔬 Advanced Features

Explore advanced functionality like batch generation and model comparison.

In [None]:
# Batch domain generation
def batch_domain_generation(business_descriptions, num_suggestions=3):
    """Generate domains for multiple businesses at once"""
    print("🚀 Batch Domain Generation")
    print("=" * 40)

    all_results = {}

    for i, business in enumerate(business_descriptions, 1):
        print(f"\n{i}. Processing: {business}")

        try:
            if generator.generator is not None:
                suggestions = generator.generate_domains(
                    business_description=business,
                    num_suggestions=num_suggestions,
                    with_confidence=True
                )
                all_results[business] = suggestions

                print(f"   Generated {len(suggestions)} domains:")
                for j, suggestion in enumerate(suggestions, 1):
                    if isinstance(suggestion, dict):
                        print(f"     {j}. {suggestion['domain']} (confidence: {suggestion['confidence']:.2f})")
                    else:
                        print(f"     {j}. {suggestion}")
            else:
                # Demo mode
                sample_suggestions = [
                    {"domain": f"demo{i}_{j}.com", "confidence": 0.8 - j*0.1}
                    for j in range(1, num_suggestions + 1)
                ]
                all_results[business] = sample_suggestions
                print(f"   (Demo) Generated {len(sample_suggestions)} domains")

        except Exception as e:
            print(f"   ⚠️  Error: {e}")
            all_results[business] = []

    return all_results

# Test batch generation
batch_businesses = [
    "smart home automation startup",
    "plant-based protein powder brand",
    "online language learning platform",
    "sustainable packaging solutions company",
    "AI-powered recruitment platform"
]

batch_results = batch_domain_generation(batch_businesses)

In [None]:
# Export results to different formats
def export_results(results, format='json'):
    """Export domain generation results"""
    timestamp = pd.Timestamp.now().strftime("%Y%m%d_%H%M%S")

    if format.lower() == 'json':
        filename = f"domain_results_{timestamp}.json"
        with open(filename, 'w') as f:
            json.dump(results, f, indent=2, default=str)
        print(f"📄 Results exported to: {filename}")

    elif format.lower() == 'csv':
        filename = f"domain_results_{timestamp}.csv"

        # Flatten results for CSV
        rows = []
        for business, suggestions in results.items():
            for i, suggestion in enumerate(suggestions, 1):
                if isinstance(suggestion, dict):
                    rows.append({
                        'business_description': business,
                        'rank': i,
                        'domain': suggestion['domain'],
                        'confidence': suggestion['confidence']
                    })
                else:
                    rows.append({
                        'business_description': business,
                        'rank': i,
                        'domain': suggestion,
                        'confidence': None
                    })

        df = pd.DataFrame(rows)
        df.to_csv(filename, index=False)
        print(f"📊 Results exported to: {filename}")

    return filename

# Export results in both formats
if batch_results:
    json_file = export_results(batch_results, 'json')
    csv_file = export_results(batch_results, 'csv')

    print("\n📁 Files created:")
    print(f"  JSON: {json_file}")
    print(f"  CSV: {csv_file}")
else:
    print("No results to export")

## 🎉 Conclusion

You've successfully run the Domain Name Generator in Google Colab!

### What you've accomplished:
- ✅ Set up the complete domain generation pipeline
- ✅ Trained a custom AI model for domain generation
- ✅ Generated domain suggestions with confidence scores
- ✅ Evaluated model performance
- ✅ Created visualizations of results
- ✅ Exported results in multiple formats

### Next steps:
1. **Try different models**: Experiment with `llama-3.2-1b` or `phi-3-mini` for better quality
2. **Customize training data**: Create your own dataset with domain examples
3. **Fine-tune parameters**: Adjust temperature, confidence thresholds, etc.
4. **Scale up**: Use Colab Pro for longer training sessions

### Tips for production use:
- Use larger models for better quality
- Implement proper domain validation
- Add availability checking via domain APIs
- Create a web interface for end users

Happy domain generating! 🚀

In [None]:
# Final summary and cleanup
print("🎯 Domain Name Generator - Session Summary")
print("=" * 50)

# Show what was accomplished
if 'generator' in locals():
    info = generator.get_model_info()
    print(f"\n📊 Model Configuration:")
    for key, value in info.items():
        print(f"  {key}: {value}")

if 'evaluation_results' in locals() and evaluation_results:
    print(f"\n📈 Performance Metrics:")
    avg_time = np.mean([r['generation_time'] for r in evaluation_results])
    avg_conf = np.mean([r['avg_confidence'] for r in evaluation_results])
    total_domains = sum([r['num_domains'] for r in evaluation_results])

    print(f"  Test cases processed: {len(evaluation_results)}")
    print(f"  Average generation time: {avg_time:.2f}s")
    print(f"  Average confidence: {avg_conf:.3f}")
    print(f"  Total domains generated: {total_domains}")

print(f"\n💡 Quick Usage Reference:")
print(f"  generator = create_generator('distilgpt2')")
print(f"  model_path = generator.train_model()")
print(f"  generator.load_model(model_path)")
print(f"  domains = generator.generate_domains('your business idea')")

print(f"\n🌟 Thank you for using the Domain Name Generator!")
print(f"   For questions or improvements, check the project repository.")