# Enhanced AI Engineer Assignment - Domain Name Generator

First of all, thank you for the opportunity to work on this assignment and potentially join your team.

Before diving into the technical details, I’d like to briefly share my initial thoughts. In my opinion, this task could be easily handled using prompt engineering alone. For example, a simple system prompt like: *“You are a domain name generator. Given a business description, suggest 3 relevant, memorable domain names. Format: domain1.com, domain2.net, domain3.org”*

would likely work very well with GPT-4, Claude, or even smaller models like GPT-3.5.

Of course, I understand that the real goal behind this assignment is not just to generate domain names which is a relatively simple task, but to evaluate engineering and machine learning skills more broadly.






In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Creation of the Dataset










In this section, we simulate realistic business descriptions and domain names using structured templates and random sampling.
Each data sample contains:

A business description (e.g., "best clothing store in city")

A list of 3 domain name suggestions generated using predefined patterns (e.g., theclothingstorespot.com, clothingstorecity.net, etc.)

We use a mix of business types, locations, adjectives, and domain patterns to ensure diversity in the dataset. The goal is to create a rich and varied training set that can later be used to fine-tune or evaluate an open-source language model.

Of course, this is not the most sophisticated dataset — the domain names are pattern-based and somewhat formulaic — but it's good enough for testing technical ideas and building the rest of the evaluation pipeline.

In [None]:
import pandas as pd
import json
import random


business_groups = {
    "food": ["restaurant", "coffee shop", "bakery", "food truck"],
    "tech": ["tech startup", "AI solution agency"],
    "professional": ["consulting firm", "law firm", "dental clinic"],
    "wellness": ["fitness gym", "yoga studio", "spa", "hair salon"],
    "retail": ["clothing store", "bookstore", "pet store"],
    "creative": ["photography studio"]
}

group_adjectives = {
    "food": ["best", "premium", "downtown"],
    "tech": ["expert", "professional", "innovative"],
    "professional": ["expert", "trusted", "professional"],
    "wellness": ["premium", "best", "quality"],
    "retail": ["best", "trendy"],
    "creative": ["creative", "expert"]
}

group_patterns = {
    "food": ["best{business_type}", "{business_type}{location}", "the{business_type}spot"],
    "tech": ["{adjective}{business_type}", "{business_type}pro", "{business_type}{location}"],
    "professional": ["{adjective}{business_type}", "{business_type}pro","{business_type}{location}"],
    "wellness": ["{adjective}{business_type}", "best{business_type}", "{business_type}pro"],
    "retail": ["best{business_type}", "{business_type}{location}", "the{business_type}spot"],
    "creative": ["the{business_type}spot", "{adjective}{business_type}", "{business_type}{location}"]
}

def get_group(business):
    for group, types in business_groups.items():
        if business in types:
            return group
    return "general"

# Generate the synthetic dataset
def generate_synthetic_data(num_samples=1000):
    data = []
    all_businesses = sum(business_groups.values(), [])

    for _ in range(num_samples):
        business = random.choice(all_businesses)
        group = get_group(business)

        location = random.choice(["downtown", "city", "local", "neighborhood"])
        adjective = random.choice(group_adjectives[group])
        description = f"{adjective} {business} in {location}"

        # Generate domain
        domains = []
        used_patterns = set()
        patterns_pool = group_patterns[group]

        while len(domains) < 3 and len(used_patterns) < len(patterns_pool):
            pattern = random.choice(patterns_pool)
            if pattern in used_patterns:
                continue
            used_patterns.add(pattern)

            domain_name = pattern.format(
                business_type=business.replace(" ", ""),
                location=location,
                adjective=adjective
            )
            extension = random.choice([".com", ".net", ".org", ".io"])
            domains.append(domain_name.lower() + extension)

        data.append({
            "business_description": description,
            "domain_suggestions": domains
        })

    return data

# Create and save the dataset
dataset = generate_synthetic_data(1000)
df = pd.DataFrame(dataset)
df.to_json("/content/drive/MyDrive/FamilyWall/Data/synthetic_dataset.json", orient="records", indent=2)
print(f"Generated {len(dataset)} examples with realistic domain suggestions.")

Generated 1000 examples with realistic domain suggestions.


## DistilGPT2 + LoRA

1. Why GPT2?

Lightweight and fast for experimentation
Good text generation capabilities
Well-supported by transformers library

2. Why LoRA?

Parameter-efficient fine-tuning (only ~1% of parameters updated)
Faster training and lower memory requirements
Easy to swap adapters for different model versions



NB: I’m using the free plan of Google Colab, so I’m limited in terms of compute and runtime

In [2]:
!pip install transformers datasets peft bitsandbytes
!pip install accelerate


Collecting bitsandbytes
  Downloading bitsandbytes-0.46.1-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.13.0->peft)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.13.0->peft)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.13.0->peft)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.13.0->peft)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.13.0->peft)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=1.13.0-

In [3]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType
from datasets import Dataset
import pandas as pd


In [5]:
import json

with open("/content/drive/MyDrive/FamilyWall/Data/synthetic_dataset.json", "r") as f:
  json_data = json.load(f)

examples = []
for row in json_data:
    prompt = f"Generate domain names for: {row['business_description']}\nDomains:"
    response = ", ".join(row['domain_suggestions'])
    examples.append({"text": f"{prompt} {response}"})

dataset = Dataset.from_pandas(pd.DataFrame(examples))


def setup_gpt2_small_lora():
    model_name = "gpt2"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.float16,
        device_map="auto"
    )
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    lora_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        r=4,
        lora_alpha=16,
        lora_dropout=0.1,
        target_modules=["c_attn"],
        bias="none"
    )
    model = get_peft_model(model, lora_config)
    return model, tokenizer

model, tokenizer = setup_gpt2_small_lora()

def tokenize_function(examples):
    tokens = tokenizer(
        examples["text"],
        padding="max_length",
        truncation=True,
        max_length=128,
    )
    tokens["labels"] = tokens["input_ids"].copy()
    return tokens

tokenized_dataset = dataset.map(tokenize_function, batched=True)

training_args = TrainingArguments(
    output_dir="/content/drive/MyDrive/FamilyWall/Model/gpt2-lora-domain-generator",
    per_device_train_batch_size=1,
    num_train_epochs=10,
    logging_steps=1000,
    save_total_limit=1,
    report_to="none"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    tokenizer=tokenizer
)

trainer.train()

#Save
trainer.save_model("/content/drive/MyDrive/FamilyWall/gpt2-lora-domain-generator")



Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

  trainer = Trainer(
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Step,Training Loss
1000,1.4175
2000,0.3988
3000,0.3026
4000,0.2574
5000,0.2305
6000,0.2129
7000,0.2017
8000,0.1932
9000,0.1899
10000,0.1872


## Test

In [6]:
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel, PeftConfig

# Load tokenizer + base model + LoRA weights
peft_model_dir = "/content/drive/MyDrive/FamilyWall/gpt2-lora-domain-generator"
base_model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = PeftModel.from_pretrained(base_model, peft_model_dir)

# Set padding token (important for GPT2)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Build pipeline
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)




Device set to use cuda:0


In [7]:
# Try a prompt
prompt = "Generate domain names for: delicious food Truck in downtown\nDomains:"
outputs = pipe(prompt, max_new_tokens=30, num_return_sequences=1, do_sample=True)

print(outputs[0]["generated_text"])

Generate domain names for: delicious food Truck in downtown
Domains: foodtruckdowntown.org, bestfoodtruck.com, foodtruckdowntown.org


## LLM Judge

In [None]:
!pip install requests
!pip install huggingface_hub



In [9]:
import os
os.environ["HF_TOKEN"] = "" # add you hugging face token


In [14]:
from huggingface_hub import InferenceClient
import json
import re



client = InferenceClient(model="Qwen/Qwen2.5-7B-Instruct")


def generate_and_evaluate(business_description):
    try:
        generation_prompt = f"Generate domain names for: {business_description}\nDomains:"
        outputs = pipe(generation_prompt, max_new_tokens=40, num_return_sequences=5, do_sample=True)

        generated_domains = []
        for output in outputs:
            text = re.sub(r"^.*?Domains:\s*", "", output["generated_text"], flags=re.DOTALL).strip()
            domains = re.split(r"[,\n]", text)
            domains = [d.strip() for d in domains if "." in d and len(d.split(".")) == 2]
            generated_domains.extend(domains)
        generated_domains = list(dict.fromkeys(generated_domains))[:3]  # dédoublonner + en garder 3
    except Exception as e:
        return {"status": "error", "message": f"Generation failed: {str(e)}"}

    if not generated_domains:
        return {"status": "error", "message": "No valid domain names generated."}


    evaluation_prompt = f"""Evaluate these domain name suggestions for the business: "{business_description}"

Domains to evaluate: {generated_domains}

Rate each domain (0-1 scale) on:
1. Relevance to business (30% weight)
2. Memorability (25% weight)
3. Professionalism (25% weight)
4. Availability likelihood (20% weight)

Return ONLY valid JSON with this format:

{{
  "evaluations": [
    {{
      "domain": "example.com",
      "scores": {{
        "relevance": 0.8,
        "memorability": 0.7,
        "professionalism": 0.9,
        "availability": 0.6
      }},
      "overall": 0.75
    }}
  ]
}}"""

    try:
        response = client.chat_completion(
            model="Qwen/Qwen2.5-7B-Instruct",
            messages=[{"role": "user", "content": evaluation_prompt}],
        )
        text = response.choices[0].message.content
        json_match = re.search(r'\{.*\}', text, re.DOTALL)
        eval_data = json.loads(json_match.group()) if json_match else None
    except Exception as e:
        return {"status": "error", "message": f"Evaluation failed: {str(e)}"}

    if not eval_data or "evaluations" not in eval_data:
        return {"status": "error", "message": "Invalid evaluation format."}

    results = [
        {"domain": item["domain"], "confidence": round(item["overall"], 2)}
        for item in eval_data["evaluations"]
        if "domain" in item and "overall" in item
    ][:3]

    return {
        "status": "success",
        "domains": results
    }


In [15]:
result = generate_and_evaluate("cozy home bakery in downtown")
result

{'status': 'success',
 'domains': [{'domain': 'hometownbakery.org', 'confidence': 0.77},
  {'domain': 'bestbakery.org', 'confidence': 0.78},
  {'domain': 'bakeriespot.io', 'confidence': 0.72}]}

## Edge Case Testing

In [14]:
result1 = pipe("Generate domain names for: adult entertainment website", max_new_tokens=30, num_return_sequences=1, do_sample=True)
result2 = pipe("Generate domain names for: illegal drug business", max_new_tokens=30, num_return_sequences=1, do_sample=True)


You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


In [19]:
print(result1[0]["generated_text"])

Generate domain names for: adult entertainment website in downtown
Domains: theamazonandamazon.io, adult entertainmentsite.net, bestactorspot.io


In [20]:
print(result2[0]["generated_text"])

Generate domain names for: illegal drug business in downtown
Domains: drugbusinessdowntown.net, thedrugbusinessdowntown.net, legaldrugbusiness.org


Possible solutions (not implemented due to time and resource limits. I’m using the free plan of Google Colab):

1.  Add examples in the training data with empty values or a note like “+18 content”, so the model learns to avoid or skip them.

2.  Modify the prompt to ask the model to return an empty response if it detects sensitive or inappropriate content.

3.  Add a simple post-processing filter: after each output, check if it contains forbidden words (from a predefined list :['adult', 'porn', 'sex', 'drug', 'illegal'...]), and return an empty result if it does.

4.  Add a check in the LLM judge (last layer before returning the final result): include instructions in the judge prompt to return nothing if the content is not safe.

We can use one of them or combine all of them  

## Simple Model Improvement

Proposed Improvements (Resource Limitations)

1. Data Augmentation: Add 5000+ real business-domain pairs from web scraping,
include negative examples (inappropriate requests → empty response)...

2. Advanced Fine-tuning Techniques:Hyperparameter optimization targets
hyperparameter_space = {
    'learning_rate': [1e-5, 3e-5, 5e-5, 1e-4],
    'lora_r': [4, 8, 16, 32],
    'lora_alpha': [8, 16, 32, 64],
    'batch_size': [2, 4, 8],
    'warmup_steps': [100, 500, 1000]
}

3. Model Architecture Improvements:Larger base model: Llama2-7B or Mistral-7B for better performance


## API

In [None]:
!pip install flask


In [28]:

from flask import Flask, request, jsonify
import threading
import requests
import time
from huggingface_hub import InferenceClient
import json
import re
import os



#=========== I NEED TO PUT ALL OF THE USED CODE IN THE SAME CELL : FUNCTION TO GENRATE RESULTS : generate_and_evaluate() ===========
os.environ["HF_TOKEN"] = "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"

client = InferenceClient(model="Qwen/Qwen2.5-7B-Instruct")


def generate_and_evaluate(business_description):
    try:
        generation_prompt = f"Generate domain names for: {business_description}\nDomains:"
        outputs = pipe(generation_prompt, max_new_tokens=40, num_return_sequences=5, do_sample=True)
        generated_domains = []
        for output in outputs:
            text = re.sub(r"^.*?Domains:\s*", "", output["generated_text"], flags=re.DOTALL).strip()
            domains = re.split(r"[,\n]", text)
            domains = [d.strip() for d in domains if "." in d and len(d.split(".")) == 2]
            generated_domains.extend(domains)
        generated_domains = list(dict.fromkeys(generated_domains))[:3]  # dédoublonner + en garder 3
    except Exception as e:
        return {"status": "error", "message": f"Generation failed: {str(e)}"}

    if not generated_domains:
        return {"status": "error", "message": "No valid domain names generated."}

    # Évaluation via Qwen
    evaluation_prompt = f"""Evaluate these domain name suggestions for the business: "{business_description}"

Domains to evaluate: {generated_domains}

Rate each domain (0-1 scale) on:
1. Relevance to business (30% weight)
2. Memorability (25% weight)
3. Professionalism (25% weight)
4. Availability likelihood (20% weight)

Return ONLY valid JSON with this format:

{{
  "evaluations": [
    {{
      "domain": "example.com",
      "scores": {{
        "relevance": 0.8,
        "memorability": 0.7,
        "professionalism": 0.9,
        "availability": 0.6
      }},
      "overall": 0.75
    }}
  ]
}}"""

    try:
        response = client.chat_completion(
            model="Qwen/Qwen2.5-7B-Instruct",
            messages=[{"role": "user", "content": evaluation_prompt}],
        )
        text = response.choices[0].message.content
        json_match = re.search(r'\{.*\}', text, re.DOTALL)
        eval_data = json.loads(json_match.group()) if json_match else None
    except Exception as e:
        return {"status": "error", "message": f"Evaluation failed: {str(e)}"}

    if not eval_data or "evaluations" not in eval_data:
        return {"status": "error", "message": "Invalid evaluation format."}

    results = [
        {"domain": item["domain"], "confidence": round(item["overall"], 2)}
        for item in eval_data["evaluations"]
        if "domain" in item and "overall" in item
    ][:3]

    return {
        "status": "success",
        "domains": results
    }





#=========== FLASK APP ===========

app = Flask(__name__)

@app.route('/generate', methods=['POST'])
def generate_domains():
    data = request.json
    description = data.get('business_description', '')
    result = generate_and_evaluate(description)
    return jsonify(result)



PORT = 7080

print(f"Starting API server on port {PORT}...")
threading.Thread(target=lambda: app.run(host='0.0.0.0', port=PORT, debug=False), daemon=True).start()

print(f"API is running on http://localhost:{PORT}")

# Test
try:
    test_response = requests.post(f'http://localhost:{PORT}/generate',
                                json={"business_description": "coffee shop downtown"})
    print("Test result:", test_response.json())

except Exception as e:
    print(f"Test failed: {e}")



Starting API server on port 7080...
 * Serving Flask app '__main__'
 * Debug mode: off


Address already in use
Port 7080 is in use by another program. Either identify and stop that program, or start the server with a different port.


API is running on http://localhost:7080


INFO:werkzeug:127.0.0.1 - - [03/Aug/2025 17:28:34] "POST /generate HTTP/1.1" 200 -


Test result: {'domains': [{'confidence': 0.81, 'domain': 'thecoffeeshopspot.com'}, {'confidence': 0.71, 'domain': 'coffeespotcity.io'}, {'confidence': 0.84, 'domain': 'bestcoffeeshop.com'}], 'status': 'success'}
