# SafeHire Compliance & Hiring Assistant

**SafeHire** is a verification platform for Kenyan domestic workers. This QA chat helps users with:

- **Compliance** — documents, contracts, minimum wage, rights (employer & worker)
- **Red flags** — what to watch out for when hiring or being hired
- **How to hire a good nanny** — practical steps, interviews, references, verification

Includes: **Gradio UI**, **streaming**, **system prompt** (focus + user type), **scraped knowledge** from the web, **model switch** (GPT / Gemini / Llama), and optional **tools**.

In [28]:
# imports
import os
import sys
import json
from datetime import datetime, timezone
from dotenv import load_dotenv
from openai import OpenAI
import gradio as gr

# Use week2 scraper (fetch_website_contents) — find week2 folder from cwd or parents
for _base in [os.getcwd(), os.path.join(os.getcwd(), ".."), os.path.join(os.getcwd(), "..", "..")]:
    if os.path.isfile(os.path.join(_base, "scraper.py")):
        sys.path.insert(0, _base)
        break
from scraper import fetch_website_contents

load_dotenv(override=True)
api_key = os.getenv("OPENAI_API_KEY")
OLLAMA_BASE_URL = "http://localhost:11434/v1"

if api_key and api_key.startswith("sk-") and len(api_key) > 10:
    print("OpenAI API key looks good so far")
else:
    print("OPENAI_API_KEY not set or invalid — add it to .env to use GPT")


openai = OpenAI() if api_key else None
ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key="ollama")


OpenAI API key looks good so far


In [37]:
# constants
MODEL_GPT = "gpt-4o-mini"
MODEL_LLAMA = "llama3.2"

### Scrape compliance & hiring knowledge from the web

We fetch content from trusted URLs (e.g. labour/rights pages) so the assistant can ground answers in real guidelines. Add or change `COMPLIANCE_URLS` to point to Kenyan labour ministry, NITA, or domestic-worker guidance pages.

In [38]:
# URLs to scrape for compliance and hiring guidance (edit to add Kenyan official sources)
COMPLIANCE_URLS = [
    "https://www.ilo.org/global/topics/domestic-workers/lang--en/index.htm",  # ILO domestic workers
    # Add more, e.g. "https://www.labour.go.ke/", or NITA / training provider pages
]

COMPLIANCE_KNOWLEDGE = ""
try:
    for url in COMPLIANCE_URLS:
        try:
            COMPLIANCE_KNOWLEDGE += fetch_website_contents(url) + "\n\n"
            print(f"Loaded {len(COMPLIANCE_KNOWLEDGE)} chars of compliance knowledge.")
        except Exception as e:
            COMPLIANCE_KNOWLEDGE += f"[Could not fetch {url}: {e}]\n\n"
except NameError:
    # If scraper import failed, use fallback text so the notebook still runs
    COMPLIANCE_KNOWLEDGE = """SafeHire guidance (fallback when scraping is unavailable):
- Employers: verify ID, references, and any training certificates. Use a written contract (hours, pay, leave).
- Workers: keep copies of your contract; know your rights to rest, pay, and safe work.
- Red flags: no contract, payment in cash only with no record, pressure to skip verification, unclear job description.
- Hiring a good nanny: define role clearly, check references, agree on trial period, discuss house rules and boundaries."""
    print("Using fallback knowledge (scraper not loaded).")

# Curated Kenya minimum wage for domestic workers (so the assistant can answer accurately)
# Source: Regulation of Wages (General) (Amendment) Order, 2024. Update when new orders are gazetted.
KENYA_MINIMUM_WAGE_KNOWLEDGE = """
Kenya minimum wage for domestic workers (housekeepers, house helps, maids) — effective from 1 November 2024:

- **Major cities (Nairobi, Mombasa, Kisumu, Nakuru, Eldoret):** KSh 16,113.75 per month for general domestic workers; KSh 17,976.54 per month for night guards.
- **Former municipalities and town councils:** KSh 14,866.92 per month (general); KSh 16,665.96 (night guards).
- **Rural areas and smaller towns:** KSh 8,596.49 per month (general); KSh 10,253.06 (night guards).

**Housing allowance:** If the employer does not provide free accommodation, an additional 15% housing allowance must be paid (e.g. for Nairobi: KSh 2,417.06 on top of KSh 16,113.75, total KSh 18,530.81 per month before overtime).

**As of 2026:** These rates from the 2024 Order remain in effect until a new wage order is gazetted. New rates are typically announced around May for implementation in July; confirm with the Ministry of Labour (labour.go.ke) or the Kenya Gazette for the latest.
"""
COMPLIANCE_KNOWLEDGE = (COMPLIANCE_KNOWLEDGE.strip() or "No scraped content.") + "\n\n" + KENYA_MINIMUM_WAGE_KNOWLEDGE.strip()

if COMPLIANCE_KNOWLEDGE.strip():
    print(f"Loaded {len(COMPLIANCE_KNOWLEDGE)} chars of compliance knowledge (includes Kenya minimum wage).")
else:
    COMPLIANCE_KNOWLEDGE = "No scraped content available. Assistant will rely on general knowledge."
    print("No content scraped — add valid URLs to COMPLIANCE_URLS.")

Loaded 2002 chars of compliance knowledge.
Loaded 2978 chars of compliance knowledge (includes Kenya minimum wage).


### System prompt: SafeHire focus + user type

The assistant adapts to **focus** (Compliance, Red flags, or Hiring a good nanny) and **user type** (Employer, Domestic worker, or Agency). It uses the scraped **compliance knowledge** above to ground answers where relevant.

In [39]:
def build_system_prompt(focus: str, user_type: str, compliance_knowledge: str = "") -> str:
    return f"""You are SafeHire's compliance and hiring assistant for Kenyan domestic workers.
The user is asking as: **{user_type}**.
They have chosen focus: **{focus}**.

Your role:
- Help with **compliance**: documents, contracts, minimum wage, employer and worker rights in Kenya.
- Explain **red flags**: warning signs when hiring or being hired (e.g. no contract, no verification, pressure to skip checks).
- Advise on **how to hire a good nanny**: practical steps, interviews, references, verification, trial periods, house rules.

Use the following scraped/curated knowledge when it is relevant to the question. If the user's question is outside this content, use general good practice and say when they should confirm with official Kenyan sources (e.g. Ministry of Labour, NITA).

--- Scraped/curated knowledge (use when relevant) ---
{compliance_knowledge[:8000] if compliance_knowledge else "No scraped content. Rely on general best practice and recommend official sources."}
--- End of knowledge ---

Answer in clear, practical language. Structure with headings where helpful. Be Kenya-aware. When you need to look up specific guidance from our scraped compliance content (e.g. documents, rights, red flags), use the search_compliance_knowledge tool with a short query.
Add sources when you can after answering the question.
"""

In [40]:
# Tool: search our scraped compliance knowledge (SafeHire-relevant)
def search_compliance_knowledge(query: str, max_chars: int = 1200) -> str:
    """
    Search the scraped compliance/hiring knowledge for a short query.
    Returns the most relevant excerpt so the assistant can ground answers in real guidelines.
    """
    knowledge = COMPLIANCE_KNOWLEDGE if "COMPLIANCE_KNOWLEDGE" in dir() and COMPLIANCE_KNOWLEDGE else ""
    if not knowledge or not query or not query.strip():
        return "No compliance knowledge available to search, or query was empty."
    query_lower = query.strip().lower()
    text = knowledge.lower()
    # Find first occurrence and return surrounding context
    idx = text.find(query_lower)
    if idx == -1:
        # No exact match; return first paragraph that contains any word from the query
        words = [w for w in query_lower.split() if len(w) > 2]
        for word in words:
            idx = text.find(word)
            if idx != -1:
                break
    if idx == -1:
        return "No matching excerpt found in compliance knowledge. Use general best practice."
    start = max(0, idx - 200)
    end = min(len(knowledge), idx + max_chars)
    excerpt = knowledge[start:end].strip()
    return excerpt[:max_chars] if len(excerpt) > max_chars else excerpt

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "search_compliance_knowledge",
            "description": "Search the scraped compliance and hiring guidance (documents, rights, red flags, domestic workers) for a specific topic. Use when the user asks about requirements, documents, red flags, or official guidance so you can cite the source.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Short search query, e.g. 'minimum wage', 'contract', 'red flags', 'documents required'"},
                },
                "required": ["query"],
                "additionalProperties": False,
            },
        },
    }
]

In [41]:
def handle_tool_calls(message):
    """Execute tool calls and return tool messages for the API."""
    responses = []
    for tool_call in message.tool_calls:
        name = tool_call.function.name
        args = json.loads(tool_call.function.arguments or "{}")
        if name == "search_compliance_knowledge":
            result = search_compliance_knowledge(args.get("query", ""))
        else:
            result = f"Unknown tool: {name}"
        responses.append({"role": "tool", "content": result, "tool_call_id": tool_call.id})
    return responses

In [42]:
def stream_chat(message, history, focus: str, user_type: str, model_label: str, use_tools: bool):
    """
    Stream the assistant reply. Supports OpenAI (with optional tools), and Ollama.
    history: list of {"role": "user"|"assistant", "content": "..."}
    Yields accumulated response text for Gradio streaming.
    """
    knowledge = COMPLIANCE_KNOWLEDGE if "COMPLIANCE_KNOWLEDGE" in dir() else ""
    system_prompt = build_system_prompt(focus, user_type, knowledge)
    history_conv = [{"role": h["role"], "content": h["content"]} for h in history]
    messages = [{"role": "system", "content": system_prompt}] + history_conv + [{"role": "user", "content": message}]

    if model_label == "Llama 3.2 (Ollama)":
        stream = ollama.chat.completions.create(model=MODEL_LLAMA, messages=messages, stream=True)
        acc = ""
        for chunk in stream:
            delta = (chunk.choices[0].delta.content or "") if chunk.choices else ""
            acc += delta
            yield acc
        return


    if model_label == "GPT-4o-mini (OpenAI)" and openai:
        tools_arg = TOOLS if use_tools else None
        response = openai.chat.completions.create(
            model=MODEL_GPT, messages=messages, tools=tools_arg, stream=False
        )
        while response.choices[0].finish_reason == "tool_calls":
            msg = response.choices[0].message
            messages.append(msg)
            messages.extend(handle_tool_calls(msg))
            response = openai.chat.completions.create(
                model=MODEL_GPT, messages=messages, tools=tools_arg, stream=False
            )
        final_content = response.choices[0].message.content or ""
        acc = ""
        for c in final_content:
            acc += c
            yield acc
        return

    yield "Unsupported model or missing API key."

### Gradio UI

SafeHire QA chat: choose **focus** (Compliance / Red flags / Hiring a good nanny), **user type** (Employer / Worker / Agency), and **model**. Replies **stream** in real time.

In [46]:
FOCUS_OPTIONS = [
    "Compliance & documents",
    "Red flags & safety",
    "How to hire a good nanny",
]
USER_TYPE_OPTIONS = ["Employer", "Domestic worker", "Agency"]
MODEL_LABELS = ["GPT-4o-mini (OpenAI)", "Llama 3.2 (Ollama)"]

# Logo: load from static folder (static/logo.png or logo.jpg) or set LOGO_URL to a URL. Uses base64 so local file works.
import base64
LOGO_URL = None  # set to a URL if you prefer (e.g. "https://...")
LOGO_HTML = ""
if not LOGO_URL:
    for base in [os.getcwd(), os.path.join(os.getcwd(), ".."), os.path.join(os.getcwd(), "..", "..")]:
        static_dir = os.path.join(base, "static")
        for name in ["logo.png", "logo.jpg", "logo.jpeg", "logo.svg"]:
            path = os.path.join(static_dir, name)
            if os.path.isfile(path):
                with open(path, "rb") as f:
                    b64 = base64.b64encode(f.read()).decode()
                ext = name.split(".")[-1].lower()
                mime = "image/svg+xml" if ext == "svg" else f"image/{ext}"
                LOGO_HTML = f"<img src='data:{mime};base64,{b64}' alt='SafeHire' class='header-logo' />"
                break
        if LOGO_HTML:
            break
else:
    LOGO_HTML = f"<img src='{LOGO_URL}' alt='SafeHire' class='header-logo' />"

HEADER_CSS = """
.header { align-items: center; gap: 12px; }
.header > div:first-child { padding: 0 !important; flex: 0 0 auto !important; min-width: 0 !important; }
.header .html-container { padding: 0 !important; max-width: none !important; }
.header [class*='svelte'] { padding: 0 !important; }
.header-logo { height: 56px; width: auto; display: block; object-fit: contain; }
.header .markdown { margin: 0 !important; }
.header .markdown h2 { margin: 0 !important; }
"""

with gr.Blocks(title="SafeHire – Compliance & Hiring Assistant", theme=gr.themes.Soft(), css=HEADER_CSS) as demo:
    with gr.Row(elem_classes="header"):
        gr.HTML(LOGO_HTML if LOGO_HTML else "<span></span>")
        gr.Markdown("## SafeHire – Compliance & Hiring Assistant")
    gr.Markdown("Ask about **compliance**, **red flags**, or **how to hire a good nanny** for Kenyan domestic workers. Set your options above, then chat below.")
    with gr.Row():
        focus_dropdown = gr.Dropdown(choices=FOCUS_OPTIONS, value=FOCUS_OPTIONS[0], label="Focus")
        user_type_dropdown = gr.Dropdown(choices=USER_TYPE_OPTIONS, value="Employer", label="I am a")
        model_dropdown = gr.Dropdown(choices=MODEL_LABELS, value=MODEL_LABELS[0], label="Model")
        use_tools = gr.Checkbox(value=True, label="Use tools (OpenAI only)")
    chat_column = gr.Column(visible=False)
    with chat_column:
        chatbot = gr.Chatbot(type="messages", label=None, height=400)
    with gr.Row():
        msg = gr.Textbox(placeholder="e.g. What documents should I ask for? What are red flags when hiring a nanny?", label="Your question", scale=4)
        submit = gr.Button("Send", variant="primary", scale=1)
    clear = gr.Button("Clear history")

    def user_commit(message, history):
        if not message or not message.strip():
            return message, history, gr.update()
        return message, history + [{"role": "user", "content": message}], gr.update(visible=True)

    def bot_stream(message, history, focus, user_type, model_label, use_tools):
        if not message or not message.strip():
            return history
        new_history = history
        prev_history = history[:-1] if history and history[-1].get("role") == "user" else history
        full = ""
        for partial in stream_chat(message, prev_history, focus, user_type, model_label, use_tools):
            full = partial
            yield new_history + [{"role": "assistant", "content": full}]

    def clear_msg():
        return ""

    msg.submit(user_commit, [msg, chatbot], [msg, chatbot, chat_column]).then(
        bot_stream, [msg, chatbot, focus_dropdown, user_type_dropdown, model_dropdown, use_tools], chatbot
    ).then(clear_msg, None, [msg])
    submit.click(user_commit, [msg, chatbot], [msg, chatbot, chat_column]).then(
        bot_stream, [msg, chatbot, focus_dropdown, user_type_dropdown, model_dropdown, use_tools], chatbot
    ).then(clear_msg, None, [msg])
    clear.click(fn=lambda: [], inputs=None, outputs=chatbot)

demo.launch(inbrowser=True,share=True)

* Running on local URL:  http://127.0.0.1:7868
* Running on public URL: https://bb18358e2754e9e527.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


