Skip to content

Devaur03/Closira

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Closira — AI Customer-Support Workflow

A Python AI workflow that handles a simulated customer-support conversation end-to-end for Bloom Aesthetics Clinic, a fictional small aesthetics business. Built for the Closira AI Engineering Intern assignment.

The workflow runs a conversation through four stages:

  1. FAQ Answering — answers customer questions using a provided SOP only.
  2. Lead Qualification — asks structured questions to qualify the lead.
  3. Escalation Detection — detects when a human is needed and logs why.
  4. Conversation Summary — produces a structured end-of-session summary.

It works with the Anthropic Claude API, the OpenAI API, or a local Ollama model — it auto-detects the provider. Provide a hosted API key and it uses that; provide nothing and it falls back to a free local model (qwen2.5:1.5b) via Ollama. Reviewers only need to drop in an API key.


Quick start

1. Requirements

  • Python 3.10 or newer
  • One of: an Anthropic Claude API key, an OpenAI API key, or a local Ollama install (no key, free)

2. Install

cd closira-ai-workflow
python -m pip install -r requirements.txt

3. Choose how to run it

The workflow auto-detects the provider in this priority order: Anthropic key → OpenAI key → local Ollama fallback.

cp .env.example .env        # Windows: copy .env.example .env

Reviewers / recruiters — use a hosted model (recommended). Open .env and fill in one key:

ANTHROPIC_API_KEY=sk-ant-...      # or
OPENAI_API_KEY=sk-...

That's it — the key is picked up automatically. To force a provider, set LLM_PROVIDER=anthropic, openai, or ollama.

Cost note: both hosted providers are paid APIs, but new accounts include free trial credit — far more than enough for every demo here (each full session costs a fraction of a cent).

Local development — use the free Ollama fallback (no key). Leave the keys in .env blank. With no key set, the workflow runs entirely locally on Ollama. One-time setup:

# 1. Install Ollama from https://ollama.com
# 2. Pull the model used by this project
ollama pull qwen2.5:1.5b
# 3. Make sure the Ollama server is running (it usually starts automatically)
ollama serve

Then run the workflow normally — it will detect Ollama and use it. The model and endpoint are configurable via OLLAMA_MODEL and OLLAMA_BASE_URL in .env.

Why a small model for local dev: qwen2.5:1.5b is tiny and fast, so it is great for iterating on the workflow logic offline at zero cost. It is less reliable at strict JSON output and nuanced escalation than a hosted model — so for grading/evaluation, a hosted key is recommended. The code requests native JSON mode from Ollama and defensively repairs malformed JSON to keep the small model usable.

4. Run it

Interactive mode — you type as the customer:

python -m src.main

Scripted mode — replays a saved customer script (used to reproduce the test transcripts):

python -m src.main --script test_transcripts/scripts/01_in_sop.txt

Type exit or quit at any time to end a session. When the session ends, the structured conversation summary prints to the screen and the full transcript is saved to logs/transcripts/.


Project structure

closira-ai-workflow/
├── README.md                  This file
├── prompt_design.md           Full system prompt + design reasoning
├── requirements.txt           Dependencies
├── .env.example               Configuration template
├── VIDEO_SCRIPT.md            Script for the 2-5 min walkthrough video
│
├── data/
│   └── sop.json               The SOP — the AI's ONLY source of truth
│
├── src/
│   ├── main.py                CLI entry point
│   ├── config.py              Settings (provider, thresholds) from env
│   ├── llm_client.py          Provider-agnostic LLM client (Claude/OpenAI)
│   ├── sop.py                 Loads + renders the SOP
│   ├── prompts.py             Every prompt used by the workflow
│   ├── conversation.py        Orchestrator — wires the four stages together
│   ├── logger.py              Append-only escalation logging
│   └── stages/
│       ├── faq.py             Stage 1 — FAQ answering
│       ├── qualification.py   Stage 2 — lead qualification
│       ├── escalation.py      Stage 3 — escalation detection
│       └── summary.py         Stage 4 — conversation summary
│
├── test_transcripts/
│   ├── 01_in_sop_question.md       One transcript per expected behaviour
│   ├── 02_out_of_scope_question.md
│   ├── 03_escalation_trigger.md
│   ├── 04_lead_qualification.md
│   ├── 05_conversation_summary.md
│   └── scripts/                    Customer scripts to regenerate them
│
└── logs/
    ├── escalations.log        One JSON line per escalation (created at run)
    └── transcripts/           Saved transcript + summary per session

How the workflow works

Each customer message flows through the workflow like this:

customer message
      │
      ▼
[ Stage 3: Escalation check ]  ── runs on EVERY message, before answering
      │  complaint / anger / medical / pricing negotiation / human request?
      │        │ yes ──────────────► hand off to a human, end session
      │        │ no
      ▼
[ Stage 1: FAQ answering ]  ── answers from data/sop.json ONLY
      │  returns: reply, answered_from_sop, confidence, escalate
      │        │ low confidence / out of scope ──► flag + log, hand off
      │        │ answered
      ▼
   "Anything else?"  ── customer has more questions? loop back to Stage 1
      │ no
      ▼
[ Stage 2: Lead qualification ]  ── 3 structured questions
      │
      ▼
[ Stage 4: Conversation summary ]  ── structured summary, session ends

The orchestrator (src/conversation.py) owns all state and stage transitions, so the flow is deterministic and easy to trace. The four stages are cleanly separated — each lives in its own module under src/stages/.

Full reasoning behind the prompts, grounding strategy, and escalation logic is in prompt_design.md.

The SOP

The AI's entire knowledge base is data/sop.json — an extended version of the assignment's sample SOP for Bloom Aesthetics Clinic. It contains the business details, hours, six services with pricing, booking and cancellation policy, general clinic policies, six FAQs, and seven escalation rules. The AI may answer only from this file; anything outside it triggers an honest "I don't have that information" and an escalation.

To use a different business, edit data/sop.json — no code changes needed.


Reliability & safety design (summary)

  • Grounding: the SOP is the only source of truth; the system prompt forbids stating, guessing, estimating, or inferring anything not in it.
  • Structured output: every stage returns JSON, so the Python layer makes escalation decisions on explicit fields — not on interpreting prose.
  • Confidence threshold: answers below CONFIDENCE_THRESHOLD (default 0.6) are escalated automatically, even if the model didn't ask for it.
  • Dedicated escalation classifier: safety detection is a separate model call that runs before answering — it is not left to the answering model.
  • Fail-safe defaults: if the escalation classifier errors, the workflow escalates anyway; if the FAQ call errors, it hands off instead of guessing.
  • Audit log: every escalation is written to logs/escalations.log with a timestamp, reason, rationale, and which path raised it.

Dependencies

Package Purpose
anthropic Anthropic Claude API client (used if Claude is selected)
openai OpenAI API client — also used for the Ollama fallback, since Ollama exposes an OpenAI-compatible API
python-dotenv Loads the .env file (optional — env vars also work)

You only need the SDK for the provider you actually use. The Ollama fallback reuses the openai package (no extra dependency) and additionally requires Ollama installed locally — see Local development above.


Trade-offs & known limitations

  • Multiple model calls per turn. Each customer message triggers a separate escalation-classifier call plus the FAQ call (and occasionally a small intent call). This is a deliberate trade of latency/token cost for reliability — keeping safety detection separate from answer generation makes it far more dependable. A production system could consolidate calls once behaviour is validated.
  • Fixed qualification questions. The three qualification questions are the same every session. This keeps lead data consistent and comparable, but it is not adaptive — the AI may ask something the conversation already revealed.
  • Heuristic "anything else?" detection. The FAQ→qualification transition uses a keyword heuristic with a model fallback; an unusually phrased reply could occasionally be misclassified.
  • Confidence is a model self-assessment. The 0.6 threshold is a sensible default but should be tuned against real transcripts for a given business.
  • Single SOP, English, no real booking. The workflow operates on one SOP in UK English and cannot perform real bookings — it only explains how to book. Multi-tenant, multi-language, and live booking integration are out of scope for this assignment.
  • Local Ollama fallback is for development, not grading. qwen2.5:1.5b is small; it is fast and free for offline iteration but weaker at strict JSON output and nuanced escalation. Evaluation is best done against a hosted model (Anthropic or OpenAI). The provider abstraction means the same code runs identically on all three.
  • Test transcripts are representative samples. The files in test_transcripts/ show realistic output; exact wording will vary slightly per model run. Regenerate any of them with --script (see above).

Reproducing the test transcripts

Each expected behaviour has a customer script in test_transcripts/scripts/:

python -m src.main --script test_transcripts/scripts/01_in_sop.txt
python -m src.main --script test_transcripts/scripts/02_out_of_scope.txt
python -m src.main --script test_transcripts/scripts/03_escalation_trigger.txt
python -m src.main --script test_transcripts/scripts/04_lead_qualification.txt
python -m src.main --script test_transcripts/scripts/05_conversation_summary.txt

The committed .md transcripts in test_transcripts/ are annotated sample runs explaining what each one demonstrates.

About

Closira is a four-stage AI support workflow for small businesses: SOP-grounded FAQ answering, lead qualification, escalation detection, and structured conversation summaries. Python CLI with Claude, OpenAI, or local Ollama support.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages