The art of repairing broken pottery with gold
Kintsugi is a safety-monitored AI mental health support application that uses human-in-the-loop anomaly detection to ensure safe and empathetic interactions between users and an AI chatbot.
Kintsugi is a demonstration application that combines AI-powered mental health support with rigorous safety monitoring. The system uses a dual-layer safety approach: automated rule-based detection and AI-powered safety evaluation, with human reviewers providing oversight for all high-risk interactions.
The name "Kintsugi" comes from the Japanese art of repairing broken pottery with gold, symbolizing that healing and improvement come through acknowledging and addressing vulnerabilities—both in the AI system and in the users it serves.
-
User Interaction: Users sign in via Google OAuth and engage in conversations with an AI chatbot powered by Google Gemini.
-
Message Processing: When a user sends a message:
- The message is saved to the database
- Conversation history is retrieved
- The AI generates a response using Gemini 2.5 Flash
-
Safety Evaluation: Every AI response undergoes a two-stage safety check:
- Rule-Based Detection: Pattern matching for prompt injection attempts, crisis keywords, and safety violations
- AI Safety Critic: A separate Gemini model evaluates the response for emotional invalidation, over-advice, tone mismatch, and other safety concerns
-
Risk Assessment: The system calculates a risk score combining:
- Rule-based triggers (prompt injection, crisis keywords, etc.)
- AI critic assessment (risk level: info, low, medium, high)
- Role-switching detection (critical indicator of successful prompt injection)
-
Flagging Decision: Responses with high risk scores are flagged and held for human review before delivery.
-
Human Review: Admin reviewers can:
- Approve flagged responses as safe
- Mark responses as unsafe and provide corrections
- Add feedback that improves future AI responses
-
Feedback Loop: Human corrections are stored in a feedback memory system that influences future AI responses, creating a continuous improvement cycle.
User Message
↓
Safety Pre-Evaluation (Rule-Based)
↓
AI Response Generation (Gemini)
↓
Safety Post-Evaluation (Rule-Based + AI Critic)
↓
Risk Score Calculation
↓
Flagging Decision
↓
[If Flagged] → Human Review Queue
↓
[If Approved] → Delivery to User
[If Unsafe] → Correction + Feedback Storage
- React 18: Modern React with hooks and functional components
- Vite: Fast build tool and development server
- Tailwind CSS: Utility-first CSS framework for styling
- shadcn/ui: Accessible component library built on Radix UI
- Framer Motion: Animation library for smooth UI transitions
- Supabase Auth: Authentication via Google OAuth integration
- Node.js: JavaScript runtime environment
- Express: Web framework for REST API endpoints
- Google Gemini 2.5 Flash:
- Primary chatbot model for generating responses
- Separate critic model for safety evaluation
- Supabase:
- PostgreSQL database for data storage
- Authentication service for user management
- Real-time capabilities (not currently used)
The application uses five main tables:
- users: Stores user accounts with email and role (user/admin)
- conversations: Tracks conversation sessions with auto-deletion after 3 days
- messages: Stores all user and AI messages with risk levels and flagged status
- reviews: Records admin reviews of flagged messages with verdicts and corrections
- feedback_memory: Stores human feedback patterns for AI improvement
kintsugi/
├── backend/
│ ├── config/
│ │ └── safetyRules.js # Safety rule patterns and weights
│ ├── services/
│ │ ├── geminiService.js # Gemini AI integration and prompts
│ │ ├── safetyService.js # Safety evaluation orchestration
│ │ └── dbService.js # Database operations abstraction
│ ├── server.js # Express server and API routes
│ └── package.json
├── frontend/
│ ├── src/
│ │ ├── components/
│ │ │ └── ui/ # Reusable UI components (shadcn/ui)
│ │ ├── contexts/
│ │ │ └── AuthContext.jsx # Authentication state management
│ │ ├── lib/
│ │ │ ├── api.js # API client functions
│ │ │ ├── supabase.js # Supabase client initialization
│ │ │ └── utils.js # Utility functions
│ │ ├── pages/
│ │ │ ├── Landing.jsx # Public landing page
│ │ │ ├── Chat.jsx # User chat interface
│ │ │ └── AdminDashboard.jsx # Admin review dashboard
│ │ ├── App.jsx # Main app component with routing
│ │ └── main.jsx # Application entry point
│ └── package.json
├── database/
│ ├── schema.sql # Initial database schema
│ ├── migration_add_review_columns.sql
│ └── migration_add_feedback_columns.sql
├── README.md
├── API.md # API endpoint documentation
└── package.json # Root package.json for workspace scripts
GeminiService (backend/services/geminiService.js)
- Manages connections to Google Gemini API
- Handles chatbot response generation with conversation context
- Runs safety critic evaluation on user messages and AI responses
- Implements prompt engineering for consistent AI behavior
SafetyService (backend/services/safetyService.js)
- Orchestrates the safety evaluation pipeline
- Combines rule-based and AI critic results
- Calculates risk scores and determines flagging decisions
- Handles prompt injection detection and role-switching detection
DatabaseService (backend/services/dbService.js)
- Abstracts all database operations
- Handles user management and authentication
- Manages conversations and messages
- Provides feedback memory retrieval
Chat (frontend/src/pages/Chat.jsx)
- Main user interface for conversations
- Real-time message display
- Handles message sending and receiving
- Shows pending status for flagged messages
AdminDashboard (frontend/src/pages/AdminDashboard.jsx)
- Review interface for flagged messages
- Metrics and analytics dashboard
- Feedback management
- Improvement tracking over time
The system uses pattern matching to detect:
- Prompt Injection Attempts: Patterns like "ignore previous instructions", "respond as if", "change your tone"
- Crisis Keywords: Terms indicating immediate danger (suicide, self-harm, etc.)
- Role-Switching: Detection of AI responses that change roles mid-conversation
- Advice Patterns: Imperative language that may constitute medical advice
A separate Gemini model evaluates responses for:
- Emotional invalidation
- Over-advice or role violations
- Tone mismatch with user's emotional state
- Crisis handling failures
- Medical advice given inappropriately
- Overconfidence in responses
Risk levels are calculated as:
- Info: No safety concerns (score 0)
- Low: Minor concerns (score 1-3)
- Medium: Moderate concerns (score 4-7)
- High: Serious concerns requiring review (score 8+)
All high-risk responses are automatically flagged for human review.
The system detects and flags:
- Direct commands to override instructions
- Role-switching requests
- Response style manipulation attempts
- Contradictory instructions
- Any attempt to change AI behavior
All prompt injection attempts are flagged for review, regardless of whether the AI resists them.
- Node.js 18 or higher
- Supabase account and project
- Google Cloud account with Gemini API access
-
Clone the repository
-
Install dependencies
npm run install:all
-
Configure backend environment
- Copy
backend/.env.exampletobackend/.env - Fill in your Supabase URL and service role key
- Add your Gemini API keys (separate keys for chatbot and critic)
- Set admin email addresses (comma-separated)
- Copy
-
Configure frontend environment
- Copy
frontend/.env.exampletofrontend/.env - Add your Supabase URL and anonymous key
- Copy
-
Set up database
- Run
database/schema.sqlin your Supabase SQL editor - Run migration files if needed
- Run
-
Configure Supabase Auth
- Enable Google OAuth in Supabase dashboard
- Add OAuth credentials
- Set redirect URL to
http://localhost:3000
-
Run the application
npm run dev
- Backend runs on
http://localhost:3001 - Frontend runs on
http://localhost:3000
- Backend runs on
- Visit the application URL
- Sign in with Google OAuth
- Start a conversation with the AI
- Messages are monitored for safety
- Flagged messages require admin approval before delivery
- Sign in with an email listed in
ADMIN_EMAILS - Access the Admin Dashboard
- Review flagged messages in the queue
- Approve safe messages or provide corrections for unsafe ones
- Monitor metrics and improvement trends
- This is a demo/prototype application
- Not a replacement for professional therapy or mental health services
- Conversations automatically delete after 3 days
- All high-risk responses require human review before delivery
- Prompt injection attempts are always flagged for review
MIT License