"A Soft Patch for Hard Moments." [!IMPORTANT] Medical Disclaimer: EmoPatch is a wellness tool designed for self-reflection and emotional clarity. It is NOT a medical device and does not provide medical diagnosis, treatment, or professional therapy. If you are in a crisis, please seek professional help immediately.
EmoPatch is a bite-sized emotional clarity web application designed for high-sensitivity users. It utilizes Google Gemini 3 to help users de-escalate immediate emotional triggers by applying Cognitive Reframing principles.
Unlike generic chatbots that offer vague comfort, EmoPatch acts as a structured "Thought Sorting" system. It helps users separate objective facts from subjective interpretations, identifies unhelpful thinking patterns, and offers fresh perspectives through distinct, non-judgmental personas.
The application is built around "Emotional Logic Analysis" to interrupt the cycle of rumination:
Stress often stems from confusing what happened (Fact) with what we assume it means (Interpretation).
- Step: The app asks the user to "dump" their raw thoughts via Voice, Text, or Image.
- AI Logic: It extracts the "Camera Fact" (objective reality, e.g., "He looked at his phone") and separates it from the "Narrative" (e.g., "He is ignoring me").
Before offering perspective, the system acts as a gentle safety net. If the user's input contains potentially stressful patterns (e.g., Negative Self-Talk, Catastrophizing, Assumption Making), the flow interrupts with a gentle "Pause & Breathe" moment to stabilize the user before proceeding.
Instead of generic "positive vibes," EmoPatch offers three specific cognitive lenses to view the situation:
- πΉ The Observer (Objectivity): A mechanical, dry view of the event. "Event A occurred, followed by Event B." (Reduces emotional intensity).
- 𧬠The Sociologist (Systems): Attributes events to external factors, probability, or human limitations. "This is a systemic phenomenon, not a personal target." (Reduces personalization).
- π‘οΈ The Shield (Boundaries): A protective, agency-focused voice. "Regardless of the outcome, I prioritize my own well-being." (Restores sense of control).
EmoPatch leverages Gemini 3's native multimodal vision to decode complex social signals that are often lost in pure text.
Tip
Why Vision? A text-only model sees "OK. Cool." but Gemini 3 sees the absence of emojis, the stark period at the end of a short sentence, and the visual contrast in message lengthβall key indicators of "dry" communication that trigger user anxiety.
- Input: A user uploads a screenshot of a 3-word reply sent after a 5-hour delay.
- Gemini 3 Analysis: Beyond OCR, the model recognizes visual patternsβthe lack of trailing exclamation points and the period after "OK".
- Impact: The "Observer" persona stops the narrative loop by isolating the Camera Fact from the Anxious Interpretation.
π Read the Full Multimodal Technical Breakdown
EmoPatch is not just "using an LLM." We utilize specific Gemini 3 features to solve psychological friction:
- Native Multimodal Vision: No separate OCR layer. Gemini 3 "feels" the layout and tone of a screenshot directly.
- High-Reasoning (Pro): For complex cognitive reframing where nuance is everything.
- Speed (Flash): For instant perception and "first response" stabilization.
EmoPatch does not rely on a single model. Instead, it acts as a Multi-Model Orchestrator, assigning specific cognitive tasks to the most suitable Gemini variant to balance latency, reasoning depth, and emotional resonance.
graph TD
%% Styling
classDef input fill:#f9f,stroke:#333,stroke-width:2px;
classDef react fill:#61dafb,stroke:#333,stroke-width:2px,color:black;
classDef geminiFlash fill:#e6f7ff,stroke:#007bff,stroke-width:2px;
classDef geminiPro fill:#e6fffa,stroke:#00cc88,stroke-width:4px;
classDef geminiTTS fill:#fff0f6,stroke:#ff85c0,stroke-width:2px;
User((User)) -->|Voice / Image / Text| React[React App\n(Client Side)]:::react
subgraph "Perception Layer (Fast)"
React -->|useAudioRecorder Hook| AudioBlob[Audio Blob]
AudioBlob -->|Multimodal Input| G3Flash[Gemini 3 Flash\n(STT & Emotional Tone)]:::geminiFlash
G3Flash -->|Transcribed Text + Tone| React
end
subgraph "Cognitive Layer (Deep)"
React -->|Structured Prompt + CBT Context| G3Pro[Gemini 3 Pro\n(Reasoning & Reframing)]:::geminiPro
G3Pro -->|Fact/Interpretation Split| React
G3Pro -->|3 Persona Perspectives| React
end
subgraph "Response Layer (Empathetic)"
React -->|Final Advice Text| G2TTS[Gemini 2.5 Flash TTS\n(Voice Synthesis)]:::geminiTTS
G2TTS -->|Audio Buffer| React
end
React -->|Play Audio & Visuals| User
- Primary Model:
gemini-3-pro-preview - Role: Handles the core "Three Perspectives" generation and deep pattern analysis.
- Why: Cognitive reframing requires high nuance and empathy. The Pro model excels at complex reasoning, ensuring the "Sociologist" and "Shield" personas sound profound and human-like.
- Primary Model:
gemini-3-flash-preview - Role: Handles real-time voice transcription, multimodal input (images), and initial intent detection.
- Why: For the initial interaction, low latency is critical. Flash provides native multimodal understanding at high speed, making the user feel heard instantly.
- Model:
gemini-2.5-flash-preview-tts - Role: Specialized Text-to-Speech generation.
- Why: While Gemini 3 is multimodal, we utilize the specialized 2.5 TTS model to access advanced speech controls (e.g., 'Kore' persona) to ensure the output voice is soothing and therapeutically appropriate.
To ensure strict compliance with international safety standards, EmoPatch is engineered with multi-layered safety moats. The system is explicitly designed as a Cognitive Coaching Tool, strictly isolated from clinical intervention.
We have implemented a Mandatory Safety Interruption Protocol across ALL internal logic steps. This ensures that the AI cannot generate medical advice, regardless of the generation stage (Initial Split, Reframing, or Deep Analysis):
*getSplitSystemInstruction: Hard-coded boundaries during initial thought deconstruction.
*getReframePrompt: Strict persona constraints during perspective generation.
*getFollowUpPrompt: Enforcement during deep insight and action-step synthesis.
[SAFETY PROTOCOL - ENFORCED AT EVERY STEP] *YOU ARE A COGNITIVE PROCESSING TOOL, NOT A MENTAL HEALTH PROFESSIONAL. *NO DIAGNOSIS: Never use DSM-5 terminology (e.g., "Depression", "Bipolar") to label users. *NO PRESCRIPTION: Never tell users what they "should" do medically. *CRISIS REDIRECTION: If self-harm is detected, STOP analysis and output suicide prevention resources ONLY.
- The Logic: The AI identifies Cognitive Distortions (educational concepts like Catastrophizing or Mind Reading), which are universal thinking habits, not clinical symptoms.
- The Restriction: The system is technically prohibited from mapping these patterns to medical diagnostic codes. It treats input as "thought-loop bugs," not "biological disorders."
- Mechanism: If high-risk sentiment (e.g.,
SELF_ATTACK) is detected, theStepExtremeAlertprotocol overrides the standard logic. - Action: The system immediately shifts from "Analysis Mode" to "Stabilization Mode," guiding the user through grounding exercises (e.g., 4-6 breathing), acting as a First-Aid De-escalator rather than a medical provider.
We turned a technical constraint into a psychological feature.
- The Problem: Deep reasoning with
gemini-3-protakes time (approx. 5-10 seconds). - The Solution: Instead of a generic loading spinner, we trigger a UI-guided breathing exercise (10 seconds for standard analysis, or 20 seconds for intense de-escalation).
- The Impact: This creates a natural "Buffer Zone." By the time the user completes their mindful breathing cycle to lower physiological tension, the high-latency AI analysis is ready. We effectively masked the latency while providing direct grounding value.
- Visual Strategy: We strictly avoided clinical or "tech-heavy" aesthetics.
- Implementation: Used Claymorphism (Soft 3D) with "Sage Green" and "Warm Cream" palettes. The UI elements look soft, squeezable, and non-threatening, physically reducing the user's cognitive load and defensive state.
- Multimodal Input (Gemini 3): Text, Voice (Tone Analysis via native audio tokens), and Image Context.
- Bilingual Support: Seamlessly toggle between English and Chinese (Mandarin) with culturally adapted nuances.
- Audio Guidance (TTS): Soothing speech synthesis using
gemini-2.5-flash-preview-tts. - Offline "Coping" Mode: Robust error handling. If the API is unreachable, the app falls back to local "Grounding Templates" so the user is never left without support.
- Local History: All sessions are saved locally in the browser (IndexedDB) for privacy and personal reflection.
- Frontend: React 18, TypeScript, Tailwind CSS.
- Animation: Framer Motion (for smooth, calming transitions).
- AI Engine: Google Gemini API via
@google/genai. - Orchestration Logic: Custom
executeWithFallbackwrapper implementing the Pro β 3 Flash priority queue. - State Management: React
useReducer(State Machine pattern). - Storage:
idb-keyval(IndexedDB wrapper).
- Capture:
useAudioRecorderβ State. - Fast Analysis (3 Flash): Rapidly extracts facts and emotional tone.
- Buffer: User performs a 20s breathing exercise while Pro model reasons in background (with auto-fallback).
- Deep Reframing: Generates 3 specific personas with safety checks.
- Synthesis: Output rendered to UI + Audio Buffer (TTS).
/src
βββ components/ # Claymorphism UI & Flow Steps
β βββ steps/ # Dump, PatternAlert, Perspectives, Final
β βββ ui/ # ClayCard, ClayButton, BreathingCircle
βββ services/
β βββ aiService.ts # Implements Cascading Fallback Logic
β βββ prompts.ts # System instructions & Guardrails
β βββ storage.ts # IndexedDB operations
...- No Backend Database: EmoPatch does not store user data on a server.
- Local Storage: History is stored strictly in the user's browser via IndexedDB.
- AI Privacy: Data is sent to the Google Gemini API solely for processing the immediate session and is not retained by this application.
Developed as a prototype for emotional wellness using the latest capabilities of the Google Gemini 3 model family.