"The true sign of intelligence is not mere knowledge, but imagination—structured through the rigors of multi-dimensional contemplation."
The Obsidian Thinking Machine is an advanced, highly experimental multi-agent AI reasoning engine designed to simulate and elevate complex human cognitive processes in machines.
The primary goal of this system is to shatter the illusion of the "linear prompt." Traditional Large Language Models (LLMs) operate as stochastic parrots; they receive a prompt and immediately attempt to predict the next most likely sequence of tokens to satisfy the user. This linear pipeline results in shallow reasoning, "hallucination by momentum" (where the model commits to a flawed logical path too early), and a complete absence of genuine creativity.
Obsidian was engineered to introduce Machine Creativity through Forced Lateral Contemplation. It operates on the thesis that creativity and rigorous problem-solving are not monolithic events, but the collision of distinct, specialized mental modalities working in concert.
Instead of allowing the AI to simply "answer" a question, Obsidian traps the AI in a Cognitive Loop. It forces the machine to break down complex inputs, generate probing questions, recall historical patterns, seek external context, model counterfactual "What-If" scenarios, and ruthlessly evaluate its own downstream consequences before it is ever permitted to synthesize a final solution.
By leveraging the cutting-edge capabilities of the Google GenAI SDK (@google/genai)—specifically integrating models like gemini-flash-lite-latest for high-speed multi-dimensional reasoning and gemini-2.5-flash-preview-tts for neural audio synthesis—the Obsidian Thinking Machine serves as a premier testbed for the future of AGI (Artificial General Intelligence) reasoning structures.
To foster genuine creativity and lateral thinking, the system rejects standard conversational flows. Instead, it utilizes a Cognitive Dimensional Model. The reasoning process is divided into 9 distinct, purpose-driven "Dimensions." The system dynamically transitions between these dimensions, preventing shallow reasoning and forcing the AI to view the problem from radically different paradigms.
The foundational dimension. Before creativity can occur, intent must be crystallized. This dimension defines the scope, parses the core meaning of the user's request, and clarifies ambiguous terminology. What exactly is being requested, and what are the boundary conditions?
The dimension of rigorous investigation. Inquiry seeks to uncover the "why" and the "how." It focuses on gathering facts, uncovering underlying logic, and demanding empirical evidence, stripping away assumptions to find the root cause.
The dimension of action and methodology. It translates abstract concepts into concrete implementation plans, focusing on algorithms, operational guidelines, and architectural blueprints.
The dimension of unconstrained possibility. Wonder is responsible for modeling counterfactuals and scenarios. It breaks away from rigid determinism to explore alternate realities, edge cases, and hypotheticals. What if the fundamental constraints of this problem were inverted?
The dimension of impact analysis. Every creative action has a reaction. Consequence analyzes the downstream effects, societal impacts, security vulnerabilities, and systemic risks of the proposed solutions.
The dimension of self-reflection. In Meta-Cognition, the system temporarily stops thinking about the user's prompt and instead thinks about its own thoughts. It critically evaluates the reasoning generated in previous iterations, actively searching for logical fallacies, biases, or gaps in its own data.
The dimension of lateral thinking and innovation. When the system encounters a logical dead-end or a deeply ingrained contradiction, it pivots to the Creative dimension. Here, it is explicitly instructed to devise unconventional, out-of-the-box solutions and bridge seemingly unrelated concepts using metaphorical synthesis.
The dimension of fundamental physics and logic. Causal reasoning is employed to explain the definitive root mechanics behind a specific result or phenomenon observed during the loop, stripping away correlation to focus purely on deterministic drivers.
The terminal dimension. After traversing the various cognitive dimensions (and satisfying strict visitation quotas), the system enters Intent Synthesis. All divergent, creative thoughts are distilled into a singular, actionable, and highly technical directive that resolves the user's original request.
Obsidian does not rely on a single LLM call to navigate these Cognitive Dimensions. Instead, it utilizes an orchestrated Swarm of Specialized Agents. Each iteration in the cognitive loop is a micro-debate among these distinct personas.
At the start of every cognitive step, the Questioner analyzes the current dimension and the overall intent. It then generates a single, highly specific, and piercing question designed to extract maximum value from the Answerer agents. Constraint: It must never provide answers, only provoke thought.
Instead of a single answer, the system forces three distinct perspectives to respond to the Questioner simultaneously in parallel (Promise.all):
- Internal Answerer (The Self): Responds purely based on internal logic, structural mechanics, and theoretical first-principles.
- Archival Answerer (The Past): Responds by drawing upon historical analogies, past patterns, established case studies, and biographical memory.
- External Answerer (The World): Responds by looking outward, utilizing general world knowledge and broad context (with native capabilities to use Google Search grounding for real-time data).
The Controller evaluates the Question and the Triad of Answers. It looks at the global context (the history of all steps) and the dimensional quotas. Using native Function Calling (set_next_dimension), it autonomously decides which Cognitive Dimension the system must transition to next. It enforces interleaving, forces pivots to 'Creative' upon detecting dead-ends, and ensures that the 'TERMINATE' command is only issued when absolute sufficiency is met.
Triggered when the Controller calls 'TERMINATE'. The Synthesizer analyzes the massive, exhaustive reasoning trace. If it determines the reasoning is still incomplete despite the Controller's decision, it violently overrides the termination, outputs a CONTINUE status with a new newDirective, and forces the loop to restart.
Consumes the final Synthesized Intent and the entire cognitive trace to generate a massive, production-grade Markdown report. It formats the output with executive summaries, problem analysis, architecture diagrams, and roadmaps.
The core functionality of the application lies in the intricate lifecycle of a single user request, managed by the Express backend.
The user inputs a text prompt and optionally uploads multiple files (images, documents). The React frontend sends a POST request to /api/think. The Express server initializes a new instance of the ThinkingEngine, converting multimodal inputs into inlineData objects compatible with the Gemini API.
The system enters a bounded while loop (with a safety threshold of 120 maximum steps to prevent infinite recursion). The starting dimension is always Understanding.
- Context Building: The engine concatenates all previous
ThoughtStepobjects into a massive context string. - Question Generation: The Questioner Agent generates a targeted query.
- Parallel Answering: The Internal, Archival, and External agents fire simultaneously.
- Controller Decision: The Controller analyzes the results. In
deepmode, it enforces that all dimensions have a count of>= 3. It outputs{ nextDimension, reasoning }. - SSE Streaming: As each phase of the step completes, the server uses Server-Sent Events (SSE) via
res.write()to stream the payload back to the React client in real-time.
Once the Controller outputs TERMINATE:
- The loop halts, and the
synthesizeIntentmethod is called. - If the Synthesizer issues a
CONTINUEoverride, the system injects a[SYNTHESIZER DIRECTIVE]into the base prompt and resumes the Cognitive Loop at Phase 2. - If the Synthesizer issues
COMPLETE, the loop breaks permanently.
- The Principal Architect agent generates the exhaustive
final_report. - A call to
gemini-2.0-flash-expgenerates a base64 audio stream of a comprehensive voice reading the synthesized intent. - The server closes the SSE connection.
The system provides the user with two distinct operational modes, which fundamentally alter the Controller and Synthesizer's prompt instructions.
- Fast Mode (Agile Reasoning): Optimized for speed. The system will synthesize an answer as soon as it feels sufficient data has been gathered. Quotas are disabled. Ideal for quick code reviews, straightforward questions, or agile brainstorming.
- Deep Mode (Exhaustive Analysis): Optimized for extreme rigor and creative breakthroughs. The system enforces a strict minimum quota (e.g., 3 visits per dimension). The system is forbidden from terminating early. This mode is for complex architectural planning, resolving philosophical paradoxes, and deep-dive root cause analysis.
The Obsidian Thinking Machine employs a modern, decoupled client-server architecture optimized for real-time streaming and heavy API orchestration.
- Core Technologies:
express,cors,dotenv,@google/genai. - Runtime: Node.js (>=20.0.0) executed via
tsx(TypeScript Execute). - Engine Logic (
src/lib/engine.ts): Implements theThinkingEngineclass. It manages the Gemini SDK client and parses tool calls using strict JSON extraction fallbacks. - Neural Congestion Protocol (Resilience): The backend implements a recursive
retrywrapper with exponential backoff. This is critical for handling429(Too Many Requests) and503(Service Unavailable) errors, which are common during high-volume, multi-agent LLM loops. If congestion occurs, an SSE event is sent to the frontend to display a "Neural Congestion" warning to the user.
- Core Technologies: React 19, Vite 6, TailwindCSS 4, Framer Motion (
motion/react), Lucide React. - UI/UX Design: The interface is styled as a cyberpunk-inspired "Neural Void" using dark mode (
bg-obsidian), glassmorphism panels, and micro-labels. Framer Motion animates the entrance of new thought steps and the "Neural Pulse" loading states. - Server-Sent Events (SSE) Client: The frontend implements a raw
fetchcall utilizing theReadableStreamAPI. It decodes the stream, splits it by newline, and parses the JSON payloads in real-time to update the UI without waiting for the massive loop to finish. - Markdown Rendering: Relies on
react-markdownpaired withremark-gfm,rehype-raw,remark-math, andrehype-katex. This ensures that the highly technical, formula-rich outputs of the Architect agent are rendered beautifully with full LaTeX support.
- Node.js version 20.0.0 or higher.
- NPM or Yarn package manager.
- A Google Gemini API Key.
The Obsidian Thinking Machine requires the following environment variables to be set. You can set these in a .env file in the root directory.
| Variable | Requirement | Description |
|---|---|---|
GEMINI_API_KEY |
Required | Your Google Gemini API key from Google AI Studio. |
APP_URL |
Optional | The public URL of the application. Used for production deployment. |
DISABLE_HMR |
Optional | Set to true to disable Hot Module Replacement in the Vite development server. |
-
Clone the Repository:
git clone https://github.com/gifted87/obsidian-thinking-machine.git cd obsidian-thinking-machine -
Install Dependencies: This will install all necessary packages for both the Vite frontend and the Express backend.
npm install
-
Configure Environment: Create a
.envfile and add yourGEMINI_API_KEY:cp .env.example .env
Open
.envand configure your keys:GEMINI_API_KEY=your_actual_api_key_here
[!CAUTION] SECURITY WARNING: Never commit your
.envfile or hardcode your API key. The system is designed to fail explicitly ifGEMINI_API_KEYis not detected in the environment. -
Running the Development Environment: The project utilizes a decoupled client-server architecture. You must run the backend server and the Vite frontend simultaneously in two separate terminal windows.
Terminal 1 (Backend Server):
npm run server
Expected Output:
Backend server running at http://localhost:3030Terminal 2 (Frontend Client):
npm run dev
Expected Output: Vite will host the application on
http://localhost:3000(or0.0.0.0:3000).
To create an optimized, minified production build of the frontend:
npm run buildThis generates static files in the /dist directory. The Express backend can then be modified to serve these static files natively using express.static('dist').
- Navigate to
http://localhost:3000. - Use the Stimulus Input panel to enter your prompt. (e.g., "Design a conceptual framework for a faster-than-light communication protocol using quantum entanglement.")
- Click the Upload icon to attach any relevant images or text files as multimodal context.
- Click Initiate Loop and select your desired mode (
Fast ModeorDeep Mode).
As the system thinks, the Cognitive Evolution Monitor on the left will track the iterations and dimensions visited. The main right-hand panel (The Cognitive Chronicle) will populate in real-time, showing:
- The Questioner's generated prompt.
- The Internal, Archival, and External syntheses.
- The Controller's directive and reasoning for the next step.
At any point in the Chronicle, you can click "Branch Process." This allows you to fork the reasoning from a specific iteration. If you notice the system exploring a fascinating lateral concept in iteration 4, you can restart the loop from that exact moment to explore alternate timelines of thought.
Once the system synthesizes its intent:
- The UI will display the Intent Realized panel.
- Click Full Report to open a modal containing the exhaustive, 3000+ word markdown document generated by the Principal Architect. You can download this directly as a
.mdfile. - Click the Speaker Icon to listen to the neural TTS audio synthesis of the executive summary.
The Obsidian Thinking Machine is built to survive the chaos of continuous, automated LLM API calls.
- Cognitive Loop Fault Handling: If the Controller Bot fails to output valid JSON or misses its function call, the system does not crash. The engine's
extractJsonfallback attempts raw parsing. If that fails, it injects a synthetic directive forcing a transition toMeta-Cognitionwith the reason: "Controller failed to use native function calling. Transitioning to Meta-Cognition to recover." - Safety Termination Limit: To prevent runaway API costs, the system enforces a hard limit of
120iterations. If the Controller fails to callTERMINATEbefore this limit, the system executes a graceful shutdown, returning the data gathered thus far with a safety warning. - Neural Congestion: Handled elegantly by the backend's recursive retry wrapper, utilizing exponential backoff specifically tuned for Google GenAI's
429and503status codes.
Obsidian is an experimental baseline for AGI-level reasoning structures. Future implementations will include:
- Local LLM Swarms: Abstracting the
@google/genaidependency to allow integration with Ollama or vLLM, enabling the cognitive loop to run entirely offline on local clusters. - Persistent Archival Memory: Integrating a Vector Database (e.g., ChromaDB) into the Archival Answerer, allowing the agent to remember and retrieve thoughts from sessions that occurred months ago.
- Procedural Sandboxing: Providing the "Procedural" agent with a secure Docker container to execute and validate the code it theorizes before the Synthesizer finalizes the report.
The Obsidian Thinking Machine represents a paradigm shift from conversational AI chatbots to autonomous cognitive processing networks. By enforcing dimensional quotas, instantiating persona-driven swarm debates, and streaming the entire recursive process via an immersive UI, the system transforms the "black box" of LLM thought into a transparent, highly analytical, and verifiable engine of creativity.
Proprietary Infrastructure for The Genesis Machine. Developed by Gift Braimah
“In the silence of the neural void, the most profound structures are built.” *