Skip to content

Gifted87/obsidian

Repository files navigation

Obsidian Thinking Machine: The Prestige Cognitive Layer

"The true sign of intelligence is not mere knowledge, but imagination—structured through the rigors of multi-dimensional contemplation."

1. Executive Summary & The Goal of the System

The Obsidian Thinking Machine is an advanced, highly experimental multi-agent AI reasoning engine designed to simulate and elevate complex human cognitive processes in machines.

The primary goal of this system is to shatter the illusion of the "linear prompt." Traditional Large Language Models (LLMs) operate as stochastic parrots; they receive a prompt and immediately attempt to predict the next most likely sequence of tokens to satisfy the user. This linear pipeline results in shallow reasoning, "hallucination by momentum" (where the model commits to a flawed logical path too early), and a complete absence of genuine creativity.

Obsidian was engineered to introduce Machine Creativity through Forced Lateral Contemplation. It operates on the thesis that creativity and rigorous problem-solving are not monolithic events, but the collision of distinct, specialized mental modalities working in concert.

Instead of allowing the AI to simply "answer" a question, Obsidian traps the AI in a Cognitive Loop. It forces the machine to break down complex inputs, generate probing questions, recall historical patterns, seek external context, model counterfactual "What-If" scenarios, and ruthlessly evaluate its own downstream consequences before it is ever permitted to synthesize a final solution.

By leveraging the cutting-edge capabilities of the Google GenAI SDK (@google/genai)—specifically integrating models like gemini-flash-lite-latest for high-speed multi-dimensional reasoning and gemini-2.5-flash-preview-tts for neural audio synthesis—the Obsidian Thinking Machine serves as a premier testbed for the future of AGI (Artificial General Intelligence) reasoning structures.


2. The Solution to Machine Creativity: The 9-Dimensional Framework

To foster genuine creativity and lateral thinking, the system rejects standard conversational flows. Instead, it utilizes a Cognitive Dimensional Model. The reasoning process is divided into 9 distinct, purpose-driven "Dimensions." The system dynamically transitions between these dimensions, preventing shallow reasoning and forcing the AI to view the problem from radically different paradigms.

2.1. Understanding

The foundational dimension. Before creativity can occur, intent must be crystallized. This dimension defines the scope, parses the core meaning of the user's request, and clarifies ambiguous terminology. What exactly is being requested, and what are the boundary conditions?

2.2. Inquiry

The dimension of rigorous investigation. Inquiry seeks to uncover the "why" and the "how." It focuses on gathering facts, uncovering underlying logic, and demanding empirical evidence, stripping away assumptions to find the root cause.

2.3. Procedural

The dimension of action and methodology. It translates abstract concepts into concrete implementation plans, focusing on algorithms, operational guidelines, and architectural blueprints.

2.4. Wonder (The Catalyst of Creativity)

The dimension of unconstrained possibility. Wonder is responsible for modeling counterfactuals and scenarios. It breaks away from rigid determinism to explore alternate realities, edge cases, and hypotheticals. What if the fundamental constraints of this problem were inverted?

2.5. Consequence

The dimension of impact analysis. Every creative action has a reaction. Consequence analyzes the downstream effects, societal impacts, security vulnerabilities, and systemic risks of the proposed solutions.

2.6. Meta-Cognition

The dimension of self-reflection. In Meta-Cognition, the system temporarily stops thinking about the user's prompt and instead thinks about its own thoughts. It critically evaluates the reasoning generated in previous iterations, actively searching for logical fallacies, biases, or gaps in its own data.

2.7. Creative

The dimension of lateral thinking and innovation. When the system encounters a logical dead-end or a deeply ingrained contradiction, it pivots to the Creative dimension. Here, it is explicitly instructed to devise unconventional, out-of-the-box solutions and bridge seemingly unrelated concepts using metaphorical synthesis.

2.8. Causal

The dimension of fundamental physics and logic. Causal reasoning is employed to explain the definitive root mechanics behind a specific result or phenomenon observed during the loop, stripping away correlation to focus purely on deterministic drivers.

2.9. Intent Synthesis

The terminal dimension. After traversing the various cognitive dimensions (and satisfying strict visitation quotas), the system enters Intent Synthesis. All divergent, creative thoughts are distilled into a singular, actionable, and highly technical directive that resolves the user's original request.


3. Structural Organization: The Autonomous Agent Swarm

Obsidian does not rely on a single LLM call to navigate these Cognitive Dimensions. Instead, it utilizes an orchestrated Swarm of Specialized Agents. Each iteration in the cognitive loop is a micro-debate among these distinct personas.

3.1. The Questioner Agent (The Catalyst)

At the start of every cognitive step, the Questioner analyzes the current dimension and the overall intent. It then generates a single, highly specific, and piercing question designed to extract maximum value from the Answerer agents. Constraint: It must never provide answers, only provoke thought.

3.2. The Triad of Answerers (Triangulation of Truth)

Instead of a single answer, the system forces three distinct perspectives to respond to the Questioner simultaneously in parallel (Promise.all):

  • Internal Answerer (The Self): Responds purely based on internal logic, structural mechanics, and theoretical first-principles.
  • Archival Answerer (The Past): Responds by drawing upon historical analogies, past patterns, established case studies, and biographical memory.
  • External Answerer (The World): Responds by looking outward, utilizing general world knowledge and broad context (with native capabilities to use Google Search grounding for real-time data).

3.3. The Controller Agent (The Prefrontal Cortex)

The Controller evaluates the Question and the Triad of Answers. It looks at the global context (the history of all steps) and the dimensional quotas. Using native Function Calling (set_next_dimension), it autonomously decides which Cognitive Dimension the system must transition to next. It enforces interleaving, forces pivots to 'Creative' upon detecting dead-ends, and ensures that the 'TERMINATE' command is only issued when absolute sufficiency is met.

3.4. The Intent Synthesizer (The Quality Gate)

Triggered when the Controller calls 'TERMINATE'. The Synthesizer analyzes the massive, exhaustive reasoning trace. If it determines the reasoning is still incomplete despite the Controller's decision, it violently overrides the termination, outputs a CONTINUE status with a new newDirective, and forces the loop to restart.

3.5. The Principal Architect (The Author)

Consumes the final Synthesized Intent and the entire cognitive trace to generate a massive, production-grade Markdown report. It formats the output with executive summaries, problem analysis, architecture diagrams, and roadmaps.


4. The Request Lifecycle & Cognitive Loop Mechanics

The core functionality of the application lies in the intricate lifecycle of a single user request, managed by the Express backend.

Phase 1: Initiation and Multimodal Processing

The user inputs a text prompt and optionally uploads multiple files (images, documents). The React frontend sends a POST request to /api/think. The Express server initializes a new instance of the ThinkingEngine, converting multimodal inputs into inlineData objects compatible with the Gemini API.

Phase 2: The Cognitive Loop (The while loop)

The system enters a bounded while loop (with a safety threshold of 120 maximum steps to prevent infinite recursion). The starting dimension is always Understanding.

  1. Context Building: The engine concatenates all previous ThoughtStep objects into a massive context string.
  2. Question Generation: The Questioner Agent generates a targeted query.
  3. Parallel Answering: The Internal, Archival, and External agents fire simultaneously.
  4. Controller Decision: The Controller analyzes the results. In deep mode, it enforces that all dimensions have a count of >= 3. It outputs { nextDimension, reasoning }.
  5. SSE Streaming: As each phase of the step completes, the server uses Server-Sent Events (SSE) via res.write() to stream the payload back to the React client in real-time.

Phase 3: Synthesis and Overrides

Once the Controller outputs TERMINATE:

  1. The loop halts, and the synthesizeIntent method is called.
  2. If the Synthesizer issues a CONTINUE override, the system injects a [SYNTHESIZER DIRECTIVE] into the base prompt and resumes the Cognitive Loop at Phase 2.
  3. If the Synthesizer issues COMPLETE, the loop breaks permanently.

Phase 4: Final Generation and Neural TTS

  1. The Principal Architect agent generates the exhaustive final_report.
  2. A call to gemini-2.0-flash-exp generates a base64 audio stream of a comprehensive voice reading the synthesized intent.
  3. The server closes the SSE connection.

5. Operational Modes: Fast vs. Deep Reasoning

The system provides the user with two distinct operational modes, which fundamentally alter the Controller and Synthesizer's prompt instructions.

  • Fast Mode (Agile Reasoning): Optimized for speed. The system will synthesize an answer as soon as it feels sufficient data has been gathered. Quotas are disabled. Ideal for quick code reviews, straightforward questions, or agile brainstorming.
  • Deep Mode (Exhaustive Analysis): Optimized for extreme rigor and creative breakthroughs. The system enforces a strict minimum quota (e.g., 3 visits per dimension). The system is forbidden from terminating early. This mode is for complex architectural planning, resolving philosophical paradoxes, and deep-dive root cause analysis.

6. System Architecture and Technical Stack

The Obsidian Thinking Machine employs a modern, decoupled client-server architecture optimized for real-time streaming and heavy API orchestration.

6.1. Backend Architecture (Node.js + Express)

  • Core Technologies: express, cors, dotenv, @google/genai.
  • Runtime: Node.js (>=20.0.0) executed via tsx (TypeScript Execute).
  • Engine Logic (src/lib/engine.ts): Implements the ThinkingEngine class. It manages the Gemini SDK client and parses tool calls using strict JSON extraction fallbacks.
  • Neural Congestion Protocol (Resilience): The backend implements a recursive retry wrapper with exponential backoff. This is critical for handling 429 (Too Many Requests) and 503 (Service Unavailable) errors, which are common during high-volume, multi-agent LLM loops. If congestion occurs, an SSE event is sent to the frontend to display a "Neural Congestion" warning to the user.

6.2. Frontend Architecture (React + Vite)

  • Core Technologies: React 19, Vite 6, TailwindCSS 4, Framer Motion (motion/react), Lucide React.
  • UI/UX Design: The interface is styled as a cyberpunk-inspired "Neural Void" using dark mode (bg-obsidian), glassmorphism panels, and micro-labels. Framer Motion animates the entrance of new thought steps and the "Neural Pulse" loading states.
  • Server-Sent Events (SSE) Client: The frontend implements a raw fetch call utilizing the ReadableStream API. It decodes the stream, splits it by newline, and parses the JSON payloads in real-time to update the UI without waiting for the massive loop to finish.
  • Markdown Rendering: Relies on react-markdown paired with remark-gfm, rehype-raw, remark-math, and rehype-katex. This ensures that the highly technical, formula-rich outputs of the Architect agent are rendered beautifully with full LaTeX support.

7. Setup, Installation, and Deployment Guide

7.1. Prerequisites

  • Node.js version 20.0.0 or higher.
  • NPM or Yarn package manager.
  • A Google Gemini API Key.

7.2. Environment Variables

The Obsidian Thinking Machine requires the following environment variables to be set. You can set these in a .env file in the root directory.

Variable Requirement Description
GEMINI_API_KEY Required Your Google Gemini API key from Google AI Studio.
APP_URL Optional The public URL of the application. Used for production deployment.
DISABLE_HMR Optional Set to true to disable Hot Module Replacement in the Vite development server.

7.3. Local Setup Instructions

  1. Clone the Repository:

    git clone https://github.com/gifted87/obsidian-thinking-machine.git
    cd obsidian-thinking-machine
  2. Install Dependencies: This will install all necessary packages for both the Vite frontend and the Express backend.

    npm install
  3. Configure Environment: Create a .env file and add your GEMINI_API_KEY:

    cp .env.example .env

    Open .env and configure your keys:

    GEMINI_API_KEY=your_actual_api_key_here

    [!CAUTION] SECURITY WARNING: Never commit your .env file or hardcode your API key. The system is designed to fail explicitly if GEMINI_API_KEY is not detected in the environment.

  4. Running the Development Environment: The project utilizes a decoupled client-server architecture. You must run the backend server and the Vite frontend simultaneously in two separate terminal windows.

    Terminal 1 (Backend Server):

    npm run server

    Expected Output: Backend server running at http://localhost:3030

    Terminal 2 (Frontend Client):

    npm run dev

    Expected Output: Vite will host the application on http://localhost:3000 (or 0.0.0.0:3000).

7.3. Building for Production

To create an optimized, minified production build of the frontend:

npm run build

This generates static files in the /dist directory. The Express backend can then be modified to serve these static files natively using express.static('dist').


8. Usage Guide & User Interface

Initiating the Loop

  1. Navigate to http://localhost:3000.
  2. Use the Stimulus Input panel to enter your prompt. (e.g., "Design a conceptual framework for a faster-than-light communication protocol using quantum entanglement.")
  3. Click the Upload icon to attach any relevant images or text files as multimodal context.
  4. Click Initiate Loop and select your desired mode (Fast Mode or Deep Mode).

Monitoring the Cognitive Chronicle

As the system thinks, the Cognitive Evolution Monitor on the left will track the iterations and dimensions visited. The main right-hand panel (The Cognitive Chronicle) will populate in real-time, showing:

  • The Questioner's generated prompt.
  • The Internal, Archival, and External syntheses.
  • The Controller's directive and reasoning for the next step.

Branching Thoughts

At any point in the Chronicle, you can click "Branch Process." This allows you to fork the reasoning from a specific iteration. If you notice the system exploring a fascinating lateral concept in iteration 4, you can restart the loop from that exact moment to explore alternate timelines of thought.

The Final Report & Neural Audio

Once the system synthesizes its intent:

  • The UI will display the Intent Realized panel.
  • Click Full Report to open a modal containing the exhaustive, 3000+ word markdown document generated by the Principal Architect. You can download this directly as a .md file.
  • Click the Speaker Icon to listen to the neural TTS audio synthesis of the executive summary.

9. System Integrity & Error Recovery

The Obsidian Thinking Machine is built to survive the chaos of continuous, automated LLM API calls.

  • Cognitive Loop Fault Handling: If the Controller Bot fails to output valid JSON or misses its function call, the system does not crash. The engine's extractJson fallback attempts raw parsing. If that fails, it injects a synthetic directive forcing a transition to Meta-Cognition with the reason: "Controller failed to use native function calling. Transitioning to Meta-Cognition to recover."
  • Safety Termination Limit: To prevent runaway API costs, the system enforces a hard limit of 120 iterations. If the Controller fails to call TERMINATE before this limit, the system executes a graceful shutdown, returning the data gathered thus far with a safety warning.
  • Neural Congestion: Handled elegantly by the backend's recursive retry wrapper, utilizing exponential backoff specifically tuned for Google GenAI's 429 and 503 status codes.

10. Future Evolution & Roadmap

Obsidian is an experimental baseline for AGI-level reasoning structures. Future implementations will include:

  • Local LLM Swarms: Abstracting the @google/genai dependency to allow integration with Ollama or vLLM, enabling the cognitive loop to run entirely offline on local clusters.
  • Persistent Archival Memory: Integrating a Vector Database (e.g., ChromaDB) into the Archival Answerer, allowing the agent to remember and retrieve thoughts from sessions that occurred months ago.
  • Procedural Sandboxing: Providing the "Procedural" agent with a secure Docker container to execute and validate the code it theorizes before the Synthesizer finalizes the report.

11. Conclusion & License

The Obsidian Thinking Machine represents a paradigm shift from conversational AI chatbots to autonomous cognitive processing networks. By enforcing dimensional quotas, instantiating persona-driven swarm debates, and streaming the entire recursive process via an immersive UI, the system transforms the "black box" of LLM thought into a transparent, highly analytical, and verifiable engine of creativity.

Proprietary Infrastructure for The Genesis Machine. Developed by Gift Braimah


“In the silence of the neural void, the most profound structures are built.” *

About

Obsidian was engineered to introduce Machine Creativity through Forced Lateral Contemplation. It operates on the thesis that creativity and rigorous problem-solving are not monolithic events, but the collision of distinct, specialized mental modalities working in concert.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors