Changelog - v1.4.0-beta.2

This is a broad product beta, not a narrow patch. It bundles everything that
landed between v1.3.0-beta.1 and today — including the work that had been
drafted for a separate v1.3.0-beta.2 language-hint release and the larger
feature set that briefly carried a v1.4.0-beta.1 working version.

🎉 New Features

Project Skills with Sandboxed Code Execution

TensorPM Skills are now a first-class, project-local extension surface. Skills
package instructions, scripts, and templates that the in-app TensorPM agent
can invoke against the current project, sandboxed in a deny-by-default Deno
runtime.

What users can do now:

Ask the agent in chat to compute over project data ("how much effort is
still open in this sprint?") — the agent writes a small TypeScript snippet
that runs locally in an isolated Deno subprocess.
Install skills from an online catalog via the File Explorer context menu
("Install skill from catalog…"). The new Skill Catalog modal shows a clear
permission diff before anything is installed.
See generated artifacts (PowerPoint, Word, Excel, PDF, images) inline on
chat messages and reuse them from <project>/exports/<skill-id>/.
See trust badges in the File Explorer when a skill needs re-approval —
e.g. after an update that asks for new permissions.

Architecture:

Lives under src/backend/services/codeExecution/: ExecutionService,
DenoEngine, SkillRegistry, SchemaValidator, PermissionMapper,
TpmContextSnapshot, ArtifactStore.
Each run spawns a fresh Deno subprocess (no shared REPL state). The Deno
binary is fetched per platform/arch (scripts/fetch-deno.mjs) and
code-signed on macOS during packaging (scripts/afterPack-sign-deno.cjs).
Skills use Anthropic-compatible SKILL.md YAML frontmatter (name,
description, version, runtime.engine, permissions, limits) plus
the TensorPM scripts: extension where each entry is its own callable
operation.
A single chat tool, execute_code, accepts either ad-hoc code or a
skillId + scriptId + inputs. A companion describe_skill exposes
installed skills for discovery.
A new @tensorpm/sdk (src/shared/tpm-sdk/) is auto-generated for the
sandbox; skills read a filtered project snapshot from TPM_IN_PATH and
write outputs/artifacts to TPM_OUT_PATH.

Security model (deny-by-default):

No --allow-run, --allow-ffi, --allow-sys, --allow-import.
--allow-env is restricted to TPM_RUN_DIR/TPM_IN_PATH/TPM_OUT_PATH.
Default policy for ad-hoc code: read-only on tpmInDir, write-only on
tpmOutDir, no network, 30 s CPU cap, 256 MB memory cap.
Two-tier trust:
- Install-time trust: clicking "Install" in the catalog auto-approves
  the skill, bound to a sha256 hash of the whole skill folder plus a
  permissions fingerprint. Manually copied skills are never auto-trusted.
- Runtime trust: every execute_code re-checks the hash + fingerprint
  and rejects with "skill is not approved" on mismatch.
Chrome-style update behavior: when an update widens permissions, the
new version is installed but trust is revoked — the user must explicitly
re-approve before the new version can run.
Symlinks are rejected at every walk (assertSkillFolderStructure,
SkillApprovalStore.walkSync).
The skill manifest name must match the folder name, so catalog uninstalls
can't hit the wrong folder.
Catalog payloads: HTTPS-only, sha256 must match the catalog entry, hard
50 MiB cap (enforced during streaming against Content-Length spoofing),
extraction via system tar with no path-escape.
Output disk cap: DenoEngine.measureDirBytes polls out/ every 100 ms and
SIGKILLs on SANDBOX_OUTPUT_CAPS.totalArtifactBytes (~50 MiB).

Catalog & install flow:

CatalogService fetches catalog.json from
raw.githubusercontent.com/Neo552/TensorPM-DesktopSkills/main/
(overridable via TPM_REMOTE_CATALOG_URL).
Layered source: remote → on-disk cache
(<userData>/skills-catalog.cache.json). ETag-based If-None-Match
conditional fetch; 6-hour cache freshness window; 4 s network timeout for
fast boot; offline empty-state when both fail.
Phase-1 refactor removed the legacy SkillSeeder. There are no built-in
default skills — all skills must be installed from the catalog or placed
manually in <project>/skills/.

Vendored sandbox libraries:

Four MIT-licensed npm libraries ship as offline ESM bundles in
resources/sandbox-libs/, built via scripts/build-sandbox-libs.mjs. Deno
imports them through an import map as @tensorpm/<id>:

Import	Library	Purpose
`@tensorpm/pptx`	PptxGenJS	`.pptx` presentations
`@tensorpm/docx`	docx (dolanmiu)	`.docx` with tables/lists/styles
`@tensorpm/pdf`	pdf-lib	Create / modify PDFs
`@tensorpm/xlsx`	write-excel-file	`.xlsx` workbooks with formulas

The chat agent's system prompt advertises these via
formatSandboxLibsPromptSection(), so the agent knows without a tool call
what's available offline.

Post-review hardening:

.git/ directories are now rejected inside skill payloads (previously
silently skipped, which would have let hostile skills hide files outside
the trust hash).
Manifest name is validated against the catalog skill ID before the
atomic rename into <project>/skills/<id>/, so a mismatched catalog entry
can't overwrite an existing installation.
describe_skill is now gated on trust state. Untrusted skills no longer
leak their instructions (SKILL.md body) into agent context — closing a
prompt-injection path where catalog content could reach the agent before
user approval.

Material Management

Material is now a first-class project area for procurement-heavy work
(construction, renovation). A new "Material" tab lives in the Budget area
alongside the existing Budget tab.

What users see:

Create / edit / delete materials, including sub-materials of arbitrary
depth (expand/collapse via chevron, parentItemId).
Columns: Name, Quantity, Status, NeededBy, EstimatedCost, Supplier — with
drag-and-drop reordering, hide/show, and resizing.
Full-text search and filters by Status (multi-select), Supplier,
NeededBy range, EstimatedCost range.
Sort ascending/descending on all columns.
Inline edit drafts per item, quantity mode fixed vs. dynamic (e.g.
participants * 1.1).
Full DE/EN i18n under material.*.

Data model (src/types/Material.ts):

Roughly 25 fields per MaterialItem:

Identity: id, projectId, parentItemId
Description: name, description, productCode, manufacturer,
revision
Quantity: quantity, unit, quantityMode, quantityFormula,
wasteFactor
Status (10 values, CHECK-constrained): planned, quoteRequested,
quoted, approved, ordered, partiallyDelivered, delivered,
installed, blocked, cancelled
Dates: neededBy, deliveryDate, leadTimeDays, lastVerifiedAt
Cost: estimatedCost / actualCost as MonetaryValue in project
storage currency
Other: supplier, location, categoryId, notes, sourceFileId

Hierarchy is FK-validated, including cycle detection
(assertNoParentCycle in MaterialRepository).

Redux & IPC:

Slice: materialSlice.ts, state shape { [projectId]: ProjectMaterialContainer }.
Thunks: fetchProjectMaterialContainer, saveMaterialItem,
removeMaterialItem.
IPC: material.loadProjectContainer, material.saveItem,
material.deleteItem via the new typed window.api.material namespace.

AI integration:

Material has its own subagent (MaterialAgentService) running a 6-iteration
tool loop with dedicated prompts. Tool definitions:

Top-level: material_agent (the main chat agent delegates to it)
Internal: material_list_items, material_save_item,
material_delete_item
Result submit: submit_material_agent_result (outcomes: answer,
materials_updated, needs_clarification, no_material_data)

changeMode gates writes: direct (chat command, may mutate) vs. propose
(connector/PDF extraction, returns proposedChanges only) — matching the
"distillation is always human-in-the-loop" guardrail.

Distiller path: the Distiller's Phase-2 (apply after user approval) calls
the atomic material_save_item / material_delete_item tools directly,
bypassing the subagent loop. Stale-id writes now hard-error instead of
silently creating a new row, so the Distiller can't "finalize" a write that
never happened.

Schema & sync:

Migration 027-add-material-items, 028-add-material-item-hierarchy,
029-add-material-settings.
material_items is added to the unified schema with softDelete: true,
synced_at clear-on-update trigger, and full e2e.sensitiveFields
coverage — every content field is client-side encrypted in E2E-encrypted
cloud workspaces.

Distiller v2 Review Flow

The Distiller pipeline is significantly stronger and now runs as a five-phase
HITL loop instead of three phases.

The new flow per signal:

Phase 0 — Pre-check (new): a cheap small-model call decides whether
the signal is project-relevant at all. For projects above the
safeBudgetTokens threshold, the pre-check can also return fieldHints
so Phase 1 only loads the relevant project sections.
Phase 1 — Present: the Distiller analyses the signal against the
project graph and emits one present_distillate per affected field.
The user clicks Approve / Append / Reject / Skip on each carousel
card. For field: "decisions", Append is disabled because decisions
flow through dedicated write tools.
Phase 2 — Apply: only the write tools for approved fields are
offered to the model. The system enriches the batch decision with the
original proposal payload and strips proposal payloads for
rejected/skipped cards, so nothing can be applied that the user didn't
approve.
Phase 3 — Verify/Complete: mark_update_complete is now position-
guarded — it must be the last tool call of the turn, and is blocked if a
prior tool call failed.
Final closing turn (new): a tools-free mini-turn writes a short user-
facing summary, without re-shipping the full project graph.

Routing improvements:

safeBudgetFraction is wired into actual routing — safe budget is now the
active model's context window × the configured fraction, instead of a
fixed 160k threshold.
Per-update review-context isolation: proposal cards from one raw update no
longer leak into the next update's context.
Deterministic no-signal handling: if no signal or batch decision is
pending, the system responds with a fixed message and skips the LLM
entirely.
parseDistillerBatchDecision is now set-based: a field with conflicting
approve/reject actions is treated as skipped, and the chat appends a
system note asking the user to re-present a consolidated proposal.
mark_update_complete after a failure now triggers a repair turn instead
of being silently accepted.
Material items are stripped from normal-chat context (they must go
through material_agent), but remain available to Distiller Phase 2.

Prompt caching:

The full Distiller prompt was restructured into static prelude + phase
block + memory + signal + project graph, with stable promptCacheKeys
(distiller:<projectId>:<chatInstanceId> vs.
chat:<projectId>:<chatInstanceId>) and an Anthropic-specific
SYSTEM_PROMPT_PRELUDE_END_SENTINEL marking the cache boundary. Memory-field
notes are sorted to stop insertion-order drift from invalidating the cache.

Decisions in Distiller and Trail

Decisions become a first-class Distillate field type.

New field: "decisions" in ContextChangeProposalFieldType,
DISTILLER_TOOLS.field.enum, and the field label map.
New DecisionFieldRenderer for Trail rendering: each plan entry maps to
one of five tool actions (record, supersede, withdraw, link,
unlink) with its own badge (ScrollText, GitBranch, XCircle, Link2,
Link2Off), and shows the status transition (Active → Superseded /
Active → Withdrawn) and meta table.
Phase-2 write tools for decisions are unlocked via
getDistillerPhase2WriteTools when the user approves the decisions
field; the actual write goes through record_decision,
supersede_decision, withdraw_decision, link_decision,
unlink_decision.
trackableWriteTools in chatService now includes the five decision
tools, so decision writes land in distiller_tool_trace and produce
Trail events.

MCP shape: the old list_decisions lookup tool is removed. Decisions
are project context; agents read them through get_project and write via
the explicit decision tools. External MCP clients that relied on
list_decisions need to migrate.

AI Panel: Docking & History

The AI panel turns from a right-only sidebar into a four-position dock.

Dock positions: left, right, top, bottom. Each dock remembers its own
size (horizontal: 300–800 px; vertical: 180–700 px), persisted to
localStorage (aiPanelPosition, aiPanelSizes), with a migration path
from the old aiPanelWidth key.
Layout state survives project switches: position, sizes, and the
maximized flag are intentionally preserved on RESET_STATE.
Side rail for top/bottom docks: when the panel is horizontal, header
controls move to a vertical strip on the right and the top header is
hidden.
History drawer: horizontal docks expose a History button that opens
recent chats in a slide-out aside; vertical docks keep the existing
inline welcome-screen history.
Distiller button (Droplets icon) is now in the panel header. Toggles
in/out of distiller mode and shows a pending-count badge.
Fullbleed layout: top/bottom docks share the 760 px centered chat
look with the maximized mode.

Image Processing and Tool Call Persistence

Images in the Distiller: DistillerService now detects image files
and runs ImageAnalyzer with OCR + object detection, feeding
description, extracted text, and detected elements into the distillation
context. Screenshots and photos now contribute meaningfully to raw
updates.
Tool calls survive cancel/reload: previously, cancelling a chat
stream lost any tool-call results that had already executed. Tool calls
are now persisted alongside the partial AI message, with frontend merge
logic in chatSlice that re-attaches tool pills and interrupt markers
when messages are reloaded from the DB.
Cleaner interrupt UI: the stop-marker is suppressed when a user
message already follows the interrupted assistant message.

Render, HTML, Inline Image, and Vision Tooling

A new render_html tool lets the agent emit HTML and have it rendered into
PNG (preview) or PDF (saved under <project>/exports/render/) inside an
isolated offscreen BrowserWindow with a strict network allowlist and
sandbox: true / contextIsolation: true.

Pool max 2, 30 s timeout, 5 MB HTML / 4 MB PNG / 25 MB PDF caps.
Vision loop: for providers that accept tool-result images (Anthropic,
Gemini), the rendered PNG is returned inline so the model can iterate on
what it produced. For providers that don't (OpenAI / Mistral / Proxy /
Ollama), the agent gets the saved path and a hint instead.
Capability map: new MODEL_CAPABILITIES in aiConfig.ts and
providerSupportsToolResultImage() whitelist make the vision-loop
decision explicit per provider and per model.

Today's vision-loop reality:

Provider	Vision-capable models	Inline tool-result image	Effect in render loop
Anthropic (BYOK)	Opus 4.7/4.6, Sonnet 4.6, Haiku 4.5	Yes	Full vision loop
Google Gemini	3.x family	Yes	Full vision loop
OpenAI (BYOK)	GPT-5.5/Pro, GPT-5.4 family	No (tool-role text-only)	Text-only fallback
Mistral	Medium/Small	No	Text-only fallback
Kimi K2.6 (Cloudflare proxy)	yes per map	No (OpenAI-compatible)	Text-only fallback
Ollama / Local	default false	No	Text-only fallback

Proxy/Free/Cloud/Pro users therefore do not yet get the vision iteration in
the render loop. That stays open until the proxy transport carries
multimodal tool results.

Preferred Language Hint

When you've selected German as your TensorPM UI language, the AI now picks
up that preference and replies in German for short or ambiguous messages
without overriding messages that have a clear language signal.

Your message	Reply
"Hallo, wie geht's?"	German (clear German)
"Hi, how do I do X?"	English (clear English wins)
"hi" / "ok" / "thanks"	German (preference breaks the tie)

How it works: a new getLanguageDirective() helper adds the line
[User] Preferred language: German to chat and action-item system prompts
only when language === 'de'. The existing "respond in the user's language"
core instruction continues to drive clear-signal cases. English-UI users
see no change. No DB or sync changes.

🏗️ Improvements

Architecture: Typed IPC Domain Namespaces

The renderer/preload boundary was fully migrated from stringly-typed
window.electron.ipcRenderer.invoke('channel-string', …) to typed domain
namespaces under window.api.<namespace>.<method>().

138/138 invoke channels, 40/40 event channels, and 2/2 send channels
migrated. 6365 tests green.
Three new type-registry files under src/types/electron/:
domainChannels.ts, domainService.ts, eventChannels.ts.
Implementation in src/preload/services/domainService.ts builds the
namespaces over a generic invoke<TChannel> helper that pairs args ↔
result on the type level. Event handlers track wrappers in a WeakMap so
off* finds the same listener — fixing a recurring subscription leak.
23 namespaces: project, actionItems, people, settings, ai,
projectSettings, files, shell, wizard, proxyAuth, updates,
theme, app, apiKeys, trail, distiller, budget, material,
browserUse, githubCopilot, localCodingAgents, testRecorder,
e2e, events.
Backend cleanup: ApiKeyIPCHandler collapses 10 per-provider channels
(save-openai-api-key, get-openai-api-key, …) into two generic
api-key:get / api-key:save channels. Brave-specific channels removed
(no remaining consumers).
CI gate: scripts/check-renderer-ipc-boundaries.ts scans src/frontend,
e2e/, and tests/unit/frontend/ for raw ipcRenderer.invoke(,
removed window.electron.<namespace> access, and dynamic API-key
channel templates. Reintroduction now fails CI.

This does not affect skills or the @tensorpm/sdk — those run in the Deno
sandbox and communicate via JSON files, not IPC.

Shared Guidance Type Foundations

The four guidance types (Context, Strategic, Coverage, Execution)
moved from src/backend/services/ai/Guidance/guidanceTypes.ts to
src/shared/types/guidance.ts. Renderer code can now consume them with
type safety without backend-path hacks. The same commit also relocates
aiConfig, markdownConverter, aiTextFormatter, extracts
rootReducer.ts from store.ts, and pulls chat-context filters,
history-sanitizer, and token-accounting helpers out of chatService.ts
into chat/.

Proxy 5xx Error Handling

Two related fixes in the Kimi-K2.6-via-Cloudflare-Workers pipeline:

ProxyHttpClient: for statusCode >= 500, the error message preference
now flips to errorData.message || errorData.error || …, because the
proxy puts the human-readable text in message and only the class name
("ServerError") in error.
proxyErrorHandler: fallback detection now also matches
errorMessage === 'ServerError' and
errorMessage.startsWith('Cloudflare Workers AI server error').

All three paths now reliably yield the same toast: "The AI service is
temporarily unavailable. Please try again in a few moments."

Documentation Screenshot E2E Coverage

A new Playwright suite (e2e/specs/docs-screenshots.spec.ts) writes
screenshots directly into the sibling TensorPMWebsite/public/images/docs
folder. Onboarding, project-context dashboard, action items, timeline,
guidance, people, budget, files, trail, and AI integration settings are
all auto-shot from a seeded createDocumentationProject() with realistic
categories, dates, budget buckets, expenses, context-change proposals, and
a seeded decision row. Docs images can no longer drift behind the UI.

Coverage Split Layout Margin

The Guidance Coverage split layout was edge-to-edge on desktop and mobile.
Margins are now symmetric: 1.5 rem desktop, 1 rem mobile (top margin
unchanged).

Prettier Formatting Sweep

A repository-wide prettier reformat (138 files) was committed separately to
keep the IPC refactor's review surface smaller. Touches SKILL.md, older
changelogs, MCP tool definitions, several Trail/Budget/Wizard components,
and the four vendored sandbox bundles. No behavior change.

🐛 Bug Fixes

Distiller

Stale review cards no longer leak into later apply/final turns.
Skipped updates now get a fresh revisit memory.
Final-completion wording no longer frames normal completion as a
repair-only step.
Material write tools are no longer offered by Phase 2 only to be rejected
at execution. They're also no longer visible in normal chat.
Normal chat no longer reintroduces raw material items after a write
refresh.
Distiller routing now uses dynamic model-aware safe budgets instead of a
fixed token threshold.
Pre-check token/credit logging uses the actual provider/model metadata.
safeBudgetFraction is now actually used.
Unsupported Mistral prompt-cache-key expectations removed.
First-token timeout raised centrally for longer Distiller/Claude turns.
ActionItems proposals with unresolvable rows now fall back to
originalChanges so the review card remains visible.

Materials

Partial material updates no longer wipe existing metadata (category,
waste factor, revision, source file, verification timestamp).
Empty-string material IDs during approved creates now generate a UUID
instead of failing as a missing ID.
Material context filtering is preserved across project refreshes.
New regression tests cover direct material tool execution and metadata
preservation.

Skills

Install-time trust boundaries tightened (.git/ rejected; manifest name
validated against catalog ID before atomic rename).
Runtime trust gating extended to describe_skill (no leaking of
untrusted SKILL.md bodies into agent context).
Vendored sandbox bundles reformatted for consistent diff output.
Additional skill lifecycle, catalog, installer, trust, and artifact-
export coverage.

UI

Coverage split layout margin fix (desktop and mobile).
AI panel docking layout and history behavior reworked.
Material Panel styling.
Distiller decision/proposal styling.
Trail rendering for decisions.
File Explorer skill badge/status rendering.

Tests & Infrastructure

ActionItemExecutor test suite split from one mega-file into focused
files: delete, generate, reEvaluate, split, updateFields.
New ActionItemExecutor.updateFields.test.ts (24 tests, 1208 lines)
covers the previously-untested update path with focus on assignee
resolution (ambiguity, email/name case-folding, unknown agent providers),
dependency validation, effort range, budget sanitization.
New provider-cache integration tests for ChatGPT, Claude, and Gemini
verify that enableSystemPromptCaching / promptCacheKey actually
produce cache hits.
New helper for single-column migration tests.
Render local-AI E2Es, Distiller full-coverage E2Es, documentation
screenshot E2Es, and material UI/store coverage all added.

📦 Dependency Updates

Production:

@anthropic-ai/sdk 0.92.0 → 0.96.0 (BYOK Anthropic)
openai 6.35.0 → 6.38.0 (BYOK OpenAI)
@google/genai 1.51.0 → 1.52.0
@github/copilot 1.0.36 → 1.0.45
@powersync/common 1.52.0 → 1.53.1, @powersync/node 0.18.4 → 0.18.6
better-sqlite3 12.9.0 → 12.10.0
i18next 26.0.10 → 26.2.0, react-i18next 17.0.7 → 17.0.8
mermaid 11.14.0 → 11.15.0
zod 4.4.2 → 4.4.3

Security (isolated single-PR bumps, GHSA-style):

protobufjs 7.5.5 → 7.5.8
@protobufjs/utf8 1.1.0 → 1.1.1

Development (all SemVer minor/patch, no breaking):

Electron 40.9.3 → 40.10.0
TipTap suite (12 packages) 3.22.5 → 3.23.4
Vitest / @vitest/coverage-v8 4.1.5 → 4.1.6, fast-check 4.7.0 → 4.8.0
@playwright/test 1.59.1 → 1.60.0
@reduxjs/toolkit 2.11.2 → 2.12.0, react-redux 9.2.0 → 9.3.0
React / React DOM 19.2.5 → 19.2.6
@typescript-eslint 8.59.1 → 8.59.3, tsx 4.21.0 → 4.22.1,
dompurify 3.4.2 → 3.4.4, various @types/node patches

📝 Notes

Material Sync Deployment

material_items is a new sync table. Cloud-workspace deployments must
apply the corresponding add-material-items.sql on the
TensorPMSync/server/scripts/ side before this desktop release ships,
otherwise PowerSync upload queues will stack 500s with
relation "material_items" does not exist.

Deployment order:

Apply material sync SQL on the sync backend.
Compact / verify the sync deployment.
Verify staging sync behavior.
Ship the desktop binary.

MCP Decision API

External MCP clients should no longer rely on the removed list_decisions
tool. Decisions are part of project context — read via get_project, write
via the explicit decision tools (record_decision, supersede_decision,
withdraw_decision, link_decision, unlink_decision).

Skill Catalog: No Bundled Defaults

The Phase-1 catalog refactor removed the legacy SkillSeeder. There are no
built-in skills shipped with the desktop binary. All skills must be
installed from the remote catalog (or placed manually in
<project>/skills/).

Provider Vision-Loop Scope

Inline-image tool-result transport is currently limited to Anthropic and
Gemini wire formats. OpenAI, Mistral, Kimi-via-proxy, and local Ollama
models get the rendered file saved to disk plus a hint, not the actual
image. The proxy multimodal path is on the roadmap.

E2E Policy for This Release

The release E2E gate now runs Playwright in fail-fast mode
(--max-failures=1). If one E2E test fails, the release process stops
immediately instead of continuing through the rest of the suite.

Known Follow-Up Areas

Not blockers, but useful:

Cache discipline improvements for provider tool/system prompt caching.
Further extraction of chatService.streamInstanceMessage.
Additional unit coverage around Distiller routing branches.
Consolidation of duplicated material status constants.
MCP API version/changelog note for external clients affected by the
decision tool changes.

📅 Release Info

Version: 1.4.0-beta.2
Release Date: May 18, 2026
Previous Version: 1.3.0-beta.1
Type: Beta Minor Release (bundles drafted 1.3.0-beta.2 language work
and the larger feature set that briefly carried a 1.4.0-beta.1 working
version)

TensorPM v1.4.0-beta.2 (Beta)