Skip to content

fix(aem-workflow-skills) - updated gaps in the workflow skills.#108

Open
akankshajain18 wants to merge 25 commits intoadobe:mainfrom
akankshajain18:skill-workflow-update
Open

fix(aem-workflow-skills) - updated gaps in the workflow skills.#108
akankshajain18 wants to merge 25 commits intoadobe:mainfrom
akankshajain18:skill-workflow-update

Conversation

@akankshajain18
Copy link
Copy Markdown
Collaborator

@akankshajain18 akankshajain18 commented Apr 28, 2026

Description

Comprehensive quality pass across all seven AEM Workflow skills for both AEM 6.5 LTS and AEM as a Cloud Service variants. Changes span 99 files covering correctness, IDE-LLM consumability, and structural integrity.

Skills updated: workflow-debugging, workflow-development, workflow-launchers, workflow-model-design, workflow-orchestrator, workflow-triaging, workflow-triggering

Related Issue

Motivation and Context

The aem-workflow skill suite is consumed directly by AI-IDE tooling to guide developers through workflow tasks. Incorrect API names, broken cross-references, outdated XML formats, and missing runbook entries cause the LLM to generate invalid code or misleading guidance. This PR closes all known correctness
and consumability defects identified during a structured QA sweep of both the 6.5 LTS and Cloud Service variants.

How Has This Been Tested?

  • Each commit was validated against the specific defect it closes (broken links resolved, API names verified against AEM Javadoc, XML formats verified against
    CRX/DE-observed model structure).
  • Cross-references between workflow-triaging and workflow-debugging symptom_id sets were reconciled manually.
    • AEMaaCS runbooks were reviewed against Cloud Service documentation for accuracy.
    • Model XML format changes were verified against the /conf/global/settings/workflow/models path structure used in 6.5 LTS and AEMaaCS.

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • I have signed the Adobe Open Source CLA.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

akankshajain18 and others added 21 commits April 29, 2026 02:11
…rectness

Bundle 12 runbooks + 3 docs under references/ in both 6.5-lts and
cloud-service variants, plus OSGi config examples and a working
StaleWorkflowServlet for cloud-service. Also apply targeted Cloud
Service correctness fixes that the previous JMX-copied content masked:

- StaleWorkflowServlet: add 403 guard on wfSession.isSuperuser() — the
  prior code silently returned only workflows the caller initiated for
  non-superusers, so ops would read "staleCount: 0" while the system
  had a large stale backlog. Push RUNNING state filter into the JCR
  query instead of loading every workflow into memory.
- Cloud Service SKILL.md: replace /libs/granite/operations/config/maintenance
  (/libs is read-only on AEMaaCS) with
  /conf/global/settings/granite/operations/maintenance. Add post-deploy
  verification step for queue.maxparallel override with service.ranking
  tiebreak guidance and an equal-ranking duplicate-registration warning.
- Correct the cq.workflow.job.max.procs myth in both variants — real
  parallelism knob is queue.maxparallel on the Granite Workflow Queue
  (verified against WorkflowSessionFactory source).
- Rewrite 4 Cloud Service runbooks (stale-workflows, failed-work-items,
  purge-and-cleanup, job-throughput-and-concurrency) — they were
  previously byte-identical copies of 6.5-lts, telling customers to
  invoke JMX operations that are not reachable on Cloud Service.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… 3 docs for AEMaaCS correctness

The Cloud Service variant of workflow-debugging shipped with runbooks and docs
that were byte-identical copies of the 6.5-lts variant, instructing customers
to invoke JMX operations (countStaleWorkflows, retryFailedWorkItems,
purgeCompleted, countRunningWorkflows, returnSystemJobInfo, etc.) that are
not reachable on AEMaaCS. This commit diverges the Cloud Service copies and
replaces every JMX remediation with its AEMaaCS-correct equivalent.

Runbooks rewritten (Cloud Service variant):
- runbook-decision-guide.md — first-action column now routes to servlet /
  OSGi-config / Developer Console paths; adds a JMX→CS translation table.
- runbook-workflow-stuck.md — full CS rewrite; adds thread-pool saturation
  check for system-wide auto-advance failure.
- runbook-workflow-fails-or-shows-error.md — CS-correct retry/terminate
  flow; propagates the audit-trail warning for bulk replay (pharma / finance
  / legal must not use terminate+restart).
- runbook-task-not-in-inbox.md — IMS-federation-aware; adds principal-
  rotation gotcha (assigning to individuals is a time bomb).
- runbook-inbox-and-permissions.md — group-vs-individual superuser guidance
  with repoinit patterns; warns against toggling enforce flags off in prod.
- runbook-launcher-not-starting.md — ui.content deploy flow, run-mode
  scoping, /libs read-only guard, CRX/DE non-durability note.
- runbook-model-delete-and-update.md — replaces JMX countRunningWorkflows
  with Workflow Console + custom read-only servlet; Sync silent-failure
  gotcha (empty OR/AND branch).
- runbook-validate-workflow-setup.md — checklist-first restructure with a
  copy-paste pre-release block.

Docs rewritten:
- mbeans.md — full rewrite as a JMX→Cloud Service operation translation
  table. The previous file described unreachable JMX infrastructure as if
  it were callable, which silently misled customers.
- error-patterns.md — adds Cloud Service log-access context (Cloud Manager
  Logs, Splunk, Developer Console), logger-class reference, and the
  LogManager factory config approach for raising log levels without Felix.
- debugging-index.md — variant note clarifying that symptom_ids are
  portable but runbook_ref targets are CS-specific.

Verification:
- 12/12 CS runbooks now diverge from 6.5-lts (was 0/12 before this series).
- 3/3 CS docs files diverge from 6.5-lts.
- 0 broken intra-skill links across all 15 CS runbook+doc files.
- 0 prescriptive JMX call-sites in CS runbooks (operation names appear
  only in "not reachable on AEMaaCS — use X instead" translation context).

Scope: all changes live under cloud-service/.../workflow-debugging/references/.
No changes to workflow-orchestrator, workflow-development, workflow-triaging,
workflow-triggering, workflow-launchers, or workflow-model-design.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…unbooks

11 runbooks referenced 7 sibling docs that were never authored:
configurations.md, custom-process-development.md, jcr-paths.md,
workflow-editor-and-steps.md, authoring-and-inbox.md, references-and-sources.md,
and examples/example-jmx-purge-and-restart.md. The dead links made 9 cross-refs
non-functional in runbook-validate-workflow-setup.md alone and another 21
across the rest of the 6.5-lts runbooks.

Fix: remove the broken links and replace load-bearing ones with inline
pointers to SKILL.md Step 5 (which already holds the OSGi property matrix
they were trying to point at). Dropped links where the target was duplicative
of content elsewhere in the skill family (custom-process-development,
workflow-editor-and-steps) rather than authoring new docs.

Per-file:
- runbook-decision-guide.md: configurations.md → inline SKILL.md Step 5 ref
- runbook-failed-work-items.md: References section cleaned; SKILL.md pointer
- runbook-inbox-and-permissions.md: References cleaned; SKILL.md pointer
- runbook-job-throughput-and-concurrency.md: 2 configurations.md refs removed
- runbook-launcher-not-starting.md: References cleaned; inline path note
- runbook-model-delete-and-update.md: jcr-paths.md removed (paths already inline)
- runbook-purge-and-cleanup.md: References cleaned; SKILL.md pointer
- runbook-task-not-in-inbox.md: References cleaned; SKILL.md pointer
- runbook-workflow-fails-or-shows-error.md: References cleaned
- runbook-workflow-stuck.md: inline custom-process ref replaced with Felix
  Console hint; References cleaned
- runbook-validate-workflow-setup.md: full rewrite — 9 broken links scattered
  across body and routing table; now a checklist-first runbook with a
  copy-paste pre-release block

Verified: 0 broken intra-skill links across both 6.5-lts and cloud-service
variants after this change.

Scope: only plugins/aem/6.5-lts/skills/aem-workflow/workflow-debugging/.
No changes to cloud-service, workflow-orchestrator, workflow-development,
workflow-triaging, workflow-triggering, workflow-launchers, or
workflow-model-design. No new docs authored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… symptom_id, /libs leak

Four in-scope residuals from the post-rewrite audit:

- cloud-service runbooks/README.md — was written pre-rewrite and warned that
  "the runbooks in this folder still reference JMX MBeans — use the
  translation table below". After the 12-runbook rewrite that framing is
  obsolete and contradicts the runbooks. Replaced with a light table of
  contents + JMX-operation → runbook pointers + bundled-artifact index.

- debugging-index.md (both variants) — workflow_auto_advance_failure appears
  in each variant's SKILL.md Step 1 table but was missing from the
  machine-readable YAML index and lookup table. Added a YAML block with
  root_cause_categories and a lookup-table row so scripts / agents keying
  on debugging-index.md can classify auto-advance failures.

- 6.5-lts runbook-decision-guide.md — cloud-service version got the
  workflow_auto_advance_failure row in the prior rewrite pass; 6.5-lts
  still had the original 11-row table. Added the matching row.

- cloud-service Purge Scheduler example JSON — the "//" comment block cited
  /libs/granite/operations/config/maintenance as the scheduling window
  location. That's the 6.5-lts path and is read-only on AEMaaCS. Updated
  the comment to point at /conf/global/settings/granite/operations/maintenance
  (matching the P2-a fix applied to SKILL.md and runbook-purge-and-cleanup.md
  earlier). JSON still parses.

Verified post-fix:
- 0 broken intra-skill links across both variants.
- 0 obsolete JMX-framing mentions in README.md.
- workflow_auto_advance_failure present in SKILL.md + debugging-index.md +
  runbook-decision-guide.md in both variants.
- Purge Scheduler JSON still valid; /libs only mentioned in the corrective
  "Do NOT reference /libs on AEMaaCS" comment.

Scope: only plugins/aem/{6.5-lts,cloud-service}/skills/aem-workflow/workflow-debugging/.
No changes to workflow-orchestrator, workflow-development, workflow-triaging,
workflow-triggering, workflow-launchers, or workflow-model-design.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… set with workflow-debugging

The References section in workflow-triaging/SKILL.md (both variants) pointed
at 4 paths relative to an "aem-agent-marketplace-workflow-knowledge-base" root
that does not exist in this repo:

- aem-agent-marketplace-workflow-knowledge-base/docs/debugging-index.md
- runbooks/runbook-decision-guide.md (triaging has no runbooks/ folder)
- Workflow-docs/splunk-workflow-triaging.md (parent dir does not exist)
- docs/error-patterns.md (triaging has no docs/ folder)

Replaced with resolvable paths into the sibling workflow-debugging skill
(../workflow-debugging/references/docs/ and .../runbooks/). Dropped the
Splunk-file reference in favor of a pointer to Step 3 where the Splunk
queries are already inlined.

Also added the workflow_auto_advance_failure symptom row to Step 1 in both
variants — it was present in workflow-debugging/SKILL.md but missing from
triaging, so "workflow auto-advance stopped firing" had no classifier.
Triaging and debugging now share an identical 12-symptom taxonomy.

Verified:
- 0 broken links in workflow-triaging/SKILL.md (both variants).
- Symptom-id set matches 1:1 between triaging and debugging.
- No changes to workflow-debugging, workflow-orchestrator, workflow-development,
  workflow-triggering, workflow-launchers, or workflow-model-design.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…llback, and polish

Technical accuracy
- ProjectEditorsChooser now returns a rep:principalName (reads editors
  multi-value property on the project roles node) instead of a JCR path.
- Workflow rules noted as ECMAScript (Rhino); removed incorrect Groovy mention.
- Workflow-package detection uses adaptTo(ResourceCollection.class) — the
  primary-type check for cq:WorkflowContentPackage was inaccurate.
- getRoutes comment clarified; points to getBackRoutes for back routes.
- DS R6 template now guards on payloadType == JCR_PATH.
- Dropped unused @reference fields from DS R6 and Felix SCR templates.
- SimpleMetaDataMap availability note added for test scope.

Scope and consistency
- Variant Scope now includes AMS deployments of 6.5 LTS (aligns with parent).
- Felix SCR lifecycle clarifier: supported only for 6.5 LTS lifetime.
- quick-start-guide.md intent tree adds debugging + triaging rows.

Customer-facing safety
- Rollback section added: bundle uninstall (with change-control caveat),
  launcher disable, pre-termination confirmation, 15-min verification window.
- Escalation section added: four trigger criteria + artifact collection pack.
- PII / payload-content logging guardrail.
- Local-only caveats on curl -u admin:admin examples.

Navigation
- SKILL.md References expanded to link all five foundation files.
- New Audience line and inline Prerequisites summary in SKILL.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mo Banner Approval validation

Gaps exposed when validating an agent-generated workflow against the spec:

- Add canonical design-time model page wrapper (cq:Page + cq:WorkflowModel)
  to jcr-paths-reference.md, with the correct cq:template
  (/libs/settings/workflow/templates/model) and sling:resourceType, plus an
  explicit "do not use" list covering the agent-hallucinated path
  (.../workflow-model) and the legacy /etc-era path (/libs/cq/workflow/...).

- Add Guardrail in SKILL.md: model XML and Java are co-authored. The
  PROCESS= value on a cq:WorkflowNode must resolve to a deployed FQ class
  or process.label, otherwise the engine fails with "Process not found".

- Add Pattern 4 to participant-step-patterns.md: route a participant step
  to the workflow initiator via a chooser that reads the engine-set
  'initiator' metadata key. Includes a no-go on PARTICIPANT="\$initiator\$"
  substitution which is not consistently supported on 6.5 LTS.

- Add Notification-Only Participant Steps section: one node per distinct
  outcome; do not multiplex outcomes through a shared step that branches
  in Java. Email path explicitly out of scope.

- Add OR_SPLIT-after-Participant section to variables-and-metadata.md
  with worked Approve/Reject ECMAScript transitions, document-order
  evaluation rule, and explicit anti-pattern callout.

- Remove the Testing section from process-step-patterns.md. Workflow-step
  testing is generic AEM testing (better served by AEM Mocks /
  aem-mock-junit5) and out of scope for this skill. Also removes the
  SimpleMetaDataMap visibility caveat that came with it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… api-reference copies

WorkflowTransition.getRule() is evaluated by the Rhino ECMAScript engine on
AEM 6.5 LTS. Groovy is not part of the Granite workflow rule pipeline —
the prior "ECMA/Groovy" wording was misleading and could push customers to
write Groovy rules that silently never compile. Aligns six sibling
api-reference.md copies and drops the now-redundant "Groovy is not supported"
line in variables-and-metadata.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tes and patterns

Bring the skill's copy-ready samples and decision rules up to the level a
Cursor / Claude Code consumer needs to generate correct AEM 6.5 LTS workflow
code on the first pass.

SKILL.md:
- Variant Scope: explicit DS R6 vs Felix SCR decision rule for new code,
  matching the project's existing annotation style rather than mixing both.
- Variant Scope: AEMaaCS stop-rule — refuses to generate 6.5-only patterns
  for cloud-service projects.
- Felix SCR template: replaced "// same body as DS R6 example" with the
  full body so an LLM does not mis-substitute.
- Workflow checklist: payload-type guard moved before payload extraction,
  matching the corrected Pattern 1 ordering.

process-step-patterns.md:
- Pattern 1: type-check before getPayload().toString() (BLOB/null safety).
- Pattern 5: getServiceResourceResolver / commit() exception handling
  added — the prior snippet did not compile against
  WorkflowProcess.execute(...) signature.
- Pattern 6: null guard on resolver.getResource(payloadPath) before
  adaptTo(Node.class) — prevents NPE → stuck workflow on missing payload.
- Pattern 2: PROCESS_ARGS marked legacy with explicit "do not generate
  for new steps" directive so an LLM defaults to named args.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…architecture considerations

Bring the model-design skill up to the level a Cursor / Claude Code consumer
needs to produce deployable, well-shaped AEM 6.5 LTS workflow models on the
first pass. Several samples taught patterns that would silently fail to
deploy or stall at runtime.

SKILL.md:
- Audience, Prerequisites, Required Permissions sections — match the
  sibling workflow-development skill's structure.
- Dependencies section — explicit Java-first / model-second order of
  operations; prevents the most common cross-skill failure mode (Process
  not found).
- Variant Scope: AEMaaCS stop-rule.
- Workflow checklist: payload-type wording aligned with the corrected
  guardrail (cq:Page collection via adaptTo(ResourceCollection.class)).
- Workflow checklist step 8: verify the model loads in the Workflow Model
  Editor — catches page-wrapper failures at the developer's desk.
- Architecture Considerations: transient vs persistent, participant
  timeouts, Goto retry caps, design-time purge configuration, model
  versioning. Closes the gap that the skill taught XML structure but not
  workflow design judgment.

model-xml-reference.md:
- Full canonical /conf model XML structure: cq:Page → cq:PageContent (with
  required cq:template and sling:resourceType) → cq:WorkflowModel "model"
  child. Prior incomplete root-only XML deployed but the Workflow Model
  Editor would not load it.
- Common-pitfalls block listing the wrong template/structure variants the
  community frequently reaches for.
- SetVariableProcess argument modes (LITERAL, RELATIVE_TO_PAYLOAD,
  ABSOLUTE_PATH, EXPRESSION, VARIABLE, JSON_DOT_NOTATION, XPATH).

65-lts-guardrails.md:
- Workflow-package detection: replaced cq:WorkflowContentPackage primary-
  type check with adaptTo(ResourceCollection.class) — the prior pattern
  silently missed every multi-page workflow payload.
- Manual purge curl: local-development-only warning + admin-credential
  caveat.

model-design-patterns.md:
- Pattern 2: declared `routes` via session.getRoutes(item, false) — prior
  snippet referenced an undefined variable.
- Pattern 4: Long retryCount with 0L default — prior int default risked
  ClassCastException across step boundaries.
- OR_SPLIT and Goto rules: strict equality (===) with String(...) wrap to
  match sibling skill's convention; eliminates Rhino coercion fragility.
- Pattern 4 Goto rule: type-safe raw read + longValue() to survive Java
  Long values being read from Rhino.
- Pattern 5 (Task Manager): canonical PROCESS XML with PROCESS=Task Manager
  Step, PROCESS_AUTO_ADVANCE={Boolean}false, taskTitle/Description/
  Instructions/Owner/Priority. Prior version showed only the conceptual
  flow.

step-types-catalog.md:
- ECMAScript (Rhino) terminology aligned with api-reference.md.
- Goto Step section (XML + arg explanation + hard-cap rule), so retry-loop
  generation is fully self-contained.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…chitecture considerations

Bring the triggering skill up to parity with the now-richer workflow-development
and workflow-model-design skills. Fixes defect classes that an IDE-LLM consumer
would faithfully reproduce — wrong workflow-package detection, NPE-prone model
lookup, deprecation-confusion, and a Sling Scheduler example that taught
workflow-flood patterns.

SKILL.md:
- Audience, Dependencies, Prerequisites, Required Permissions sections —
  parity with sibling skills; Dependencies explicitly states the upstream
  workflow-model-design and workflow-development requirements.
- Variant Scope: AEMaaCS stop-rule (no traditional replication agents,
  different HTTP auth, /etc paths deprecated).
- Manage Publication payload: replaced cq:WorkflowContentPackage description
  with the corrected workflow-package guidance — cq:Page collection under
  /var/workflow/packages/ (newer) or /etc/workflow/packages/ (legacy),
  detected at runtime via adaptTo(ResourceCollection.class).
- Programmatic example: getModel() null-guard with descriptive
  WorkflowException; explicit non-blocking note on startWorkflow().
- HTTP Workflow API: local-development-only warning on curl examples plus
  service-account guidance for non-local environments.
- Architecture Considerations: async-by-default, bulk-trigger caps,
  transient workflows for high-volume triggers, recursive-trigger
  prevention, mechanism-stacking warning, initiator-as-service-user
  semantics.
- Triggering Mechanisms Summary: removed unclear "Classic UI Activate"
  row; replaced with explicit "Replication Trigger (6.5 LTS only)" row
  that points at Section 5.

programmatic-api.md:
- Service Class Pattern: getModel() null-guard with descriptive exception;
  inline async-execution note.
- Sling Scheduler example: hard cap (MAX_PER_RUN=500), per-iteration
  try/catch around startWorkflow (one bad payload no longer aborts the
  batch), getModel null check, named LOG instead of inline
  LoggerFactory.getLogger(getClass()), specific exception types instead
  of generic Exception, transient-workflow cross-reference for
  high-volume triggers.

triggering-mechanisms.md:
- Manage Publication payload: same cq:WorkflowContentPackage correction
  as SKILL.md.
- Service user requirement: corrected the false "deprecated SlingRepository.
  loginService()" — loginService() is the supported method; only
  loginAdministrative() is deprecated and must not be used.
- HTTP Workflow REST API: local-development-only warning on curl examples.

65-lts-guardrails.md (this skill's copy):
- Workflow Packages section: replaced primary-type check with the
  canonical adaptTo(ResourceCollection.class) pattern, matching the fix
  already applied in workflow-model-design.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r AI-IDE consumers

UX review pass on the post-architecture-fix skill. The skill produced excellent
codegen but went quiet on the post-trigger conversation — "did it work?", "how
do I parse the response?", "can I cancel?", "why isn't it running?" all had
no canonical answer. These six additions close that loop.

SKILL.md:
- Triggering Mechanisms Summary: added a "Common Scenarios" sub-table that
  routes developer intent ("nightly batch job processes pending assets",
  "CI pipeline triggers review after deploy", etc.) directly to the right
  mechanism. Lifts the Decision Matrix from the reference into the
  first-read surface.
- Manage Publication: added the payload-shape pairing requirement — the
  selected workflow model must be designed for multi-page payloads, with
  PROCESS steps that adapt the payload to ResourceCollection and iterate.
  Prevents the "model + Manage Publication" combination that fails on the
  first step.
- HTTP Workflow API: added an explicit response-shape note. POST returns
  201 with the new instance path in the Location response header (capture
  this in CI scripts); 4xx/5xx with Sling JSON error body on failure;
  GET returns a JSON array; DELETE returns 200.
- HTTP Workflow API: added cancellation semantics — termination is
  irreversible, completed steps are not rolled back, in-flight execute()
  is abandoned, the instance becomes ABORTED and cannot be resumed.
  Prefer suspend/resume for business-critical workflows.
- Verifying the Trigger: new section with three concrete confirmation
  paths (UI / HTTP / Java). Closes the most common post-trigger question:
  "did it actually start?"
- When the trigger succeeds but the workflow doesn't progress: explicit
  failure-mode handoff. Most common cause is a missing WorkflowProcess
  registration (cross-ref to workflow-development Dependencies); for full
  diagnosis points at workflow-debugging. Closes the "I triggered something
  and it's broken" conversational loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hitecture considerations

Add Audience, Variant Scope (with AEMaaCS stop-rule), Dependencies, Prerequisites,
Required Permissions, Common Scenarios (intent→pattern + when-not-to-use),
Architecture Considerations (glob narrowing, multi-event amplification, loop
prevention, transient workflows, lower-env discipline, mechanism stacking), and
Verifying the Launcher sections to SKILL.md. Document `transient` and
`noProcess` properties in launcher-config-reference. Flag `runModes` honoring as
unreliable on 6.5 LTS in both files and steer toward `config.author/` packaging.
Add a local-dev-only guardrail on the `curl -u admin:admin` debug example.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ten routing

Add Audience, Variant Scope (with AEMaaCS stop-rule), Dependencies, and
Cross-Cutting Invariants (loop prevention, JMX safety, AEMaaCS stop-rule) to
SKILL.md. Flip routing default to debugging-first; load workflow-triaging only
when the user explicitly invokes a multi-instance / log-mining context. Drop
production-support routing rows that force IDE-LLM hallucination of
Splunk / multi-host / ticket context, and remove Patterns E and G; renumber
Pattern F to E. Fix Pattern A step 6 ("Sync via Package Manager" was
incorrect — split into the real Tools → Workflow → Models → Sync action and
the Maven autoInstallPackage iteration path). Flip Pattern C service-user
mapping to a dedicated-user-first recommendation with narrow ACLs. Extend
workflow-debugging reference loading to include runbooks and docs subdirs.
Add local-dev-only guardrail to quick-start-guide.md curl examples.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s all 7 skills

Strengthen the AEMaaCS stop-rule in workflow-debugging and workflow-triaging so
their Variant Scope blocks match the imperative "Not for AEM as a Cloud Service"
form already used by the dev cluster — IDE LLM consumers were getting weaker
guidance from the operational skills. Propagate the orchestrator's full
JMX-safety invariant ("Never recommend JMX remediation without confirming target
instance with the user") into the two skills that actually emit JMX commands.
Add a launcher-re-trigger loop-prevention guardrail (setUserData
"workflowmanager") to workflow-development Guardrails — the skill that emits
process-step code now carries the constraint at the point of code generation,
not just from the orchestrator. Add Audience and Dependencies blocks to
workflow-debugging, workflow-triaging, and workflow-development so all 7 skills
share the same opening structure. Fix workflow-orchestrator frontmatter
description (still said "spanning development and production support" after the
body was cleaned), and add a runModes-on-launchers reliability row to its
Guardrails Summary table. Add a "Routing back to dev skills" subsection to
workflow-debugging so diagnoses that conclude in code/model defects can route
forward to the right dev skill.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…p-prevention guardrail

The loop-prevention guardrail in workflow-orchestrator, workflow-launchers, and
workflow-development told the LLM to call
`session.getWorkspace().getObservationManager().setUserData("workflowmanager")`
on the `session` parameter — but in `WorkflowProcess.execute(WorkItem,
WorkflowSession, MetaDataMap)` that parameter is a `WorkflowSession`, which
does not expose `getWorkspace()`. An IDE LLM faithfully reproducing the
guardrail would emit code that does not compile. Update all three sites to
make the JCR-Session-vs-WorkflowSession distinction explicit and show the
`adaptTo(javax.jcr.Session.class)` step inline, plus a note that a service-user
`ResourceResolver` write path uses a different `Session` instance and must be
tagged separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ice variant — Phase 1

Replace the launchers debug curl that emitted admin:admin against an /etc
endpoint — neither is appropriate for Cloud Service (production auth is
IMS-based; /etc/workflow/launcher.json is not the canonical surface on
AEMaaCS) — with Tools → Workflow → Launchers UI guidance plus a local-AEMaaCS-
SDK-only fallback. Add an imperative safety guardrail above the OOTB launcher
overlay section warning never to disable dam_update_asset_* / dam_xmp_writeback
without confirming intent — disabling these silently breaks asset processing.
Flip the orchestrator service-user mapping (Guardrails Summary table and
Pattern C step 3) from "Always use workflow-process-service" to "Use a
dedicated service user with narrow ACLs" — reusing the OOTB privileged user as
an application sub-service violates least-privilege. Fix the workflow-
development Variant Scope claim that the bundle "goes into the ui.apps content
package" — the Java source lives in the core (or equivalent) Maven module and
the built bundle is wrapped by the all content package, not ui.apps. Add a
Tools → Workflow → Models → Sync step to orchestrator Pattern A so the
runtime path at /var/workflow/models/<id> matches design-time — without it the
engine cannot resolve the deployed model. Fix the workflow-debugging Reference
Loading Order entry that pointed at a non-existent reference.md and now lists
the actual debugging-index.md and runbooks paths that exist on disk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… cloud-service variant — Phase 2

Mirror the 6.5-lts suite-level review work for the AEMaaCS variant, with
AEMaaCS-specific framing throughout — the underlying tools and surfaces differ
(no Felix Console JMX in production, Cloud Manager pipeline-only deploy,
Developer Console for diagnostics, IMS auth, all-package wrapping) so the
guardrail wording is not a literal port of the 6.5-lts text.

Add Audience + Variant Scope (with reverse "Not for AEM 6.5 LTS" stop-rule) +
Dependencies blocks to all 7 cloud-service skills so the IDE LLM has the same
opening structure across both variants. Add Cross-Cutting Invariants to the
orchestrator (loop prevention with WorkflowSession-vs-JCR-Session
disambiguation, AEMaaCS-flavored JMX-safety: "no JMX in production — use Inbox
Retry / Purge Scheduler / Cloud Manager-driven config changes", and the 6.5-LTS
reverse stop-rule). Propagate the JMX-safety framing into workflow-debugging
and workflow-triaging where JMX commands are emitted, replacing the weaker
existing wording with imperative "Never recommend JMX-based remediation"
guardrails. Remove the Production Support routing rows that forced LLM
hallucination of Splunk / multi-host / ticket context and remove orchestrator
Pattern D ("Workflow errors on host X for past 4 hours"); rename Pattern E to
Pattern D. Flip the routing default from triaging-first to debugging-first;
load workflow-triaging only on explicit invocation. Bring workflow-development
Guardrails to parity with the 6.5-lts variant (PII/payload-content logging
guard, model-vs-Java co-authorship constraint, loop-prevention with
disambiguated JCR Session adapt step). Add an Architecture Considerations
section to workflow-launchers (glob narrowing, multi-event amplification, loop
prevention, transient workflows, lower-env discipline via run-mode-aware
folders, mechanism stacking) — adjusted for AEMaaCS auto-scaling cost
implications. Strengthen the runModes guidance: the property has known
honoring issues, and the canonical AEMaaCS pattern is config.author/-style
run-mode-aware folder packaging. Fix the workflow-debugging file's broken
[reference.md](reference.md) link by listing the actual references that exist
on disk (debugging-index.md, runbooks/, error-patterns.md, mbeans.md). Add a
"Routing back to dev skills" closing subsection to workflow-debugging matching
the 6.5-lts variant. Add a runModes reliability row to the orchestrator
Guardrails Summary. Add a local-AEMaaCS-SDK-only guardrail above the HTTP
Workflow API curl examples in workflow-triggering, and flip its service-user
mapping recommendation from "workflow-process-service" to dedicated user.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ase 3

Update workflow-orchestrator frontmatter description from "spanning development
and production support" (which contradicted the cleaned body) to a routing-
focused line with the AEMaaCS-only / 6.5-LTS-stop-rule gate so the LLM's
skill-selection step sees the variant gate without loading the body. Add an
"author-tier only by default" note above the orchestrator's Quick Architecture
Recap so the LLM does not assume publish-tier workflow infrastructure on
AEMaaCS (publish is read-mostly and replication-driven; workflow execution
runs on author). Add a "cloud environments only" note above the Cloud Service
Production Support Constraints table that distinguishes cloud environments
from the local AEMaaCS SDK — the SDK has Felix Console with JMX, accepts
Package Manager uploads, supports admin:admin auth, and gives jstack access,
none of which apply to cloud, and conflating the two leads the LLM to suggest
local affordances against cloud. Sweep ECMA / Groovy / "ECMA (JavaScript)"
rule-language references to "ECMAScript (Rhino)" across workflow-model-design
SKILL.md, model-xml-reference, step-types-catalog, model-design-patterns,
workflow-development variables-and-metadata, and the five workflow-foundation
api-reference duplicates — Groovy was never the correct term, workflow rules
on AEMaaCS are evaluated by Rhino. Mirrors the 6.5-lts ac5e5f6 sweep so the
LLM emits the canonical wording.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rrectness defects

The same logical workflow-foundation/ directory was duplicated across 5 sub-skills
with 4 different SHAs of 65-lts-guardrails.md, 3 of quick-start-guide.md, and 2 of
jcr-paths-reference.md. The orchestrator copy (loaded first per its own SKILL.md)
carried a stale workflow-package detection pattern (cq:WorkflowContentPackage +
/etc/workflow/packages/) that an IDE LLM would faithfully emit, silently making
multi-page Manage-Publication payloads fall through the single-payload branch.

Consolidate to one canonical workflow-foundation/ under workflow-orchestrator/
referenced by relative path from each sub-skill. SHA drift is now structurally
impossible. Promote the corrected ResourceCollection adapter pattern, the local-dev
curl disclaimer, and the design-time model page wrapper template into the canonical
copies. Delete the four duplicate trees.

Also close three smaller IDE-LLM defects:
- condition-patterns.md multi-condition .content.xml example was missing the
  backslash-escaped commas FileVault requires; an LLM copying it produced a
  launcher that silently never matched. Fix and call out the failure mode.
- condition-patterns.md "Run Mode Patterns" section taught runModes="[author]"
  as primary, contradicting every SKILL.md's "unreliable on 6.5 LTS — package
  under config.author/" caveat. Add the same caveat.
- workflow-orchestrator Pattern C said "do not reuse workflow-process-service"
  while workflow-triggering correctly says it is the standard target. Align
  orchestrator with triggering — reuse OOTB unless the starter writes outside
  scope or compliance requires a narrower user.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…orkflow skills

The "use OOTB workflow-process-service vs. dedicated service user"
guidance had no Adobe-doc backing and kept flipping across commits.
Drop the prescriptive lines from workflow-orchestrator constraint
table, Pattern C step 3, and workflow-triggering Variant Scope /
Required Permissions in both cloud-service and 6.5-lts variants.

Mechanical content (ACL list, ServiceUserMapper requirement,
"never admin credentials") is preserved where it appears in the
non-prescriptive Guardrails sections.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@rombert rombert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshajain18 - please ensure that the Adobe CLA bot identifies you as an employee. Then I can grant you write access and you can submit the PR from a branch of this repo so the extended checks will run.

akankshajain18 and others added 2 commits April 30, 2026 14:35
Replace the incorrect cq:WorkflowModel/cq:WorkflowNode/cq:WorkflowTransition
structure (runtime /var format) with the correct flow/parsys design-time format
that the AEM Workflow Model Editor actually reads and writes at /conf. Fixes four
reference files:

- model-xml-reference.md: fix cq:template to /libs/cq/workflow/templates/model,
  add cq:designPath, replace model/nodes/transitions with flow/parsys structure,
  rewrite pitfalls section, clarify variables as runtime-only, add Sync instruction
- step-types-catalog.md: rewrite all step snippets to nt:unstructured +
  sling:resourceType, add sling:resourceType table, document initiatorparticipant,
  remove {Boolean} type hints, remove transition XML (not in flow layer)
- model-design-patterns.md: fix Pattern 5 Task Manager XML node type, annotate
  Pattern 6 variables block as runtime-only
- SKILL.md: add Sync + test-instance verification to checklist, add /conf vs /var
  distinction note under storage table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Apply the same flow/parsys design-time format corrections as the 6.5 LTS fix,
preserving AEMaaCS-specific differences (Cloud Manager deploy, no /etc path,
EXTERNAL_PROCESS step type, cloud-service-guardrails references):

- model-xml-reference.md: was using cq:WorkflowModel as root element (runtime
  format) and wrong file path jcr:content/model/.content.xml; rewritten to
  correct cq:Page/flow/parsys design-time structure, fix cq:template to
  /libs/cq/workflow/templates/model, add cq:designPath, rewrite pitfalls,
  clarify variables as runtime-only, add /etc deprecation note
- step-types-catalog.md: rewrite all step snippets to nt:unstructured +
  sling:resourceType, add sling:resourceType table including EXTERNAL_PROCESS,
  document initiatorparticipant, remove {Boolean} type hints, remove transition
  XML, add AEMaaCS note on Activate/Deactivate Page
- SKILL.md: fix checklist step 5, add Sync + test-instance verification steps,
  fix Cloud Service Deployment path, add /conf vs /var distinction note
- model-design-patterns.md: annotate Pattern 6 variables block as runtime-only

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@trieloff trieloff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumping a 99-file PR without substantive description, motivation, or explanation makes the work for maintainers incredibly difficult. Improve the quality or close the PR.

@akankshajain18
Copy link
Copy Markdown
Collaborator Author

Dumping a 99-file PR without substantive description, motivation, or explanation makes the work for maintainers incredibly difficult. Improve the quality or close the PR.

@trieloff , the changes are around the gaps in workflow skills. This PR is kind of initial work for workflow skills, no other skill has been modified all files and improvements were around the workflow skills.
Also as there are multiple files, i have not squashed all the commit, each commit has description and details what has been changed.
I will update overall PR description.
From next time, i will keep single skill PR, let it approved and then work on other workflow skills.

Please check commit history for details

@akankshajain18
Copy link
Copy Markdown
Collaborator Author

@akankshajain18 - please ensure that the Adobe CLA bot identifies you as an employee. Then I can grant you write access and you can submit the PR from a branch of this repo so the extended checks will run.

updates: able to identified as an employee by Adobe CLA bot.

@akankshajain18 akankshajain18 requested review from rombert and trieloff May 4, 2026 16:52
…to cloud-service variant

Add Output Contract, Default Path Rule, Runtime Model Structure, and
Forbidden Patterns sections to the cloud-service workflow-model-design
skill — aligning it with the equivalent fixes applied to 6.5-lts in
the preceding commit.
@rombert
Copy link
Copy Markdown
Member

rombert commented May 5, 2026

@akankshajain18 - now that you're part of this github org I have added you as a collaborator so you can push to branch of this repo and have the full CI checks run.

Not sure if you can reuse the same PR but please explicitly address Lars' comment from #108 (review) when you update.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants