Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- `page get --as raw` emits the page body's untouched source — storage-format
XHTML, or server-rendered HTML with `--body-format view` — with no
markdown/text rendering. Use it to inspect macros, round-trip-edit a page or
debug. It requires `--scope full`.
- `page get` now reports a `render_notes` field when markdown/text rendering
drops or degrades content (macros without a native rendering, images shown
as placeholders). Rendering loss was previously silent; when `render_notes`
is present, re-read with `--as raw` for the full source.

## [0.3.1] - 2026-05-19

### Fixed
Expand Down
11 changes: 9 additions & 2 deletions docs/cli/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -635,12 +635,16 @@ <h2>confluence-cli page get</h2>
outline list the headings (start here when the structure is unknown)
section one section, identified by --section &lt;id&gt; from the outline
keyword blocks matching --keyword, with their heading for context
full the entire body (default)</p>
full the entire body (default)

Rendering to markdown/text drops content it cannot represent (macros,
images); when that happens the result carries a render_notes field.
Use --as raw to get the untouched source body instead.</p>
<h3>Options</h3>
<table>
<thead><tr><th>Flag</th><th>Default</th><th>Description</th></tr></thead>
<tbody>
<tr><td><code>--as</code></td><td><code>markdown</code></td><td>render body as markdown or text</td></tr>
<tr><td><code>--as</code></td><td><code>markdown</code></td><td>output form: markdown, text or raw (unrendered source)</td></tr>
<tr><td><code>--body-format</code></td><td><code>storage</code></td><td>source body format: storage or view</td></tr>
<tr><td><code>--detail</code></td><td><code>simple</code></td><td>block detail: simple, with-ids or full</td></tr>
<tr><td><code>--keyword</code></td><td></td><td>keyword (with --scope keyword)</td></tr>
Expand All @@ -656,6 +660,9 @@ <h3>Examples</h3><pre><span class="c"> # render the whole page as Markdown</spa
confluence-cli page get 123456 --scope outline
confluence-cli page get 123456 --scope section --section sec-2

<span class="c"> # get the untouched storage XHTML (macros and all)</span>
confluence-cli page get 123456 --as raw

<span class="c"> # a page URL works in place of an ID</span>
confluence-cli page get https://wiki.example.com/pages/viewpage.action?pageId=123456</pre>
</section>
Expand Down
4 changes: 4 additions & 0 deletions docs/technical-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,10 @@ CI 校验不漂移)—— 本节不再维护并行的命令清单,以杜绝

交互式向导(`config init`、`auth login`)的提示走 stderr;错误只走 stderr。

读取侧:`page get` 的 `--as markdown|text` 渲染是有损的(不支持的宏被丢弃、图片
降级为占位符)。损耗不再静默 —— 渲染丢内容时输出带 `render_notes` 字段;`--as raw`
则原样返回未经渲染的正文源(storage XHTML 或 view HTML),作为无损出口。

### 6.2 错误

错误以 JSON 写 **stderr**:
Expand Down
6 changes: 6 additions & 0 deletions internal/app/app_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,12 @@ func mockConfluence(t *testing.T) *httptest.Server {
"body":{"storage":{"value":"<h1>Hi</h1><p>body text</p>","representation":"storage"}},
"_links":{"webui":"/display/ENG/Welcome"}}`))
})
mux.HandleFunc("/rest/api/content/790", func(w http.ResponseWriter, r *http.Request) {
w.Write([]byte(`{"id":"790","type":"page","status":"current","title":"Macro Page",
"space":{"key":"ENG"},"version":{"number":1},
"body":{"storage":{"value":"<p>before</p><ac:structured-macro ac:name=\"view-file\"><ac:parameter ac:name=\"name\"><ri:attachment ri:filename=\"resume.pdf\"/></ac:parameter></ac:structured-macro>","representation":"storage"}},
"_links":{"webui":"/display/ENG/Macro"}}`))
})
mux.HandleFunc("/rest/api/content/404", func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusNotFound)
w.Write([]byte(`{"message":"No content found"}`))
Expand Down
47 changes: 34 additions & 13 deletions internal/app/page.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@ type pageOutput struct {
Body string `json:"body,omitempty"`
ScopeApplied string `json:"scope_applied,omitempty"`
Truncated bool `json:"truncated,omitempty"`
// RenderNotes lists content the markdown/text renderer could not represent
// (e.g. unrendered macros). When non-empty, re-read with --as raw.
RenderNotes []string `json:"render_notes,omitempty"`
}

// dryRunOutput is the result shape emitted for a --dry-run write.
Expand Down Expand Up @@ -154,12 +157,17 @@ func newPageGetCmd(s *appState) *cobra.Command {
" outline list the headings (start here when the structure is unknown)\n" +
" section one section, identified by --section <id> from the outline\n" +
" keyword blocks matching --keyword, with their heading for context\n" +
" full the entire body (default)",
" full the entire body (default)\n\n" +
"Rendering to markdown/text drops content it cannot represent (macros,\n" +
"images); when that happens the result carries a render_notes field.\n" +
"Use --as raw to get the untouched source body instead.",
Example: " # render the whole page as Markdown\n" +
" confluence-cli page get 123456\n\n" +
" # list the headings, then read just one section\n" +
" confluence-cli page get 123456 --scope outline\n" +
" confluence-cli page get 123456 --scope section --section sec-2\n\n" +
" # get the untouched storage XHTML (macros and all)\n" +
" confluence-cli page get 123456 --as raw\n\n" +
" # a page URL works in place of an ID\n" +
" confluence-cli page get https://wiki.example.com/pages/viewpage.action?pageId=123456",
Args: cobra.ExactArgs(1),
Expand Down Expand Up @@ -187,17 +195,30 @@ func newPageGetCmd(s *appState) *cobra.Command {
Version: page.Version, Ancestors: page.Ancestors,
}
if !noBody && page.Body != nil {
rendered, err := render.Render(page.Body.Value, render.Options{
Scope: scope, Detail: detail, As: as,
SectionID: section, Keyword: keyword,
})
if err != nil {
return err
if as == "raw" {
// raw emits the body exactly as fetched, with no rendering;
// slicing the unparsed source is not supported.
if scope != render.ScopeFull {
return cerrors.New(cerrors.CategoryUsage, "RAW_NEEDS_FULL_SCOPE",
"--as raw supports only --scope full").
WithHint("Drop --scope, or drop --as raw to read a section.")
}
out.Body = page.Body.Value
out.ScopeApplied = "raw"
} else {
rendered, err := render.Render(page.Body.Value, render.Options{
Scope: scope, Detail: detail, As: as,
SectionID: section, Keyword: keyword,
})
if err != nil {
return err
}
out.Outline = rendered.Outline
out.Body = rendered.Body
out.ScopeApplied = rendered.ScopeApplied
out.Truncated = rendered.Truncated
out.RenderNotes = rendered.Notes
}
out.Outline = rendered.Outline
out.Body = rendered.Body
out.ScopeApplied = rendered.ScopeApplied
out.Truncated = rendered.Truncated
}
return s.emit(out)
},
Expand All @@ -208,12 +229,12 @@ func newPageGetCmd(s *appState) *cobra.Command {
f.StringVar(&scope, "scope", "full", "read scope: full, outline, section or keyword")
f.StringVar(&section, "section", "", "section ID (with --scope section)")
f.StringVar(&keyword, "keyword", "", "keyword (with --scope keyword)")
f.StringVar(&as, "as", "markdown", "render body as markdown or text")
f.StringVar(&as, "as", "markdown", "output form: markdown, text or raw (unrendered source)")
f.BoolVar(&noBody, "no-body", false, "fetch metadata only, skip the body")
enumComplete(cmd, "body-format", "storage", "view")
enumComplete(cmd, "detail", "simple", "with-ids", "full")
enumComplete(cmd, "scope", "full", "outline", "section", "keyword")
enumComplete(cmd, "as", "markdown", "text")
enumComplete(cmd, "as", "markdown", "text", "raw")
return cmd
}

Expand Down
45 changes: 45 additions & 0 deletions internal/app/writes_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,51 @@ func TestCmdSearchListEnvelope(t *testing.T) {
}
}

func TestCmdPageGetRenderNotes(t *testing.T) {
srv := mockConfluence(t)
out, err := runCLI(t, srv, "page", "get", "790")
if err != nil {
t.Fatal(err)
}
var got map[string]any
json.Unmarshal([]byte(out), &got)
notes, _ := got["render_notes"].([]any)
if len(notes) == 0 {
t.Fatalf("page with a view-file macro should report render_notes:\n%s", out)
}
if first, _ := notes[0].(string); !strings.Contains(first, "view-file") {
t.Errorf("render_notes should name the macro: %v", notes)
}
}

func TestCmdPageGetRaw(t *testing.T) {
srv := mockConfluence(t)
out, err := runCLI(t, srv, "page", "get", "790", "--as", "raw")
if err != nil {
t.Fatal(err)
}
var got map[string]any
json.Unmarshal([]byte(out), &got)
body, _ := got["body"].(string)
// raw emits the storage source untouched — the macro tag must survive.
if !strings.Contains(body, "<ac:structured-macro") {
t.Errorf("--as raw should return the unrendered source:\n%s", body)
}
if got["scope_applied"] != "raw" {
t.Errorf("scope_applied = %v, want raw", got["scope_applied"])
}
if _, leaked := got["render_notes"]; leaked {
t.Errorf("raw output renders nothing, so it must carry no render_notes")
}
}

func TestCmdPageGetRawRejectsPartialScope(t *testing.T) {
srv := mockConfluence(t)
if _, err := runCLI(t, srv, "page", "get", "790", "--as", "raw", "--scope", "outline"); err == nil {
t.Fatal("expected an error: --as raw supports only --scope full")
}
}

func TestCmdWhoami(t *testing.T) {
srv := mockConfluence(t)
out, err := runCLI(t, srv, "whoami")
Expand Down
44 changes: 39 additions & 5 deletions internal/render/parser.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,23 +32,57 @@ type Block struct {
SectionID string
}

// parse converts storage-format XHTML into a flat slice of Blocks.
func parse(storage string) []Block {
// parse converts storage-format XHTML into a flat slice of Blocks. It also
// returns notes describing content that markdown/text rendering cannot
// faithfully represent (see lossNotes).
func parse(storage string) ([]Block, []string) {
// Confluence wraps macro code in CDATA, which the HTML parser would drop;
// unwrap it so the inner text survives as text nodes.
src := strings.NewReplacer("<![CDATA[", "", "]]>", "").Replace(storage)

root, err := html.Parse(strings.NewReader("<html><body>" + src + "</body></html>"))
if err != nil {
return []Block{{Kind: KindPara, Text: strings.TrimSpace(stripTags(storage))}}
return []Block{{Kind: KindPara, Text: strings.TrimSpace(stripTags(storage))}}, nil
}
body := findBody(root)
if body == nil {
return nil
return nil, nil
}
var blocks []Block
walkBlocks(body, &blocks)
return blocks
return blocks, lossNotes(body)
}

// lossNotes walks the parsed tree and reports content that markdown/text
// rendering drops or degrades: structured macros without a native rendering,
// and images (shown only as a placeholder). Each kind is reported once.
func lossNotes(root *html.Node) []string {
seen := map[string]bool{}
var notes []string
var walk func(*html.Node)
walk = func(n *html.Node) {
if n.Type == html.ElementNode {
switch strings.ToLower(n.Data) {
case "ac:structured-macro":
name := attrNS(n, "name")
// code/noformat macros render losslessly as code blocks.
if name != "" && name != "code" && name != "noformat" && !seen["macro:"+name] {
seen["macro:"+name] = true
notes = append(notes, "unrendered macro: "+name+" (use --as raw to see the source)")
}
case "ac:image":
if !seen["image"] {
seen["image"] = true
notes = append(notes, "an image is shown only as a placeholder (use --as raw to see the source)")
}
}
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
walk(c)
}
}
walk(root)
return notes
}

func findBody(n *html.Node) *html.Node {
Expand Down
8 changes: 6 additions & 2 deletions internal/render/render.go
Original file line number Diff line number Diff line change
Expand Up @@ -64,16 +64,20 @@ type Rendered struct {
Body string `json:"body"`
ScopeApplied string `json:"scope_applied"`
Truncated bool `json:"truncated"`
// Notes lists content the renderer could not represent (macros without a
// native rendering, images shown as placeholders). It is empty when the
// markdown/text output is a faithful representation of the source.
Notes []string `json:"notes,omitempty"`
}

// Render parses storage-format XHTML and renders it according to opt.
func Render(storage string, opt Options) (Rendered, error) {
opt = opt.withDefaults()
blocks := parse(storage)
blocks, notes := parse(storage)
assignSections(blocks)
outline := buildOutline(blocks)

result := Rendered{Outline: outline, ScopeApplied: opt.Scope}
result := Rendered{Outline: outline, ScopeApplied: opt.Scope, Notes: notes}

switch opt.Scope {
case ScopeFull:
Expand Down
31 changes: 31 additions & 0 deletions internal/render/render_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -167,3 +167,34 @@ func TestRenderLink(t *testing.T) {
t.Errorf("link not rendered:\n%s", got.Body)
}
}

func TestRenderNotesReportsDroppedMacro(t *testing.T) {
t.Parallel()
storage := `<p>intro</p>` +
`<ac:structured-macro ac:name="view-file"><ac:parameter ac:name="name">` +
`<ri:attachment ri:filename="resume.pdf"/></ac:parameter></ac:structured-macro>` +
`<p><ac:image><ri:attachment ri:filename="diagram.png"/></ac:image></p>`
got, err := Render(storage, Options{Scope: ScopeFull})
if err != nil {
t.Fatal(err)
}
joined := strings.Join(got.Notes, "\n")
if !strings.Contains(joined, "view-file") {
t.Errorf("notes should report the dropped view-file macro: %v", got.Notes)
}
if !strings.Contains(joined, "image") {
t.Errorf("notes should report the placeholdered image: %v", got.Notes)
}
}

func TestRenderNotesEmptyForPlainPage(t *testing.T) {
t.Parallel()
// Headings, paragraphs, lists and code macros all render losslessly.
got, err := Render(sample, Options{Scope: ScopeFull})
if err != nil {
t.Fatal(err)
}
if len(got.Notes) != 0 {
t.Errorf("a faithfully rendered page should carry no notes: %v", got.Notes)
}
}
35 changes: 29 additions & 6 deletions skills/confluence/references/reading-pages.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,16 +56,39 @@ Returns each block containing the term plus its nearest heading for context.

## Output syntax

`--as markdown` (default) renders headings, lists, code and tables as Markdown.
`--as text` produces plain text. `--no-body` fetches metadata only.
`--body-format storage|view` selects the source representation (default
`storage`).
`--as` controls the output form:

| `--as` | output |
|--------|--------|
| `markdown` (default) | headings, lists, code, tables rendered as Markdown |
| `text` | plain text |
| `raw` | the body's **untouched source** — no rendering (requires `--scope full`) |

`--no-body` fetches metadata only. `--body-format storage|view` selects the
source representation to fetch (default `storage`).

## Rendering loss — macros and images

`markdown` / `text` rendering cannot represent every Confluence construct:
macros without a native rendering (e.g. `view-file`) are dropped, and images
become a `[image]` placeholder. When that happens `page get` reports a
**`render_notes`** array naming what was lost.

**If you see `render_notes`, the rendered `body` is incomplete.** Re-read the
page with `--as raw` to get the exact storage XHTML — macros and all — e.g. to
verify an embedded file or to round-trip-edit the page.

```bash
confluence-cli page get 12345 # render_notes appears if content was dropped
confluence-cli page get 12345 --as raw # the full, unrendered storage source
```

## Result shape

`page get` returns: `id`, `title`, `space_key`, `status`, `url`, `version`,
`ancestors`, and — when a body was fetched — `outline`, `body`, `scope_applied`
and `truncated`. A `truncated: true` means the scope omitted part of the page.
`ancestors`, and — when a body was fetched — `outline`, `body`, `scope_applied`,
`truncated` and (when rendering dropped content) `render_notes`. A
`truncated: true` means the scope omitted part of the page.

## Browsing the page tree

Expand Down