Skip to content

Processing Instruction data lost unless it looks like attributes #133

@vblz

Description

@vblz

Summary

The data (ProcInst.Inst) of processing instructions (xml.ProcInst) is dropped when it isn’t attribute-like.
Per XML 1.0 §2.6, PI data after the target is an arbitrary string (not attributes) and should be preserved verbatim.

Example

package main

import (
	"encoding/xml"
	"fmt"
	"strings"
	"github.com/antchfx/xmlquery"
)

const content = `<?ProcInstTag random string ?><?AnotherProcInst a="b"?><a/>`

func main() {
	doc, err := xmlquery.Parse(strings.NewReader(content))
	if err != nil { panic(err) }
	fmt.Println(doc.OutputXML(true))

	d := xml.NewDecoder(strings.NewReader(content))
	t, _ := d.Token()
	fmt.Printf("%#v\n", t)
	t, _ = d.Token()
	fmt.Printf("%#v\n", t)
}

Output

<?ProcInstTag?><?AnotherProcInst a="b"?><a></a>
xml.ProcInst{Target:"ProcInstTag", Inst:[]uint8{0x72, 0x61, 0x6e, 0x64, 0x6f, 0x6d, 0x20, 0x73, 0x74, 0x72, 0x69, 0x6e, 0x67, 0x20}}
xml.ProcInst{Target:"AnotherProcInst", Inst:[]uint8{0x61, 0x3d, 0x22, 0x62, 0x22}}

• xmlquery drops the PI data for , serializing it as .
• Go’s encoding/xml correctly exposes both Inst values.

Expected

Round-tripping should keep PI data intact:

<?ProcInstTag random string ?><?AnotherProcInst a="b"?><a></a>

Parsed Node also is expected to contain the value from ProcInst.Inst

Why it matters

ProcInst.Inst content is arbitrary text; treating it as attributes causes silent data loss.

RCA

ProcInst.Inst seems to be parsed/retained only when it resembles attributes; otherwise it’s discarded during serialization:

xmlquery/parse.go

Lines 337 to 348 in deb27cf

case xml.ProcInst: // Processing Instruction
if p.prev.Type != DeclarationNode {
p.level++
}
node := &Node{Type: DeclarationNode, Data: tok.Target, level: p.level, LineNumber: p.currentLine}
pairs := strings.Split(string(tok.Inst), " ")
for _, pair := range pairs {
pair = strings.TrimSpace(pair)
if i := strings.Index(pair, "="); i > 0 {
AddAttr(node, pair[:i], strings.Trim(pair[i+1:], `"'`))
}
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions