-
Notifications
You must be signed in to change notification settings - Fork 17
goxpath needs a rewrite #6
Comments
subscribe. |
Been busy and still have no time to work on this, but finally came up with some realistic pseudocode that outlines some fixes to what I've complained about: package main
import "github.com/ChrisTrenkamp/xsel"
type FooStruct struct {
Foo string `xml:"string"`
}
func main() {
xReader, _ := xsel.XmlReader(os.Open("..."))
store, _ := xsel.InMemoryStore(xReader)
for result := range xsel.Select("//foo", store) {
f := &FooStruct{}
_ := xsel.UnmarshalXml(f, result)
}
//Each of these returns an error as well, but it should get the point across
var arrResult []xsel.Result = xsel.All(xsel.Select("//foo", store))
var strResult string = xsel.AsString(xsel.Select("//foo", store))
var numResult float64 = xsel.AsNum(xsel.Select("//foo", store))
} First things first, it'll be a complete rewrite. The only thing that'll be transferred over is all the test cases. Let's call it xsel; it's easier to type than goxpath and has a bit of a catchier name (pretty sure I would fail marketing 101). The first interface is the The second interface is the When submitting an XPath query ( A common situation I kept running into was unmarshalling the XML into a struct, which, to be honest, is probably what 95% of people using XPath really need. There will be a default implementation for reading it into xml-tagged structs. The last three function calls are simply helpers around the XPath query. |
Better 4 years late than never, I suppose: https://github.com/ChrisTrenkamp/xsel Here's a recap on what I wanted to accomplish: tree's are not streamedA streaming API was incredibly difficult to create, and difficult and awkward to use. This goal has been trashed. tree's are format-agnostic, but the core library depends specifically on XML elementsxsel's core XPath logic is disconnected from any direct XML dependencies. XML tropes such as namespaces and processing instructions are defined in the package, but are not directly tied to any XML libraries and are not required. tree's cannot be alteredxsel has two general interfaces, node's for defining the data points themselves, and cursor's for defining the parent-child relationships and position numbers. It mandates the order in which namespaces and attributes are created, and it mandates how position numbers are created, but nothing else beyond that. It should be possible to define your own interface for creating modifiable tree structures. The public API is uglyThe following is an example for defining a custom function's in xsel. Should be easier to read and follow. package main
import (
"bytes"
"fmt"
"github.com/ChrisTrenkamp/xsel/exec"
"github.com/ChrisTrenkamp/xsel/grammar"
"github.com/ChrisTrenkamp/xsel/node"
"github.com/ChrisTrenkamp/xsel/parser"
"github.com/ChrisTrenkamp/xsel/store"
)
func main() {
xml := `
<root>
<a>This is an element.</a>
<!-- This is a comment. -->
</root>
`
isComment := func(context exec.Context, args ...exec.Result) (exec.Result, error) {
nodeSet, isNodeSet := context.Result().(exec.NodeSet)
if !isNodeSet || len(nodeSet) == 0 {
return exec.Bool(false), nil
}
_, isComment := nodeSet[0].Node().(node.Comment)
return exec.Bool(isComment), nil
}
contextSettings := func(c *exec.ContextSettings) {
c.FunctionLibrary[exec.Name("", "is-comment")] = isComment
}
xpath := grammar.MustBuild(`//node()[is-comment()]`)
parser := parser.ReadXml(bytes.NewBufferString(xml))
cursor, _ := store.CreateInMemory(parser)
result, _ := exec.Exec(cursor, &xpath, contextSettings)
fmt.Println(result) // This is a comment.
} There are also other improvements, such as separating the node information from the parent-child/position relationship's, making the API cleaner and easier to extend. It also uses a parser generator. It was a fun learning experience to hand-write my own lexer/parser, but it was also a mess and a nightmare to maintain. |
goxpath was mainly written for myself because the tools I used were either too complicated and cumbersome or insufficient. I made a lot of decisions early on that I now regret. Some of them were fixable, but others are inherit to its design and cannot be fixed without gutting huge portions of the library. I don't use goxpath as much as I was about half a year ago (and the commit history reflects that), but whenever I do, it has a bunch of flaws that bug the hell out of me. Here's a short list of ones that desperately need to be addressed:
tree's are not streamed
Anyone who has used goxpath on big XML files will notice this. My job is to work with many, many small XML files, so it was never really a concern, but as I start working with bigger XML files, this is starting to turn its ugly head.
The problem, however, is the core Go XML library does not have the ability to read previous elements, so if the tree automatically releases memory in the background, there's no way to get it back without re-opening the file.
tree's are format-agnostic, but the core library depends specifically on XML elements
I should have thought the core library out more thoroughly before putting in explicit XML dependencies. I should have made the data points on tree node's an interface so other format's can just define their own internal data points without making some ugly translation into XML.
tree's cannot be altered
It's becoming more apparent to me that people want this library for editing tree's, not just querying them. It can be done, but the default xmltree package was not made with editing in mind.
The public API is ugly
I focused a lot of the internal library and didn't pay much attention to the public API. It's not as fluent as I would like it to be, and it needs to be more elegant.
The text was updated successfully, but these errors were encountered: