Skip to content
/ go-exml Public
forked from lucsky/go-exml

An event based XML parsing API for Go

License

Notifications You must be signed in to change notification settings

akavel/go-exml

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

go-exml Build Status

The go-exml package provides an intuitive event based XML parsing API which sits on top of a standard Go encoding/xml/Decoder, greatly simplifying the parsing code while retaining the raw speed and low memory overhead of the underlying stream engine, regardless of the size of the input. The module takes care of the complex tasks of maintaining contexts between event handlers allowing you to concentrate on dealing with the actual structure of the XML document.

Installation

go get github.com/lucsky/go-exml

or

go get http://gopkg.in/lucsky/go-exml.v1

Usage

The best way to illustrate how go-exml makes parsing very easy is to look at actual examples. Consider the following contrived sample document:

<?xml version="1.0"?>
<address-book name="homies">
    <contact>
        <first-name>Tim</first-name>
        <last-name>Cook</last-name>
        <address>Cupertino</address>
    </contact>
    <contact>
        <first-name>Steve</first-name>
        <last-name>Ballmer</last-name>
        <address>Redmond</address>
    </contact>
    <contact>
        <first-name>Mark</first-name>
        <last-name>Zuckerberg</last-name>
        <address>Menlo Park</address>
    </contact>
</address-book>

Here is a way to parse it into an array of contact objects using go-exml:

package main

import (
    "fmt"
    "os"

    "github.com/lucsky/go-exml"
)

type AddressBook struct {
    Name     string
    Contacts []*Contact
}

type Contact struct {
    FirstName string
    LastName  string
    Address   string
}

func main() {
    reader, _ := os.Open("example.xml")
    defer reader.Close()

    addressBook := AddressBook{}
    decoder := exml.NewDecoder(reader)

    decoder.On("address-book", func(attrs exml.Attrs) {
        addressBook.Name, _ = attrs.Get("name")

        decoder.On("contact", func(attrs exml.Attrs) {
            contact := &Contact{}
            addressBook.Contacts = append(addressBook.Contacts, contact)

            decoder.On("first-name", func(attrs exml.Attrs) {
                decoder.On("$text", func(text exml.CharData) {
                    contact.FirstName = string(text)
                })
            })

            decoder.On("last-name", func(attrs exml.Attrs) {
                decoder.On("$text", func(text exml.CharData) {
                    contact.LastName = string(text)
                })
            })

            decoder.On("address", func(attrs exml.Attrs) {
                decoder.On("$text", func(text exml.CharData) {
                    contact.Address = string(text)
                })
            })
        })
    })

    decoder.Run()

    fmt.Printf("Address book: %s\n", addressBook.Name)
    for _, c := range addressBook.Contacts {
        fmt.Printf("- %s %s @ %s\n", c.FirstName, c.LastName, c.Address)
    }
}

To reduce the amount and depth of event callbacks that you have to write, go-exml provides stacked events. Here's how you can simplify the last example using stacked events:

...
...

contact := &Contact{}

decoder.On("first-name/$text", func(text exml.CharData) {
    contact.FirstName = string(text)
})

decoder.On("last-name/$text", func(text exml.CharData) {
    contact.LastName = string(text)
})

decoder.On("address/$text", func(text exml.CharData) {
    contact.Address = string(text)
})

...
...

Finally, since using nodes text content to initialize struct fields is a pretty frequent task, go-exml provides a shortcut to make it shorter to write. Let's revisit the previous example and use this shortcut:

...
...

contact := &Contact{}
decoder.On("first-name/$text", decoder.Assign(&contact.FirstName))
decoder.On("last-name/$text", decoder.Assign(&contact.LastName))
decoder.On("address/$text", decoder.Assign(&contact.Address))

...
...

Another shortcut allows to accumulate text content from various nodes to a single slice:

...
...

info := []string{}
decoder.On("first-name/$text", decoder.Append(&info))
decoder.On("last-name/$text", decoder.Append(&info))
decoder.On("address/$text", decoder.Append(&info))

...
...

Benchmarks

The included benchmarks show that go-exml can be massively faster than standard unmarshaling and the difference would most likely be even greater for bigger inputs.

% go test -bench . -benchmem
OK: 16 passed
PASS
Benchmark_UnmarshalSimple      50000         55643 ns/op        6141 B/op        128 allocs/op
Benchmark_UnmarshalText       100000         23253 ns/op        3522 B/op         64 allocs/op
Benchmark_UnmarshalCDATA      100000         24455 ns/op        3611 B/op         64 allocs/op
Benchmark_UnmarshalMixed      100000         29596 ns/op        4120 B/op         70 allocs/op
Benchmark_DecodeSimple       5000000           375 ns/op          57 B/op          3 allocs/op
Benchmark_DecodeText         5000000           541 ns/op         123 B/op          5 allocs/op
Benchmark_DecodeCDATA        5000000           541 ns/op         123 B/op          5 allocs/op
Benchmark_DecodeMixed        5000000           541 ns/op         123 B/op          5 allocs/op
ok      github.com/lucsky/go-exml   23.966s

License

Code is under the BSD 2 Clause (NetBSD) license.

About

An event based XML parsing API for Go

Resources

License

Stars

Watchers

Forks

Packages

No packages published