Skip to content

breaking: varint encoding for column index and field lengths #314

Open
@michaelkirk

Description

@michaelkirk

Currently column idx are u16 and field lengths (for Strings, Binary, etc.) are u32. I expect in practice that column indexes would almost always fit in a 1 byte varint and field lengths typically in 3 bytes (if not 2).

The properties data is already not random access, it must be processed serially. So there's no loss of functionality there.

This would be a major breaking change, so I don't expect it to be adopted anytime soon, but if you end up making a breaking format release in #81, you should consider piling this on.

I made a prototype here: https://github.com/michaelkirk/flatgeobuf/tree/mkirk/varint

I was working with openaddresses data which is a lot of point geometries with short string columns. Using varints for columns and field lengths outputs a file 85% the size of the original.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions