Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an optional MIME types VLR that describes other VLRs in the file #142

Open
hobu opened this issue Nov 9, 2023 · 1 comment
Open

Comments

@hobu
Copy link
Contributor

hobu commented Nov 9, 2023

What is the issue about?

Inquiry about the specification

Issue description

Problem

I wish the LAS ecosystem had better VLR interoperability. Unless they are baked into the specification, VLRs are not really consumable without "just knowing" a particular user_id / record_id combination. Usually that's only your own VLRs, but maybe a particular application might know about one or two other application's VLRs and treat them accordingly.

One thing that is becoming increasingly needed is for storing metadata WITH the point cloud data. Sometimes that metadata is a full FGDC metadata document, sometimes it's just a Word .docx or a .pdf, or maybe it is a simple Markdown text file that describes the process of how the file was made. Regardless, the specification has no way to communicate the type of content inside a VLR. It would be really nice to be able to do this for metadata.

The NGA BPF Specification has a concept of a "Bundle File" that is a little like a VLR. The idea is to stuff whatever you want into a blob and give it a filename. The content type is implicitly defined by that filename's extension, however. There's no MIME type to explicitly tell you want that file is supposed to be. I think we could do better with LAS by providing an optional VLR that gives a simple map of user_id / record_id / mimetype / (optional) filename.

Proposal

[
    {
        "user_id":"PDAL",
        "record_id":12,
        "mimetype":"application/json",
        "description":"PDAL metadata output as a JSON document"
    },
    {
        "user_id":"USGS",
        "record_id":86,
        "mimetype":"application/vnd.openxmlformats-officedocument.wordprocessingml.document"
        "filename":"metadata.docx",
        "description":"Random stuff pasted into a word document that MIGHT describe how the data came into being"
    }
]

Notes:

  1. We should use JSON Schema to describe a schema document for these things (and any other JSON VLRs we might make).
  2. This isn't a replacement for the header.
  3. You are probably writing this at the end of the file as an EVLR since you don't know your content types until after you write them all

FAQ

Why make a new VLR instead of augmenting the current VLR headers?

Because it should be optional and we don't want to cause people to change any existing software.

Why use JSON?

It's what people use for this kind of thing nowadays. Depending on the schema, it can also be extendable so people can add their own stuff to it if they want. That said, I'm biased toward JSON as a contributing author to the GeoJSON specification, so take my suggestion accordingly 😛

@hobu
Copy link
Contributor Author

hobu commented Apr 5, 2024

Additional comments:

  • MIME types are how the internet communicates the content of files and protocols. Aligning LAS with these conventions will make it easier for people using LAS in the context of other systems to communicate the content of LAS files.
  • There are already MIME types registered for both LAS, LAZ, and BPF. The current list can be found at https://www.iana.org/assignments/media-types/media-types.xhtml
  • It is likely that this VLR would be written as an EVLR at the end of a file, but it wouldn't have to be.
  • I would propose that each entry be REQUIRED to contain only user_id, record_id and mimetype. Any other fields, including complex JSON objects if desired, would be explicitly allowed.
  • If a record_id/user_id pair is not matched in the file, it should be ignored. This would allow applications to write a stock VLR for all of the VLR mimetypes they might add to a file.
  • If there is an existing mimetype VLR/EVLR in the file, writers must APPEND their entries to the JSON block, not overwrite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant