Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardization of common extrabytes #37

Open
esilvia opened this issue Sep 21, 2017 · 105 comments
Open

Standardization of common extrabytes #37

esilvia opened this issue Sep 21, 2017 · 105 comments
Assignees
Labels
enhancement wiki Changes to be incorporated into the wiki

Comments

@esilvia
Copy link
Member

esilvia commented Sep 21, 2017

We've discussed listing some standardized extrabytes either in the specification itself or a supplementary document. This would encourage their adoption by the community and formalize common extrabytes as a guideline to future implementations.

We need to figure out the following:

  1. Which extrabytes merit standardization?
  2. Which fields should be formalized? e.g., optional fields like min, max, and nodata might not make sense.
  3. Should data_type be formalized?
  4. Where will this list live? Will it formally be included in the specification itself (thereby requiring ASPRS approval every time one gets added), or perhaps as a wiki page on GitHub with a link from the specification? I propose the latter.
  5. What will be the approval process for new additions? (I propose people submit new Issues and then LWG votes yes/no).
  6. Should units be formalized? For example, will we have to have separate entries for "refracted depth" in meters and feet?

Below is a link to what I think is a decent start to the standardized extrabytes. Once we get some agreement on a few of these I can start building a wiki page or contribute to Martin's pull request. Which one we do depends on the answer to the 4th question.

Standard ExtraBytes v1.docx

@esilvia esilvia self-assigned this Sep 21, 2017
@rapidlasso
Copy link
Member

rapidlasso commented Sep 21, 2017

3.) I think for data_type we can make a "best" and "worst" practice recommendation that shows what scales and offsets use the fewest bytes possible to get reasonable resolution.
BEST:
For a return echo width stored with 0.1 ns resolution use an 0.1 scaled unsigned char to cover the range 0.0 ns to 25.5 ns or an 0.1 scaled unsigned short to cover the range 0.0 ns to 6553.5 ns.
For a return echo width stored with 0.01 ns resolution use an 0.01 scaled unsigned short to cover the range 0.00 ns to 655.35 ns.
WORST:
Avoid using floats or doubles and email Martin if you really want to start a heated discussion about storing linear measurements using a floating-point format. (-;

@lgraham-geocue
Copy link

Other useful stuff under the "common" category. Not fleshed out but just introduced as ideas:

Group ID - this is a 32 bit (64 bit?) unsigned int that is used as a group index (or Object ID, OID). For example, all points that "belong" to a specific building roof identified as Object ID 246 will have this tag set to 246. Relationship maps can be built in a VLR, EVLR.

Sigma X, Y, Z --> Standard deviation of the point expressed in the units of the projection. For Geographic data, the units are meters. The values are doubles or they could just be long and follow the point scaling.

Normal - 3 tuple that defines the normal to the surface at the point of incidence. Direction is opposite the ray direction (toward the laser scanner).

@esilvia esilvia added this to the v1.4 R14 milestone Sep 27, 2017
@rapidlasso
Copy link
Member

rapidlasso commented Oct 3, 2017

Two issues with the Normal that @lgraham-geocue suggests.

(1) Directions are always troublesome because they are difficult to re-project correctly when going from one CRS to another. In the PulseWaves format we've solved this by expressing directions vectors as two points. Re-projecting both is always going to be correct (even if we go to non-euclidean space). How about a "trajectory" index instead that reference "trajectory points" stored in the same LAS file (but marked synthetic) that are on the trajectory. These "trajectory points" are then given the same index so they can be paired up with the actual returns.

(2) Triplets have been deprecated.

@esilvia
Copy link
Member Author

esilvia commented Oct 3, 2017

@rapidlasso What is there to be gained by allowing the data_type to vary for a given ExtraByte definition? I've noticed that you allow it to vary for the "height from ground" extrabyte in your tools, but that's caused my implementations a little trouble when some files have it defined one way while others have it defined another way.

I guess this begs the question of why we're standardizing. I believe it's to encourage implementation by more software vendors, which means simplification is key. In my opinion that means guaranteeing a 1-to-1 relationship of the key attributes with a certain EB code. At a minimum I think data_type, name, and nodata should be fixed, while description, scale, offset, and validity of min/max are recommended.

What do you think about releasing a series instead? e,g, "height from ground [cm]" with data_type int16 gets one Standard value (e.g., 200) while "height from ground [mm]" with data_type int32 gets the next value (e.g., 201)?

@esilvia
Copy link
Member Author

esilvia commented Oct 3, 2017

@rapidlasso Good point about the difficulties with reprojection. I've had this struggle with the Origin vector of FWF data, and I've often wondered whether those vectors are getting modified correctly.

Unfortunately, if the points get shifted (e.g., from calibration) I doubt whether any software would also update the point coordinates. That's the advantage of the vector. As you point out, though, the disadvantage is that they're only valid for a given projection.

@lgraham-geocue
Copy link

lgraham-geocue commented Oct 3, 2017 via email

@rapidlasso
Copy link
Member

@esilvia, not feeling strongly about the data type issue. Your suggestion is also good as it would prohibit folks from storing "height above ground" or "echo width" as floating point numbers. Now that is something that I really do feel strongly about. How do I allow different data types in LASlib? I have a "get attribute as float value" function to use "extra bytes" for processing so the actual storage format of the extra attribute does not matter in my implementation.

@rapidlasso
Copy link
Member

@esilvia and @lgraham-geocue my suggestion is to start this standardization with very few (two or three) additional attributes that are likely to be used or that are already used. "Height above ground" is an obvious candidate for derived (i.e. not new) information. "Echo width" is an obvious candidate for additional (i.e. new) information. I would recommend to start with just those two and see how it works out before adding a larger number ...

@lgraham-geocue
Copy link

Yes, I agree. The more complex, the lower the adoption rate. I would like to see Group added in this initial change. It is just an unsigned long (4 byte) or unsigned long long (8 byte). In the initial version, there would be no restrictions on its use other than initializing to zero (meaning no group membership). We could write a short "best practices" on using Group but it would only be a guideline, not a requirement.

@esilvia
Copy link
Member Author

esilvia commented Apr 13, 2018

@lgraham-geocue I like the idea of a GroupID/ObjectID attribute. Should the NULL value be 0 or INTMAX? Not sure which is more intuitive.

What if there are two different kinds of groups that a point could belong to? Should we include recommendations for supporting multiple attributes of the same kind? e.g., GroupID[1], GroupID[2], etc?

@esilvia
Copy link
Member Author

esilvia commented Apr 13, 2018

Any preference on how to differentiate between the 32-bit and 64-bit ObjectID definitions? LongObjectID for 64bit?

@lgraham-geocue
Copy link

lgraham-geocue commented Apr 13, 2018 via email

@esilvia
Copy link
Member Author

esilvia commented Sep 5, 2018

Here's an update to the proposed standard extrabytes.
Standard ExtraBytes v2.docx

@esilvia
Copy link
Member Author

esilvia commented Sep 5, 2018

And here's another update including some of the feedback I got this summer at JALBTCX, adding the horizontal and vertical uncertainty fields.
Standard ExtraBytes v3.docx

@rapidlasso
Copy link
Member

The "Range" which is "defined as the three-dimensional distance from the sensor to the point, the range is useful for multiple computations such as intensity attenuation and measurement bias." is suggested to be of data type float. I vehemently oppose that. The data type should be an unsigned integer (or even just an unsigned short) with a scale that is similar to that of the LAS points (or less precise) and an offset of zero.

@rapidlasso
Copy link
Member

Are the tuples and triples finally deprecated? I'd like to completely remove them from LASlib. They never were properly supported and I've never seen them used anywhere.

@rapidlasso
Copy link
Member

I suggest we start with one or two or three standardization that are reasonably simple. My votes go to:

  • echo width (widely used by RIEGL exports)
  • height above ground (needed by many, already implemented and used long time in LAStools)
  • group ID (requested by Lewis and use case exists for Terrasolid's new group functionality)

@lgraham-geocue
Copy link

lgraham-geocue commented Sep 13, 2018 via email

@lgraham-geocue
Copy link

lgraham-geocue commented Sep 13, 2018 via email

@hobu
Copy link
Contributor

hobu commented Sep 13, 2018

In addition, all distance units in the file should be (we would say “must” in the spec) be in the vertical units of the Spatial Reference System of the file. I say vertical units because, in the USA, there are still some “official” SRS with horizontal in feet (INT, Survey?) and vertical in meters.

LAS abdicates responsibility for the coordinate system by handing it off to WKT. I disagree that the specification should get involved here, because the spec and the SRS are inevitably going to get into conflict.

LAS should investigate requiring OGC WKT2 in a future revision. WKT2 handles more situations and is more complete. See https://gdalbarn.com/ for some discussion related to the GDAL project on the topic (thanks for the contribution @lgraham-geocue!)

(tuples and triples) (Maybe Howard (Butler) is using them for something?

Triplets are common in graphics scenarios, and I proposed them thinking they would be well aligned with LAS. They aren't, and they introduce as many problems as they might solve. Few softwares produce or consume them. They should be dropped. No one will miss their removal.

@esilvia
Copy link
Member Author

esilvia commented Sep 21, 2018

@rapidlasso Tuples and triples will be officially dropped with the next revision (#1).

I agree that range could be confusing because of potential desynchronization with the SRS units, but I believe that fixing it at meters and leaving it with the points prevents its loss when the trajectory files inevitably get lost. We hard-code units for the angles (at degrees), so I don't see why we can't do this with Range. Software can easily change units displayed while leaving the units stored untouched.

You've persuaded me that starting with a small handful is a good idea, and I like Martin's list. I'm tempted to add the topobathy-related ones, but perhaps that's better left in the LDP?

@lgraham-geocue
Copy link

lgraham-geocue commented Sep 21, 2018 via email

@rapidlasso
Copy link
Member

@esilvia "Tuples and triples will be officially dropped with the next revision". Happy to hear that. I just kicked them out of LASlib last week ... (-:

@rapidlasso
Copy link
Member

rapidlasso commented Sep 21, 2018

@lgraham-geocue I disagree. A range is - similar to the scan angle - something measured by a scientific measurement instrument and should follow international standards. I could see how your argument could apply to "height above ground" but even here I'm leaning to always making the measurement unit part of the standardized "extra bytes" because (1) the CRS often gets stripped, (2) reprojecting coordinates from a feet-based CRS to a meter-based CRS (or vice versa) without rescaling the "extra bytes" leads to wrong ranges / heights above ground, and (3) the best choices in scale and offset change when we go from meter to feet. A scale factor of 0.01 may be good when measuring the range or the height above ground for an airborne scan in meters but it is overly precise for feet. We will open a whole can of worms of "extra bytes" that do not have the correct unit or where we do not know the correct unit if we let the vertical unit of the CRS decide this.

@esilvia
Copy link
Member Author

esilvia commented Oct 1, 2018

@rapidlasso You make a strong point regarding the scale/offset also being unit-dependent. I think that's also a strong argument in favor of fixing the units.

@lgraham-geocue observed that the horiziontal and vertical units can be different in LAS files, which is something I've also observed to my chagrin. Since Range is a 3D measurement, it could get very, very weird if the vertical units are meters and horizontal units are feet. I think this is another argument in favor of fixing the units at meters.

I can be persuaded that the height-from-ground will match the vertical units of the LAS file. Simple, and I think it's what people would expect when they receive data.

So here's the plan: I'm going to publish the following "Standard" extrabytes as a GitHub wiki page (https://github.com/ASPRSorg/LAS/wiki/Standard-ExtraByte-Definitions):

  • Pulse Width
  • Height Above Ground
  • Group ID (aka Object ID)
  • Horizontal Uncertainty
  • Vertical Uncertainty
  • Bathymetric Flags
  • Submerged/Refracted Vector Length (aka Refracted Depth)
  • Range

All of these Standard ExtraBytes will be assigned an integer value (ID) that can be assigned to the first two bytes of the ExtraByte definition structure (currently Reserved). It's a little longer than Martin's list but I think it captures the ones I've seen in the wild. I didn't get any feedback on incorporating the ExtraByte definitions from the topobathy LDP, so I decided to include the ones that I've seen most often.

Rather than include these definitions in the specification itself, I'll update the ExtraByte VLR description in the specification with a link to the wiki page and claim the two Reserved bytes for the ID field, which must be 0 unless it adheres to one of the definitions on the wiki page.

All of these changes will be included with the R14 revision, which I plan to submit to ASPRS in the next week or two. Last chance to comment. @rapidlasso @lgraham-geocue @csevcik01 @hobu @anayegandhi @jdnimetz

@rapidlasso
Copy link
Member

rapidlasso commented Mar 16, 2020

Do the TPU values require error values that are in increments of femtometer close to zero but drop exponentially to increments of decimeters close to one million? Then a float32 representation is suitable.

If the increments with which the error is to be expressed should be a constant centimeter or millimeter throughout the entire error value range then an unsigned integer scaled by 0.01 or 0.001 is the correct approach.

@parrishOSU
Copy link

The range of plausible vertical uncertainty values, considering the range of possible data sources, is probably meters to millimeters. Adding one order of magnitude in either direction gives tens of meters to tenths of millimeters. If we use a scale factor, where is the scale factor stored? Is it the same as for the X Y Z coordinates?

@rapidlasso
Copy link
Member

rapidlasso commented Mar 16, 2020

The "scale factor" is core part of the "extra byte" definition. I recently published a little blog post on how to use txt2las (which is open source) to map the information stored in these kind of ASCII lines of LiDAR information to the LAS format and you see examples with different numbers of decimal digits being used there:

1, 290.243, 28.663, -11.787, 0.060, -0.052, 0.997, 517.3170, -58.6934, 313.0817, 52
1, 290.208, 28.203, -11.825, 0.062, -0.056, 0.996, 517.3167, -58.6934, 313.0817, 49
1, 290.182, 27.739, -11.852, 0.063, -0.055, 0.997, 517.3164, -58.6935, 313.0817, 53
1, 290.165, 27.272, -11.866, 0.061, -0.058, 0.996, 517.3161, -58.6935, 313.0817, 53
1, 290.163, 26.800, -11.858, 0.061, -0.053, 0.997, 517.3157, -58.6935, 313.0817, 68
...

@ASPRSorg ASPRSorg deleted a comment from rapidlasso Jan 15, 2021
@esilvia esilvia removed this from the v1.4 R16 milestone Mar 15, 2021
@rapidlasso
Copy link
Member

The "beam ID" seems a rather easy first candidate for standardization. Clearly there is a need and clearly users already store this information to "extra bytes" like here as "Velodyne Rings". In this blog post I describe how to copy the beam ID from the "point source ID" field or from the "user data" field into a new "extra bytes" attribute with two calls to las2las, namely

las2las ^
-i Samara\Drone\00_raw_aligned\*.laz ^
-add_attribute 1 "laser beam ID" "which beam ranged this return" ^
-odir Samara\Drone\00_raw_temp -olaz

las2las ^
-i Samara\Drone\00_raw_temp\*.laz ^
-copy_user_data_into_attribute 0 ^
-set_user_data 0 ^
-set_point_source 0 ^
-odir Samara\Drone\00_raw_ready -olaz

@jo-chemla
Copy link

jo-chemla commented Aug 23, 2023

Hi there, checking in to see if there is now a ~standardized way to store surface normals within a las/laz file extra bytes fields.

There have been discussions on this thread regarding the way to store normals (3 floats xyz coords vs more efficient way to store a unit vector), but nothing regarding a commonly-agreed way to name these attributes eg NORMAL_X, normal z etc. A source from geocue indicates that Agisoft Metashape, Riegl and TerraScan all support exporting surface normals to las, but probably each with its own field name (could not check).
Are there other places where this discussion could happen? The topic does not seem heavily discussed on laserscanningforum for example.

PS I read the type 22 tuples and triples has been deprecated so the surface normal coordinates should be stored as individual field. PDAL defined the NormalX, NormalY, NormalZ fields, see dimensions, so this could be used as the standard definition for normal coords.

@jo-chemla
Copy link

jo-chemla commented Aug 28, 2023

After digging through a bit, here is what I found for normals naming convention for las files:

Lib/Software Normals Naming convention Reference
PDAL main convention NormalX PDAL dimensions
PDAL other supported any variation of nx, normal_x, normalx Dimension.json or official docs
Agisoft Metashape normal x with a space attribute0 -0.992126 0.992126 ('normal x') using LASTools/lasinfo
Other like Riegel, Terrascan No las sample found

PS: if a dedicated github issue should be opened specifically for the normal coords naming convention, please do tell me!

@jedfrechette
Copy link

The normals naming convention that we've adopted internally is NormalX. The reasoning for that choice was:

  1. It's what PDAL uses.
  2. After normalizing for case it is the same as what the E57 Normals extension uses: http://www.libe57.org/E57_EXT_surface_normals.txt
  3. It does not use any special characters, i.e. spaces.
  4. It is more explicit than abbreviations like nx

@jo-chemla
Copy link

Thanks for clarifying, I did not know NormalX was the convention adopted within the LAS spec. Great to hear!

@jedfrechette
Copy link

Sorry, just to clarify when I wrote "we've adopted..." I wasn't commenting on behalf of ASPRS or the LAS committee. That's just the convention we've ended up adopting within my company. For us that convention isn't specific to LAS either, but we needed a general convention to follow for data formats/applications to don't have their own definition.

@jo-chemla
Copy link

Woops sorry about the confusion and thanks for the contribution! Awaiting feedback from the LAS committee then.

@hobu
Copy link
Contributor

hobu commented Aug 31, 2023

Hi there, checking in to see if there is now a ~standardized way to store surface normals within a las/laz file extra bytes fields.

As you're finding out in PDAL PDAL/PDAL#4144 and maybe other places, I don't think this exists. We chose the NormalX composition because PDAL uses strings to define dimension types and it seemed both explicit enough and clean enough without spaces.

@abellgithub
Copy link

abellgithub commented Aug 31, 2023 via email

@esilvia
Copy link
Member Author

esilvia commented Sep 5, 2023

A while ago I became aware of a single-field convention being used in TerraSolid for this purpose... if I remember correctly they had packed the normal_x/y/z values into a 64bit or 32bit integer ExtraByte. It'll take me a while to dig that out of my notes if it's of interest.

Beyond that, the waveform PDRFs do include a directional vector that I've seen leveraged as a surface normal, but I wouldn't recommend it.

@esilvia
Copy link
Member Author

esilvia commented Sep 5, 2023

Just found my notes. In TerraSolid they store the Normal Vector ExtraByte as a 32bit integer -- 2 bits for the "model type" and then 15 bits for each polar angle. I don't have more detailed notes than that, but maybe someone has a contact at TerraSolid that can provide more detail?

Related: #1 (comment)

@esilvia
Copy link
Member Author

esilvia commented Sep 5, 2023

One final comment... I found an offline discussion thread between me, Martin, and the folks at AgiSoft about a method of encoding that may resemble their final implementation of normal vectors. I'm uploading it here now that we have a more public way to have these discussions.
Quantum Spatial Mail - tuples in LAS extra bytes.pdf

Beginning a new discussion thread on normal vectors may be helpful for posterity.

@agisoft-metashape
Copy link

Hello Evon, thanks for sharing our discussion.

We decided to use lower case name with space for normal vector extrabytes because this convention was used in other places we could find:
https://github.com/ASPRSorg/LAS/wiki/Standard-ExtraByte-Definitions
#37 (comment)

It seems a good idea to keep consistent convention for extrabyte names. LAS 1.4 R15 specification includes several examples of extrabyte names using lower case naming convention with spaces. It also has a link to official LAS wiki with Standard ExtraByte Definitions list that has even more examples, all using the same lower case naming convention. It makes me believe that it is an accepted naming convention for LAS extrabytes.

If lower case naming convention with spaces is indeed preferred for LAS extrabytes, then maybe adding support for this convention in PDAL is a better solution.

Please correct me if I am wrong and camel case convention should be preferred instead (does it need to be reflected in the LAS specification then?), or if there are any good reasons to ignore consistency in this case.

Dmitry

@hobu
Copy link
Contributor

hobu commented Sep 6, 2023

If lower case naming convention with spaces is indeed preferred for LAS extrabytes, then maybe adding support for this convention in PDAL is a better solution.

PDAL supports names in spaces, but since it supports so many other formats, it discourages them and normalizes them out.

Please correct me if I am wrong and camel case convention should be preferred instead (does it need to be reflected in the LAS specification then?), or if there are any good reasons to ignore consistency in this case.

The specification doesn't say and I think it is up to implementers to do what they want. That said, commonly agreed upon dimension names that mean specific things (like nx) would be a welcome community resource. We have https://github.com/PDAL/PDAL/blob/master/pdal/Dimension.json for PDAL but it is just a list more than an attempt at getting anyone to agree on name + meaning.

@esilvia
Copy link
Member Author

esilvia commented Sep 6, 2023

Thanks Dmitry @agisoft-metashape !

I've been following the lowercase-with-spaces-allowed convention for all of the standardized ExtraBytes in the wiki as a way of normalizing them, and I intend to continue that trend unless I hear a compelling reason to abandon it.

Nevertheless, as @hobu notes the specification itself is silent on a preferred naming convention.

@agisoft-metashape
Copy link

Thank you @esilvia and @hobu for your comments.

We are going to add support for PDAL style names ('NormalX', etc) on import in the next Metashape version. For export we will probably keep using our current lower case names until there is a consensus on the better solution.

@hobu PDAL 2.3.0 indeed normalized 'normal x' name to 'NormalX' on input, but that is not the case with PDAL 2.5.5 any more. If such normalization is indeed intended, maybe it can be added to PDAL again? This will probably solve reported compatibility issue.

@abellgithub
Copy link

Spaces in names is yuck. Other than that, I have no opinion.

@rapidlasso
Copy link
Member

Thanks Evon for sharing this conversation.
Thanks @ALL for the comments and the input.
Just to make my opinion known:
I also think the naming of extra bytes is not part of the
specifiaction.
On the other hand it would be good to have an orientation
how to name fields with a certain content.
Therefore the Wiki seems to be a good place.
About "good" names I think it is worth to avoid spaces in names
when names come close to identifiers or tokens which are may
also used in programming languages, databases and so on.
As soon we list those names in a wiki it is not a very good
argument to say "Is was like this in the past..."
So I would prefer names in
camel case or with "_" instead of a space.

@esilvia
Copy link
Member Author

esilvia commented Nov 30, 2023

Spaces in names is yuck. Other than that, I have no opinion.

I'm at JALBTCX this week discussing the LDP v2 that was developed in #117 and approaching publication. This includes updates to the ExtraBytes being used, and I'm trying to decide whether or not I should rename the ExtraByte fields in the v2 upgrade.

Can @abellgithub or anyone elaborate on "spaces in names is yuck"? I get that the dev community has historically avoided spaces like the plague (windows path errors, anyone?), but any sane language or library should be able to handle spaces in a char array, especially one that interacts with LAS which explicitly specifies that its char arrays are null-terminated.

Why would removing spaces be a need here? Are folks trying to use ExtraByte names as variable names or key values? Or is this not a need I should attempt to support?

@kjwaters
Copy link

kjwaters commented Nov 30, 2023 via email

@esilvia
Copy link
Member Author

esilvia commented Nov 30, 2023

My main reason to avoid spaces is that they mean you have to remember to quote or escape them any time you might want to use that name in a command line interface. Kirk

@kjwaters OH that makes total sense. I'll propose that change in the Bathy Working Group meeting tonight, if I get a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement wiki Changes to be incorporated into the wiki
Projects
None yet
Development

No branches or pull requests