Standardization of common extrabytes #37
Comments
|
3.) I think for data_type we can make a "best" and "worst" practice recommendation that shows what scales and offsets use the fewest bytes possible to get reasonable resolution. |
|
Other useful stuff under the "common" category. Not fleshed out but just introduced as ideas: Group ID - this is a 32 bit (64 bit?) unsigned int that is used as a group index (or Object ID, OID). For example, all points that "belong" to a specific building roof identified as Object ID 246 will have this tag set to 246. Relationship maps can be built in a VLR, EVLR. Sigma X, Y, Z --> Standard deviation of the point expressed in the units of the projection. For Geographic data, the units are meters. The values are doubles or they could just be long and follow the point scaling. Normal - 3 tuple that defines the normal to the surface at the point of incidence. Direction is opposite the ray direction (toward the laser scanner). |
|
Two issues with the Normal that @lgraham-geocue suggests. (1) Directions are always troublesome because they are difficult to re-project correctly when going from one CRS to another. In the PulseWaves format we've solved this by expressing directions vectors as two points. Re-projecting both is always going to be correct (even if we go to non-euclidean space). How about a "trajectory" index instead that reference "trajectory points" stored in the same LAS file (but marked synthetic) that are on the trajectory. These "trajectory points" are then given the same index so they can be paired up with the actual returns. (2) Triplets have been deprecated. |
|
@rapidlasso What is there to be gained by allowing the data_type to vary for a given ExtraByte definition? I've noticed that you allow it to vary for the "height from ground" extrabyte in your tools, but that's caused my implementations a little trouble when some files have it defined one way while others have it defined another way. I guess this begs the question of why we're standardizing. I believe it's to encourage implementation by more software vendors, which means simplification is key. In my opinion that means guaranteeing a 1-to-1 relationship of the key attributes with a certain EB code. At a minimum I think What do you think about releasing a series instead? e,g, "height from ground [cm]" with |
|
@rapidlasso Good point about the difficulties with reprojection. I've had this struggle with the Origin vector of FWF data, and I've often wondered whether those vectors are getting modified correctly. Unfortunately, if the points get shifted (e.g., from calibration) I doubt whether any software would also update the point coordinates. That's the advantage of the vector. As you point out, though, the disadvantage is that they're only valid for a given projection. |
|
Just always store the normals in ECEF.
|
|
@esilvia, not feeling strongly about the data type issue. Your suggestion is also good as it would prohibit folks from storing "height above ground" or "echo width" as floating point numbers. Now that is something that I really do feel strongly about. How do I allow different data types in LASlib? I have a "get attribute as float value" function to use "extra bytes" for processing so the actual storage format of the extra attribute does not matter in my implementation. |
|
@esilvia and @lgraham-geocue my suggestion is to start this standardization with very few (two or three) additional attributes that are likely to be used or that are already used. "Height above ground" is an obvious candidate for derived (i.e. not new) information. "Echo width" is an obvious candidate for additional (i.e. new) information. I would recommend to start with just those two and see how it works out before adding a larger number ... |
|
Yes, I agree. The more complex, the lower the adoption rate. I would like to see Group added in this initial change. It is just an unsigned long (4 byte) or unsigned long long (8 byte). In the initial version, there would be no restrictions on its use other than initializing to zero (meaning no group membership). We could write a short "best practices" on using Group but it would only be a guideline, not a requirement. |
|
@lgraham-geocue I like the idea of a GroupID/ObjectID attribute. Should the What if there are two different kinds of groups that a point could belong to? Should we include recommendations for supporting multiple attributes of the same kind? e.g., GroupID[1], GroupID[2], etc? |
|
Any preference on how to differentiate between the 32-bit and 64-bit ObjectID definitions? LongObjectID for 64bit? |
|
IT may complicate it a bit. A simple 32 or 64 bit Group number would probably be a good start (If we have only one, I would prefer 64 bit).
|
|
Here's an update to the proposed standard extrabytes. |
|
And here's another update including some of the feedback I got this summer at JALBTCX, adding the horizontal and vertical uncertainty fields. |
|
The "Range" which is "defined as the three-dimensional distance from the sensor to the point, the range is useful for multiple computations such as intensity attenuation and measurement bias." is suggested to be of data type float. I vehemently oppose that. The data type should be an unsigned integer (or even just an unsigned short) with a scale that is similar to that of the LAS points (or less precise) and an offset of zero. |
|
Are the tuples and triples finally deprecated? I'd like to completely remove them from LASlib. They never were properly supported and I've never seen them used anywhere. |
|
I suggest we start with one or two or three standardization that are reasonably simple. My votes go to:
|
|
I have never encountered them being used. Maybe Howard (Butler) is using them for something? I think he was the one who advocated for these structures.
|
|
Completely agree. In addition, all distance units in the file should be (we would say “must” in the spec) be in the vertical units of the Spatial Reference System of the file. I say vertical units because, in the USA, there are still some “official” SRS with horizontal in feet (INT, Survey?) and vertical in meters.
|
LAS abdicates responsibility for the coordinate system by handing it off to WKT. I disagree that the specification should get involved here, because the spec and the SRS are inevitably going to get into conflict. LAS should investigate requiring OGC WKT2 in a future revision. WKT2 handles more situations and is more complete. See https://gdalbarn.com/ for some discussion related to the GDAL project on the topic (thanks for the contribution @lgraham-geocue!)
Triplets are common in graphics scenarios, and I proposed them thinking they would be well aligned with LAS. They aren't, and they introduce as many problems as they might solve. Few softwares produce or consume them. They should be dropped. No one will miss their removal. |
|
@rapidlasso Tuples and triples will be officially dropped with the next revision (#1). I agree that range could be confusing because of potential desynchronization with the SRS units, but I believe that fixing it at meters and leaving it with the points prevents its loss when the trajectory files inevitably get lost. We hard-code units for the angles (at degrees), so I don't see why we can't do this with Range. Software can easily change units displayed while leaving the units stored untouched. You've persuaded me that starting with a small handful is a good idea, and I like Martin's list. I'm tempted to add the topobathy-related ones, but perhaps that's better left in the LDP? |
|
To be LAS 1.4, PDRF 6 compliant, a LAS file must have the vertical units encoded in the WKT. USGS has been rejecting data that do not have the units tag properly set.
We are inviting a space probe collision error by allowing mixed linear, unlabeled units, especially for states who do not use meters for anything.
|
|
@esilvia "Tuples and triples will be officially dropped with the next revision". Happy to hear that. I just kicked them out of LASlib last week ... (-: |
|
@lgraham-geocue I disagree. A range is - similar to the scan angle - something measured by a scientific measurement instrument and should follow international standards. I could see how your argument could apply to "height above ground" but even here I'm leaning to always making the measurement unit part of the standardized "extra bytes" because (1) the CRS often gets stripped, (2) reprojecting coordinates from a feet-based CRS to a meter-based CRS (or vice versa) without rescaling the "extra bytes" leads to wrong ranges / heights above ground, and (3) the best choices in scale and offset change when we go from meter to feet. A scale factor of 0.01 may be good when measuring the range or the height above ground for an airborne scan in meters but it is overly precise for feet. We will open a whole can of worms of "extra bytes" that do not have the correct unit or where we do not know the correct unit if we let the vertical unit of the CRS decide this. |
|
@rapidlasso You make a strong point regarding the scale/offset also being unit-dependent. I think that's also a strong argument in favor of fixing the units. @lgraham-geocue observed that the horiziontal and vertical units can be different in LAS files, which is something I've also observed to my chagrin. Since Range is a 3D measurement, it could get very, very weird if the vertical units are meters and horizontal units are feet. I think this is another argument in favor of fixing the units at meters. I can be persuaded that the height-from-ground will match the vertical units of the LAS file. Simple, and I think it's what people would expect when they receive data. So here's the plan: I'm going to publish the following "Standard" extrabytes as a GitHub wiki page (https://github.com/ASPRSorg/LAS/wiki/Standard-ExtraByte-Definitions):
All of these Standard ExtraBytes will be assigned an integer value (ID) that can be assigned to the first two bytes of the ExtraByte definition structure (currently Reserved). It's a little longer than Martin's list but I think it captures the ones I've seen in the wild. I didn't get any feedback on incorporating the ExtraByte definitions from the topobathy LDP, so I decided to include the ones that I've seen most often. Rather than include these definitions in the specification itself, I'll update the ExtraByte VLR description in the specification with a link to the wiki page and claim the two Reserved bytes for the ID field, which must be 0 unless it adheres to one of the definitions on the wiki page. All of these changes will be included with the R14 revision, which I plan to submit to ASPRS in the next week or two. Last chance to comment. @rapidlasso @lgraham-geocue @csevcik01 @hobu @anayegandhi @jdnimetz |
|
LAS in TLS is less common but it happens because e57 and similar formats aren't widely supported, nor is there a standard way to store the setup location (something I'd like to fix). If we did a SF of 0.02 then we could have a range of 0-5.10 meters, with of course a slight decrease in precision. IMHO any more than 5ish meters starts to lose usefulness, but maybe satellite LiDAR hits that range? 1 byte per point isn't a huge issue, although remember that storing Hz Precision also means storing Vt Precision, so it's actually 2 bytes vs 4 bytes per point. Again, not a huge issue because storage is relatively cheap, but is there really a need for it? Maybe there is. @gimahori might know. |
|
Indeed, storage is cheap and if the range is unused then the upper byte (or the upper bits of the upper byte) are mostly zero meaning they disappear when compressing the LAS file with LASzip (or any other redundancy removing compression scheme). |
|
LAS is no longer only used for laser data. We use LAS for multibeam echo sounder data. A typical system has 400 beams over a 150 degree arc. This gives a 0.375 degrees beam separation which is only part of the uncertainty and increases further away from nadir. Depending on range (water depth) that value can get big. 100m water depth results in a 9.5m beam width at the edge vs 0.65m at nadir. At 1000m they get 10x worse. |
|
I agree with @manfred-brands. The standardization document should recommend suitable data types and strongly discourage the use of floats or doubles, but allow the data producer to populate scale and offset that are suitable for their data. Just like the LAS standard does it for x/y/z coordinates. But it is really important that the standardization document contains concrete use examples so we don't end up with attributes that are stored as 64 bit integers or as picometer scales or with double-precision floating-point. In the LASlib API I include a covenience function that can read any of the additional attributes from any scaled and offset representation and present it as a double-precision floating-point for processing. |
|
From today's conference call: The purpose of standardization is fourfold:
In that light, your points make sense to me, and imo also make the answer about units obvious. The standard ExtraBytes can recommend a standard unit, offset, and scale, but allow for deviations when the underlying technology, site, or application require greater range and/or precision. If we don't do this, then we'll end up with multiple versions of the same "standard" ExtraByte for different levels of precision, and I believe that would be counterproductive to the stated goals. Thanks for providing some clarity on this issue. I believe that we can move forward with this information. |
|
I recommend we start (quickly) with one or two "standardized additional attributes" and see what we learn in the process of adding them as addendums (?) to the specifiction and implementing them in a few software packages. My number one pick would be "echo width" in tenth of a nanoseconds. My number two pick would be "height above ground" in centimeters or millimeters. |
|
I think echo width would be tough for anything other than a Riegl sensor but very useful.
The most used auxiliary data we employ (which we already encode as extra bytes) is the emission point to impact point unit vector (e.g. if you back-trace this unit vector from a point, it directs to the spot on the trajectory where the point was emitted). This is basically the same thing as the “point walking vector in the waveform spec). You need this (or some similar geometry metadata) to colorize a LIDAR point.
Lewis
|
|
@lgraham-geocue how do you currently encode this "emission point to impact point unit vector" into extra bytes? I assume you use three different additional attributes, one for each vector component? What data type, scale, and offset are you using? |
|
Just the very simple structure below where “ux” means the unit vector in the x direction, etc. It is in the LAS spatial reference system and LAS units for range.
float range; // extra bytes - distance to point
float ux; // extra bytes unit vec x
float uy; // extra bytes unit vec y
float uz; // extra bytes unit vec z
|
|
@lgraham-geocue that is exactly what I was afraid of. (-: You are hereby excused from designing the storage details for standardization of "additional attributes" via extra bytes ... (-; But seriously. For all near-nadir shots the ux and uy components will be close to zero and lead to very inefficient (aka over-precise) storage. We had this discussion before. It originally started when a fully flexible 2.0 version of the LAS specification was first proposed. It (fortunately) has died. This was about storing xyz in floating-point but the same argument holds for the three components of a unit vector. If we need to store unit vectors it may be worthwhile using a concise coding such as [Deering 1995]. The full discussion against floating-point is still available here and a screen shot of the opening argument is attached: |
|
I'll just take this opportunity to also mention that putting range before
the x,y,z should "trigger" any data storage designer. LAS has had some
extremely unfortunate "design" in the past leading to every implementation
having to read piece-by-piece and move around due to misaligning the
structures, and even more pain writing, so moving forward it would be good
to engage in practices that allow storage in a machine read/writable format
without needing to move things around - in this case if it were to remain
as 4 floats having range come after the normal (in the w component) would
make a lot more sense than range in x, x in y, etc etc. But separate to
that I agree with Martin that storing a unit vector in 3 floats is
excessive.
Regards
Dave Pevreal
…On Fri, Jan 24, 2020 at 10:05 AM Martin Isenburg ***@***.***> wrote:
@lgraham-geocue <https://github.com/lgraham-geocue> that is exactly what
I was afraid of. (-: You are hereby excused from designing the storage
details for standardization of "additional attributes" via standard bytes
... (-;
But seriously. For all near-nadir shots the ux and uy components will be
close to zero and lead to very inefficient (aka over-precise) storage. We
had this discussion before. It originally started when a fully flexible 2.0
version of the LAS specification was first proposed. It (fortunately) has
died. This was about storing xyz in floating-point but the same argument
holds for the three components of a unit vector. If we need to store unit
vectors it may be worthwhile using a concise coding such as [Deering 1995].
The full discussion against floating-point is still available here
<https://web.archive.org/web/20100612184118/https://lidarbb.cr.usgs.gov/index.php?showtopic=538>
and a screen shot of the opening argument is attached:
[image: USGS_CLICK_LiDARBB_LAS2 0_floating_point_boycott]
<https://user-images.githubusercontent.com/1107656/73034333-dfb37600-3e44-11ea-824c-0bd25076b6e6.png>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#37?email_source=notifications&email_token=AC3GI42LAD4AXZA4WTEFRXTQ7IWGJA5CNFSM4D36PXTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJZJT2A#issuecomment-577935848>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC3GI433OEKRCTETNA2A3D3Q7IWGJANCNFSM4D36PXTA>
.
|
|
Alas I, Martin, I must take you to school – you have been living in the sheltered world of ALS for too long! Let’s give a fair shake to MLS, TLS and (if I might coin the expression, Drone Laser Scanning) DLS. I am not forgetting BLS (as in bathymetric but I am not an expert in that area).
It is a rather long discussion that touches on several areas of core LAS design so I’ll try to take some time this weekend to put down the thoughts. One thing to give some thought to is the tension in LAS – should we always design to optimize the ease of implementation in LAZ (your concern here, no doubt) or optimize for exploitation of LAS?
In the meantime, dust off your copy of Hamming (still just about the best reference for “on the metal” coding) and reacquaint yourself with optimal representation of random numbers distributed somewhat uniformly from 0 to 1 (well, -1 to 1 but that’s just a side detail).
Later,
Lewis
|
|
Don't jump to conclusions too quickly about me having hidden LAZ intentions. Three fluffy floats will LAZ-compress with a higher compression rate than more compact unit vector representations. A recent paper on efficient storage of unit vectors (here with applications as shading normals) that also provides an accessible explanaition of why three floats are überfluffy is presented in this paper alongside a number of better alternatives. I think the "opt32" mapping looks promising: Lewis' emotional response suggests that surface normals are not a suitable starting candidate for the first standardized additional attribute. (-; Maybe the beam or beamlet ID needed for Velodyne, Ouster, SPL100 and upcoming scanners is a less contentious candidate? |
|
This all seems to have gotten very confusing and confused. Can someone summarize the basic proposal and goal? |
|
Picking a common extrabyte data set and setting the standard as to how that
data is stored in extrabytes.
Lewis and Martin are discussing the best format for storing the data for
one set of common extrabyte data set, but there are other common extrabyte
data sets that could benefit from a standardization until they can agree on
the best(for processing speed, not storage (my two cents)) for that one
particular set.
-George
…On Fri, Jan 24, 2020, 8:40 AM Andrew Bell ***@***.***> wrote:
This all seems to have gotten very confusing and confused. Can someone
summarize the basic proposal and goal?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#37?email_source=notifications&email_token=AL2MBMXHYWKD4RKZKNPFQ33Q7L4VHA5CNFSM4D36PXTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ27WJI#issuecomment-578157349>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AL2MBMQ5EIG3ZG35R3NGRADQ7L4VHANCNFSM4D36PXTA>
.
|
|
So a few notes on LAS:
First off, the executive summary of my points (pun intended)
* Consider all use cases when modifying, adding to LAS
* Do not add any more data to a LAS point record that requires reference outside of the point record to decode the point (e.g. scale, offset in a header). Keep points self-contained.
Stating the obvious but LAS is ubiquitous, supporting (there are probably other cases of which I am not aware):
ALS – traditional Airborne Laser Scanning
MLS – Mobile Laser Scanning
DLS – Drone Laser Scanning – I have found DLS to contain elements of both ALS and MLS but is more akin to MLS
TLS – Tripod Laser Scanning (where it seems e57 never really gained much traction)
BLS – Bathymetric Laser Scanning (where the first LAS “profile” was designed)
The first consideration (and my objection to Martin’s comments) is that when we design for LAS, we tend to think primarily of our own domain space. Consider Martin’s comment - “For all near-nadir shots the ux and uy components will be close to zero…” The application space we are dealing with is Mobile Laser Scanning (MLS) and Drone Laser Scanning (DLS). Here the “incident” angle unit vectors range from -1.0 to 1.0 for all three coordinates since we are often looking “up”, “sideways” and so forth. This is obvious for MLS. For DLS this happens when you are flying below the pit edge, below guard wires and so on.
Incident angle is incredibly valuable and is used for a variety of functions. One of the use cases for us is ray tracing to coincidently acquired high resolution images. This requires a resolution of each component of around 24 bits. The more common use is for visualization of the point cloud (for example, which points on a road sign to display based on eye point). This use case could probably work fine with 6 to 8 bits of resolution.
For numbers normalized on the span -1.0 to 1.0 that have relatively uniform distribution and required resolutions in the 16 to 32 bit range, “Float” is the ideal structure (not to mention it is now the ubiquitous data type in hardware renderers). So float/doubles are not necessarily bad words (Note that if you normalize a project to -1.0 --> 1.0, double can have some advantages as the representation and hence its inclusion in the proposed LAS 2.0 standard). I think most folks are aware of the logarithmic loss of resolution in these data types when you move out beyond the base range so we would not use them in denormalized work (in my data sample of extra byes from a DLS application, Range is a float but this is actually a hardware value from the sensor whose original data type is float).
However, my point really isn’t about the ideal data types for storage in LAS. It is really about how we often view components of LAS from the narrow perspective of the use cases with which we deal – e.g. Martin’s assumption of the ALS use case in our discussion. We all tend to do this so we just need to be aware.
Of course, when we deal with augmentation such as we do through Extra Bytes, we do not necessarily have to please everyone! But if a data tuple has high value in more than one domain, we should probably give a lot of thought to those other domains (as is the case for Incident unit vector).
A second consideration of LAS is the following:
LAS was initially intended as a transport format (I know because I was a member of the initial design effort of LAS – prior to it being handing to ASPRS). At that time, Terrasolid had worked with Earth Data to design the Earth Data Binary (EBN) format and tended to use it internally. Army Corps was using a precursor of LAS from Enerquest. Optech had yet another format (though Optech was not part of the original LAS consortium – they came in a bit later). So LAS was an effort to have a neutral exchange format. None of us considered LAS an exploitation format (that is, a format we would use internally in a software application) – if we had, we never would have released anything since everyone would have radically different ideas of what goes into an exploitation format (trees, tiling, ….).
However, many, many software implementations elected to use LAS directly in exploitation (in retrospect, we could have prevented this by using structures not amenable to random access!). So now I think the LAS committee has an obligation to at least consider the ramifications of direct exploitation. This has huge implications, the most serious being that a point needs to stand on its own, as much as possible. This is not the case in LAS today because one has to have information from the LAS header to scale and translate positional coordinates. In a typical project where you have many contributing LAS files, this causes a really nasty bit of bookkeeping. Consider merging N files from ALS with M files from MLS and a few files from DLS where each contractor scaled the points differently (of course, most are not aware of this since software algorithms tend to compute these values rather than allow input from a user). To handle this efficiently, you have to prescan and figure out essentially the lowest common denominator. Obviously we all do this routinely but it is still big issue. We have run into cases where denormailzing a project without resolution loss was not possible.
Now consider complexity of introducing another parameter in LAS with scanning and offsets definable to different values than the overall point scale/offset. This then begins to blow up the normalization problem.
My general point here is that we need to, as much as possible, have LAS normalization values at the point record level (and since this blows up the point record size, it means we’ll use denormal data). Arbitrary normalization parameters work just fine per file but are a huge issue in heterogeneous projects. I can see this really going wild with the use of Extra Byes. Obviously some stuff has to be at the file level such as Spatial Reference System but don’t make me go to a header to normalize/denormailze some other parameter.
A second consideration is compression and when we talk LAS, we mean LAZ. LAZ is great – we need to make certain that mods to LAS via new versions, extra bytes and so forth does not break LAZ.
However, we need to really think about the philosophy of transport versus exploitation. I (and I may be in the minority here) consider LAZ to be a transport/storage format because (to my knowledge, anyway) LAZ does not support random access. So why is this important? Well, LAS should always be optimized for exploitation since so much software uses it as such. This means LAS should be based on the data types most natural (e.g. fastest) for exploitation. If data conversion is needed for efficient compression, then this should be a function of the serialize/deserialize to/from LAZ. Of course, the big hassle with this scheme is maintaining LAZ as 100% lossless (e.g. quantizing an angle in float to a 16 bit representation, etc.).
Well, this note is definitely rambling on so a repeat of the original plea:
* Consider all use cases when modifying, adding to LAS
* Do not add any more data to a LAS point record that requires reference outside of the point record to use.
|
|
I would appreciate if @lgraham-geocue could stop suggesting my comments are driven by me only "knowing about ALS" or me only "caring about LAZ" or the like. This is getting old. In the seminal paper "Geometry Compression" from SIGGRAPH 95 Deering kick-started research on better representations of surface normals or unit vectors noting that "Traditionally 96-bit normals (three 32-bit IEEE floating-point numbers) are used in calculations to determine 8-bit color intensities. 96 bits of information theoretically could be used to represent 2 to the power of 96 different normals spread evenly over the surface of a unit sphere. This is a normal every 2 to the power of -46 radians in any direction. Such angles are so exact that spreading out angles evenly in every direction from earth you could point out any rock on Mars with sub-centimeter accuracy." The summary paper I cited earlier points out that "Consider a straight forward representation of points on the unit sphere. A structure comprising three 32-bit floating scalars (struct { float x, y, z; }) occupies 3 floats = 96 bits per unit vector. This representation spans the full 3D real space, R3, distributing precision approximately exponentially away from the origin until it jumps to infinity. Since almost all representable points in this representation are not on the unit sphere, almost all 96-bit patterns are useless for representing unit vectors. Thus, a huge number of patterns have been wasted for our purpose, and there is an opportunity to achieve the same set of representable vectors using fewer bits, or to increase effective precision at the same number of bits." So I am just one of pretty much any other geometry storage researcher in the world that would say that it's time to move past storing three floats for unit vectors or surface normals. For every "additional attribute" stored as extra bytes we specify these things in the VLR:
@lgraham-geocue, are you suggesting that reading 1 to 3 is ok but using 4 and 5 is too complex? |
|
I am not all that concerned with the format used to store attributes so long as it supports the use case, is easily understandable by implementors and is not excessively bloated.
On the Comment:
For "additional attributes" stored as extra bytes we specify these things in the VLR:
1. starting byte
2. data type
3. no data value
4. scale
5. offset
You are suggesting that reading 1 to 3 is ok but using 4 and 5 is too complex?
Yes. I think that 1-3 are likely identical for a particular attribute so no issue. However, suppose we have 20 files with 20 different scale, offset values for the same attribute (e.g. range from point to sensor). You cannot denormalize a point using only the point record – you have to read the VLR. But we do not have a really good, reliable method to go from a point to the containing file to select the appropriate VLR. This makes coders implement Extra Bytes to be able to decode the Extra Bytes. This is for issues such as keeping a cache of normalized points on a display list and only denormalizing on the way to the renderer (hence my comment about LAS being used as an exploitation format).
Perhaps this should be a change to consider – a standard way to point to a reduced set of records from the point data record. Sadly, the only thing that comes to mind is a GUID – way too expensive.
Anyway, food for thought.
|
|
I recently published a little blog post on how to map the information stored in these kind of ASCII lines of LiDAR information to the LAS format: The first number is either a classification into ground, vegetation, or other surface, or represents an identifier for a planar shape that the return is part of. The next three numbers are the x, y, and z coordinate of the LiDAR point in some local coordinate system. The next three numbers are the x, y, and z coordinates of an estimated surface normal. The next three numbers are the x, y, and z coordinates of the sensor position in the same coordinate system. The last number is the intensity of the LiDAR return. |
|
The cBLUE topo-bathy lidar TPU tool (https://github.com/noaa-rsd/cBLUE.github.io) is currently storing vertical uncertainty values in extra bytes as floats, rather than uchar, for increased precision. This differs from the LWG's DRAFT Standard ExtraByte Definitions but seems to be working for those groups using the tool. Input on this? @forkozi @esilvia ? |
|
Do the TPU values require error values that are in increments of femtometer close to zero but drop exponentially to increments of decimeters close to one million? Then a float32 representation is suitable. If the increments with which the error is to be expressed should be a constant centimeter or millimeter throughout the entire error value range then an unsigned integer scaled by 0.01 or 0.001 is the correct approach. |
|
The range of plausible vertical uncertainty values, considering the range of possible data sources, is probably meters to millimeters. Adding one order of magnitude in either direction gives tens of meters to tenths of millimeters. If we use a scale factor, where is the scale factor stored? Is it the same as for the X Y Z coordinates? |
|
The "scale factor" is core part of the "extra byte" definition. I recently published a little blog post on how to use txt2las (which is open source) to map the information stored in these kind of ASCII lines of LiDAR information to the LAS format and you see examples with different numbers of decimal digits being used there: 1, 290.243, 28.663, -11.787, 0.060, -0.052, 0.997, 517.3170, -58.6934, 313.0817, 52 |
|
The "beam ID" seems a rather easy first candidate for standardization. Clearly there is a need and clearly users already store this information to "extra bytes" like here as "Velodyne Rings". In this blog post I describe how to copy the beam ID from the "point source ID" field or from the "user data" field into a new "extra bytes" attribute with two calls to las2las, namely |

We've discussed listing some standardized extrabytes either in the specification itself or a supplementary document. This would encourage their adoption by the community and formalize common extrabytes as a guideline to future implementations.
We need to figure out the following:
data_typebe formalized?Below is a link to what I think is a decent start to the standardized extrabytes. Once we get some agreement on a few of these I can start building a wiki page or contribute to Martin's pull request. Which one we do depends on the answer to the 4th question.
Standard ExtraBytes v1.docx
The text was updated successfully, but these errors were encountered: