lat and lon only need to be 32 bits #65

hallahan · 2022-10-23T22:05:01Z

lat and lon in the schema are defined as 40 bytes each, represented as signed 64 bit integers.

Lines 83 to 85 in 5b3b629

    
           lat: i64 : 40; 
        
           /// Longitude (scaled with `COORD_SCALE`). 
        
           lon: i64 : 40;

Though OSM PBFs provide the ability of a defined precision to a coordinate, the reality is that OSM data is always 32 bits for lat and lon. The database for the website / Editing API is the source of truth, and here you can see they are defined as 32 bit integers:

https://github.com/openstreetmap/openstreetmap-website/blob/f407def8ba4bc55ea70807c33ff011472bdf8720/db/structure.sql#L444-L445

Many fields have the bit width of 40. Why is this?

The text was updated successfully, but these errors were encountered:

VeaaC · 2022-10-24T05:58:58Z

You are indeed correct: OSM uses at most [-180, 180) * 10,000,000 which should fit into a signed 32 bit integer. I will change this later today.

Most of the none-coordinate fields using 40 bits are indices and/or Ids. For Ids we need to use more than 32 bit since OSM exceeded that address space a while ago already. Indices needs to use more than 32 bit if they need to address more than 2^32 entities, which can be the case for OSM data.

VeaaC · 2022-10-24T14:42:34Z

@hallahan I incorporated this into a long standing change I had planned that makes osmflat more compact by making identifiers optional: #70

Feel free to check it out

hallahan · 2022-10-24T15:27:47Z

Great changes!

The unique feature I'm seeing with flatdata is the ability to specify the exact size to pack a given field in a struct. Is there a specific performance reason behind choosing 40 for many fields? Or, is there some sort of heuristic?

I'm guessing you just figured that 32bits is not enough, and 8 bits more gets you a large enough number? Does that play well with alignment though?

VeaaC · 2022-10-25T11:02:36Z

32 bits are not enough (OSM already exceeds it for several items), and 40 bits seems to be the next larger size that should last for a very long time (1 trillion items). Flatdata pads structs to the next byte boundary, so using less than 40 bits is most of the time not saving much (unless you have multiple sub-40 bit references in the same struct).

Using 40 bits instead of 32/64 does of course shift the alignment. Flatdata handles that well, though, but depending on the platform there is a (slight) performance hit for that (usually compensated by having less data than with 64 bit)

hallahan · 2022-10-26T16:36:19Z

Completed via #70

hallahan closed this as completed Oct 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lat and lon only need to be 32 bits #65

lat and lon only need to be 32 bits #65

hallahan commented Oct 23, 2022 •

edited

VeaaC commented Oct 24, 2022

VeaaC commented Oct 24, 2022

hallahan commented Oct 24, 2022

VeaaC commented Oct 25, 2022

hallahan commented Oct 26, 2022

lat and lon only need to be 32 bits #65

lat and lon only need to be 32 bits #65

Comments

hallahan commented Oct 23, 2022 • edited

VeaaC commented Oct 24, 2022

VeaaC commented Oct 24, 2022

hallahan commented Oct 24, 2022

VeaaC commented Oct 25, 2022

hallahan commented Oct 26, 2022

hallahan commented Oct 23, 2022 •

edited