Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lat and lon only need to be 32 bits #65

Closed
hallahan opened this issue Oct 23, 2022 · 5 comments
Closed

lat and lon only need to be 32 bits #65

hallahan opened this issue Oct 23, 2022 · 5 comments

Comments

@hallahan
Copy link
Contributor

hallahan commented Oct 23, 2022

lat and lon in the schema are defined as 40 bytes each, represented as signed 64 bit integers.

lat: i64 : 40;
/// Longitude (scaled with `COORD_SCALE`).
lon: i64 : 40;

Though OSM PBFs provide the ability of a defined precision to a coordinate, the reality is that OSM data is always 32 bits for lat and lon. The database for the website / Editing API is the source of truth, and here you can see they are defined as 32 bit integers:

https://github.com/openstreetmap/openstreetmap-website/blob/f407def8ba4bc55ea70807c33ff011472bdf8720/db/structure.sql#L444-L445

Many fields have the bit width of 40. Why is this?

@VeaaC
Copy link
Collaborator

VeaaC commented Oct 24, 2022

You are indeed correct: OSM uses at most [-180, 180) * 10,000,000 which should fit into a signed 32 bit integer. I will change this later today.

Most of the none-coordinate fields using 40 bits are indices and/or Ids. For Ids we need to use more than 32 bit since OSM exceeded that address space a while ago already. Indices needs to use more than 32 bit if they need to address more than 2^32 entities, which can be the case for OSM data.

@VeaaC
Copy link
Collaborator

VeaaC commented Oct 24, 2022

@hallahan I incorporated this into a long standing change I had planned that makes osmflat more compact by making identifiers optional: #70

Feel free to check it out

@hallahan
Copy link
Contributor Author

Great changes!

The unique feature I'm seeing with flatdata is the ability to specify the exact size to pack a given field in a struct. Is there a specific performance reason behind choosing 40 for many fields? Or, is there some sort of heuristic?

I'm guessing you just figured that 32bits is not enough, and 8 bits more gets you a large enough number? Does that play well with alignment though?

@VeaaC
Copy link
Collaborator

VeaaC commented Oct 25, 2022

32 bits are not enough (OSM already exceeds it for several items), and 40 bits seems to be the next larger size that should last for a very long time (1 trillion items). Flatdata pads structs to the next byte boundary, so using less than 40 bits is most of the time not saving much (unless you have multiple sub-40 bit references in the same struct).

Using 40 bits instead of 32/64 does of course shift the alignment. Flatdata handles that well, though, but depending on the platform there is a (slight) performance hit for that (usually compensated by having less data than with 64 bit)

@hallahan
Copy link
Contributor Author

Completed via #70

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants