Make osmflat more compact (especially when compressed) #70

VeaaC · 2022-10-24T14:40:53Z

Store granularity explicitly instead of pre-multiplied numbers
Move Ids to separate optional sub-archive
Reduce bits needed for coordinates from 40 to 32
Remove unused header information

Comparison:

Comparing:

Compressed PBF (internal zlib compression)
Unpacked osmflat with and without optional Ids (only new version has them optional)
Compression with pzstd level 3
Compression with shuffly ( https://github.com/VeaaC/shuffly ) + pzstd level 3

Before:

Dataset	PBF (zlib)	osmflat w Ids	zstd + osmflat w Ids	shuffly + zstd + osmflat w Ids
Berlin	70M	227M	94M	58M
Europe	26G	89G	41G	23G
Planet	66G	223G	102G	57G

After:

Dataset	PBF (zlib)	osmflat w Ids	osmflat w/o Ids	zstd + osmflat w Ids	zstd + osmflat w/o Ids	shuffly + zstd + osmflat w Ids	shuffly + zstd + osmflat w/o Ids
Berlin	70M	214M	176M	83M	69M	53M	49M
Europe	26G	84G	68G	36G	30G	20G	19G
Planet	66G	208G	164G	97G	76G	48G	47G

Observations:

More compact in all scenarios.
Using shuffly is still worth it (Ids have a compression ratio of > factor 20), but a bit less so due to rearranged data, and granularity.
Shuffly compressed version is smaller than compressed PBF (with and without Ids)
Most people might want to use a version without ids since it saves disk space
Ids are almost for free if compressed with shuffly (due to data being sorted by id)

flatdata/osm.flatdata

osmflatc/src/main.rs

hallahan · 2022-10-25T15:22:00Z

Such a big difference! I wonder how much better it would be if you reduce integer sizes to 38 bytes instead of 40?

VeaaC · 2022-10-26T06:56:29Z

Not much / anything at all: Most structures would only be a few bit smaller, and flatdata rounds up to the next byte. It would only help if a structure had 4 references, each saving 2 bits.

flatdata/osm.flatdata

* Store granularity explicitly instead of pre-multiplied numbers * Move Ids to separate optional sub-archive * Reduce bits needed for coordinates from 40 to 32 * Remove unused header information

hallahan · 2022-10-26T16:25:42Z

How do you do your size benchmarking? Is there a script somewhere?

boxdot · 2022-10-26T18:10:27Z

How do you do your size benchmarking? Is there a script somewhere?

+1 for documenting the command lines which produced the above numbers

VeaaC mentioned this pull request Oct 24, 2022

lat and lon only need to be 32 bits #65

Closed

VeaaC force-pushed the compact branch from 07c0408 to 6d5c79c Compare October 24, 2022 15:09

hallahan reviewed Oct 24, 2022

View reviewed changes

flatdata/osm.flatdata Show resolved Hide resolved

osmflatc/src/main.rs Show resolved Hide resolved

VeaaC mentioned this pull request Oct 26, 2022

metrics about file size #59

Open

boxdot requested changes Oct 26, 2022

View reviewed changes

flatdata/osm.flatdata Show resolved Hide resolved

VeaaC added 6 commits October 26, 2022 16:31

Make schema more compact:

85ad952

* Store granularity explicitly instead of pre-multiplied numbers * Move Ids to separate optional sub-archive * Reduce bits needed for coordinates from 40 to 32 * Remove unused header information

Fix compilation

e4f0c41

Fixes after rebase

3c30e74

Silence clippy

c14b38d

Fix clippy

c9c418a

Bump versions

4124a08

VeaaC force-pushed the compact branch from 038768f to 4124a08 Compare October 26, 2022 14:33

boxdot self-requested a review October 26, 2022 14:35

boxdot approved these changes Oct 26, 2022

View reviewed changes

VeaaC merged commit 6c74740 into boxdot:master Oct 26, 2022

VeaaC deleted the compact branch October 26, 2022 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make osmflat more compact (especially when compressed) #70

Make osmflat more compact (especially when compressed) #70

VeaaC commented Oct 24, 2022

hallahan commented Oct 25, 2022

VeaaC commented Oct 26, 2022

hallahan commented Oct 26, 2022

boxdot commented Oct 26, 2022

Make osmflat more compact (especially when compressed) #70

Make osmflat more compact (especially when compressed) #70

Conversation

VeaaC commented Oct 24, 2022

hallahan commented Oct 25, 2022

VeaaC commented Oct 26, 2022

hallahan commented Oct 26, 2022

boxdot commented Oct 26, 2022