Skip to content

Experiment: Use fast serialization library for serializing graph

Andrew Byrd edited this page Dec 17, 2014 · 1 revision

This is based on issue #1506

I made tests on master with http://ruedigermoeller.github.io/fast-serialization/

According to current master (3. oct. 2014) Fast serialization doesn't improve build/read times and serialization size at least for New York.

All tests are made on Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz 64 bit with 8 GB of memory of which 50% is free (one FF and Netbeans needs 4 GB of RAM)

Graph build time master Slovenia

real    1m6.558s
user    1m29.557s
sys     0m4.947s
File size: 173006442 (165MB)
With OSM of Slovenia and GTFS of Maribor and without Elevation data
OSM Size 741854134 (708MB) .osm
GTFS Size 886354 (866K)

Graph build time master Maribor

real    0m13.609s
user    0m23.867s
sys     0m0.507s
File size: 19919430 (19MB)
With OSM Maribor and GTFS of Maribor and without Elevation data
OSM Size 62386091 (60MB) .osm
GTFS Size 886354 (866K)

Graph build time master fast serialization Slovenia

real    1m5.240s
user    1m31.330s
sys     0m3.573s
File size: 151043817 (145MB)
With OSM of Slovenia and GTFS of Maribor and without Elevation data
OSM Size 741854134 (708MB) .osm
GTFS Size 886354 (866K)

Graph build time master fast serialization Maribor

real    0m11.599s
user    0m22.997s
sys     0m0.330s
File size: 17311379 (17MB)
With OSM Maribor and GTFS of Maribor and without Elevation data
OSM Size 62386091 (60MB) .osm
GTFS Size 886354 (866K)

Graph build is negligible faster with fast serialization. But is 13% smaller in Slovenia and Maribor case. Fast serialization needs Java 1.7 but can be also made to work with 1.6 http://ruedigermoeller.github.io/fast-serialization/

This is made with drop in replacement Fast serialization, file size and speed can probably be improved because now reflection is used if there is no special FST serializers for classes and then Java serialization is used according to Fast serialization wiki

Microbenchmarking largest PlainStreets

I tested largest 20 PlainStreets from Slovenia in microbenchmark serialization-benchmark

Tested Custom serializers

I added custom serializers for Vertex, IntersectionVertex, Coordinate and PlainStreetEdge

But even with this serializers results aren't much improved:

lib                                     read (ns)     write (ns)     total (ns)   size (bytes)
FST Vertex S  Coor S                     31138816       27979583       59118399       13702695
FST Vertex S                             32457766       28876080       61333846       13702695
FSTSpeed                                 32287995       29663330       61951325       13578853
FST Coor S                               31924599       30170905       62095504       13578853
FST                                      32561549       29855780       62417329       13578853
Java built in                            37825031       48773887       86598918       14187397

We can see that just adding Fast_serializer which is couple of lines change improves writing+reading speed for 24ms on these plainStreets. Which is 27% improvement. Size wise improvements aren't so big just 4% improvement.

Adding custom serializer for Coordinates and/or Vertex improves read+writing speed a litle. (5.3%) from fast serializer Adding Vertex raises size a little. (1% worse) from normal fast_serializer. Adding PlainSerializer doesn't improve speed and size.

Adding coordinate and Vertex custom serializers could bring 32% improvement speed wise and (3.4%) improvement size wise.

Adding coordinate in full graph would be easy. Adding serializers for all different Vertexes would be hard. One reason is because each needs a Graph on creation. And in serializer class we don't have Graph object. So we would need to go over all vertexes after serialization and add Graph object to them.

Tested registering classes

I also tested registering classes. Where you just write which classes will be used and name is used only first time otherwise only number is used.

This lowers speed a little 2% and improves size (less then 1%). But this test case has only 20 Plainstreets and more classes there are larger are savings.

Interestingly largest PlainStreet Derived classes are AreaEdges. Largest PlainStreet in Slovenia is 16 264 bytes. (there are only two objects this size), but largest areaEdge has 74 877 bytes and there is at least 20 of them.

Testing FST in New York with new geometry optimizations

I tested FST serialization with current master (61d7f21d66d84853edc2c77) with New York pbf from geofabrik. (New geometry optimizations have reduced graph so much that I can build it on my PC).

Plain master: graph build time 9 seconds. and 262M (274557152)

Plain FST: graph build time 10 seconds. and 265M (277130075)

FST with custom serializers for Vertex and coordinate: graph build time 7 seconds and 271M (283903975)

FST with custom serializers for StreetEdge, Vertex and coordinate: graph build time 12 seconds and 420M (439935905)

Build time is time from Writing graph LOG output to Graph Written. Full build times are around 2m30s

According to these results FST serialization doesn't improve building speed or size with new geometry optimizations

I was planning to test protostuff but I don't think it would improve.

The documentation on this wiki is outdated and should not be used

unless you are intentionally working with legacy versions of OpenTripPlanner. Please consult the current documentation at readthedocs

Clone this wiki locally