-
Notifications
You must be signed in to change notification settings - Fork 64
Description
First of all, thank you for making ProtoBuf available in Julia! I am currently using this package to read scenarios form the waymo-open-dataset. While this has worked reliably, I am running into quite some performance issues due the "topology" of the messages.
Some relevant details:
Each of the scenarios consists of approximately 2MB of binary data. However, each scenario contains map features which in turn consist of some meta-data and a number of MapPoints. Each map point only encodes three floats for the x, y, and z position. Thus they are quite small. However, a single scenario can contain > 100.000 MapPoints (spread over the different features of the map).
As a result, loading a single scenario takes about 150ms on my machine. This may not sound all too bad but in my current usecase I need to iterate over about 400.000 of these scenarios; potentially multiple times.
After doing some profiling, I found that 60% of the time is spent pre-allocating memory for all the small dicts that are needed for each MapPoint object. Specifically, most of the time is spent in this line of julia base.
Just to provide some comparison: The same kind of data is loaded in about 11ms from a json file, and in about 3ms from the binary format created by Serialization.serialize. Of course, there are other issues with these kinds of formats. Therefore, I would really like to use ProtoBuf here. But I cannot get it up to speed.
Are there known ways to reduce the amount of overhead that arises in this setting with many small proto sub-messages? Or is this issue inherent to the current design of the library?