Rendering time metric #18

rafatower · 2017-04-25T09:57:28Z

Add metric to measure the time spent on rendering vs postgres

rafatower · 2017-05-08T16:46:33Z

A bit of history

Current implementation

See mapnik::progress_timer and MAPNIK_STATS:

It spits results to an std::ostream, which is not the best option.

This is the typical usage:

#ifdef MAPNIK_STATS
    mapnik::progress_timer __stats__(std::clog, "postgis_datasource::init");
#endif

__stats__ (and in some places __stats2__) is just the name of the variable. When the destructor is called (after getting out of scope), the timer is stopped and the stats dumped to the stream.

rafatower · 2017-06-07T13:56:30Z

Requirements

Functional requirements

Original requirement from Santana: we want to know how much time mapnik spends querying DB versus rendering.
The same question, more fine-grained (thanks Raul): "How much time mapnik spends in DB vs rendering, given a map and a zoom level [and possibly a bounding box]"

Other requirements:

From Provide performance stats to applications using Mapnik mapnik/mapnik#780
- Layers: is the layer being reprojected? time spent?
- Features: # fetched, # filtered, # rendered, time spent fetching, rendering
- Total render time, total time
From Statistics processor to enable details on rendering time at runtime mapnik/mapnik#1956
- get details of how long it takes to query data and render it
- Design a new statistics_processor that will return data when called (to be easily aggregated) instead of printing to stderr

Non-functional requirements

The API should be generic enough to allow other mapnik contributors to add more metrics (e.g: to other input plugins, grid and vector tiles, etc.)
It should be possible to enable/disable the feature through configuration, ideally at runtime (rather than at compilation time)
It should be lightweight in terms of CPU.
It should be lightweight in terms of memory footprint, that is, total memory used shall be bounded.

rafatower · 2017-06-07T14:02:31Z

Proposed roadmap

Milestone 1: Coarse-grain DB vs rendering times (not per map yet)

At this point I'm pretty confident we can achieve this, useful both for benchmarking and getting overall metrics in production.

Mapnik: API & implementation for cumulative timing stats
Mapnik: Inject code to measure DB and rendering time
Node-mapnik: bindings for timer_stats class
Windshaft: wrapper around node-mapnik timer_stats (IMHO this is not really needed)
Windshaft-cartodb: expose metrics functionality through endpoints

Milestone 2: Finer-grain metrics (per map)

Mapnik: Research how the information flows from Windshaft-cartodb to mapnik. Modify function interfaces when needed to pass layergroup_id, map_id or equivalent concepts to store finer-grain stats (as done and then reverted in mapnik@6bbbe15 )
Decide on storage and memory utilization strategy: ideally the stats should be memory-bounded by design. Storing finer-grained stats implies that more memory is to be used, but we cannot let mapnik run out of memory because of stats. There are several possible solutions to this:
- flush them out often enough
- use a LRU-like storage strategy, or a circular buffer
- keep just the top-N
- use sampling

Milestone 3: Extended API and configurable storage strategy

Some ideas for a future milestone:

For the first iteration at least I'm focusing the efforts on timing stats. Other possible stats may be interesting, namely counters.
The aggregation, if any, would be delegated on the client at least until this iteration.
Ideally, and to cover different use cases, the storage strategy could be chosen through configuration at server start and run times.

cc/ @jorgesancha

rafatower · 2017-06-08T17:08:56Z

Why this is important

This is needed to better know our platform and its usage. That will allow us to:

set up sensible limits
debug performance issues experienced by particular users
do some capacity planning
optimize what is needed to be optimized

rafatower · 2017-06-08T17:30:26Z

I'm working right now on different branches with the hope of getting the first iteration complete soon:

mapnik: https://github.com/CartoDB/mapnik/compare/v3.0.x...CartoDB:18-performance-stats-take2-v3.0.x?expand=1
node-mapnik: CartoDB/node-mapnik@v3.5.14...CartoDB:timer-stats-flush-v3.5.14
Windshaft (pretty dumb, most likely not needed): https://github.com/CartoDB/Windshaft/compare/mapnik-timer-stats?expand=1
Windshaft-cartodb: https://github.com/CartoDB/Windshaft-cartodb/compare/mapnik-timer-stats?expand=1

This is yet WIP and probably lacks a lock to avoid concurrency issues... but the hardest parts to integrate are pretty much under control.

I'll probably add some notes about the internals of mapnik and node-mapnik somewhere, which can be helpful to newcomers.

Edit: added Windshaft-cartodb branch.

rafatower · 2017-06-08T17:34:11Z

A small test I use to check it actually gets data from node-mapnik (to be added to the corresponding branch):

// file my-test.js
var mapnik = require('.');
var fs = require('fs');

// register fonts and datasource plugins
mapnik.register_default_fonts();
mapnik.register_default_input_plugins();

var map = new mapnik.Map(256, 256);
var out = '';
map.load('./test/stylesheet.xml', function(err,map) {
    if (err) throw err;
    map.zoomAll();
    var im = new mapnik.Image(256, 256);
    map.render(im, function(err,im) {
        if (err) throw err;
        console.log(mapnik.TimerStats.flush());
    });
    
});

and the output:

$ time node my-test.js
total_map_rendering	cpu_time = 5.582 ms	wall_time = 5.58496 ms


real	0m0.184s
user	0m0.168s
sys	0m0.020s

rafatower · 2017-06-13T11:52:27Z

A bit ugly but it starts to work in my development environment.

This is just an example after tinkering a little with one simple map (based on world borders):

$ curl localhost:8181/stats
{"stats":"total_map_rendering\tcpu_time = 1220.97 ms\twall_time = 827.765 ms\npostgis_datasource::features_with_context::get_resultset\tcpu_time = 367.364 ms\twall_time = 402.992 ms\n","ok":true}

The same output beautified:

total_map_rendering                                         cpu_time = 1220.97 ms   wall_time = 827.765 ms
postgis_datasource::features_with_context::get_resultset    cpu_time = 367.364 ms   wall_time = 402.992 ms

Some things I'd need to do:

check why wall_time is bigger than cpu_time when rendering. It may turn out that multiple threads and cores lead to those results when using a pool of threads. I believe it can be checked by instructing node to use just one thread.
add a lock to grant exclusive access to the unordered map used as internal storage of stats, in order to avoid race conditions.
modify the mapnik/node-mapnik interface and wrappers to return a json object instead of a plain string, depending on what we decide with regards to interface and granularity.

rafatower · 2017-06-13T11:59:27Z

Using UV_THREADPOOL_SIZE=1 I indeed got different results, with cpu < wall times (mind that this is not a formal benchmark):

total_map_rendering                                         cpu_time = 473.751 ms   wall_time = 533.123 ms
postgis_datasource::features_with_context::get_resultset    cpu_time = 136.841 ms   wall_time = 313.367 ms

rafatower · 2017-06-14T10:44:42Z

Something worth noting: if we wanted to deploy this changes before they were accepted upstream (or backported to mapnik v3.0.x series for what it matters), we'd need to set up our own building and packaging infrastructure as we're currently using vanilla mapnik and node-mapnik.

rafatower · 2017-06-16T14:45:11Z

Thanks @dgaubert I could run a few tests and validate that the metrics are consistent. I disabled cache and metatiling to get them.

# reset the logs
$ truncate -s0 /tmp/postgres.log
$ curl -s localhost:8181/stats | jsonlint -p
{
  "stats": {},
  "ok": true
}

# request a tile
$ curl -v -s -o /dev/null "http://cdb.localhost.lan:8181/api/v1/map/cdb@99ca6525@ee41f1f57f9852eea1910c4f84f4e5ac:1486641153877/1/11/524/761.png" 2>&1 | grep X-Tiler-Profiler
< X-Tiler-Profiler: {"setDBAuth":7,"res":159,"getTileOrGrid":159,"getRenderer":30,"render-png":129,"render":67,"encode":62,"total":166}

# get the stats
$ curl -s localhost:8181/stats | jsonlint -p
{
  "stats": {
    "total_map_rendering": {
      "cpu_time": 44.790000000000006,
      "wall_time": 66.33305549621582
    },
    "postgis_datasource::features_with_context::get_resultset": {
      "cpu_time": 0.522,
      "wall_time": 19.935131072998047
    }
  },
  "ok": true
}

# get the postgres query times from postgres itself, quick'n'dirty way
$ cat /tmp/postgres.log | grep duration: | grep -v parse | sed 's/.*duration: \([0-9.]*\) ms.*/\1/g' | awk '{SUM+=$1}END{print SUM}'
19.647

(Note: I'm using a debug build, so no optimizations and a lot of traces. The important point here is understanding the process and check that the measurements make sense)

rafatower · 2017-06-16T16:19:14Z

I added a new metric there, save_to_string, that corresponds to tilelive-mapnik "encode" phase.

Taking into account that postgis queries are part of the total_rendering_time, we have all the information we wanted and the metrics almost fit to the millisecond.

Here another example:

$ curl -v -s -o /dev/null "http://cdb.localhost.lan:8181/api/v1/map/cdb@99ca6525@ee41f1f57f9852eea1910c4f84f4e5ac:1486641153877/1/11/524/761.png" 2>&1 | grep X-Tiler-Profiler
< X-Tiler-Profiler: {"req2params_setup":1,"setDBAuth":3,"res":38,"getTileOrGrid":38,"render-png":38,"render":26,"encode":11,"total":42}

$ curl -s localhost:8181/stats | jsonlint -p
{
  "stats": {
    "total_map_rendering": {
      "cpu_time": 9.753,
      "wall_time": 25.885820388793945
    },
    "save_to_string": {
      "cpu_time": 11.009,
      "wall_time": 11.010885238647461
    },
    "postgis_datasource::features_with_context::get_resultset": {
      "cpu_time": 1.195,
      "wall_time": 17.41814613342285
    }
  },
  "ok": true
}

I'll stop there until further evaluation.

rafatower · 2017-06-19T10:08:16Z

cc/ @javisantana

rafatower · 2017-06-20T08:22:03Z

Under review by upstream project maintainers: mapnik#3705

rafatower · 2017-06-23T13:26:27Z

The review above is stalled. I'll try a different approach: collect and pass the stats info back to the caller, when the caller wants to do so.

It seems like node-mapnik passes a "closure" to the mapnik renderer, which turns out to be a pretty open structure of attributes. I think that the stats collection could be triggered by the presence of an attribute, say named stats.

rafatower · 2017-07-07T16:48:59Z

Systemtap

I just wanted to add a quick note here about a possible different approach: use systemtap.

Basically with it we can instrument both kernel and user-space code and get metrics from there. I have very little experience with it, but I could easily make it run in my local setup:

I got the identifiers by getting symbols from the binaries (not stripped):

$ readelf -Ws libmapnik.so | grep feature_style_processor.*apply 
   467: 00000000001c7c00   306 FUNC    WEAK   DEFAULT   11 _ZN6mapnik23feature_style_processorINS_12agg_rendererINS_5imageINS_7rgba8_tEEENS_25label_collision_detector4EEEE5applyERKNS_5layerERSt3setISsSt4lessISsESaISsEEd
  2207: 00000000001c4a80  1087 FUNC    WEAK   DEFAULT   11 _ZN6mapnik23feature_style_processorINS_12agg_rendererINS_5imageINS_7rgba8_tEEENS_25label_collision_detector4EEEE5applyEd
  3128: 00000000001c40d0   411 FUNC    WEAK   DEFAULT   11 _ZN6mapnik23feature_style_processorINS_13grid_rendererINS_8hit_gridINS_9gray64s_tEEEEEE14apply_to_layerERKNS_5layerERS5_RKNS_10projectionEddjjRKNS_5box2dIdEEiRSt3setISsSt4lessISsESaISsEE
...

the script:

# file /tmp/test.stp
probe process("/home/rtorre_unencrypted/src/Windshaft-cartodb/node_modules/mapnik/lib/binding/node-v48-linux-x64/mapnik/input/postgis.input").function("_ZNK18postgis_datasource21features_with_contextERKN6mapnik5queryESt10shared_ptrINS0_17IProcessorContextEE")
{
}

probe process("/home/rtorre_unencrypted/src/Windshaft-cartodb/node_modules/mapnik/lib/binding/node-v48-linux-x64/mapnik/input/postgis.input").function("_ZNK18postgis_datasource17features_at_pointERKN6mapnik5coordIdLi2EEEd")
{
}


probe process("/home/rtorre_unencrypted/src/Windshaft-cartodb/node_modules/mapnik/lib/binding/node-v48-linux-x64/libmapnik.so").function("_ZN6mapnik23feature_style_processor*apply*")
{
}

(I think it supports c++ symbol demangling so the script could be simplified)

then running it:

$ sudo stap -t /tmp/test.stp 
^C----- probe hit report: 
process("/home/rtorre_unencrypted/src/Windshaft-cartodb/node_modules/mapnik/lib/binding/node-v48-linux-x64/mapnik/input/postgis.input").function("_ZNK18postgis_datasource21features_with_contextERKN6mapnik5queryESt10shared_ptrINS0_17IProcessorContextEE"), (/tmp/test.stp:1:1), hits: 54, cycles: 520min/1552avg/4405max, from: process("/home/rtorre_unencrypted/src/Windshaft-cartodb/node_modules/mapnik/lib/binding/node-v48-linux-x64/mapnik/input/postgis.input").function("_ZNK18postgis_datasource21features_with_contextERKN6mapnik5queryESt10shared_ptrINS0_17IProcessorContextEE") from: process("/home/rtorre_unencrypted/src/Windshaft-cartodb/node_modules/mapnik/lib/binding/node-v48-linux-x64/mapnik/input/postgis.input").function("_ZNK18postgis_datasource21features_with_contextERKN6mapnik5queryESt10shared_ptrINS0_17IProcessorContextEE"), index: 0
process("/home/rtorre_unencrypted/src/Windshaft-cartodb/node_modules/mapnik/lib/binding/node-v48-linux-x64/libmapnik.so").function("_ZN6mapnik23feature_style_processorINS_13grid_rendererINS_8hit_gridINS_9gray64s_tEEEEEE5applyERKNS_5layerERSt3setISsSt4lessISsESaISsEEd"), (/tmp/test.stp:10:1), hits: 17, cycles: 1924min/5325avg/9615max, from: process("/home/rtorre_unencrypted/src/Windshaft-cartodb/node_modules/mapnik/lib/binding/node-v48-linux-x64/libmapnik.so").function("_ZN6mapnik23feature_style_processorINS_13grid_rendererINS_8hit_gridINS_9gray64s_tEEEEEE5applyERKNS_5layerERSt3setISsSt4lessISsESaISsEEd") from: process("/home/rtorre_unencrypted/src/Windshaft-cartodb/node_modules/mapnik/lib/binding/node-v48-linux-x64/libmapnik.so").function("_ZN6mapnik23feature_style_processor*apply*"), index: 9
process("/home/rtorre_unencrypted/src/Windshaft-cartodb/node_modules/mapnik/lib/binding/node-v48-linux-x64/libmapnik.so").function("_ZN6mapnik23feature_style_processorINS_13grid_rendererINS_8hit_gridINS_9gray64s_tEEEEEE14apply_to_layerERKNS_5layerERS5_RKNS_10projectionEddjjRKNS_5box2dIdEEiRSt3setISsSt4lessISsESaISsEE"), (/tmp/test.stp:10:1), hits: 17, cycles: 438min/1295avg/4971max, from: process("/home/rtorre_unencrypted/src/Windshaft-cartodb/node_modules/mapnik/lib/binding/node-v48-linux-x64/libmapnik.so").function("_ZN6mapnik23feature_style_processorINS_13grid_rendererINS_8hit_gridINS_9gray64s_tEEEEEE14apply_to_layerERKNS_5layerERS5_RKNS_10projectionEddjjRKNS_5box2dIdEEiRSt3setISsSt4lessISsESaISsEE") from: process("/home/rtorre_unencrypted/src/Windshaft-cartodb/node_modules/mapnik/lib/binding/node-v48-linux-x64/libmapnik.so").function("_ZN6mapnik23feature_style_processor*apply*"), index: 10
process("/home/rtorre_unencrypted/src/Windshaft-cartodb/node_modules/mapnik/lib/binding/node-v48-linux-x64/libmapnik.so").function("_ZN6mapnik23feature_style_processorINS_12agg_rendererINS_5imageINS_7rgba8_tEEENS_25label_collision_detector4EEEE5applyEd"), (/tmp/test.stp:10:1), hits: 37, cycles: 2189min/4295avg/10093max, from: process("/home/rtorre_unencrypted/src/Windshaft-cartodb/node_modules/mapnik/lib/binding/node-v48-linux-x64/libmapnik.so").function("_ZN6mapnik23feature_style_processorINS_12agg_rendererINS_5imageINS_7rgba8_tEEENS_25label_collision_detector4EEEE5applyEd") from: process("/home/rtorre_unencrypted/src/Windshaft-cartodb/node_modules/mapnik/lib/binding/node-v48-linux-x64/libmapnik.so").function("_ZN6mapnik23feature_style_processor*apply*"), index: 11
----- refresh report:

As you can see in the examples it is a very powerful tool and would not require changes in the source code, as long as the binaries are not stripped (contain the function symbols), which is the case with our current mapnik version.

For this to work a kernel with the matching headers and debugging symbols is required AFAIK.

@pllopis if you have any experience with this tool, I'd appreciate a "crash course" :)

rafatower · 2017-08-09T14:02:07Z

Here's another example that I wrote while checking metrics for the MVT blog:

# file mapnik.stp
# based on https://sourceware.org/systemtap/examples/profiling/fntimes.stp

function time() { return gettimeofday_us() }

function print_time(t)
{
    printf("function %s took %d us\n", ppfunc(), t)
}

probe process("node_modules/mapnik/lib/binding/node-v48-linux-x64/mapnik/input/postgis.input").function("*get_resultset*").return
{
    print_time(time() - @entry(time()));
}

probe process("node_modules/mapnik/lib/binding/node-v48-linux-x64/mapnik.node").function("*update_tile*").return
{
    print_time(time() - @entry(time()));
}

and here's the output:

$ sudo stap mapnik.stp
function _ZNK18postgis_datasource13get_resultsetERSt10shared_ptrI10ConnectionERKSsRKS0_IN6mapnik4PoolIS1_17ConnectionCreatorEEES0_INS6_17IProcessorContextEE took 73232 us
function _ZN6mapnik16vector_tile_impl9processor11update_tileERNS0_4tileEdii took 152431 us

rafatower · 2017-12-20T14:56:34Z

Superseeded by mapnik#3767

rafatower added the enhancement label Apr 25, 2017

rafatower self-assigned this Apr 25, 2017

rafatower pushed a commit that referenced this issue May 25, 2017

First version of reworked progress_timer class #18

94c5134

rafatower pushed a commit that referenced this issue May 25, 2017

Encapsulate into a class timer_stats_ #18

d84ad7a

rafatower pushed a commit that referenced this issue May 25, 2017

Function to get metrics #18

d59662d

rafatower pushed a commit that referenced this issue May 25, 2017

Optimization relying on default constructor #18

880d108

rafatower pushed a commit that referenced this issue May 25, 2017

Use const& for keys #18

66d05a7

rafatower pushed a commit that referenced this issue May 25, 2017

Function to reset a counter #18

37b0e80

rafatower pushed a commit that referenced this issue May 25, 2017

Function to reset all counters #18

7a217b3

rafatower pushed a commit that referenced this issue May 25, 2017

Add iterators to ease traversal #18

8df9475

rafatower removed their assignment Jun 16, 2017

rafatower self-assigned this Jun 20, 2017

rochoa assigned Algunenano and unassigned rafatower Aug 21, 2017

rafatower closed this as completed Dec 20, 2017

rafatower unassigned Algunenano Dec 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rendering time metric #18

Rendering time metric #18

rafatower commented Apr 25, 2017

rafatower commented May 8, 2017

rafatower commented Jun 7, 2017

rafatower commented Jun 7, 2017 •

edited

rafatower commented Jun 8, 2017 •

edited

rafatower commented Jun 8, 2017 •

edited

rafatower commented Jun 8, 2017

rafatower commented Jun 13, 2017 •

edited

rafatower commented Jun 13, 2017

rafatower commented Jun 14, 2017

rafatower commented Jun 16, 2017

rafatower commented Jun 16, 2017

rafatower commented Jun 19, 2017

rafatower commented Jun 20, 2017

rafatower commented Jun 23, 2017

rafatower commented Jul 7, 2017 •

edited

rafatower commented Aug 9, 2017

rafatower commented Dec 20, 2017

Rendering time metric #18

Rendering time metric #18

Comments

rafatower commented Apr 25, 2017

rafatower commented May 8, 2017

A bit of history

Current implementation

rafatower commented Jun 7, 2017

Requirements

Functional requirements

Non-functional requirements

rafatower commented Jun 7, 2017 • edited

Proposed roadmap

Milestone 1: Coarse-grain DB vs rendering times (not per map yet)

Milestone 2: Finer-grain metrics (per map)

Milestone 3: Extended API and configurable storage strategy

rafatower commented Jun 8, 2017 • edited

Why this is important

rafatower commented Jun 8, 2017 • edited

rafatower commented Jun 8, 2017

rafatower commented Jun 13, 2017 • edited

rafatower commented Jun 13, 2017

rafatower commented Jun 14, 2017

rafatower commented Jun 16, 2017

rafatower commented Jun 16, 2017

rafatower commented Jun 19, 2017

rafatower commented Jun 20, 2017

rafatower commented Jun 23, 2017

rafatower commented Jul 7, 2017 • edited

Systemtap

rafatower commented Aug 9, 2017

rafatower commented Dec 20, 2017

rafatower commented Jun 7, 2017 •

edited

rafatower commented Jun 8, 2017 •

edited

rafatower commented Jun 8, 2017 •

edited

rafatower commented Jun 13, 2017 •

edited

rafatower commented Jul 7, 2017 •

edited