Trial tracemalloc as memory tracker (replacement v2 PR) #5946

pp-mo · 2024-05-09T11:37:33Z

WIP DO NOT MERGE THIS, EVER

Some modified code for memory benchmarks, evaluating "tracemalloc" against our existing Linux process-RSS measuring technique.

Additional context : sample output from a desktop run

$ asv run -b Memcheck* --quick -e -E existing:/tmp/persistent/newconda-envs/iris3/bin/python 
Couldn't load asv.plugins._mamba_helpers because
No module named 'libmambapy'
· Ignoring ASV setting(s): `python`. Benchmark environment management is delegated to third party script(s).
· Discovering benchmarks
· Running 4 total benchmarks (1 commits * 1 environments * 4 benchmarks)
[ 0.00%] ·· Benchmarking existing-py_tmp_persistent_newconda-envs_iris3_bin_python
[ 0.00%] ··· Importing benchmark suite produced output:
[ 0.00%] ···· /net/home/h05/itpp/git/iris/iris_main/benchmarks/benchmarks/generate_data/__init__.py:52: UserWarning: No BENCHMARK_DATA env var, defaulting to /net/home/h05/itpp/git/iris/iris_main/benchmarks/.data. Note that some benchmark files are GB in size.
              warn(message)

[12.50%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc                                                                                                                                 ok
[12.50%] ··· ========= ========== ========= =========
             --                   nworkers           
             --------- ------------------------------
              nblocks      1          4         9    
             ========= ========== ========= =========
                 1      568±0ms    543±0ms   514±0ms 
                 4      144±0ms    181±0ms   151±0ms 
                 9      55.9±0ms   116±0ms   109±0ms 
             ========= ========== ========= =========

[25.00%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc                                                                                                                         ok
[25.00%] ··· ========= ======= ======= =======
             --                nworkers       
             --------- -----------------------
              nblocks     1       4       9   
             ========= ======= ======= =======
                 1      305.3   305.3   305.3 
                 4       19.2    76.4    76.4 
                 9       4.0     15.2    33.9 
             ========= ======= ======= =======

[37.50%] ··· memtrace_evaluation.MemcheckRunstyles.time_calc                                                                                                                                        ok
[37.50%] ··· ============= ================= ================= =================== ===================
             --                                          runtype / ysize                              
             ------------- ---------------------------------------------------------------------------
                measure     threads / 10000   threads / 40000   processes / 10000   processes / 40000 
             ============= ================= ================= =================== ===================
              tracemalloc       91.2±0ms          228±0ms            40.2±0ms            82.4±0ms     
                  rss           87.0±0ms          222±0ms            45.5±0ms            81.5±0ms     
             ============= ================= ================= =================== ===================

[50.00%] ··· memtrace_evaluation.MemcheckRunstyles.track_addmem_calc                                                                                                                                ok
[50.00%] ··· ============= ================= ================= =================== ===================
             --                                          runtype / ysize                              
             ------------- ---------------------------------------------------------------------------
                measure     threads / 10000   threads / 40000   processes / 10000   processes / 40000 
             ============= ================= ================= =================== ===================
              tracemalloc         17.0              67.9               0.2                 0.2        
                  rss             9.1               34.4               0.9                 0.9        
             ============= ================= ================= =================== ===================

pp-mo · 2024-05-09T11:37:57Z

What I think we can deduce from the benchmark results

neither "rss" or "tracemalloc" tracks memory in subprocesses
- compare MemcheckRunstyles.track_addmem_calc , any "threads" with "processes"
"rss" method is not picking up on memory used in multiple threads, but "tracemalloc" is
- compare MemcheckRunstyles.track_addmem_calc , "threads" results ..
- .. these are ~2 times as big for "tracemalloc" as for "rss" (this for nblocks=6, nworkers=4)

codecov · 2024-05-09T16:53:14Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.78%. Comparing base (a56f939) to head (84ff4ad).
Report is 9 commits behind head on main.

❗ Current head 84ff4ad differs from pull request most recent head 8f7166e

Please upload reports for the commit 8f7166e to get more accurate results.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #5946   +/-   ##
=======================================
  Coverage   89.78%   89.78%           
=======================================
  Files          93       93           
  Lines       23007    23007           
  Branches     5017     5017           
=======================================
  Hits        20657    20657           
  Misses       1620     1620           
  Partials      730      730

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pp-mo · 2024-05-09T17:12:30Z

Updated ideas

@trexfeathers pointed out that the above were run on a machine with only 4 CPUs (hyperthread), hence nworkers=4/9 can be expected to be questionable.
Also a bit difficult to interpret -- questioned whether the nworkers/nblocks axes were correct !

Hence changes in d039ad0

n_workers defaults to 3
n_workers = [1, 2, 3, 4] in BlocksAndWorkers testing
MemcheckBlocksAndWorkers_Rss repeats the n/n matrix with the Rss measure

New results

typical run ...

$ asv run -b Memcheck --quick -e -E existing:/tmp/persistent/newconda-envs/iris3/bin/python
---
    Couldn't load asv.plugins._mamba_helpers because
    No module named 'libmambapy'
    · Ignoring ASV setting(s): `python`. Benchmark environment management is delegated to third party script(s).
    · Discovering benchmarks
    · Running 6 total benchmarks (1 commits * 1 environments * 6 benchmarks)
    [ 0.00%] ·· Benchmarking existing-py_tmp_persistent_newconda-envs_iris3_bin_python
    [ 0.00%] ··· Importing benchmark suite produced output:
    [ 0.00%] ···· /net/home/h05/itpp/git/iris/iris_main/benchmarks/benchmarks/generate_data/__init__.py:52: UserWarning: No BENCHMARK_DATA env var, defaulting to /net/home/h05/itpp/git/iris/iris_main/benchmarks/.data. Note that some benchmark files are GB in size.
                  warn(message)

    [ 8.33%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc                                                                                                                                 ok
    [ 8.33%] ··· ========= ========== ========== ========= =========
                 --                         nworkers
                 --------- -----------------------------------------
                  nblocks      1          2          3         4
                 ========= ========== ========== ========= =========
                     1      546±0ms    508±0ms    518±0ms   517±0ms
                     4      120±0ms    156±0ms    151±0ms   139±0ms
                     9      54.6±0ms   71.2±0ms   141±0ms   122±0ms
                 ========= ========== ========== ========= =========

    [16.67%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc                                                                                                                         ok
    [16.67%] ··· ========= ======= ======= ======= =======
                 --                    nworkers
                 --------- -------------------------------
                  nblocks     1       2       3       4
                 ========= ======= ======= ======= =======
                     1      305.3   305.3   305.3   305.3
                     4       19.2    38.3    57.3    76.4
                     9       4.0     7.7     11.5    15.2
                 ========= ======= ======= ======= =======

    [25.00%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc                                                                                                                             ok
    [25.00%] ··· ========= ========== ========== ========== ==========
                 --                          nworkers
                 --------- -------------------------------------------
                  nblocks      1          2          3          4
                 ========= ========== ========== ========== ==========
                     1      501±0ms    518±0ms    570±0ms    597±0ms
                     4      121±0ms    150±0ms    132±0ms    161±0ms
                     9      52.4±0ms   67.5±0ms   64.3±0ms   64.0±0ms
                 ========= ========== ========== ========== ==========

    [33.33%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc                                                                                                                     ok
    [33.33%] ··· ========= ======= ======= ======= =======
                 --                    nworkers
                 --------- -------------------------------
                  nblocks     1       2       3       4
                 ========= ======= ======= ======= =======
                     1      305.8   305.8   305.8   305.8
                     4       19.8    38.9    20.0    20.8
                     9       4.6     8.0     11.6    15.3
                 ========= ======= ======= ======= =======

    [41.67%] ··· memtrace_evaluation.MemcheckRunstyles.time_calc                                                                                                                                        ok
    [41.67%] ··· ============= ================= ================= =================== ===================
                 --                                          runtype / ysize
                 ------------- ---------------------------------------------------------------------------
                    measure     threads / 10000   threads / 40000   processes / 10000   processes / 40000
                 ============= ================= ================= =================== ===================
                  tracemalloc       97.5±0ms          208±0ms            39.9±0ms            94.1±0ms
                      rss           49.5±0ms          203±0ms            51.5±0ms            91.6±0ms
                 ============= ================= ================= =================== ===================

    [50.00%] ··· memtrace_evaluation.MemcheckRunstyles.track_addmem_calc                                                                                                                                ok
    [50.00%] ··· ============= ================= ================= =================== ===================
                 --                                          runtype / ysize
                 ------------- ---------------------------------------------------------------------------
                    measure     threads / 10000   threads / 40000   processes / 10000   processes / 40000
                 ============= ================= ================= =================== ===================
                  tracemalloc         12.8              51.0               0.2                 0.2
                      rss             13.2              51.1               0.8                 0.8
                 ============= ================= ================= =================== ===================

Some interpretation

RunStyle checks are now more understandable.

MemcheckRunstyles.time_calc (run **WITHOUT** --quick)
      --                                          runtype / ysize
      ------------- ---------------------------------------------------------------------------
         measure     threads / 10000   threads / 40000   processes / 10000   processes / 40000
      ============= ================= ================= =================== ===================
       tracemalloc       58.2±8ms          179±5ms            44.3±1ms            86.4±3ms
           rss           71.1±6ms          171±8ms            43.7±3ms            91.2±3ms
      ============= ================= ================= =================== ===================

BUT N.B. results are still not very stable from time to time

      ============= ================= ================= =================== ===================
       tracemalloc       62.9±5ms          172±10ms           41.8±4ms            89.1±6ms
           rss           62.3±7ms          173±9ms            49.1±4ms           91.3±0.9ms
      ============= ================= ================= =================== ===================
       tracemalloc       58.2±3ms          152±20ms           44.9±5ms            91.8±5ms
           rss          58.7±10ms          170±7ms            46.1±2ms            91.6±7ms
      ============= ================= ================= =================== ===================

((but generally, ARE about the same for both methods, as we would expect))
conclude :

~same for methods
processes slightly quicker than threads generally (??odd?)

MemcheckBlocksAndWorkers

General picture

memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc
     --                         nworkers
     --------- -----------------------------------------
      nblocks      1          2          3         4
     ========= ========== ========== ========= =========
         1      546±0ms    508±0ms    518±0ms   517±0ms
         4      120±0ms    156±0ms    151±0ms   139±0ms
         9      54.6±0ms   71.2±0ms   141±0ms   122±0ms
     ========= ========== ========== ========= =========

memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc
     --                    nworkers
     --------- -------------------------------
      nblocks     1       2       3       4
     ========= ======= ======= ======= =======
         1      305.3   305.3   305.3   305.3
         4       19.2    38.3    57.3    76.4
         9       4.0     7.7     11.5    15.2
     ========= ======= ======= ======= =======

MEMORY now makes more sense : can see memory reducing with nblocks, and increasing with nworkers
TIME is still weird -- always seems to take longer with more workers ??

compare methods

memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc
     --                    nworkers
     --------- -------------------------------
      nblocks     1       2       3       4
     ========= ======= ======= ======= =======
         1      305.8   305.8   305.8   305.8
         4       19.8    38.9    20.0    20.8
         9       4.6     8.0     11.6    15.3
     ========= ======= ======= ======= =======

conclude

results are mostly similar,
... except RSS tends to under-read ..
.. and EXCEPT for nblocks=4
* TYPICAL (see e.g. below)

/nworkers	1	2	3	4
tracemalloc	19.2	38.3	57.3	76.4
rss	19.8	38.9	20.0	20.8

POSSIBLY
- RSS shows re-use of blocks within the operation ?
- .. or RSS "misses" some pages ?
generally, it is NOT like RSS is only tracking 1 worker
- since, with many blocks, mem-usage generally grows with nworkers, just as for tracemalloc

pp-mo · 2024-05-09T17:15:00Z

Outcome

methods are ~equivalent in scope,
- both can measure across threads
- neither can measure across processes
but tracemalloc is probably more reliable
- rss results are less consistent on repeat
- rss seems to under-read in some cases

stephenworsley · 2024-05-15T08:57:12Z

benchmarks/benchmarks/memtrace_evaluation/__init__.py

+        nblocks = params["nblocks"]
+        nworkers = params["nworkers"]
+
+        nyfull = ysize // nblocks


It looks like you're applying this division twice, I think that's why adding blocks has such an exagerated effect on performance.

Oh, yes I see !
Hopefully that may explain the peculiar behaviour of the timings too.

I will fix this and re-investigate ...

84ff4ad fixes, I think

benchmarks/benchmarks/memtrace_evaluation/memory_exercising_task.py

pp-mo · 2024-05-16T14:33:48Z

Full results, from new code introduced in 84ff4ad
from $ asv run -b "Memcheck*" -e -E existing:<iris-test-python-path>/bin/python
also with export DATA_GEN_PYTHON=<iris-test-python-path>/bin/python
and all the numpy threading control settings given here in the Iris 'Dask Best Practice' section

$ asv run -b "Memcheck*" -e -E existing:/tmp/persistent/newconda-envs/iris3/bin/python
 . . .
· Discovering benchmarks
· Running 8 total benchmarks (1 commits * 1 environments * 8 benchmarks)
[ 0.00%] ·· Benchmarking existing-py_tmp_persistent_newconda-envs_iris3_bin_python
[ 0.00%] ··· Importing benchmark suite produced output:
[ 0.00%] ···· /tmp/persistent/git/iris/benchmarks/benchmarks/generate_data/__init__.py:52: UserWarning: No BENCHMARK_DATA env var, defaulting to /tmp/persistent/git/iris/benchmarks/.data. Note that some benchmark files are GB in size.
              warn(message)

[ 6.25%] ··· Running (memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc--).
[18.75%] ··· Running (memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc--).
[31.25%] ··· Running (memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc--)..
[56.25%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc                                                                                                                       ok
[56.25%] ··· ========= ========== ========== ========= =========
             --                         nworkers                
             --------- -----------------------------------------
              nblocks      1          2          3         4    
             ========= ========== ========== ========= =========
                 1      531±5ms    538±8ms    532±7ms   531±6ms 
                 4      535±6ms    518±9ms    518±8ms   512±8ms 
                 9      539±10ms   514±10ms   515±7ms   520±7ms 
             ========= ========== ========== ========= =========

[62.50%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc                                                                                                               ok
[62.50%] ··· ========= ======= ======= ======= =======
             --                    nworkers           
             --------- -------------------------------
              nblocks     1       2       3       4   
             ========= ======= ======= ======= =======
                 1      305.3   305.3   305.3   305.3 
                 4       76.4   152.7   229.0   305.3 
                 9       34.1    68.0   101.9   135.8 
             ========= ======= ======= ======= =======

[68.75%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc                                                                                                                   ok
[68.75%] ··· ========= ========== ========== ========== ==========
             --                          nworkers                 
             --------- -------------------------------------------
              nblocks      1          2          3          4     
             ========= ========== ========== ========== ==========
                 1      547±20ms   545±20ms   563±20ms   555±20ms 
                 4      562±20ms   528±20ms   526±20ms   524±20ms 
                 9      561±20ms   528±20ms   529±20ms   516±9ms  
             ========= ========== ========== ========== ==========

[75.00%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc                                                                                                           ok
[75.00%] ··· ========= ======= ======= ======= =======
             --                    nworkers           
             --------- -------------------------------
              nblocks     1       2       3       4   
             ========= ======= ======= ======= =======
                 1      305.8   305.8   305.8   305.8 
                 4       77.0    83.6    82.3    82.3 
                 9       34.5    37.1    36.9    36.9 
             ========= ======= ======= ======= =======

[81.25%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc                                                                                                             ok
[81.25%] ··· ========= ========= ========= ========= =========
             --                        nworkers               
             --------- ---------------------------------------
              nblocks      1         2         3         4    
             ========= ========= ========= ========= =========
                 1      546±8ms   546±9ms   550±6ms   554±7ms 
                 4      544±5ms   283±2ms   287±3ms   178±4ms 
                 9      547±4ms   313±2ms   203±5ms   208±3ms 
             ========= ========= ========= ========= =========

[87.50%] ··· memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc                                                                                                     ok
[87.50%] ··· ========= ===== ===== ===== =====
             --                nworkers       
             --------- -----------------------
              nblocks    1     2     3     4  
             ========= ===== ===== ===== =====
                 1      0.1   0.1   0.1   0.1 
                 4      0.2   0.2   0.2   0.2 
                 9      0.3   0.3   0.3   0.3 
             ========= ===== ===== ===== =====

[93.75%] ··· memtrace_evaluation.MemcheckRunstyles.time_calc                                                                                                                              ok
[93.75%] ··· ============= ================= ================= =================== ===================
             --                                          runtype / ysize                              
             ------------- ---------------------------------------------------------------------------
                measure     threads / 10000   threads / 40000   processes / 10000   processes / 40000 
             ============= ================= ================= =================== ===================
              tracemalloc       190±4ms          1.03±0.02s          117±2ms             389±4ms      
                  rss           190±4ms           1.04±0s            118±1ms             385±5ms      
             ============= ================= ================= =================== ===================

[100.00%] ··· memtrace_evaluation.MemcheckRunstyles.track_addmem_calc                                                                                                                      ok
[100.00%] ··· ============= ================= ================= =================== ===================
              --                                          runtype / ysize                              
              ------------- ---------------------------------------------------------------------------
                 measure     threads / 10000   threads / 40000   processes / 10000   processes / 40000 
              ============= ================= ================= =================== ===================
               tracemalloc         76.4             305.3               0.2                 0.2        
                   rss             76.6             109.3               0.8                 0.4        
              ============= ================= ================= =================== ===================

pp-mo · 2024-05-16T14:42:27Z

Conclusions:
Since we fixed the task-size scaling (and added a process-based BlocksAndWorkers), the results are now roughly :

Runstyle checks show that old + new memory schemes deliver roughly the same numbers
- both methods can track memory in thread workers, but not subprocesses
- (we find that tracemalloc delivers more stable answers -- but this was not re-retested)
threaded workers : space required reduces with nblocks + increases with nworkers, as you would expect
threaded workers : time taken is ~constant : this seems to indicate that due to Python GIL requirements, they don't actually run in parallel
parallel workers : time taken scales as expected (they do run in parallel, and task division gives speedups)

github-actions · 2024-05-16T15:07:28Z

⏱️ Performance Benchmark Report: `a3a20b5`

Performance shifts

Full benchmark results


Benchmarks that have stayed the same:

| Change   | Before [95b7ffe5]    | After [a3a20b56]    | Ratio   | Benchmark (Parameter)                                                                                |
|----------|----------------------|---------------------|---------|------------------------------------------------------------------------------------------------------|
|          | 1.12±0.01ms          | 1.12±0.02ms         | 1.00    | cube.CubeCreation.time_create(False, 'construct')                                                    |
|          | 408±5μs              | 409±7μs             | 1.00    | cube.CubeCreation.time_create(False, 'instantiate')                                                  |
|          | 949±10μs             | 960±10μs            | 1.01    | cube.CubeCreation.time_create(True, 'construct')                                                     |
|          | 583±10μs             | 580±9μs             | 0.99    | cube.CubeCreation.time_create(True, 'instantiate')                                                   |
|          | 223±4ms              | 219±4ms             | 0.98    | cube.CubeEquality.time_equality(False, False, 'all_equal')                                           |
|          | 111±2ms              | 111±1ms             | 1.00    | cube.CubeEquality.time_equality(False, False, 'coord_inequality')                                    |
|          | 230±3ms              | 231±2ms             | 1.01    | cube.CubeEquality.time_equality(False, False, 'data_inequality')                                     |
|          | 16.5±0.2μs           | 17.2±0.4μs          | 1.05    | cube.CubeEquality.time_equality(False, False, 'metadata_inequality')                                 |
|          | 306±2ms              | 298±3ms             | 0.97    | cube.CubeEquality.time_equality(False, True, 'all_equal')                                            |
|          | 195±2ms              | 195±2ms             | 1.00    | cube.CubeEquality.time_equality(False, True, 'coord_inequality')                                     |
|          | 316±3ms              | 316±3ms             | 1.00    | cube.CubeEquality.time_equality(False, True, 'data_inequality')                                      |
|          | 17.0±0.3μs           | 17.0±0.2μs          | 1.00    | cube.CubeEquality.time_equality(False, True, 'metadata_inequality')                                  |
|          | 220±3ms              | 222±2ms             | 1.01    | cube.CubeEquality.time_equality(True, False, 'all_equal')                                            |
|          | 111±0.9ms            | 111±2ms             | 1.00    | cube.CubeEquality.time_equality(True, False, 'coord_inequality')                                     |
|          | 232±5ms              | 232±3ms             | 1.00    | cube.CubeEquality.time_equality(True, False, 'data_inequality')                                      |
|          | 54.2±0.9μs           | 53.7±0.3μs          | 0.99    | cube.CubeEquality.time_equality(True, False, 'metadata_inequality')                                  |
|          | 305±4ms              | 305±5ms             | 1.00    | cube.CubeEquality.time_equality(True, True, 'all_equal')                                             |
|          | 195±3ms              | 196±2ms             | 1.01    | cube.CubeEquality.time_equality(True, True, 'coord_inequality')                                      |
|          | 315±3ms              | 314±3ms             | 1.00    | cube.CubeEquality.time_equality(True, True, 'data_inequality')                                       |
|          | 55.0±0.9μs           | 55.4±0.7μs          | 1.01    | cube.CubeEquality.time_equality(True, True, 'metadata_inequality')                                   |
|          | 401±3ns              | 416±3ns             | 1.04    | experimental.ugrid.regions_combine.CombineRegionsComputeRealData.time_compute_data(50)               |
|          | 257±3ms              | 260±3ms             | 1.01    | experimental.ugrid.regions_combine.CombineRegionsComputeRealData.time_compute_data(500)              |
|          | 14.5±0.08ms          | 14.4±0.2ms          | 0.99    | experimental.ugrid.regions_combine.CombineRegionsCreateCube.time_create_combined_cube(50)            |
|          | 16.4±0.3ms           | 16.2±0.1ms          | 0.98    | experimental.ugrid.regions_combine.CombineRegionsCreateCube.time_create_combined_cube(500)           |
|          | 5.0                  | 5.0                 | 1.00    | experimental.ugrid.regions_combine.CombineRegionsCreateCube.track_addedmem_create_combined_cube(50)  |
|          | 5.0                  | 5.0                 | 1.00    | experimental.ugrid.regions_combine.CombineRegionsCreateCube.track_addedmem_create_combined_cube(500) |
|          | 104±0.7ms            | 103±1ms             | 0.99    | experimental.ugrid.regions_combine.CombineRegionsFileStreamedCalc.time_stream_file2file(50)          |
|          | 704±7ms              | 703±1ms             | 1.00    | experimental.ugrid.regions_combine.CombineRegionsFileStreamedCalc.time_stream_file2file(500)         |
|          | 66.1±2ms             | 66.3±0.8ms          | 1.00    | experimental.ugrid.regions_combine.CombineRegionsSaveData.time_save(50)                              |
|          | 657±8ms              | 657±6ms             | 1.00    | experimental.ugrid.regions_combine.CombineRegionsSaveData.time_save(500)                             |
|          | 2.1752849999999997   | 2.1752849999999997  | 1.00    | experimental.ugrid.regions_combine.CombineRegionsSaveData.track_filesize_saved(50)                   |
|          | 216.01528499999998   | 216.01528499999998  | 1.00    | experimental.ugrid.regions_combine.CombineRegionsSaveData.track_filesize_saved(500)                  |
|          | 660±4μs              | 660±9μs             | 1.00    | import_iris.Iris.time__concatenate                                                                   |
|          | 179±1μs              | 184±4μs             | 1.03    | import_iris.Iris.time__constraints                                                                   |
|          | 110±0.7μs            | 110±0.9μs           | 1.00    | import_iris.Iris.time__data_manager                                                                  |
|          | 94.3±0.5μs           | 94.0±0.8μs          | 1.00    | import_iris.Iris.time__deprecation                                                                   |
|          | 137±2μs              | 136±0.5μs           | 0.99    | import_iris.Iris.time__lazy_data                                                                     |
|          | 900±5μs              | 899±10μs            | 1.00    | import_iris.Iris.time__merge                                                                         |
|          | 78.4±0.6μs           | 78.3±0.6μs          | 1.00    | import_iris.Iris.time__representation                                                                |
|          | 491±3μs              | 487±2μs             | 0.99    | import_iris.Iris.time_analysis                                                                       |
|          | 142±2μs              | 144±2μs             | 1.01    | import_iris.Iris.time_analysis__area_weighted                                                        |
|          | 110±3μs              | 111±0.6μs           | 1.01    | import_iris.Iris.time_analysis__grid_angles                                                          |
|          | 243±2μs              | 245±1μs             | 1.01    | import_iris.Iris.time_analysis__interpolation                                                        |
|          | 185±2μs              | 189±2μs             | 1.02    | import_iris.Iris.time_analysis__regrid                                                               |
|          | 112±1μs              | 111±0.9μs           | 0.99    | import_iris.Iris.time_analysis__scipy_interpolate                                                    |
|          | 140±1μs              | 140±1μs             | 1.00    | import_iris.Iris.time_analysis_calculus                                                              |
|          | 331±4μs              | 331±2μs             | 1.00    | import_iris.Iris.time_analysis_cartography                                                           |
|          | 94.9±0.3μs           | 95.1±0.5μs          | 1.00    | import_iris.Iris.time_analysis_geomerty                                                              |
|          | 221±3μs              | 222±3μs             | 1.00    | import_iris.Iris.time_analysis_maths                                                                 |
|          | 98.3±0.4μs           | 98.5±1μs            | 1.00    | import_iris.Iris.time_analysis_stats                                                                 |
|          | 176±2μs              | 177±1μs             | 1.00    | import_iris.Iris.time_analysis_trajectory                                                            |
|          | 307±6μs              | 300±3μs             | 0.98    | import_iris.Iris.time_aux_factory                                                                    |
|          | 85.1±1μs             | 85.4±1μs            | 1.00    | import_iris.Iris.time_common                                                                         |
|          | 162±2μs              | 165±4μs             | 1.02    | import_iris.Iris.time_common_lenient                                                                 |
|          | 987±10μs             | 986±9μs             | 1.00    | import_iris.Iris.time_common_metadata                                                                |
|          | 133±1μs              | 133±1μs             | 1.00    | import_iris.Iris.time_common_mixin                                                                   |
|          | 1.17±0.01ms          | 1.18±0.01ms         | 1.00    | import_iris.Iris.time_common_resolve                                                                 |
|          | 201±2μs              | 202±2μs             | 1.01    | import_iris.Iris.time_config                                                                         |
|          | 116±0.9μs            | 116±0.7μs           | 1.00    | import_iris.Iris.time_coord_categorisation                                                           |
|          | 355±5μs              | 352±4μs             | 0.99    | import_iris.Iris.time_coord_systems                                                                  |
|          | 737±7μs              | 733±5μs             | 0.99    | import_iris.Iris.time_coords                                                                         |
|          | 658±10μs             | 656±8μs             | 1.00    | import_iris.Iris.time_cube                                                                           |
|          | 225±1μs              | 226±3μs             | 1.01    | import_iris.Iris.time_exceptions                                                                     |
|          | 78.8±0.5μs           | 77.8±1μs            | 0.99    | import_iris.Iris.time_experimental                                                                   |
|          | 189±1μs              | 189±1μs             | 1.00    | import_iris.Iris.time_fileformats                                                                    |
|          | 251±2μs              | 249±2μs             | 0.99    | import_iris.Iris.time_fileformats__ff                                                                |
|          | 2.71±0.02ms          | 2.69±0.02ms         | 0.99    | import_iris.Iris.time_fileformats__ff_cross_references                                               |
|          | 79.1±1μs             | 79.7±1μs            | 1.01    | import_iris.Iris.time_fileformats__pp_lbproc_pairs                                                   |
|          | 114±1μs              | 115±0.9μs           | 1.00    | import_iris.Iris.time_fileformats_abf                                                                |
|          | 348±4μs              | 349±2μs             | 1.00    | import_iris.Iris.time_fileformats_cf                                                                 |
|          | 5.31±0.06ms          | 5.40±0.06ms         | 1.02    | import_iris.Iris.time_fileformats_dot                                                                |
|          | 75.2±0.9μs           | 75.6±2μs            | 1.01    | import_iris.Iris.time_fileformats_name                                                               |
|          | 260±2μs              | 261±2μs             | 1.01    | import_iris.Iris.time_fileformats_name_loaders                                                       |
|          | 120±2μs              | 120±1μs             | 1.00    | import_iris.Iris.time_fileformats_netcdf                                                             |
|          | 121±1μs              | 122±1μs             | 1.01    | import_iris.Iris.time_fileformats_nimrod                                                             |
|          | 208±5μs              | 208±4μs             | 1.00    | import_iris.Iris.time_fileformats_nimrod_load_rules                                                  |
|          | 769±7μs              | 765±7μs             | 1.00    | import_iris.Iris.time_fileformats_pp                                                                 |
|          | 181±4μs              | 180±4μs             | 1.00    | import_iris.Iris.time_fileformats_pp_load_rules                                                      |
|          | 134±2μs              | 134±1μs             | 1.00    | import_iris.Iris.time_fileformats_pp_save_rules                                                      |
|          | 520±7μs              | 513±2μs             | 0.99    | import_iris.Iris.time_fileformats_rules                                                              |
|          | 221±2μs              | 219±1μs             | 0.99    | import_iris.Iris.time_fileformats_structured_array_identification                                    |
|          | 84.4±0.7μs           | 84.2±0.4μs          | 1.00    | import_iris.Iris.time_fileformats_um                                                                 |
|          | 161±0.7μs            | 161±2μs             | 1.00    | import_iris.Iris.time_fileformats_um__fast_load                                                      |
|          | 136±1μs              | 137±1μs             | 1.01    | import_iris.Iris.time_fileformats_um__fast_load_structured_fields                                    |
|          | 76.7±0.5μs           | 77.6±0.8μs          | 1.01    | import_iris.Iris.time_fileformats_um__ff_replacement                                                 |
|          | 82.7±1μs             | 82.4±2μs            | 1.00    | import_iris.Iris.time_fileformats_um__optimal_array_structuring                                      |
|          | 986±10μs             | 987±7μs             | 1.00    | import_iris.Iris.time_fileformats_um_cf_map                                                          |
|          | 137±2μs              | 137±0.9μs           | 1.00    | import_iris.Iris.time_io                                                                             |
|          | 171±2μs              | 172±2μs             | 1.01    | import_iris.Iris.time_io_format_picker                                                               |
|          | 203±3μs              | 209±2μs             | 1.03    | import_iris.Iris.time_iris                                                                           |
|          | 129±2μs              | 129±1μs             | 1.01    | import_iris.Iris.time_iterate                                                                        |
|          | 8.52±0.1ms           | 8.55±0.07ms         | 1.00    | import_iris.Iris.time_palette                                                                        |
|          | 339±6μs              | 342±2μs             | 1.01    | import_iris.Iris.time_plot                                                                           |
|          | 106±1μs              | 104±1μs             | 0.99    | import_iris.Iris.time_quickplot                                                                      |
|          | 2.07±0.01ms          | 2.08±0.01ms         | 1.00    | import_iris.Iris.time_std_names                                                                      |
|          | 1.78±0.01ms          | 1.77±0.01ms         | 0.99    | import_iris.Iris.time_symbols                                                                        |
|          | 22.6±1ms             | 23.1±1ms            | 1.03    | import_iris.Iris.time_tests                                                                          |
|          | 230±4μs              | 228±4μs             | 0.99    | import_iris.Iris.time_third_party_cartopy                                                            |
|          | 4.83±0.04ms          | 4.83±0.02ms         | 1.00    | import_iris.Iris.time_third_party_cf_units                                                           |
|          | 108±1μs              | 107±0.4μs           | 0.99    | import_iris.Iris.time_third_party_cftime                                                             |
|          | 2.82±0.01ms          | 2.81±0.01ms         | 1.00    | import_iris.Iris.time_third_party_matplotlib                                                         |
|          | 1.06±0ms             | 1.06±0.01ms         | 1.00    | import_iris.Iris.time_third_party_numpy                                                              |
|          | 159±2μs              | 158±2μs             | 1.00    | import_iris.Iris.time_third_party_scipy                                                              |
|          | 100±0.6μs            | 101±2μs             | 1.00    | import_iris.Iris.time_time                                                                           |
|          | 319±2μs              | 317±3μs             | 0.99    | import_iris.Iris.time_util                                                                           |
|          | 75.4±2μs             | 73.4±1μs            | 0.97    | iterate.IZip.time_izip                                                                               |
|          | 8.03±0.2ms           | 8.04±0.06ms         | 1.00    | load.LoadAndRealise.time_load((1280, 960, 5), False, 'FF')                                           |
|          | 23.8±0.6ms           | 23.3±0.6ms          | 0.98    | load.LoadAndRealise.time_load((1280, 960, 5), False, 'NetCDF')                                       |
|          | 9.08±0.1ms           | 8.79±0.2ms          | 0.97    | load.LoadAndRealise.time_load((1280, 960, 5), False, 'PP')                                           |
|          | 8.06±0.1ms           | 8.10±0.04ms         | 1.00    | load.LoadAndRealise.time_load((1280, 960, 5), True, 'FF')                                            |
|          | 20.9±0.7ms           | 20.8±0.2ms          | 1.00    | load.LoadAndRealise.time_load((1280, 960, 5), True, 'NetCDF')                                        |
|          | 8.84±0.2ms           | 8.84±0.06ms         | 1.00    | load.LoadAndRealise.time_load((1280, 960, 5), True, 'PP')                                            |
|          | 1.37±0.01s           | 1.34±0.01s          | 0.98    | load.LoadAndRealise.time_load((2, 2, 1000), False, 'FF')                                             |
|          | 20.5±0.2ms           | 20.7±0.09ms         | 1.01    | load.LoadAndRealise.time_load((2, 2, 1000), False, 'NetCDF')                                         |
|          | 1.49±0.02s           | 1.51±0.02s          | 1.01    | load.LoadAndRealise.time_load((2, 2, 1000), False, 'PP')                                             |
|          | 1.35±0.01s           | 1.35±0.01s          | 1.00    | load.LoadAndRealise.time_load((2, 2, 1000), True, 'FF')                                              |
|          | 20.8±0.5ms           | 20.6±0.1ms          | 0.99    | load.LoadAndRealise.time_load((2, 2, 1000), True, 'NetCDF')                                          |
|          | 1.50±0.02s           | 1.49±0.01s          | 1.00    | load.LoadAndRealise.time_load((2, 2, 1000), True, 'PP')                                              |
|          | 3.91±0.03ms          | 3.91±0.03ms         | 1.00    | load.LoadAndRealise.time_load((50, 50, 2), False, 'FF')                                              |
|          | 20.0±0.4ms           | 19.6±0.2ms          | 0.98    | load.LoadAndRealise.time_load((50, 50, 2), False, 'NetCDF')                                          |
|          | 4.16±0.02ms          | 4.17±0.01ms         | 1.00    | load.LoadAndRealise.time_load((50, 50, 2), False, 'PP')                                              |
|          | 3.87±0.02ms          | 3.91±0.04ms         | 1.01    | load.LoadAndRealise.time_load((50, 50, 2), True, 'FF')                                               |
|          | 20.3±0.4ms           | 19.5±0.2ms          | 0.96    | load.LoadAndRealise.time_load((50, 50, 2), True, 'NetCDF')                                           |
|          | 4.16±0.02ms          | 4.20±0.03ms         | 1.01    | load.LoadAndRealise.time_load((50, 50, 2), True, 'PP')                                               |
|          | 32.7±2ms             | 32.2±3ms            | 0.98    | load.LoadAndRealise.time_realise((1280, 960, 5), False, 'FF')                                        |
|          | 19.5±0.5ms           | 19.2±0.2ms          | 0.98    | load.LoadAndRealise.time_realise((1280, 960, 5), False, 'NetCDF')                                    |
|          | 14.1±2ms             | 14.0±2ms            | 0.99    | load.LoadAndRealise.time_realise((1280, 960, 5), False, 'PP')                                        |
|          | 25.7±2ms             | 25.4±1ms            | 0.99    | load.LoadAndRealise.time_realise((1280, 960, 5), True, 'FF')                                         |
|          | 70.6±2ms             | 70.7±2ms            | 1.00    | load.LoadAndRealise.time_realise((1280, 960, 5), True, 'NetCDF')                                     |
|          | 25.7±1ms             | 25.3±2ms            | 0.98    | load.LoadAndRealise.time_realise((1280, 960, 5), True, 'PP')                                         |
|          | 437±3ms              | 435±2ms             | 0.99    | load.LoadAndRealise.time_realise((2, 2, 1000), False, 'FF')                                          |
|          | 2.75±0.1ms           | 2.78±0.1ms          | 1.01    | load.LoadAndRealise.time_realise((2, 2, 1000), False, 'NetCDF')                                      |
|          | 444±3ms              | 440±3ms             | 0.99    | load.LoadAndRealise.time_realise((2, 2, 1000), False, 'PP')                                          |
|          | 442±5ms              | 442±3ms             | 1.00    | load.LoadAndRealise.time_realise((2, 2, 1000), True, 'FF')                                           |
|          | 2.82±0.07ms          | 2.73±0.07ms         | 0.97    | load.LoadAndRealise.time_realise((2, 2, 1000), True, 'NetCDF')                                       |
|          | 449±4ms              | 447±4ms             | 1.00    | load.LoadAndRealise.time_realise((2, 2, 1000), True, 'PP')                                           |
|          | 1.55±0.07ms          | 1.56±0.08ms         | 1.01    | load.LoadAndRealise.time_realise((50, 50, 2), False, 'FF')                                           |
|          | 2.82±0.09ms          | 2.76±0.05ms         | 0.98    | load.LoadAndRealise.time_realise((50, 50, 2), False, 'NetCDF')                                       |
|          | 1.56±0.1ms           | 1.59±0.09ms         | 1.02    | load.LoadAndRealise.time_realise((50, 50, 2), False, 'PP')                                           |
|          | 1.62±0.08ms          | 1.58±0.06ms         | 0.98    | load.LoadAndRealise.time_realise((50, 50, 2), True, 'FF')                                            |
|          | 2.88±0.08ms          | 2.84±0.07ms         | 0.99    | load.LoadAndRealise.time_realise((50, 50, 2), True, 'NetCDF')                                        |
|          | 1.59±0.06ms          | 1.56±0.07ms         | 0.98    | load.LoadAndRealise.time_realise((50, 50, 2), True, 'PP')                                            |
|          | 357±3ms              | 357±6ms             | 1.00    | load.ManyVars.time_many_var_load                                                                     |
|          | 8.12±0.02ms          | 8.12±0.04ms         | 1.00    | load.STASHConstraint.time_stash_constraint((1280, 960, 5), 'FF')                                     |
|          | 8.95±0.08ms          | 9.00±0.07ms         | 1.01    | load.STASHConstraint.time_stash_constraint((1280, 960, 5), 'PP')                                     |
|          | 1.35±0.01s           | 1.36±0.01s          | 1.01    | load.STASHConstraint.time_stash_constraint((2, 2, 1000), 'FF')                                       |
|          | 1.52±0.02s           | 1.52±0.01s          | 1.00    | load.STASHConstraint.time_stash_constraint((2, 2, 1000), 'PP')                                       |
|          | 3.95±0.02ms          | 3.98±0.02ms         | 1.01    | load.STASHConstraint.time_stash_constraint((2, 2, 2), 'FF')                                          |
|          | 4.23±0.03ms          | 4.23±0.02ms         | 1.00    | load.STASHConstraint.time_stash_constraint((2, 2, 2), 'PP')                                          |
|          | 8.06±0.03ms          | 8.08±0.06ms         | 1.00    | load.StructuredFF.time_structured_load((1280, 960, 5), False)                                        |
|          | 4.74±0.03ms          | 4.73±0.05ms         | 1.00    | load.StructuredFF.time_structured_load((1280, 960, 5), True)                                         |
|          | 1.34±0.02s           | 1.32±0.01s          | 0.98    | load.StructuredFF.time_structured_load((2, 2, 1000), False)                                          |
|          | 372±6ms              | 368±2ms             | 0.99    | load.StructuredFF.time_structured_load((2, 2, 1000), True)                                           |
|          | 3.92±0.01ms          | 3.90±0.03ms         | 0.99    | load.StructuredFF.time_structured_load((2, 2, 2), False)                                             |
|          | 3.53±0.02ms          | 3.54±0.03ms         | 1.00    | load.StructuredFF.time_structured_load((2, 2, 2), True)                                              |
|          | 148±0.6ms            | 148±0.5ms           | 1.00    | load.TimeConstraint.time_time_constraint(20, 'FF')                                                   |
|          | 23.1±0.3ms           | 22.9±0.2ms          | 0.99    | load.TimeConstraint.time_time_constraint(20, 'NetCDF')                                               |
|          | 166±3ms              | 162±0.9ms           | 0.98    | load.TimeConstraint.time_time_constraint(20, 'PP')                                                   |
|          | 29.5±0.3ms           | 29.3±0.2ms          | 0.99    | load.TimeConstraint.time_time_constraint(3, 'FF')                                                    |
|          | 22.9±0.3ms           | 22.9±0.1ms          | 1.00    | load.TimeConstraint.time_time_constraint(3, 'NetCDF')                                                |
|          | 31.5±0.1ms           | 31.5±0.5ms          | 1.00    | load.TimeConstraint.time_time_constraint(3, 'PP')                                                    |
|          | 17.2±0.2ms           | 17.2±0.4ms          | 1.00    | load.ugrid.BasicLoading.time_load_file(1)                                                            |
|          | 40.9±0.3ms           | 41.0±0.6ms          | 1.00    | load.ugrid.BasicLoading.time_load_file(200000)                                                       |
|          | 14.0±0.1ms           | 14.1±0.4ms          | 1.00    | load.ugrid.BasicLoading.time_load_mesh(1)                                                            |
|          | 21.5±0.3ms           | 21.8±0.3ms          | 1.02    | load.ugrid.BasicLoading.time_load_mesh(200000)                                                       |
|          | 17.3±0.3ms           | 17.2±0.2ms          | 0.99    | load.ugrid.BasicLoadingTime.time_load_file(1)                                                        |
|          | 19.9±0.4ms           | 20.2±0.6ms          | 1.01    | load.ugrid.BasicLoadingTime.time_load_file(200000)                                                   |
|          | 14.1±0.2ms           | 14.0±0.2ms          | 0.99    | load.ugrid.BasicLoadingTime.time_load_mesh(1)                                                        |
|          | 16.7±0.3ms           | 16.6±0.2ms          | 0.99    | load.ugrid.BasicLoadingTime.time_load_mesh(200000)                                                   |
|          | 18.3±0.2ms           | 18.6±0.4ms          | 1.02    | load.ugrid.Callback.time_load_file_callback(1)                                                       |
|          | 50.2±0.7ms           | 50.5±0.4ms          | 1.01    | load.ugrid.Callback.time_load_file_callback(200000)                                                  |
|          | 18.2±0.3ms           | 18.1±0.2ms          | 0.99    | load.ugrid.CallbackTime.time_load_file_callback(1)                                                   |
|          | 21.9±0.8ms           | 21.9±0.4ms          | 1.00    | load.ugrid.CallbackTime.time_load_file_callback(200000)                                              |
|          | 2.75±0.2ms           | 2.73±0.1ms          | 0.99    | load.ugrid.DataRealisation.time_realise_data(10000)                                                  |
|          | 5.36±0.1ms           | 3.62±1ms            | ~0.67   | load.ugrid.DataRealisation.time_realise_data(200000)                                                 |
|          | 39.3±4ms             | 39.5±2ms            | 1.01    | load.ugrid.DataRealisationTime.time_realise_data(10000)                                              |
|          | 827±10ms             | 821±8ms             | 0.99    | load.ugrid.DataRealisationTime.time_realise_data(200000)                                             |
|          | 276±0.8ms            | 276±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(1, 1)                                         |
|          | 276±3ms              | 275±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(1, 2)                                         |
|          | 277±1ms              | 278±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(1, 3)                                         |
|          | 277±1ms              | 278±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(1, 4)                                         |
|          | 274±2ms              | 274±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(4, 1)                                         |
|          | 267±0.7ms            | 266±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(4, 2)                                         |
|          | 269±1ms              | 268±0.6ms           | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(4, 3)                                         |
|          | 270±0.8ms            | 270±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(4, 4)                                         |
|          | 280±2ms              | 279±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(9, 1)                                         |
|          | 271±1ms              | 269±0.7ms           | 0.99    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(9, 2)                                         |
|          | 272±2ms              | 273±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(9, 3)                                         |
|          | 273±2ms              | 272±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.time_calc(9, 4)                                         |
|          | 305.3                | 305.3               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(1, 1)                                 |
|          | 305.3                | 305.3               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(1, 2)                                 |
|          | 305.3                | 305.3               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(1, 3)                                 |
|          | 305.3                | 305.3               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(1, 4)                                 |
|          | 76.4                 | 76.4                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(4, 1)                                 |
|          | 152.7                | 152.7               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(4, 2)                                 |
|          | 229.0                | 229.0               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(4, 3)                                 |
|          | 305.3                | 305.3               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(4, 4)                                 |
|          | 34.1                 | 34.1                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(9, 1)                                 |
|          | 68.0                 | 68.0                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(9, 2)                                 |
|          | 101.9                | 101.9               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(9, 3)                                 |
|          | 135.8                | 135.8               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers.track_addmem_calc(9, 4)                                 |
|          | 275±1ms              | 276±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(1, 1)                                     |
|          | 276±1ms              | 276±0.8ms           | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(1, 2)                                     |
|          | 276±1ms              | 277±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(1, 3)                                     |
|          | 276±1ms              | 278±2ms             | 1.01    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(1, 4)                                     |
|          | 275±2ms              | 276±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(4, 1)                                     |
|          | 267±2ms              | 266±0.5ms           | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(4, 2)                                     |
|          | 268±1ms              | 269±0.9ms           | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(4, 3)                                     |
|          | 271±1ms              | 270±0.7ms           | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(4, 4)                                     |
|          | 277±1ms              | 276±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(9, 1)                                     |
|          | 271±2ms              | 270±0.5ms           | 0.99    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(9, 2)                                     |
|          | 272±1ms              | 273±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(9, 3)                                     |
|          | 272±1ms              | 272±0.8ms           | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.time_calc(9, 4)                                     |
|          | 307.9                | 307.9               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(1, 1)                             |
|          | 307.9                | 307.9               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(1, 2)                             |
|          | 307.9                | 307.8               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(1, 3)                             |
|          | 307.9                | 307.9               | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(1, 4)                             |
|          | 79.1                 | 79.1                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(4, 1)                             |
|          | 83.4                 | 83.4                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(4, 2)                             |
|          | 83.5                 | 83.5                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(4, 3)                             |
|          | 83.4                 | 83.4                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(4, 4)                             |
|          | 36.7                 | 36.8                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(9, 1)                             |
|          | 38.0                 | 37.8                | 0.99    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(9, 2)                             |
|          | 38.0                 | 37.9                | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(9, 3)                             |
|          | 38.2                 | 37.8                | 0.99    | memtrace_evaluation.MemcheckBlocksAndWorkers_Rss.track_addmem_calc(9, 4)                             |
|          | 286±1ms              | 286±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(1, 1)                               |
|          | 292±1ms              | 291±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(1, 2)                               |
|          | 297±1ms              | 297±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(1, 3)                               |
|          | 302±2ms              | 303±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(1, 4)                               |
|          | 286±2ms              | 286±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(4, 1)                               |
|          | 158±1ms              | 161±7ms             | 1.02    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(4, 2)                               |
|          | 162±1ms              | 162±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(4, 3)                               |
|          | 157±2ms              | 157±2ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(4, 4)                               |
|          | 289±2ms              | 290±3ms             | 1.01    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(9, 1)                               |
|          | 175±2ms              | 177±7ms             | 1.01    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(9, 2)                               |
|          | 171±0.9ms            | 171±1ms             | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(9, 3)                               |
|          | 176±2ms              | 179±2ms             | 1.02    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.time_calc(9, 4)                               |
|          | 0.1                  | 0.1                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(1, 1)                       |
|          | 0.1                  | 0.1                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(1, 2)                       |
|          | 0.1                  | 0.1                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(1, 3)                       |
|          | 0.1                  | 0.1                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(1, 4)                       |
|          | 0.2                  | 0.2                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(4, 1)                       |
|          | 0.2                  | 0.2                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(4, 2)                       |
|          | 0.2                  | 0.2                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(4, 3)                       |
|          | 0.2                  | 0.2                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(4, 4)                       |
|          | 0.3                  | 0.3                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(9, 1)                       |
|          | 0.3                  | 0.3                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(9, 2)                       |
|          | 0.3                  | 0.3                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(9, 3)                       |
|          | 0.3                  | 0.3                 | 1.00    | memtrace_evaluation.MemcheckBlocksAndWorkers_processes.track_addmem_calc(9, 4)                       |
|          | 107±5ms              | 109±3ms             | 1.01    | memtrace_evaluation.MemcheckRunstyles.time_calc('rss', 'processes', 10000)                           |
|          | 308±5ms              | 311±6ms             | 1.01    | memtrace_evaluation.MemcheckRunstyles.time_calc('rss', 'processes', 40000)                           |
|          | 121±1ms              | 121±0.4ms           | 1.00    | memtrace_evaluation.MemcheckRunstyles.time_calc('rss', 'threads', 10000)                             |
|          | 537±1ms              | 536±0.9ms           | 1.00    | memtrace_evaluation.MemcheckRunstyles.time_calc('rss', 'threads', 40000)                             |
|          | 108±4ms              | 108±4ms             | 1.00    | memtrace_evaluation.MemcheckRunstyles.time_calc('tracemalloc', 'processes', 10000)                   |
|          | 309±3ms              | 310±4ms             | 1.01    | memtrace_evaluation.MemcheckRunstyles.time_calc('tracemalloc', 'processes', 40000)                   |
|          | 121±0.7ms            | 121±0.5ms           | 1.00    | memtrace_evaluation.MemcheckRunstyles.time_calc('tracemalloc', 'threads', 10000)                     |
|          | 537±2ms              | 538±2ms             | 1.00    | memtrace_evaluation.MemcheckRunstyles.time_calc('tracemalloc', 'threads', 40000)                     |
|          | 2.5                  | 2.5                 | 1.00    | memtrace_evaluation.MemcheckRunstyles.track_addmem_calc('rss', 'processes', 10000)                   |
|          | 2.6                  | 2.5                 | 0.96    | memtrace_evaluation.MemcheckRunstyles.track_addmem_calc('rss', 'processes', 40000)                   |
|          | 79.0                 | 79.0                | 1.00    | memtrace_evaluation.MemcheckRunstyles.track_addmem_calc('rss', 'threads', 10000)                     |
|          | 110.3                | 110.3               | 1.00    | memtrace_evaluation.MemcheckRunstyles.track_addmem_calc('rss', 'threads', 40000)                     |
|          | 0.2                  | 0.2                 | 1.00    | memtrace_evaluation.MemcheckRunstyles.track_addmem_calc('tracemalloc', 'processes', 10000)           |
|          | 0.2                  | 0.2                 | 1.00    | memtrace_evaluation.MemcheckRunstyles.track_addmem_calc('tracemalloc', 'processes', 40000)           |
|          | 76.4                 | 76.4                | 1.00    | memtrace_evaluation.MemcheckRunstyles.track_addmem_calc('tracemalloc', 'threads', 10000)             |
|          | 305.3                | 305.3               | 1.00    | memtrace_evaluation.MemcheckRunstyles.track_addmem_calc('tracemalloc', 'threads', 40000)             |
|          | 183±2ms              | 185±2ms             | 1.01    | merge_concat.Concatenate.time_concatenate                                                            |
|          | 45.4±0.5ms           | 45.6±1ms            | 1.00    | merge_concat.Merge.time_merge                                                                        |
|          | 6.67±0.08ms          | 6.53±0.03ms         | 0.98    | plot.AuxSort.time_aux_sort                                                                           |
|          | 76.1±3ms             | 79.0±2ms            | 1.04    | regridding.CurvilinearRegridding.time_regrid_pic                                                     |
|          | 98.1±0.5ms           | 97.5±0.6ms          | 0.99    | regridding.HorizontalChunkedRegridding.time_regrid_area_w                                            |
|          | 48.6±2ms             | 49.0±2ms            | 1.01    | regridding.HorizontalChunkedRegridding.time_regrid_area_w_new_grid                                   |
|          | 4.03±0.05ms          | 4.04±0.03ms         | 1.00    | save.NetcdfSave.time_netcdf_save_cube(50, False)                                                     |
|          | 70.9±0.3ms           | 70.6±0.4ms          | 0.99    | save.NetcdfSave.time_netcdf_save_cube(50, True)                                                      |
|          | 52.5±2ms             | 52.6±1ms            | 1.00    | save.NetcdfSave.time_netcdf_save_cube(600, False)                                                    |
|          | 568±6ms              | 567±4ms             | 1.00    | save.NetcdfSave.time_netcdf_save_cube(600, True)                                                     |
|          | 90.3±0.5ns           | 91.4±0.7ns          | 1.01    | save.NetcdfSave.time_netcdf_save_mesh(50, False)                                                     |
|          | 54.6±0.8ms           | 54.1±0.4ms          | 0.99    | save.NetcdfSave.time_netcdf_save_mesh(50, True)                                                      |
|          | 90.1±0.8ns           | 90.9±2ns            | 1.01    | save.NetcdfSave.time_netcdf_save_mesh(600, False)                                                    |
|          | 498±4ms              | 501±2ms             | 1.01    | save.NetcdfSave.time_netcdf_save_mesh(600, True)                                                     |
|          | 42.0±0.7ms           | 42.2±1ms            | 1.00    | stats.PearsonR.time_lazy                                                                             |
|          | 19.1±0.3ms           | 18.9±0.2ms          | 0.99    | stats.PearsonR.time_real                                                                             |
|          | 23.0±0.8ms           | 23.0±0.7ms          | 1.00    | trajectory.TrajectoryInterpolation.time_trajectory_linear                                            |
|          | 57.4±0.5ms           | 57.8±0.6ms          | 1.01    | trajectory.TrajectoryInterpolation.time_trajectory_nearest                                           |

Generated by GHA run 9113925886

pp-mo · 2024-05-16T16:51:32Z

Note: the results reported in the whole-benchmark runs behave a little differently.

the parameter tables are laid out differently in some cases, which is odd but maybe due to some additional config in the GHA action ?
the memory scaling for threaded workers is different -- I suspect the number of available CPUs is different

pp-mo · 2024-05-20T23:37:43Z

Added
Investigate repeated ops with simulated memory 'leak'.

Just for reference.
A sample output ...

$ python bm_runner.py custom run -- -b "MemcheckTaskRepeats" -e --quick -E existing:/tmp/persistent/newconda-envs/iris3/bin/python
 . . .
    BM_RUNNER DEBUG: asv run -b MemcheckTaskRepeats -e --quick -E existing:/tmp/persistent/newconda-envs/iris3/bin/python
 . . .
· Ignoring ASV setting(s): `python`. Benchmark environment management is delegated to third party script(s).
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[ 0.00%] ·· Benchmarking existing-py_tmp_persistent_newconda-envs_iris3_bin_python
[ 0.00%] ··· Importing benchmark suite produced output:
[ 0.00%] ···· /net/home/h05/itpp/git/iris/iris_main/benchmarks/benchmarks/generate_data/__init__.py:52: UserWarning: No BENCHMARK_DATA env var, defaulting to /net/home/h05/itpp/git/iris/iris_main/benchmarks/.data. Note that some benchmark files are GB in size.
              warn(message)

[100.00%] ··· memtrace_evaluation.MemcheckTaskRepeats.track_mem                                                                                                      ok
[100.00%] ··· ======= =======
               nreps         
              ------- -------
                 1      76.8 
                 2      77.1 
                 3     102.9 
                 4     103.3 
              ======= =======

Some combined results

                   nreps
                  ------- -------
                     1      76.8,  76.8,  76.9,  76.8,  76.8,  76.8,
                     2      77.1,  77.0,  77.1,  77.0,  77.0,  77.1,
                     3     102.8, 103.0, 103.0, 103.0, 103.0, 102.9,
                     4     103.3, 103.2, 103.5, 103.4, 103.2, 103.3,
                  ======= =======
(( incremental differences, from above numbers ))
                     1-2     0.3   0.2   0.2   0.2   0.3
                     2-3    25.7  26.0  25.9  26.0  25.8
                     3-4     0.5   0.2   0.5   0.4   0.4
                  ======= =======

Conclusions

The allocation pattern is a trifle hard to explain.
Sizes don't relate very directly to the actual allocations and are irregular
- this presumably show the python heap memory handling in action
This raises some doubts how stable results may be.

Trial tracemalloc as alternative memory tracker.

94494fb

pp-mo closed this May 9, 2024

This was referenced May 9, 2024

Memory benchmark trials (--OLD BROKEN VERSION : please see #5946 instead ) #5945

Closed

Use tracemalloc for memory measurements. #5948

Merged

Use more practical nworkers params.

d039ad0

pp-mo reopened this May 9, 2024

pp-mo closed this May 9, 2024

pp-mo mentioned this pull request May 10, 2024

Investigate tracemalloc #5947

Closed

stephenworsley reviewed May 15, 2024

View reviewed changes

benchmarks/benchmarks/memtrace_evaluation/memory_exercising_task.py Show resolved Hide resolved

Fix task scaling; test WorkersAndBlocks with process-based tasks.

84ff4ad

pp-mo reopened this May 16, 2024

pp-mo added the benchmark_this Request that this pull request be benchmarked to check if it introduces performance shifts label May 16, 2024

pp-mo closed this May 16, 2024

Investigate repeated ops with simulated memory 'leak'.

8f7166e

pp-mo reopened this May 20, 2024

pp-mo closed this May 20, 2024

pp-mo mentioned this pull request May 21, 2024

Add memory benchmarks #5960

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trial tracemalloc as memory tracker (replacement v2 PR) #5946

Trial tracemalloc as memory tracker (replacement v2 PR) #5946

pp-mo commented May 9, 2024

pp-mo commented May 9, 2024

codecov bot commented May 9, 2024 •

edited

Loading

pp-mo commented May 9, 2024 •

edited

Loading

pp-mo commented May 9, 2024 •

edited

Loading

stephenworsley May 15, 2024

pp-mo May 16, 2024

pp-mo May 16, 2024

pp-mo commented May 16, 2024 •

edited

Loading

pp-mo commented May 16, 2024 •

edited

Loading

github-actions bot commented May 16, 2024

pp-mo commented May 16, 2024

pp-mo commented May 20, 2024 •

edited

Loading

Trial tracemalloc as memory tracker (replacement v2 PR) #5946

Trial tracemalloc as memory tracker (replacement v2 PR) #5946

Conversation

pp-mo commented May 9, 2024

WIP DO NOT MERGE THIS, EVER

Additional context : sample output from a desktop run

pp-mo commented May 9, 2024

What I think we can deduce from the benchmark results

codecov bot commented May 9, 2024 • edited Loading

Codecov Report

pp-mo commented May 9, 2024 • edited Loading

Updated ideas

New results

Some interpretation

RunStyle checks are now more understandable.

MemcheckBlocksAndWorkers

pp-mo commented May 9, 2024 • edited Loading

Outcome

stephenworsley May 15, 2024

Choose a reason for hiding this comment

pp-mo May 16, 2024

Choose a reason for hiding this comment

pp-mo May 16, 2024

Choose a reason for hiding this comment

pp-mo commented May 16, 2024 • edited Loading

pp-mo commented May 16, 2024 • edited Loading

github-actions bot commented May 16, 2024

⏱️ Performance Benchmark Report: a3a20b5

pp-mo commented May 16, 2024

pp-mo commented May 20, 2024 • edited Loading

Some combined results

Conclusions

codecov bot commented May 9, 2024 •

edited

Loading

pp-mo commented May 9, 2024 •

edited

Loading

pp-mo commented May 9, 2024 •

edited

Loading

pp-mo commented May 16, 2024 •

edited

Loading

pp-mo commented May 16, 2024 •

edited

Loading

⏱️ Performance Benchmark Report: `a3a20b5`

pp-mo commented May 20, 2024 •

edited

Loading