attempt at optimizing allocs in replace.c #142

lefessan · 2024-04-23T08:33:27Z

There are two steps to remove allocations:

use a circular buffer for the list of tokens
remove re-allocation of tokens passing from one stream to another

cobc/replace.c

lefessan · 2024-04-23T09:14:28Z

This version passes the testsuite on my computer.

@GitMensch You may want to test it on your example to measure the number of allocations. It's supposed to allocate less than the original implementation in gnucobol 3.1, as there is no allocation of intermediate lists.

GitMensch · 2024-04-23T09:45:57Z

related to #75

GitMensch · 2024-04-23T10:42:04Z

This looks quite promising. perf stat, first for 3.2, then for 3.2 + this change:

          2,369.99 msec task-clock:u                     #    0.937 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
           356,148      page-faults:u                    #  150.274 K/sec
     6,269,470,708      cycles:u                         #    2.645 GHz
    14,738,503,488      instructions:u                   #    2.35  insn per cycle
     3,446,098,461      branches:u                       #    1.454 G/sec
         8,069,400      branch-misses:u                  #    0.23% of all branches

       2.528378960 seconds time elapsed

       1.615366000 seconds user
       0.741772000 seconds sys

          1,255.32 msec task-clock:u                     #    0.959 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
            91,721      page-faults:u                    #   73.066 K/sec
     3,975,614,937      cycles:u                         #    3.167 GHz
     9,728,980,463      instructions:u                   #    2.45  insn per cycle
     2,260,897,196      branches:u                       #    1.801 G/sec
         4,786,343      branch-misses:u                  #    0.21% of all branches

       1.308611289 seconds time elapsed

       1.039849000 seconds user
       0.208398000 seconds sys

especially look at "instructions" and "page faults".

relation to 3.1.2:

               193      page-faults:u                    #   37.357 /sec
     4,975,843,684      cycles:u                         #    0.963 GHz
     6,891,334,795      instructions:u                   #    1.38  insn per cycle

Numbers from valgrind:

total heap usage	allocs	frees	bytes allocated
GC 3.1.2	5,526	5,526	20,443,861
GC 3.2	21,677,571	21,677,571	893,390,820
GC 3.2 (after this PR)	7,803,907	7,803,903	244,140,683

So still much more than 3.1.2, but definitely a good direction.

Note that you likely have some cobc_malloc somewhere, as only after this patch valgrind outputs "heap in use at exit: 320 bytes in 4 blocks". (with 3.1.2 and 3.2 valgrind outputs "0 bytes").

For comparison to the linked PR, here's the flamegraph and the cost-overview:

If you don't have an idea how to improve more (if it helps I could drop the inlining, providing a cleaner flamegraph and cost analysis (but of course less similar to production builds of cobc), then I'd be fine with upstreaming this now (with a changelog entry, of course and maybe a minor note to NEWS that cobc's memory usage during parsing in comparison to 3.2 was decreased).

But possibly you see something from the list above leading to even more improvements?

GitMensch · 2024-04-23T10:52:45Z

Oops - leads to 4 compile-errors in NIST SM and additional

In file included from /usr/include/string.h:535,
                 from /tmp/cob88280_0.c:8:
In function 'memmove',
    inlined from 'OBIC3A_' at /tmp/cob88280_0.c:1677:5:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:36:10: warning: '__builtin_memmove' reading 20 bytes from a region of size 0 [-Wstringop-overread]
   36 |   return __builtin___memmove_chk (__dest, __src, __len,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   37 |                                   __glibc_objsize0 (__dest));
      |                                   ~~~~~~~~~~~~~~~~~~~~~~~~~~

GitMensch · 2024-04-23T19:53:54Z

Oh - if possible (for any cases like this - where NIST fails but our internal does not): please add "the issue" for the current failure to the testsuite, as this ensures higher coverage there ("someday" we don't want to run NIST that often, but our coverage is definitely not there yet).

lefessan · 2024-04-23T22:33:52Z

I think the bug is fixed now, and I added some more optimizations that I would be interested to check their performance.

About the freeing to perform, is there a function that is calling all the freeing functions on exit ?

GitMensch · 2024-04-23T22:45:22Z

About the freeing to perform, is there a function that is calling all the freeing functions on exit ?

Yes, on different places, as long as you use the right malloc - which is possibly cobc_plex_malloc in replace.c

I think the bug is fixed now

Seems so - could you get a grasp what that triggered and add it to the internal testsuite, please?

and I added some more optimizations that I would be interested to check their performance

Will check in 7 to 17 hours and report back ;-)
Maybe it is useful to check in replace.c if the outline comments in the header are correct and if there are any doxygen comments on functions are missing, as well as crafting the Changelog entry and possibly a minor NEWS one.

Note: the amount of one- or two-liner static function... looks special.
Please check if these are useful and mark them as always inlined.

GitMensch · 2024-04-24T07:36:22Z

Again: much better (apart from the memory free part better than both 3.1.2 and 3.2


            654.86 msec task-clock:u                     #    0.742 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
               180      page-faults:u                    #  274.869 /sec
     2,190,772,514      cycles:u                         #    3.345 GHz
     5,304,004,935      instructions:u                   #    2.42  insn per cycle
     1,309,395,192      branches:u                       #    2.000 G/sec
         4,460,189      branch-misses:u                  #    0.34% of all branches

       0.882801772 seconds time elapsed

       0.583984000 seconds user
       0.071812000 seconds sys

Numbers from valgrind:

total heap usage	allocs	frees	bytes allocated
GC 3.1.2	5,526	5,526	20,443,861
GC 3.2	21,677,571	21,677,571	893,390,820
GC 3.2 (after this PR now)	4,465	4,463	20,390,950

with the flamegraph not showing the replace functions any more.

Therefore I'd say - issue fixed, only needs the "additional" parts (check use of memory-functions, reformat, in-file documentation check; handling TODOs, Changelog, ideally testsuite, possibly minor NEWS entry) then looks good to go.

For the format:

-       if (copy_repls->replace_list == NULL &&
-            copy_repls->current_list == NULL &&
-            replace_repls->replace_list == NULL &&
-            replace_repls->current_list == NULL) {
-		return cb_ppecho_direct (text, token);
-	}
+	if (copy_repls->replace_list == NULL
+	 && copy_repls->current_list == NULL
+	 && replace_repls->replace_list == NULL
+	 && replace_repls->current_list == NULL) {
+		return cb_ppecho_direct (text, token);
+	}

It likely is useful to also compare the actual REPLACE code. I'll add a REPLACE statement + COPY REPLACING then reinspect (later, possibly in an hour).

lefessan · 2024-04-24T15:46:04Z

Great. I cleaned the PR with ChangeLog and NEWS sections. Allocations should also be correctly freed at the end.

I did not add a new entry in the testsuite, as the bug was in the data-structure token_queue, and, unless we modify this data structure, which is unlikely, it cannot be triggered anymore. The difference between the testsuite and the NIST tests came from a test in NIST that had a replacement longer than 8 tokens, which seems not to exist in our testsuite (and my token_queue implementation had a bug on that constant...).

GitMensch

The difference between the testsuite and the NIST tests came from a test in NIST that had a replacement longer than 8 tokens, which seems not to exist in our testsuite

Please add it (likely by adjusting one of the existing tests) - this way we ensure things like that are catched easier in the future.

I may do a last try for getting the numbers in this issue correct, both with and without REPLACE later this evening.

GitMensch · 2024-04-24T16:58:01Z

NEWS

+   comparison between a numeric DISPLAY variable to another or to a
+   literal comparison between numeric DISPLAY or BCD variable to zero
+   INSPECT CONVERTING (and "simple" INSPECT REPLACING), in general and
+   especially if both from and to are constants
+


The old indent is strange, but I think it matches the one from previous versions. If this is the case then either leave it as-is or, potentially better, replace in both cases by indented extra list starting with *

Looking at NEWS from 3.2 and earlier version, I think the former indent was wrong even for them.

thank you for fixing all of these with an * iteration

cobc/replace.c

GitMensch · 2024-04-24T18:45:26Z

make checkall also passed on the dev environment at work updated stats:

            661.16 msec task-clock:u                     #    0.735 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
               178      page-faults:u                    #  269.225 /sec
     2,242,123,238      cycles:u                         #    3.391 GHz
     5,304,096,492      instructions:u                   #    2.37  insn per cycle
     1,309,409,904      branches:u                       #    1.980 G/sec
         4,453,625      branch-misses:u                  #    0.34% of all branches

       0.899029345 seconds time elapsed

       0.584002000 seconds user
       0.075081000 seconds sys

now with

HEAP SUMMARY:
   in use at exit: 0 bytes in 0 blocks
 total heap usage: 4,465 allocs, 4,465 frees, 20,390,950 bytes allocated

All heap blocks were freed -- no leaks are possible

The "simple" adjustment did work fine (one of each,. the second nearly globally), resulting in

          1,019.13 msec task-clock:u                     #    0.957 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
            50,342      page-faults:u                    #   49.397 K/sec
     3,341,557,957      cycles:u                         #    3.279 GHz
     7,982,333,596      instructions:u                   #    2.39  insn per cycle
     1,889,809,554      branches:u                       #    1.854 G/sec
         4,612,794      branch-misses:u                  #    0.24% of all branches

       1.064843192 seconds time elapsed

       0.899148000 seconds user
       0.115552000 seconds sys
       
 HEAP SUMMARY:
    in use at exit: 0 bytes in 0 blocks
  total heap usage: 4,285,236 allocs, 4,285,236 frees, 144,827,144 bytes allocated

All heap blocks were freed -- no leaks are possible

... but I'm not sure if my more complex test program for COPY REPLACING / REPLACE does work correctly.
I should inspect that tomorrow (but still would be fine if you want to check that in upstream today).

* Add two fast paths that completely avoid the replacement machinery (these two fast paths were formerly present in 3.1); * Use a circular buffer instead of a list for the temporary queue of tokens in streams; * Do not reallocate tokens passed from the copy-replacing stream to the replace stream

lefessan · 2024-04-25T06:30:39Z

I added a test case for a long replacement (and a combinaison of COPY and REPLACE on the same tokens), that would trigger the same error as the NIST test.

I am not sure what you mean by your complex program: except for the two fast paths (that were already present in 3.1.2), this PR does not change the behavior of the COPY/REPLACE, so anything that worked with 3.1.2 and 3.2 should also work with this one.

GitMensch · 2024-04-25T08:17:32Z

so anything that worked with 3.1.2 and 3.2 should also work with this one

I think so - as noted this is good to be checked in upstream.

As also noted I do need to recheck the example I've created in any case.

lefessan · 2024-04-25T08:58:14Z

I am thinking about writing a blog post about all this (how we fixed the initial bug and how we optimized it to match the perf of 3.1.2), do you happen to have a flamegraph of the final version also ? It could make a nice illustration of the evolution of performances during the work...

GitMensch · 2024-04-25T19:07:31Z

Here you go.

GitMensch · 2024-04-25T19:47:51Z

The recordings for the flamegraph were taken by perf record call-graph dwarf,8096 --aio -z -- cobc B BIG.cob -o BIG.i, the analysis was done using Hotspot from KDAB.

lefessan · 2024-04-26T06:29:37Z

Thanks ! Is big.cob public ?

GitMensch · 2024-04-26T12:00:51Z

Thanks ! Is big.cob public ?

No chance. But I can share that the file is generated by Oracle Pro*COB from an (also generated) file handling SQL.
After all processing of copybooks and removing any comment or empty lines this source still has 747,203 LOC - and the whole part of preparsing the Pro*COB generated file (.cob -> .i - the part that we inspected here) now takes 661.31 msec task-clock; the complete processing (.cob with -fsyntax-only) takes 4,738.12 msec task-clock and processing + codegen (.cob-> .c) needs 7,693.71 msec task-clock.

lefessan · 2024-04-27T12:44:03Z

Merged in SVN !

GitMensch · 2024-04-27T13:42:17Z

Well done. If you write that blog entry please drop a note in the discussion boards under GnuCOBOL.

Note: while we've inspected the preprocessing here I've worked this week on the parsing speed and memory usage, especially for source files with lots of FILLERs as seen especially in generated programs and EXEC SQL preprocessed ones (speed inspected with perf/Hotspot, memory usage with Heaptrack) and hope to commit this soon as well.

... actually I hope to commit anything of the 95% stuff laying around, those tend to pile up.

lefessan marked this pull request as draft April 23, 2024 08:33

GitMensch reviewed Apr 23, 2024

View reviewed changes

cobc/replace.c Show resolved Hide resolved

lefessan force-pushed the z-2024-04-22-optimized-replacing branch from 9f44da1 to d210c6e Compare April 24, 2024 15:39

lefessan marked this pull request as ready for review April 24, 2024 15:39

GitMensch requested changes Apr 24, 2024

View reviewed changes

lefessan force-pushed the z-2024-04-22-optimized-replacing branch from d210c6e to 7f60896 Compare April 24, 2024 18:11

lefessan force-pushed the z-2024-04-22-optimized-replacing branch from 7f60896 to 78ef11d Compare April 25, 2024 06:26

GitMensch added the Ready for SVN label Apr 25, 2024

lefessan added the Merged in SVN label Apr 27, 2024

lefessan closed this Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attempt at optimizing allocs in replace.c #142

attempt at optimizing allocs in replace.c #142

lefessan commented Apr 23, 2024 •

edited

Loading

lefessan commented Apr 23, 2024

GitMensch commented Apr 23, 2024

GitMensch commented Apr 23, 2024 •

edited

Loading

GitMensch commented Apr 23, 2024

GitMensch commented Apr 23, 2024

lefessan commented Apr 23, 2024

GitMensch commented Apr 23, 2024 •

edited

Loading

GitMensch commented Apr 24, 2024 •

edited

Loading

lefessan commented Apr 24, 2024

GitMensch left a comment

GitMensch Apr 24, 2024

lefessan Apr 24, 2024

GitMensch Apr 24, 2024

GitMensch commented Apr 24, 2024

lefessan commented Apr 25, 2024

GitMensch commented Apr 25, 2024

lefessan commented Apr 25, 2024

GitMensch commented Apr 25, 2024

GitMensch commented Apr 25, 2024

lefessan commented Apr 26, 2024

GitMensch commented Apr 26, 2024 •

edited

Loading

lefessan commented Apr 27, 2024

GitMensch commented Apr 27, 2024 •

edited

Loading

attempt at optimizing allocs in replace.c #142

attempt at optimizing allocs in replace.c #142

Conversation

lefessan commented Apr 23, 2024 • edited Loading

lefessan commented Apr 23, 2024

GitMensch commented Apr 23, 2024

GitMensch commented Apr 23, 2024 • edited Loading

GitMensch commented Apr 23, 2024

GitMensch commented Apr 23, 2024

lefessan commented Apr 23, 2024

GitMensch commented Apr 23, 2024 • edited Loading

GitMensch commented Apr 24, 2024 • edited Loading

lefessan commented Apr 24, 2024

GitMensch left a comment

Choose a reason for hiding this comment

GitMensch Apr 24, 2024

Choose a reason for hiding this comment

lefessan Apr 24, 2024

Choose a reason for hiding this comment

GitMensch Apr 24, 2024

Choose a reason for hiding this comment

GitMensch commented Apr 24, 2024

lefessan commented Apr 25, 2024

GitMensch commented Apr 25, 2024

lefessan commented Apr 25, 2024

GitMensch commented Apr 25, 2024

GitMensch commented Apr 25, 2024

lefessan commented Apr 26, 2024

GitMensch commented Apr 26, 2024 • edited Loading

lefessan commented Apr 27, 2024

GitMensch commented Apr 27, 2024 • edited Loading

lefessan commented Apr 23, 2024 •

edited

Loading

GitMensch commented Apr 23, 2024 •

edited

Loading

GitMensch commented Apr 23, 2024 •

edited

Loading

GitMensch commented Apr 24, 2024 •

edited

Loading

GitMensch commented Apr 26, 2024 •

edited

Loading

GitMensch commented Apr 27, 2024 •

edited

Loading