Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulation performance #961

Open
ydnatag opened this issue Jun 25, 2019 · 51 comments
Open

Simulation performance #961

ydnatag opened this issue Jun 25, 2019 · 51 comments
Labels
category:performance performance topics

Comments

@ydnatag
Copy link

ydnatag commented Jun 25, 2019

Hi,
At work, i'm porting all the repositories to mainline cocotb with python3 and i'm seeing that the performance using python3 is ~4 times worse than python2 For executing the tests, i just changed the PYTHON_BIN env,

I can't share a testcase now, i will try to recreate this performance issue with a sharable test.

My environment:

  • debian stretch
  • python2.7
  • python3.5
  • current cocotb master

Thanks

@ydnatag
Copy link
Author

ydnatag commented Jun 25, 2019

Some results:

Python2

*************************************************************************************************************
** TEST                                                 PASS/FAIL  SIM TIME(NS)  REAL TIME(S)  RATIO(NS/S) **
*************************************************************************************************************
** test.xxxxxxxxxxxxxxxxxxxx                              PASS        60983.81          1.20     50900.73  **
** test.xxxxxxxxxxxxxxxxxx                                PASS        79006.15          4.00     19732.29  **
** test.xxxxxxxxxxxxxx                                    PASS          566.95          0.02     25764.83  **
** test.xxxxxxxxxxxxxxxxxxxxxxx                           PASS        64058.68          1.28     49926.37  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxx                         PASS       160903.77          3.10     51852.69  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx              PASS       220670.00         44.15      4998.32  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx              PASS       270395.00         54.00      5007.50  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    PASS       394325.00         86.88      4538.70  **
** test.xxxxxxxxxx                                        PASS       160353.47          2.99     53638.34  **
** test.xxxxxxxxxxxxxx                                    PASS         5362.68          0.16     33913.34  **
*************************************************************************************************************

*************************************************************************************
**                                 ERRORS : 0                                      **
*************************************************************************************
**                               SIM TIME : 1485586.65 NS                          **
**                              REAL TIME : 199.12 S                               **
**                        SIM / REAL TIME : 7460.66 NS/S                           **
*************************************************************************************

Python3

*************************************************************************************************************
** TEST                                                 PASS/FAIL  SIM TIME(NS)  REAL TIME(S)  RATIO(NS/S) **
*************************************************************************************************************
** test.xxxxxxxxxxxxxxxxxxxx                              PASS        61183.91          3.54     17286.30  **
** test.xxxxxxxxxxxxxxxxxx                                PASS        79206.25          9.50      8340.80  **
** test.xxxxxxxxxxxxxx                                    PASS          767.05          0.06     13165.36  **
** test.xxxxxxxxxxxxxxxxxxxxxxx                           PASS        64258.78          3.16     20356.97  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxx                         PASS       161103.87          7.86     20496.36  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx              PASS       221070.00        113.51      1947.57  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx              PASS       270795.00        142.54      1899.82  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    PASS       568430.00        343.62      1654.23  **
** test.xxxxxxxxxx                                        PASS       160553.57          8.03     20004.19  **
** test.xxxxxxxxxxxxxx                                    PASS         5562.78          0.38     14753.74  **
*************************************************************************************************************

*************************************************************************************
**                                 ERRORS : 0                                      **
*************************************************************************************
**                               SIM TIME : 1940301.22 NS                          **
**                              REAL TIME : 849.55 S                               **
**                        SIM / REAL TIME : 2283.91 NS/S                           **
*************************************************************************************

@eric-wieser
Copy link
Member

Which version of python 2 and 3?

@imphil imphil added the type:question a support question label Jun 25, 2019
@ydnatag
Copy link
Author

ydnatag commented Jun 26, 2019

Python versions:

  • python 3.5.3
  • python 2.7.13

@imphil
Copy link
Member

imphil commented Jun 26, 2019

Could you possibly test with a newer Python 3 version, Python 3.7 being preferred? (Only to narrow down the problem a bit, we are still aiming for good performance in Python 3.5.)

@cmarqu
Copy link
Contributor

cmarqu commented Jun 26, 2019

There is also an environment variable COCOTB_ENABLE_PROFILING:
https://cocotb.readthedocs.io/en/latest/building.html?highlight=profile#additional-environment-variables

@ydnatag
Copy link
Author

ydnatag commented Jun 26, 2019

I supposed that i was using the same COCOTB but in one terminal i was inside a docker with an older cocotb. So, I rerun all the test with current and old commit and i got the following results:

Current master

Python 3.7.3

*************************************************************************************************************
** TEST                                                 PASS/FAIL  SIM TIME(NS)  REAL TIME(S)  RATIO(NS/S) **
*************************************************************************************************************
** test.xxxxxxxxxxxxxxxxxxxx                              PASS        61183.91          2.48     24695.05  **
** test.xxxxxxxxxxxxxxxxxx                                PASS        79206.25          6.52     12139.58  **
** test.xxxxxxxxxxxxxx                                    PASS          767.05          0.04     17737.50  **
** test.xxxxxxxxxxxxxxxxxxxxxxx                           PASS        64258.78          2.13     30143.39  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxx                         PASS       161103.87          5.47     29432.25  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx              PASS       252351.29         92.17      2737.78  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx              PASS       302076.66        109.01      2771.22  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    PASS       947690.01        421.32      2249.33  **
** test.xxxxxxxxxx                                        PASS       160553.57          5.66     28375.40  **
** test.xxxxxxxxxxxxxx                                    PASS         5562.78          0.26     21150.07  **
*************************************************************************************************************
                                                                                                             
*************************************************************************************
**                                 ERRORS : 0                                      **
*************************************************************************************
**                               SIM TIME : 2034754.18 NS                          **
**                              REAL TIME : 645.18 S                               **
**                        SIM / REAL TIME : 3153.80 NS/S                           **
*************************************************************************************

Python 2.7.3

*************************************************************************************************************
** TEST                                                 PASS/FAIL  SIM TIME(NS)  REAL TIME(S)  RATIO(NS/S) **
*************************************************************************************************************
** test.xxxxxxxxxxxxxxxxxxxx                              PASS        61183.91          3.54     17282.70  **
** test.xxxxxxxxxxxxxxxxxx                                PASS        79206.25          8.68      9128.62  **
** test.xxxxxxxxxxxxxx                                    PASS          767.05          0.05     14443.75  **
** test.xxxxxxxxxxxxxxxxxxxxxxx                           PASS        64258.78          3.12     20622.67  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxx                         PASS       161103.87          7.78     20709.44  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx              PASS       252351.29        117.08      2155.40  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx              PASS       302076.66        147.67      2045.64  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    PASS       600020.65        329.24      1822.44  **
** test.xxxxxxxxxx                                        PASS       160553.57          7.75     20720.70  **
** test.xxxxxxxxxxxxxx                                    PASS         5562.78          0.36     15554.01  **
*************************************************************************************************************
                                                                                                             
*************************************************************************************
**                                 ERRORS : 0                                      **
*************************************************************************************
**                               SIM TIME : 1687084.81 NS                          **
**                              REAL TIME : 625.35 S                               **
**                        SIM / REAL TIME : 2697.80 NS/S                           **
*************************************************************************************

Old Cocotb commit: 2744c4f

Python 2.7

*************************************************************************************************************
** TEST                                                 PASS/FAIL  SIM TIME(NS)  REAL TIME(S)  RATIO(NS/S) **
*************************************************************************************************************
** test.xxxxxxxxxxxxxxxxxxxxxxxxx                          N/A            0.00          0.00         0.00  **
** test.xxxxxxxxxxxxxxxxxxxx                              PASS        61183.91          1.94     31557.20  **
** test.xxxxxxxxxxxxxxxxxx                                PASS        79206.25          5.59     14173.21  **
** test.xxxxxxxxxxxxxx                                    PASS          767.05          0.04     18289.67  **
** test.xxxxxxxxxxxxxxxxxxxxxxx                           PASS        64258.78          1.93     33259.93  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxx                         PASS       161103.87          4.79     33622.94  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx              PASS       252351.29         58.97      4279.14  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx              PASS       302076.66         74.35      4062.96  **
** test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    PASS       600020.65        171.52      3498.33  **
** test.xxxxxxxxxx                                        PASS       160553.57          4.81     33392.15  **
** test.xxxxxxxxxxxxxx                                    PASS         5562.78          0.25     22497.06  **
*************************************************************************************************************
           
*************************************************************************************
**                                 ERRORS : 0                                      **
*************************************************************************************
**                               SIM TIME : 1687084.81 NS                          **
**                              REAL TIME : 324.24 S                               **
**                        SIM / REAL TIME : 5203.12 NS/S                           **
*************************************************************************************

Python 3.5.3:

* TEST                                                 PASS/FAIL  SIM TIME(NS)  REAL TIME(S)  RATIO(NS/S) **
************************************************************************************************************
* test.xxxxxxxxxxxxxxxxxxxxxxxxx                          N/A            0.00          0.00         0.00  **
* test.xxxxxxxxxxxxxxxxxxxx                              PASS        61183.91          1.95     31354.22  **
* test.xxxxxxxxxxxxxxxxxx                                PASS        79206.25          5.48     14465.62  **
* test.xxxxxxxxxxxxxx                                    PASS          767.05          0.04     17940.26  **
* test.xxxxxxxxxxxxxxxxxxxxxxx                           PASS        64258.78          1.96     32822.68  **
* test.xxxxxxxxxxxxxxxxxxxxxxxxx                         PASS       161103.87          4.88     33025.67  **
* test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx              PASS       252351.29         60.02      4204.34  **
* test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx              PASS       302076.66         74.06      4078.62  **
* test.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    PASS       773851.17        241.63      3202.65  **
* test.xxxxxxxxxx                                        PASS       160553.57          5.01     32063.15  **
* test.xxxxxxxxxxxxxx                                    PASS         5562.78          0.25     21906.17  **
************************************************************************************************************
                                                          
************************************************************************************
*                                 ERRORS : 0                                      **
************************************************************************************
*                               SIM TIME : 1860915.33 NS                          **
*                              REAL TIME : 395.37 S                               **
*                        SIM / REAL TIME : 4706.75 NS/S                           **
************************************************************************************

Summary

The performance doesn't depends on the python version. Cocotb suffered a loss of performance in the last year. Maibe it is expected, i don't know. I will do a the profiling and share the results.

@imphil
Copy link
Member

imphil commented Jun 26, 2019

The simulated time differs between your test runs, so things are not fully deterministic.

My observations

  1. Looking at the "SIM / REAL TIME" metric:
    In the newer commit, Python 3.7 is faster than Python 2.7.
    In the older commit, Python 3.5 is slower than Python 2.7.
    So this could suggest: "Python 3.5 is slower than Python 3.7, with Python 3.7 being faster than Python 2.7."
  2. Independent of the Python version, cocotb is slower with today's master than it was with 2744c4f (December 2018).

Can you confirm the latter two observations with a more deterministic test case? The first item would then be a documentation issue (suggest to people to use a modern Python version), while the second observation would require more investigation. It would be very helpful to have a deterministic test case and bisect the cocotb history between 2744c4f and master to figure out in which commit (if it is a single one) the performance regression was introduced.

@imphil
Copy link
Member

imphil commented Jun 26, 2019

And no, the performance regression is not expected.

@ydnatag
Copy link
Author

ydnatag commented Jun 26, 2019

Just one testcase has randomization (the testcase with the longest name), the others are deterministic. I can confirm both observations. These results are similar in all our tests (we have around 50 cores that are currently tested with cocotb).

Currently i'm looking for a shorter test case without confidential information for sharing the profiling.

@eric-wieser
Copy link
Member

As a hunch, #870 might be a good starting point to test before and after

@ydnatag
Copy link
Author

ydnatag commented Jun 26, 2019

Using python3.7

Current master

commit: 714e129

SIM TIME : 270162.40 NS 
REAL TIME : 16.99 S      
SIM / REAL TIME : 15900.06 NS/S

After Fix scheduler issues caused by compound triggers

commit: a354327

SIM TIME : 270162.40 NS 
REAL TIME : 14.08 S      
SIM / REAL TIME : 19192.90 NS/S

Before

commit: 5ac9bfb

SIM TIME : 270162.40 NS 
REAL TIME : 13.78 S      
SIM / REAL TIME : 19599.73 NS/S

My freezed version:

commit: 2744c4f

SIM TIME : 270162.40 NS 
REAL TIME : 8.52 S       
SIM / REAL TIME : 31705.06 NS/S

Observation

#870 doesn't affect the performance considerably.

@ydnatag
Copy link
Author

ydnatag commented Jun 26, 2019

I'm searching the guilty commits with brute force: An script which executes and grep the performance for each commit.

@imphil
Copy link
Member

imphil commented Jun 26, 2019

For the difference between the Python versions I've opened #965 to track the further investigation and implementation, leaving this bug to actually figure out the performance regression within cocotb.

@ydnatag
Copy link
Author

ydnatag commented Jun 26, 2019

Here the results:

image

CSV file with the results:
performance.txt

The outlier near 2019-01-04 was my fault. I slept the process.

As is shown in the image, the performance has a negative derivative along the time and 2/3 commits with higher performance loss:

Of course, those results are relative to the test but i think they are a good estimation.

@eric-wieser
Copy link
Member

What's the y axis on that graph?

@ydnatag
Copy link
Author

ydnatag commented Jun 27, 2019

The Y axis is SIM / REAL TIME measument. I call it "performance".

For having a better performane estimator, It should be normalized to the max SIM / REAL TIME. But for this investigation i think it is enough.

@eric-wieser
Copy link
Member

eric-wieser commented Jun 27, 2019

It would be good to confirm that the sim time is the same across all commits, in case the cause of the increased runtime is something like "we fixed a bug where the test ended silently too soon"

@ydnatag
Copy link
Author

ydnatag commented Jun 27, 2019

I checked that simtime is the same. . If you want i can share the scripts to recreate these results. The scripts are so ugly but useful.

@eric-wieser
Copy link
Member

eric-wieser commented Jun 27, 2019

So, I've found one obvious performance cost already - a logger is created for every single coroutine, which for a very simple test-case of:

import cocotb
from cocotb.triggers import Timer, RisingEdge, First
from cocotb.clock import Clock

@cocotb.test()
def test1(dut):
	clk_gen = cocotb.fork(Clock(dut.clk, 100).start())
	for i in range(10000):
		yield [RisingEdge(dut.clk), Timer(200)]

Seems to spend 12% of it's time calling logging.getLogger

Some of the scheduler refactors replaced some special-casing with more coroutines. What I didn't realize is that the each mini coroutine creates and destroys a full logger instance, which is not cheap at all.

@eric-wieser
Copy link
Member

Patch for that at #967. @andresdemski, please let me know how much that helps your more realistic suite.

@ydnatag
Copy link
Author

ydnatag commented Jun 27, 2019

I reprocess all the commits but this time i recorded simtime.
performance.txt

image

As is shown in the picture, simtime is constant

@ydnatag
Copy link
Author

ydnatag commented Jun 27, 2019

I tried #967 . The performance is improved a 10% but still far from the best performance

@ydnatag
Copy link
Author

ydnatag commented Jun 27, 2019

Here the profiling:

Old cocotb version

profile_oldversion

Current master

profile_master

@eric-wieser
Copy link
Member

What's the tool you used to produce those diagrams?

@ydnatag
Copy link
Author

ydnatag commented Jun 27, 2019

The same command that it is in endian_swapper example

$ make COCOTB_ENABLE_PROFILING=true
$ python3 -m gprof2dot -f pstats test_profile.pstat | dot -Tsvg -o profile.svg

@eric-wieser
Copy link
Member

Any chance you could upload the svg version?

eric-wieser added a commit to eric-wieser/cocotb that referenced this issue Jun 27, 2019
From the investigation at cocotbgh-961, this is shown to be 3% of the runtime on master.
The version of python does not change at runtime, so we can replace this function with a simple constant.
Doing so also brings our name in line with six's.
@imphil
Copy link
Member

imphil commented Jun 28, 2019

Let's please keep the discussion in this issue focused on understanding and fixing the performance issues.

@ydnatag
Copy link
Author

ydnatag commented Jul 3, 2019

I made a simple tool which allow us to do performance analysis in an easy way: https://github.com/andresdemski/cocotb-performance

Terminal output:

$ ./run_in_docker.sh python3 compare.py                                                                                                                                                                     
sha256:7a4d132d15ca92f95b7f75b637c2c95b277bf7a3609918815f276bbe9fb8c5e4

 url https://github.com/potentialventures/cocotb/
 commit: master
 path: /tmp/cocotb-master
 results: 
    simtime: 142610.10 ns
    realtime: 18.65s
    performance: 7648.49 ns/s


 url https://github.com/eric-wieser/cocotb/
 commit: avoid-runtime-coroutine-creation
 path: /tmp/cocotb-eric-wieser
 results: 
    simtime: 142610.10 ns
    realtime: 18.20s
    performance: 7833.63 ns/s


 url https://github.com/potentialventures/cocotb/
 commit: 2744c4fec607bf598d677484e6c0290183360f7a
 path: /tmp/cocotb-master
 results: 
    simtime: 142610.10 ns
    realtime: 15.53s
    performance: 9182.79 ns/s

@imphil
Copy link
Member

imphil commented Jul 3, 2019

@andresdemski now that all outstanding performance PRs have been merged, how close to 1.1 performance are we with current master? I'd like to use this information to make a judgement call if we can actually release master right now, or if we need to do further performance work. (A small performance regression is fine, a huge regression would be a bad thing.)

@eric-wieser
Copy link
Member

If #764 (f692e8d) is really significant, it would be fairly straightforward to revert - most of the value in it was to provide the simplicity needed to refactor. The refactor has now happened, and the three profiler.disable()s that touched became one. I suppose the cost of nullcontext() might be further from 0 than I hoped.

@ydnatag
Copy link
Author

ydnatag commented Jul 3, 2019

image

our current situation.

@eric-wieser
Copy link
Member

Can you replot that graph with an origin of 0?

@ydnatag
Copy link
Author

ydnatag commented Jul 3, 2019

@imphil
Copy link
Member

imphil commented Jul 3, 2019

So we're roughly 30 % slower than we were in the 1.1 release.

@eric-wieser
Copy link
Member

eric-wieser commented Jul 3, 2019

Phrased another way, (that) simulation will take 1 / (1 - 0.3) = 1.4 times the time it did in 1.1.

At the point this issue was opened, the simulation took 2.0 times longer

@eric-wieser
Copy link
Member

eric-wieser commented Jul 4, 2019

I investigated some more - the graph is misleading, as it doesn't handle branches well. Here's a better plot, that draws the branches separately:

The labels are linked to the commits on github.

Edit: updated to use the new data

@ydnatag ydnatag closed this as completed Jul 4, 2019
@ydnatag ydnatag reopened this Jul 4, 2019
@imphil
Copy link
Member

imphil commented Jul 5, 2019

Thanks @andresdemski for all the analysis runs, and @eric-wieser for the great work on performance patches! Are there any more patches upcoming in the next couple of days which target performance? (Otherwise I'm going to release 1.2 with what we have now, and we can build on top of that without time pressure.)

@eric-wieser
Copy link
Member

@imphil: Came up with another one at #1012.

Thinking back, I think #727 (and perhaps some of #723) might have been a performance mistake, and the cost of the weak dictionaries may not justify the minor gain of avoiding reference cycles. I don't really want to look at reverting those just yet.

In the long term though, there's a lot of efficiency to be gained by eliminating internal uses of Join triggers where using await / yield from directly is possible.

@cmarqu
Copy link
Contributor

cmarqu commented Jul 15, 2019

With these real-life testcases and current performance numbers, it may be worth re-investigating gcc options like in #178 (this would be set in
https://github.com/potentialventures/cocotb/blob/master/cocotb/share/lib/simulator/Makefile#L33 I suppose).

@ydnatag
Copy link
Author

ydnatag commented Nov 27, 2019

Update 11/26/19

image

No considerable changes since July.

Profile

image

profile_11_26_19.zip

@cmarqu cmarqu changed the title Simulation performace Simulation performance May 1, 2020
@cmarqu cmarqu added the category:performance performance topics label May 1, 2020
@github-actions
Copy link
Contributor

Has your question been resolved? If so please close this issue. If it has not been resolved, you may need to provide more information. If no more activity on this issue occurs in 7 days, it will be closed.

@themperek
Copy link
Contributor

themperek commented Dec 6, 2020

One can also use pprofile to generate reports.

Example for matrix_multiplier commit: f8c0e40

pprofile_matrix_multiplier_f8c0e40.log

@themperek
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:performance performance topics
Projects
None yet
Development

No branches or pull requests

6 participants