Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UI / execution slows down for very long programs (thousands of cycles) #132

Closed
glpuga opened this issue Aug 9, 2016 · 38 comments
Closed

Comments

@glpuga
Copy link

glpuga commented Aug 9, 2016

I was test running the eduMIPS v1.2.2 using the program that is attached to this issue, and I noted that for long simulation times, the rate at which the simulation advances (as seen in the cycles counter and registers windows updates) falls lower the longer the simulation time goes on.

On my computer, after just a one or two minutes the simulation rate drops from from almost a hundred cycles a second to just a few cycles per second, and keeps falling.

I tried disabling the the synchronization of the graphical interface after every cycle in the settings menu, but it made no difference, simulation rate seems to fall anyway.

hailstoneenglish.s.zip

@lupino3
Copy link
Member

lupino3 commented Aug 10, 2016

Hi Gerardo, thanks for the report.

I will have a look at it as soon as possible. I think we have never ran programs so long in EduMIPS64, the simulator is mostly tested on short programs, about a hundred cycles.

You mention that the speed drops after two minutes, and that the original speed is ~100 cycles/second. This means that your code reaches more than 12000 cycles? That's definitely never been tested. :)

I will try to reproduce the problem in the upcoming days (most likely next week) and see if I can isolate the defect and fix it.

Do you have any possible suspects? Does the memory usage grow during these 2 minutes? Maybe there is some memory inefficiency, or we reach the maximum amount of memory reserved by the JVM.

Did you try increasing the default memory that the JVM reserves to the process? See http://stackoverflow.com/a/2294280/193852. Try using a higher value. If I remember correctly, the default is at most 1/4th of your RAM. Try increasing that to half your RAM.

Thanks again for the bug report!

@glpuga
Copy link
Author

glpuga commented Aug 10, 2016

Hi Andrea,

Fearing that I might have exaggerated, this time I measured some hard data by running the simulation with "sync graphics with CPU in multi-step execution" turned off. I also set "interval
between cycles" to 0.

After two minutes of simulation the execution rate falls from 90 cycles/second to 13 cycles/second. Because of the speed drop, whereas after the first 10 seconds the simulated time has advanced by 900 cycles, at the 2 minutes time mark the simulated time has yet to reach 4500 cycles.

I attached the data that I measured as an example. However, bear in mind that multiple runs seem to vary quite a bit, probably depending on how the java processes are distributed on my CPUs. Whatever the peak simulation rate is, however, there is always a steep drop after just a few tenths of seconds.

When "sync graphics with CPU in multi-step execution" is on, the speed decline is less pronounced (probably because of the time required for screen updates), but also visible after a few minutes.

I have not noticed any increase in memory consumption as the simulation advances that could explain the rate drop. I have yet to test increasing the amount of memory for the JVM, but since the computer I'm working on has 16Gb to start with, even assuming that only a fourth of the total memory is being used by the JVM it seems unlikely that lack of memory is the problem.

Maybe you are right that the simulation time is kind of long. I had never given it much thought before because using winMIPS the full length of the program (about 50000 simulated cycles) runs in about a second. It's a program that I use to show the impact on cpu performance when the number of instructions between conditional jumps is small, by comparing the effect in the final cycle count of small changes in the program, like more aggressive use of delay slots and the like, so many cycles made for good statistics.

You are welcome. Thank you for maintaining open source projects with educational value!

edumipssimcyles
edumipssimrate

@lupino3
Copy link
Member

lupino3 commented Aug 11, 2016

Wow, thanks a lot for the detailed analysis.

I tried to run the program yesterday using the Swing interface and kept memory under control, and indeed it is very low and not a concern. As soon as I get some time I will attach a profiler to the simulator (YourKit, most likely) and will see how this time is spent.

The program is long, but if WinMIPS64 can execute it in 1 second then we should try to get as close as possible :)

I suspect that the issue might be with the graphical interface or the mechanisms used to synchronize it with the core. If I run your code in the web-based prototype (http://www.edumips.org/edumips64.html), and fix the warnings related to non-MIPS64 instructions, then the whole code runs in 2-3 seconds.

Sounds like an interesting bug to debug :) Thanks for spotting it!

@lupino3
Copy link
Member

lupino3 commented Aug 11, 2016

The "don't sync the UI" option does not work properly, I have opened issue #133 to deal with it.

I can indeed see the slowness as the number of instructions grows.

screenshot
Looking at some profiling data, taken ~4000 cycles after the execution started,it seems that most of the time is spent drawing the Cycles window. This makes sense to me: it is the only window for which the effort to paint it grows with the number of cycles executed.

I will dig a bit deeper and try to fix all the issues.

Gerardo, may I ask you to include your program as a unit test for the simulator, under the GPL license?

Thanks,
Andrea

@lupino3
Copy link
Member

lupino3 commented Aug 11, 2016

Finally, I ran the code as part of unit tests locally, and it still takes ages. The profile suggests that CycleBuilder is the culprit, and indeed in the code there is a linear search in a list of instructions.

Few things to fix. :) Hope we can get the performance as good as WinMIPS64 with a few of these fixes.

lupino3 added a commit that referenced this issue Aug 11, 2016
This avoids a quadratic behavior in CycleBuilder (O(number_of_cycles^2)), and
instead makes it linear. Improves the situation described in #132 but does not
fully fix it.
@lupino3
Copy link
Member

lupino3 commented Aug 11, 2016

The CycleBuilder code should be fixed in b0f654c. After that commit, it takes 4 seconds for the program to run inside unit tests (so, no GUI whatsoever).

For the drawing part, we could look into using the AWT clip region to minimize the amount of data drawn. Looks like I should study http://www.oracle.com/technetwork/java/painting-140037.html.

@glpuga
Copy link
Author

glpuga commented Aug 12, 2016

Gerardo, may I ask you to include your program
as a unit test for the simulator, under the GPL license?

Of course, you can include it. I'm glad it was useful.

Gerardo.-

lupino3 added a commit that referenced this issue Aug 23, 2016
Kindly donated by Gerardo Puga in Issue #132.
lupino3 added a commit that referenced this issue Aug 23, 2016
@lupino3
Copy link
Member

lupino3 commented Aug 24, 2016

So, I have fixed the sync option (#133), so the code I am about to merge into master should be able to run your program with GUI synchronization disabled.

The cycles window is horrible though (#134), I'll need to fix it before making a new release.

@KeyC0de
Copy link

KeyC0de commented May 3, 2017

It is unbearably slow. It only works properly for the first few seconds. Having disabled visual sync and setting to 0ms (or 1ms). This program is not "ready". I would also like to know why such a big difference to the MIPS instructions. Why can't you be consistent with the MIPS ISA? Anyway. I wanted to report that this program only works for very small (possibly with not much looping). It takes about ~10 minutes to execute 8k instructions. After than it's not possible. Splendid program but this needs fixing.

@lupino3
Copy link
Member

lupino3 commented May 3, 2017

Hi, have you tried the last JAR? It should be a bit better: http://lupino3.github.io/edumips64/edumips64-latest.jar.

I'll try to prioritize this bugfix, thanks for the report!

@KeyC0de
Copy link

KeyC0de commented May 3, 2017

OMG! The problem is completely fixed! I just ran my same program and it took only 20 seconds for 40k cycles! I have only attempted to run it up to 10k cycles before and it took 15 minutes (it was impossible to continue). Thanks so much! The program is great! Config: 0ms delay, no visual sync

@lupino3
Copy link
Member

lupino3 commented May 3, 2017

Yeah, I think I fixed most performance issues in commit b0f654c, but the problem is that this broke part of the UI, and I had very little time in the meantime to devote to the project 😭

I'll try to push it to a releasable state as soon as possible!

@lupino3
Copy link
Member

lupino3 commented May 3, 2017

Actually @CobraL0rd would you be open to donating your large program under GPL to be used as a test case? In this way I can use it as a regression test for execution speed. Thanks!

@KeyC0de
Copy link

KeyC0de commented May 3, 2017

Sure. Once it's done. I'm on a last detail.

@lupino3
Copy link
Member

lupino3 commented May 3, 2017

Cool, thanks!

@lupino3 lupino3 changed the title The longer the simulation goes on, the lower the simulation rate UI slows down for very long programs (thousands of cycles) May 3, 2017
@lupino3 lupino3 changed the title UI slows down for very long programs (thousands of cycles) UI / execution slows down for very long programs (thousands of cycles) May 3, 2017
@lupino3
Copy link
Member

lupino3 commented May 3, 2017

I have renamed the issue because the main root cause of core engine slowness was fixed. I am going to do a release with the fix so that users can benefit from it, and address the UI slowness in the next bugfix release (#134 might also be related to this problem).

@lupino3
Copy link
Member

lupino3 commented May 3, 2017

The speed improvements are in the newly-released version v1.2.3 (https://github.com/lupino3/edumips64/releases/tag/v1.2.3). The UI still is buggy, but it's definitely better to release incremental improvements now and then focus on the UI for the next release. Thanks everyone for the bug reports and the help!

@KeyC0de
Copy link

KeyC0de commented May 4, 2017

My program is probably ready. It took one minute and a half to run 100k cycles, ~93k instructions. What would giving my program to be used as a "test case" entail?

@lupino3
Copy link
Member

lupino3 commented May 4, 2017

If you attach it to this bug under GPL license, I'll add it to the test suite.

If you want to do it yourself, you'd have to add the files to the src/test/resources directory, and add a new test case in EndToEndTests.java, similar to what was done in commit 0c62ada

One minute is pretty long, I hope we can get it down to a few seconds!

@KeyC0de
Copy link

KeyC0de commented May 4, 2017

This program was made for a university project of mine. I've got to clear some things first and i will give it to you within a month. Don't worry one bit. I am not going to forget about you. I just have to finish the class first. I don't want to risk my grades if my code leaks, or something else happens idk. I will give my program to you in precisely 1 month from now 100%.

By the way my professor told me that 1.5 minute for the simulation isn't bad..

Thanks again.

@lupino3
Copy link
Member

lupino3 commented May 4, 2017

Sounds good, thanks for the help! Hope version 1.2.3 works well for you :)

@lupino3
Copy link
Member

lupino3 commented May 16, 2017

@CobraL0rd BTW have you tried running after disabling the option "Sync graphics with the UI"? That should be significantly faster.

@KeyC0de
Copy link

KeyC0de commented May 24, 2017

Yes. I mentioned that it was a setting i disabled in my configuration.

@lupino3
Copy link
Member

lupino3 commented May 24, 2017

Yeah then I'd be grateful to get your program at some point, so I can profile the simulator's execution and see if I can find any bottlenecks. Thanks!

@KeyC0de
Copy link

KeyC0de commented May 26, 2017

Ok i just run my mips program on the same EDUMIPS64 edition in a Ubuntu Linux x86_64 machine and it took 5 seconds! Same Java 8 version. Before i run it on Windows 8.1. Strange and interesting since it runs in the java virtual machine and the underlying HW is the same (my same computer, just dual boot), the running time was 18 times faster.

@lupino3
Copy link
Member

lupino3 commented May 26, 2017

Hm, interesting! I don't have access right now to a Windows machine, but I may try to find one to debug this slowness.

Can you confirm the EduMIPS64 versions you are using?

@KeyC0de
Copy link

KeyC0de commented May 26, 2017

It is the latest version, 1.2.2.

@lupino3
Copy link
Member

lupino3 commented May 26, 2017

Latest is actually 1.2.3, which might be faster. And I implemented some improvements to Swing threading which will be in 1.2.4.

Would you mind please trying the JAR at http://lupino3.github.io/edumips64/edumips64-latest.jar on Windows and letting me know how quick it is?

Thanks for your help!

@KeyC0de
Copy link

KeyC0de commented May 26, 2017

Ok i didn't notice you were rolling out versions so fast. I tried 1.2.3. On linux it took 4-5 seconds (approximately the same). On Windows it lasted 5 more seconds 95 in total. Previously it run on 90 seconds. The display of results (input/output operations) is also much slower (displayed) on Windows. On linux it takes almost instantly, on Windows it takes about 2 seconds to display 150 output lines.

@lupino3
Copy link
Member

lupino3 commented May 26, 2017

Thanks! How about the latest JAR on Windows? I changed how UI rendering interacts with Swing (c591f4b and 2f49b2e) and I'd expect it to be way faster.

@KeyC0de
Copy link

KeyC0de commented May 27, 2017

I tested the latest jar on Windows too. I told you about it in my previous post. It takes 4-5 additional seconds 95 in total.

@KeyC0de
Copy link

KeyC0de commented Aug 27, 2017

@lupino3 Hey. I can provide you the program. Let me know if you want it and where do i post it. Sorry for the delay.

@lupino3
Copy link
Member

lupino3 commented Aug 27, 2017

@Nik-Lz yes, thanks. Feel free to either add it to a gist (gist.github.com), add it to the test directory (https://github.com/lupino3/edumips64/tree/master/src/test/resources) or send it via email. Thanks!

@KeyC0de
Copy link

KeyC0de commented Aug 27, 2017

I uploaded it on my Gist, along with its C implementation. Hopefully it can be of some use.

@lupino3
Copy link
Member

lupino3 commented Aug 27, 2017

Thanks, can you please share the link here?

@KeyC0de
Copy link

KeyC0de commented Aug 27, 2017

@lupino3
Copy link
Member

lupino3 commented Aug 29, 2017

Added your program as a unit test, thanks. It will be helpful to detect performance regressions.

I will try to troubleshoot slowness on Windows machines once I get access to one of them. Thanks!

lupino3 added a commit that referenced this issue Oct 8, 2017
Profiling revealed that a lot of time in unit tests is spent throwing
and catching the RAWException exception across method boundaries.

Apparently, throwing the exception within the method is much cheaper,
hence the changes in this commit.

In this commit, the Instruction.ID() method returns true when a RAW
conflict is detected, false otherwise. While this is not a clean
interface, it makes the longest unit test (testSetBitSort) roughly
30% faster.

Also, while changing the instructions, some of the simplifications
suggested by IntelliJ were applied.

This should mitigate in part Issue #132.
lupino3 added a commit that referenced this issue Oct 8, 2017
Profiling revealed that a lot of time in unit tests is spent throwing
and catching the RAWException exception across method boundaries.

Apparently, throwing the exception within the method is much cheaper,
hence the changes in this commit.

In this commit, the Instruction.ID() method returns true when a RAW
conflict is detected, false otherwise. While this is not a clean
interface, it makes the longest unit test (testSetBitSort) roughly
30% faster.

Also, while changing the instructions, some of the simplifications
suggested by IntelliJ were applied.

This should mitigate in part Issue #132.
@lupino3
Copy link
Member

lupino3 commented Jul 27, 2018

I am going to close this issue.

Various speed-ups have been implemented, and when disabling UI sync the execution is very fast.

Please reopen if you think this should be addressed differently. Thanks!

@lupino3 lupino3 closed this as completed Jul 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants