Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterations for a single time step are not synchronised on HELICS 3.2.1 #2408

Closed
Getnet-Ayele opened this issue Aug 4, 2022 · 19 comments
Closed
Labels
bug Issue concerning incorrect code operation

Comments

@Getnet-Ayele
Copy link

Describe the bug
Iterations out of synchronization if the federates have different speeds.

What is the expected behavior?
Like the time step synchronization, iterations are supposed to be synchronized.

To Reproduce
Steps to reproduce the behavior:
Here is the dummy project:
https://github.com/Getnet-Ayele/HELICS_SAInt_DummyFederates

Environment (please complete the following information):

  • Operating System: Windows
  • Language Extension: [c#
  • what compiler or setup process did you use Visual Studio 2019
  • HELICS version:
$ helics_app --version

Additional context and information
(e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, eg. stackoverflow, gitter, etc)

@Getnet-Ayele Getnet-Ayele added the bug Issue concerning incorrect code operation label Aug 4, 2022
@phlptp
Copy link
Member

phlptp commented Aug 4, 2022

I will take a close look and see if I can get the example running on my system

@phlptp
Copy link
Member

phlptp commented Aug 4, 2022

I ran it, what am I looking for in regard to what it is showing vs what you would expect?

@Getnet-Ayele
Copy link
Author

Expected: The diverging time steps should be the same in both federates and the result should not change by repeating the simulation or by changing the delay in one of the federates. Below are some screenshots showing the diverging time steps for different simulations.

  • Setting the delay=0ms in the electric federate:

image
Repeating the simulation:
image

  • Setting the delay =100ms in the electric federate after the time step 5.

image
Repeating the simulation
image

@phlptp
Copy link
Member

phlptp commented Aug 11, 2022

Are you able to test this with the develop branch. There was a minor fix related to iterations in a recent PR. I am not 100% sure it is related but is possibly connected.

@Getnet-Ayele
Copy link
Author

Getnet-Ayele commented Aug 19, 2022

I have tested it with the "develop branch". The issue is still not resolved.
In the updated test case (available here https://github.com/Getnet-Ayele/HELICS_SAInt_DummyFederates), only Time-step 2 is supposed to fail (diverge). However, more time-steps are diverging when I increase the delay in the electric federate.

I used the following command for the iteration:

h.helicsFederateRequestTimeIterative(vfed, TimeStep, HelicsIterationRequest.HELICS_ITERATION_REQUEST_FORCE_ITERATION, out helics_iter_status);

@eranschweitzer
Copy link
Contributor

You should probably be using helics_iteration_request_iterate_if_needed. That will wait to see if any other Federate published new values. When you do force_iteration the federate immediately iterates.
The trick is to use iterate_if_needed until you are satisfied and want to move on and then not publish, thus indicating this wish to the Federation.

@phlptp
Copy link
Member

phlptp commented Sep 2, 2022

Force_iteration does not immediately grant iteration. At least in theory it behaves identically to iterate_if_needed except that in the case iterate_if_needed would have granted the next time step force_iteration will instead grant an additional iteration.

@eranschweitzer
Copy link
Contributor

That was not what I was seeing when I was working on the iteration example. I think we even discussed this at some point. force_iteration in my experience had the same behavior as described by @Getnet-Ayele that the federate issuing this call begins to reiterate immediately irrespective of what else is going on in the Federation.

@Getnet-Ayele
Copy link
Author

I tried iterate_if_needed option. The issue is still there. Relatively, the force_iteration looks slightly better.

The reason could be that I have only two federates and they exchange data in both directions. If one is iterating, it means that it is also publishing new values at each iteration. Hence, the other federate should also be iterating and trying to get those latest values. However, there is no way that each federate knows the iteration number of the other federate. Is there a way that helics counts the iteration of each federate internally for each time step?

In the code shown below, I called the helics_iteration_request_force_iteration just before each subscription and publication so that I can access the latest values available. If I remove those lines, the iteration misalignment gets worse.

image

@eranschweitzer
Copy link
Contributor

eranschweitzer commented Sep 7, 2022 via email

@phlptp
Copy link
Member

phlptp commented Sep 16, 2022

Can you try this again with the new 3.3.0 release? I am still working on translating the entire program into C to verify but there was a bug fixed that could be related in the new release since we last spoke.

@Getnet-Ayele
Copy link
Author

Thank you for the update!

I updated the hellics library to 3.3.0. The issue is not resolved and looks the same as before.

I also tried a for loop instead of Thread.Sleep() to make the electric federate slower. The problem is still there.
image

@eranschweitzer
Copy link
Contributor

Would it be possible for you to provide some sort of log output for how the federates are moving through time and exchanging information?

iterate_if_needed really should be the call you use, so the fact that force_iteration seems needed is an indication that something may not be set up correctly.

@Getnet-Ayele
Copy link
Author

Would it be possible for you to provide some sort of log output for how the federates are moving through time and exchanging information?

iterate_if_needed really should be the call you use, so the fact that force_iteration seems needed is an indication that something may not be set up correctly.

@eranschweitzer
Here are the log files for different simulations. Additional log files of the results are now made available in the Output folder.

The reference log files are taken from the simulation which gave me better results, but they do not necessarily mean that the iterations are synchronized.

Let me know if you need further information about the log file or to run the dummy federates again.

@eranschweitzer
Copy link
Contributor

@Getnet-Ayele, thank you for sharing those.
Here is my suspicion about at least some of what is going on.

  1. I think you might not be logging things quite correctly. The reason is in the iterate if needed log. Your iter counter appears to continue increasing even when your federates get iteration status: 0 which means NEXT_STEP, which means time should actually be progressing.
  2. Based on the order of things in the logs, and what you posted before it appears that your federate code is structured as follows:
  • call request_time
  • call get_subscriptions
  • process
  • call send_publications
    The issue is that you really should be publishing before calling request_time as the presence of new data is what tells HELICS to return the ITERATING status.

There is some general information about how to structure iteration here, including a pseudo-code example.
While these are more "python like" the concept should be the same.

There is further an example on how to set up iteration here and the code is here.
These examples are also of two different Federates, so there is actually a good deal of similarity between that and your problem (other than the examples are in Python and not C#).

I would strongly encourage you to take a look at those examples and compare their structure to yours.
If it is not the same, consider reworking your federate structure and see if that works.
If they are the same...then we probably do have a bug 😅

@Getnet-Ayele
Copy link
Author

@eranschweitzer, thank you for your detailed feedback!

I will go through your suggestions and will let you know the outcome.

@Getnet-Ayele
Copy link
Author

@eranschweitzer and @eranschweitzer thank you again for your feedback.

Following your suggestions, the iterations are now synchronised regardless of the delay between the federates. iterate_if_needed is the one that worked perfectly.

In addition to the order of publications, time_request, and subscriptions, I noticed that I was wrongly using:

  • h.helicsFederateEnterExecutingMode(vfed)
  • h.helicsFederateRequestTime(vfed, TimeStep)
  • h.helicsFederateRequestTimeIterative(vfed, TimeStep, HelicsIterationRequest.HELICS_ITERATION_REQUEST_FORCE_ITERATION)

Which are now replaced with:

  • h.helicsFederateEnterInitializingMode(vfed)
  • h.helicsFederateEnterExecutingModeIterative(vfed, HelicsIterationRequest.HELICS_ITERATION_REQUEST_ITERATE_IF_NEEDED)
  • h.helicsFederateRequestTimeIterative(vfed, TimeStep, HelicsIterationRequest.HELICS_ITERATION_REQUEST_ITERATE_IF_NEEDED)
  • And iteration_status check up after each iterative_time_request

The log file can be found here. Log.txt

@Getnet-Ayele
Copy link
Author

@eranschweitzer : Regarding the maximum iteration, HelicsProperties.HELICS_PROPERTY_INT_MAX_ITERATIONS.

In the absence of convergence, is the iterative_time_request is supposed to return a h.helics_iteration_result_next_step status when the iteration exceeds the maximum?

In my case, it was iterating beyond MAX_ITERATIONS. To overcome that I used while(Iter < MAX_ITERATIONS) {... ; Iter +=1;...} in the inner loop of the main co-simulation code in both federates.

@phlptp
Copy link
Member

phlptp commented Oct 18, 2022

I will look into that, It is possible some of the recent changes caused that property to not work correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue concerning incorrect code operation
Projects
None yet
Development

No branches or pull requests

3 participants